Letter in my inbox…..

Dear Sheeri,

Congratulations! You have been accepted as a presenter for the O’Reilly Open Source Convention 2006 at the Oregon Convention Center July 24, 2006 – July 28, 2006.

The following has been accepted as a 45 minute session for the event:

“So you’ve inherited a MySQL Instance on Unix”

…..

——————-

I’d totally forgotten I’d even submitted that!

Guess I’m going to Oregon in July…..

Dear Sheeri,

Congratulations! You have been accepted as a presenter for the O’Reilly Open Source Convention 2006 at the Oregon Convention Center July 24, 2006 – July 28, 2006.

The following has been accepted as a 45 minute session for the event:

“So you’ve inherited a MySQL Instance on Unix”

…..

——————-

I’d totally forgotten I’d even submitted that!

Guess I’m going to Oregon in July…..

Why use varchar a lot?

So it’s totally appropriate to denormalize a table when the data you’re dealing with is variable length. Variable-length rows lead to fragmentation, whereas if you have fixed-length rows, a new row can take the place of a deleted row with no problem.

With disk space so cheap these days, why is varchar used so often? I understand that varchar is probably misrepresented — if one attribute is variable length (varchar) then all char attributes are changed to varchars. However, why employ the use of varchar instead of char?

So it’s totally appropriate to denormalize a table when the data you’re dealing with is variable length. Variable-length rows lead to fragmentation, whereas if you have fixed-length rows, a new row can take the place of a deleted row with no problem.

With disk space so cheap these days, why is varchar used so often? I understand that varchar is probably misrepresented — if one attribute is variable length (varchar) then all char attributes are changed to varchars. However, why employ the use of varchar instead of char?

When To Put Images Into MySQL?

So, most of the “I want images in MySQL” conversations are terminated with “Don’t.” Some articles say “MySQL’s overhead isn’t worth it” and others that say “If you put the images in MySQL you don’t have to deal with the filesystem overhead.”

My company’s site has over 2 million images to store/retrieve, and our current setup is at its limit and does not scale well — 2 NFS servers. We’d rather avoid adding another NFS server, because it involves changing where images are located whenever we scale. The problem is that with so many images, there are so many inodes that the filesystem cannot keep up. As well, if a server is rebooted or offline for more than an hour (ie, during maintenance) it is extremely slow until the cache catches up (about half an hour).

We believe the best retrieval method is to use MySQL. We will be doing our own speed testing, so I’m not relying on what folks say here. But I’m curious — what do other folks think? Flickr uses MySQL for image storage…..

So, most of the “I want images in MySQL” conversations are terminated with “Don’t.” Some articles say “MySQL’s overhead isn’t worth it” and others that say “If you put the images in MySQL you don’t have to deal with the filesystem overhead.”

My company’s site has over 2 million images to store/retrieve, and our current setup is at its limit and does not scale well — 2 NFS servers. We’d rather avoid adding another NFS server, because it involves changing where images are located whenever we scale. The problem is that with so many images, there are so many inodes that the filesystem cannot keep up. As well, if a server is rebooted or offline for more than an hour (ie, during maintenance) it is extremely slow until the cache catches up (about half an hour).

We believe the best retrieval method is to use MySQL. We will be doing our own speed testing, so I’m not relying on what folks say here. But I’m curious — what do other folks think? Flickr uses MySQL for image storage…..

Education vs. Experience

I hated university and graduate school. I had a liberal arts education, so I was learning theory. Now, computer science theory is boring to someone who wants to make something useful. I understand why many folks dropped out of college to go work for the internet boom.

However, in the brief 5 years of work experience I’ve had since getting my Master’s, I’ve learned that a self-taught practicum only goes so far. Whether it’s system administration or database administration, the more basic levels of problem solving do not require much insight into how the system (OS or DB) works, the deeper problems do.

Some of that you can learn by doing, but most of the concepts aren’t learnable unless you’re digging into the source code.

That being said, I found a good web page today that goes through Entity-Relationship modeling. It’s very good, comprehensive.

http://www.utexas.edu/its/windows/database/datamodeling/index.html

Perhaps a good workshop idea would be “Data Modeling Using MySQL and the Sakila sample database” — in other words, teaching generic database concepts, translating it into MySQL for folks that already use MySQL.

(As full disclosure, PostgreSQL was the standard teaching database, because at the time MySQL hadn’t been full-featured, as it is now. I’m not quite sure why PostgreSQL isn’t more popular, given that it’s been full-featured for longer. Probably either not enough marketing, or that MySQL is just more intuitive for the self-taught folks…..)

And oh, the irony: “Dictionary.com Word of the Day – autodidact: one who is self-taught.”

I hated university and graduate school. I had a liberal arts education, so I was learning theory. Now, computer science theory is boring to someone who wants to make something useful. I understand why many folks dropped out of college to go work for the internet boom.

However, in the brief 5 years of work experience I’ve had since getting my Master’s, I’ve learned that a self-taught practicum only goes so far. Whether it’s system administration or database administration, the more basic levels of problem solving do not require much insight into how the system (OS or DB) works, the deeper problems do.

Some of that you can learn by doing, but most of the concepts aren’t learnable unless you’re digging into the source code.

That being said, I found a good web page today that goes through Entity-Relationship modeling. It’s very good, comprehensive.

http://www.utexas.edu/its/windows/database/datamodeling/index.html

Perhaps a good workshop idea would be “Data Modeling Using MySQL and the Sakila sample database” — in other words, teaching generic database concepts, translating it into MySQL for folks that already use MySQL.

(As full disclosure, PostgreSQL was the standard teaching database, because at the time MySQL hadn’t been full-featured, as it is now. I’m not quite sure why PostgreSQL isn’t more popular, given that it’s been full-featured for longer. Probably either not enough marketing, or that MySQL is just more intuitive for the self-taught folks…..)

And oh, the irony: “Dictionary.com Word of the Day – autodidact: one who is self-taught.”

How many different db problems can happen in one day?

So, today I have had the following problems on various databases:

1) lack of hosts file causing intermittent connectivity
2) running out of space in innodb tablespace
3) a backup causing undue slowness
4) a data partition running out of space and corrupting the binary log, thus making the slaves not replicate
5) mysql server not starting after I added space (see 2) because I put the same logfile in twice

All in all, not bad. Only problem 4 caused me to furrow my brow to fix it, because I hadn’t seen anything like it on the list, but one of the first google hits led me to the cause. Basically the symptom was a slave server that would not start replication, with the error in mysqld.err only (not in SHOW SLAVE STATUS — that would always show NULL seconds behind). But Google immediately gave me the answer.

What a day. Luckily I slept for 11 hours last night; I’ve been working 12 so far!

So, today I have had the following problems on various databases:

1) lack of hosts file causing intermittent connectivity
2) running out of space in innodb tablespace
3) a backup causing undue slowness
4) a data partition running out of space and corrupting the binary log, thus making the slaves not replicate
5) mysql server not starting after I added space (see 2) because I put the same logfile in twice

All in all, not bad. Only problem 4 caused me to furrow my brow to fix it, because I hadn’t seen anything like it on the list, but one of the first google hits led me to the cause. Basically the symptom was a slave server that would not start replication, with the error in mysqld.err only (not in SHOW SLAVE STATUS — that would always show NULL seconds behind). But Google immediately gave me the answer.

What a day. Luckily I slept for 11 hours last night; I’ve been working 12 so far!

binlog feature?

So, I have what I consider a feature request for binlog. I’d like there to be an option to log DDL only or DML only, or both DDL and DML (current functionality).

Sure, I can take a diff of the schema, or grep for TABLE or INDEX and put that into its own logfile; however, this is something that really should be an option to mysqlbinlog.

(http://bugs.mysql.com/bug.php?id=16916)

So, I have what I consider a feature request for binlog. I’d like there to be an option to log DDL only or DML only, or both DDL and DML (current functionality).

Sure, I can take a diff of the schema, or grep for TABLE or INDEX and put that into its own logfile; however, this is something that really should be an option to mysqlbinlog.

(http://bugs.mysql.com/bug.php?id=16916)

Book Review: Time Management for System Administrators

I picked up this book for a few reasons:

1) I have a sysadmin background, and it helps me understand databases immensely (though not as much as the actual database courses I’ve had).

2) The author is a friend of mine.

3) The author is an exceptional sysadmin and writer.

4) I watched the Google video presentation he gave to a user group about the book, and wanted to learn more.

So, without ado, here is my full review of the book:

Firstly, let’s delve into why system administrators (and database administrators!) need a book on time management. Most folks have too much on their plate. System administrators, however, have a lot on their plate in terms of projects, but are also interrupted many times per day with numerous situations.

System administrators have the situation where folks come to them with a problem, expecting immediate attention, much like a retail cashier — “I’m here, I’d like to make a purchase, please help me.” If the cashier is busy doing other things, such as stocking merchandise, and takes a while to acknowledge the customer, the customer gets upset.

However, system administrators have large projects looming — they need to reconcile “create this account” with “rebuild the network infrastructure.”

Tom Limoncelli’s book, “Time Management for System Administrators” deals with this quandary. He focuses on making the work environment sane and handleable, so a sysadmin feels more accomplished (and his boss feels he is, too). However, this book is also focused on keeping folks who need work from you happy. Limoncelli reforms the idea of an overworked, grumpy sysadmin, and shows you how you can transform yourself.

In the first part of Chapter 2, “Focus Versus Interruptions”, he states:

“You might say that this chapter teaches you how to keep yourself focused and deal with interruptions without being a jerk.”

Limoncelli has the uncanny way of answering my questions with the next paragraph. For instance, I was a bit wary of his “Cycle system”, and thought, “well, I could use Life Balance for that, couldn’t I? And it’s really better!” And lo and behold, on page 75, he gives that option.

The values of routines and automation are discussed, as well as how to set those up. It’s all well and good to say “you should automate things,” but Limoncelli explains how to do that for tasks. He explains how to get over ‘mental garbage’ that blocks us from improving ourselves. There’s an entire chapter devoted to email management, which comes after one on stress management. And yes, he even delves into such sticky issues as documentation and prioritization.

All of his tips are useful. I did not find myself disagreeing with any tips — they either fell into the category of “I’m already doing it,” “I should do it now,” and “I’m not ready to do that now, but I will do it in the future.”

This is a book I would read once or twice a year and keep refining my techniques based on it.

I intend to give this book to anyone who works in this type of environment — sysadmins I know, in particular, but DBAs as well.

I picked up this book for a few reasons:

1) I have a sysadmin background, and it helps me understand databases immensely (though not as much as the actual database courses I’ve had).

2) The author is a friend of mine.

3) The author is an exceptional sysadmin and writer.

4) I watched the Google video presentation he gave to a user group about the book, and wanted to learn more.

So, without ado, here is my full review of the book:

Continue reading “Book Review: Time Management for System Administrators”

mysqldumpslow

Wow! I just stumbled across mysqldumpslow (while RTFM’ing) and what a tool!

It goes through the slow query log you give as an argument and tallies up the number of times each query appears and a lot of other summary data. Not knowing about this before, it makes it easy to prioritize which slow queries to work on first.

Wow! I just stumbled across mysqldumpslow (while RTFM’ing) and what a tool!

It goes through the slow query log you give as an argument and tallies up the number of times each query appears and a lot of other summary data. Not knowing about this before, it makes it easy to prioritize which slow queries to work on first.

I Have Created a Monster

Tonight is the 2nd MySQL User Group meeting. The group has 114 people registered. I saw a clear need to make the group back in October, given that there were 32 people waiting for a Boston MySQL User Group.

And of course, Boston being a major high-tech area, folks are driving in from a 30-mile radius (New Hampshire and places out on 495) to attend.

Tonight is the 2nd MySQL User Group meeting. The group has 114 people registered. I saw a clear need to make the group back in October, given that there were 32 people waiting for a Boston MySQL User Group.

And of course, Boston being a major high-tech area, folks are driving in from a 30-mile radius (New Hampshire and places out on 495) to attend.

Aliases

How much time do you spend every week typing in ‘mysqladmin -u root -p showprocesslist’ ? or ‘mysqladmin -u root -p kill 123,456,789 ? After listening to Tom Limoncelli‘s video presentation of a workshop called “Time Management for System Administrators” (which was the forerunner to the O’Reilly book of the same name), I realized that I could implement some of his tips right away.

Specifically, aliasing. I’ve now aliased the following in my .bash_profile:

alias myps=’mysqladmin -u root -p processlist’
alias mystatus=’mysql -u root -p -e “show status; status;”‘
alias mykill=’mysqladmin -u root -p kill’
alias mysqlr=’mysql -u root -p’

I’d rather not put passwords into my profile, and this makes for easy copying. I use CVS instead of copying to track and easily update changes to my .bash_profile. But I realized while watching the recording that there are many commands, particularly relating to mysql, where I end up typing a lot for the same command over and over.

How much time do you spend every week typing in ‘mysqladmin -u root -p showprocesslist’ ? or ‘mysqladmin -u root -p kill 123,456,789 ? After listening to Tom Limoncelli‘s video presentation of a workshop called “Time Management for System Administrators” (which was the forerunner to the O’Reilly book of the same name), I realized that I could implement some of his tips right away.

Specifically, aliasing. I’ve now aliased the following in my .bash_profile:

alias myps=’mysqladmin -u root -p processlist’
alias mystatus=’mysql -u root -p -e “show status; status;”‘
alias mykill=’mysqladmin -u root -p kill’
alias mysqlr=’mysql -u root -p’

I’d rather not put passwords into my profile, and this makes for easy copying. I use CVS instead of copying to track and easily update changes to my .bash_profile. But I realized while watching the recording that there are many commands, particularly relating to mysql, where I end up typing a lot for the same command over and over.