MySQL Camp Lost and Found

Well, I have a lot to say about MySQL Camp, but for now, I just want to say thank you to Jay Pipes for organizing the Camp, and to Google for hosting and feeding us, and to MySQL for support.

I have a few lost and found items — if you lost or found something, please contact me by e-mailing me at awfief@gmail.com and I will see to it that you get your item back, or that found items get to the right place.

LOST:
white 4-port USB hub

FOUND:
a charger (looks like it’s for a phone, it’s small-duty)
small stack of Avery blank printer sheets, for file folder labels
Sunglasses

If you ended up with something that’s not yours, or are missing something, contact me and we’ll do as much as we can to get items to their owners.

Well, I have a lot to say about MySQL Camp, but for now, I just want to say thank you to Jay Pipes for organizing the Camp, and to Google for hosting and feeding us, and to MySQL for support.

I have a few lost and found items — if you lost or found something, please contact me by e-mailing me at awfief@gmail.com and I will see to it that you get your item back, or that found items get to the right place.

LOST:
white 4-port USB hub

FOUND:
a charger (looks like it’s for a phone, it’s small-duty)
small stack of Avery blank printer sheets, for file folder labels
Sunglasses

If you ended up with something that’s not yours, or are missing something, contact me and we’ll do as much as we can to get items to their owners.

MySQL Is a Great Database, But Do Not Use It????

I just read this post, thanks to the database log buffer:

http://marist89.blogspot.com/2006/11/where-am-i-deploying-mysql.html

I think the major point is subtle, and can be gotten by taken the first and last few sentences of the post:

“If cost were no object, I’d always deploy Oracle. I’m comfortable with Oracle technology and I think I have a pretty good idea how to implement and administer it….My company is sold on MySQL and as our confidence grows in the software, so will our installed base.”

He says that MySQL is an “80% solution” but he would not use it for “mission critical applications.” I will venture to say that this is because he is not familiar with MySQL. If he were, he would be more comfortable with the fact that MySQL is an enterprise-level, mission-critical solution. James Day recently commented on a blog post (http://sheeri.net/archives/141#comment-7043) saying,

“Well, I work for MySQL and I’m happy enough to say it: the Community Edition is fine for production systems with huge loads like Wikipedia, where two of the members of the MySQL Support team come from. Merely one of the top 20-30 web sites in the world, doing around a couple of billion queries a day.”

The article goes on to say where MySQL is good, which I actually appreciate. However, it basically says “MySQL is good as a database, but does not have as many ‘extras’ as other DBMS’s have.”

Take the first example:

The first [sweet spot of MySQL] is small to medium size OLTP databases (<100 GB) that are fronted by something like a java middle-tier. These applications typically control most of the business logic and authentication/authorization in the middle-tier (right or wrong) and use the database as a big storage bucket. These applications rely on the backend serving data as fast as it can and MySQL can serve data just as fast as the next guy.

Is there something wrong with expecting a database to be “a big storage bucket”? Isn’t that what it’s for? Isn’t that like saying “HTML is great, but it sucks for business logic.” That is the entire point — it’s the UI layer.

Similarly, databases are the storage layers. In a good theoretical model, databases are not supposed to have business logic embedded in them.

Sure, if the database does it faster, then put the business logic there. But know that you are doing that, make a conscious decision, and say “yes, we’re breaking from good theory for better performance”. Much like, say, denormalizing a table.

And MySQL isn’t “just as fast as the next guy”. It’s FASTER, otherwise why would people use it, because it’s much less featured than Oracle? This I’ll admit — however, you can code around features. No amount of code can make Oracle faster.

In the 2nd “sweet spot”, it says that MySQL excels at retrieving data. Hot damn, isn’t that what a database is supposed to do? Is Oracle not good at retrieving data? If Oracle is better at storing business logic than retrieving data, shouldn’t it be called a programming language, not a database?

In the third sweet spot, you say MySQL is good at stuffing lots of data into denormalized tables. I’ve been using MySQL for quite a long time, and I missed the part where stuffing lots of data into normalized tables is slow. It’s fast for stuffing lots of data into tables, period.

And yes, it’s snappy for retrieving data. So it’s good for putting data in and taking data out. I am not sure how this says “it is not good for mission critical applications.” It does everything a database is supposed to do. So does the blog post implying MySQL doesn’t have 24×7 reliability?

What’s your uptime with Oracle?

MySQL has fabulous uptimes, multi-threaded processing, etc. My company relies on MySQL, and we make > USD $220,000 per week.

Is it perfect? no. But is it reliable? Damn skippy.

To Jeff Hunter’s comment I say: Incremental backups exist — that’s what binary logs are.

I just read this post, thanks to the database log buffer:

http://marist89.blogspot.com/2006/11/where-am-i-deploying-mysql.html

I think the major point is subtle, and can be gotten by taken the first and last few sentences of the post:

“If cost were no object, I’d always deploy Oracle. I’m comfortable with Oracle technology and I think I have a pretty good idea how to implement and administer it….My company is sold on MySQL and as our confidence grows in the software, so will our installed base.”

He says that MySQL is an “80% solution” but he would not use it for “mission critical applications.” I will venture to say that this is because he is not familiar with MySQL. If he were, he would be more comfortable with the fact that MySQL is an enterprise-level, mission-critical solution. James Day recently commented on a blog post (http://sheeri.net/archives/141#comment-7043) saying,

“Well, I work for MySQL and I’m happy enough to say it: the Community Edition is fine for production systems with huge loads like Wikipedia, where two of the members of the MySQL Support team come from. Merely one of the top 20-30 web sites in the world, doing around a couple of billion queries a day.”

The article goes on to say where MySQL is good, which I actually appreciate. However, it basically says “MySQL is good as a database, but does not have as many ‘extras’ as other DBMS’s have.”

Take the first example:

The first [sweet spot of MySQL] is small to medium size OLTP databases (<100 GB) that are fronted by something like a java middle-tier. These applications typically control most of the business logic and authentication/authorization in the middle-tier (right or wrong) and use the database as a big storage bucket. These applications rely on the backend serving data as fast as it can and MySQL can serve data just as fast as the next guy.

Is there something wrong with expecting a database to be “a big storage bucket”? Isn’t that what it’s for? Isn’t that like saying “HTML is great, but it sucks for business logic.” That is the entire point — it’s the UI layer.

Similarly, databases are the storage layers. In a good theoretical model, databases are not supposed to have business logic embedded in them.

Sure, if the database does it faster, then put the business logic there. But know that you are doing that, make a conscious decision, and say “yes, we’re breaking from good theory for better performance”. Much like, say, denormalizing a table.

And MySQL isn’t “just as fast as the next guy”. It’s FASTER, otherwise why would people use it, because it’s much less featured than Oracle? This I’ll admit — however, you can code around features. No amount of code can make Oracle faster.

In the 2nd “sweet spot”, it says that MySQL excels at retrieving data. Hot damn, isn’t that what a database is supposed to do? Is Oracle not good at retrieving data? If Oracle is better at storing business logic than retrieving data, shouldn’t it be called a programming language, not a database?

In the third sweet spot, you say MySQL is good at stuffing lots of data into denormalized tables. I’ve been using MySQL for quite a long time, and I missed the part where stuffing lots of data into normalized tables is slow. It’s fast for stuffing lots of data into tables, period.

And yes, it’s snappy for retrieving data. So it’s good for putting data in and taking data out. I am not sure how this says “it is not good for mission critical applications.” It does everything a database is supposed to do. So does the blog post implying MySQL doesn’t have 24×7 reliability?

What’s your uptime with Oracle?

MySQL has fabulous uptimes, multi-threaded processing, etc. My company relies on MySQL, and we make > USD $220,000 per week.

Is it perfect? no. But is it reliable? Damn skippy.

To Jeff Hunter’s comment I say: Incremental backups exist — that’s what binary logs are.

Statistics Gathering Script

I posted some code to the MySQL Forge at:

http://forge.mysql.com/snippets/view.php?id=52

Basically I gather some stats on my webservers in a very crude way. It runs a bunch of commands on the commandline (I run it every 5 minutes from a cron script). Please note that it adds to the load of the database server by running these commands, and it connects a few times to the database, so it actually increases things like “total # of connections” by the sheer fact that it runs.

This should run on most Unix machines, running most versions of MySQL. Please comment to this entry if you find something wrong or an incompatibility.

An explanation of the code follows:

This script will produce a string that looks like this:
(‘value’,’value’,’value’,123,’value’),
and all you have to do is add a line at the top: “INSERT INTO table (field1,field2,field3,field4,field5) VALUES”

and change the last comma to a semicolon. Then you will have something suitable for entering into a database, for later processing.

So here is an explanation of the lines of code, in order — the first few set variables:
#!/bin/sh
DB=db_here
USER=user_here
PASS=pass_here
HOST=`/bin/hostname | /bin/cut -f1 -d\.`

/bin/echo \(\'$HOST\',\'\

“echo” sets up the fact that you are printing all this out (later you >> to a file). \ is the escape character, and the line starts with “(“. Then we put the hostname in quotes, print a comma, and then an open quote. The final \ means “continue on the next line”, and it shows up on every line hereafter, so we can build a long string and then save it to a file.

`/bin/date +%Y-%m-%d\ %T`\',\

Putting “ around a command ensures that it gets run first, and then echo uses the output. Otherwise echo would simply print out “/bin/date +%Y…..”

This gets the date and time in MySQL’s date format (‘YYYY-MM-DD HH:MM:SS’), and puts in the ‘ to end the quoted field and a comma.

`/usr/bin/mysql -u $USER -p$PASS -e "\s"| /bin/awk '/Threads/ {print $2","$7","$20}'`,\

If you type \s at the database prompt, you will get some information that looks like this:


mysql> \s
--------------
mysql Ver 12.22 Distrib 4.0.20, for sun-solaris2.8 (sparc)
Connection id: 296486
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Server version: 4.0.20-log
Protocol version: 10
Connection: Localhost via UNIX socket
Client characterset: latin1
Server characterset: latin1
UNIX socket: /tmp/mysql.sock
Uptime: 57 days 8 hours 6 min 25 sec
Threads: 1 Questions: 9331360 Slow queries: 32 Opens: 19372 Flush tables: 1 Open tables: 64 Queries per second avg: 1.884
--------------

This gets fields 2, 7 and 20 (fields are delimited by spaces) from the line marked “Threads” — field 2 is “1” relating to # of threads, field 7 is “32” which refers to the number of slow queries, and field 20 is “1.884” which referes to “Queries per second avg”. So this line gets 3 values. Since they are numbers, they do not need to be quoted*.

`cat /proc/loadavg | /bin/awk '{print $1}'`,\

/proc has some great information in it, but you have to use cat to view them. This gets the current load average, that appears as the first argument to “uptime”. Again, this is a number and does not need to be quoted.*

`cat /proc/meminfo | /bin/awk '/MemTotal|MemFree|SwapTotal|SwapFree/ {print $2","}'` \

This prints the 2nd field (and a comma separation) from /proc/meminfo from the lines that deal with total memory, free memory, total swap and free swap space. Again, these are numbers and do not require quoting.*

`/usr/bin/mysql -u $USER -p$PASS $DB -e "show table status\G" | /bin/awk '/InnoDB free:/ {print $4}' | /usr/bin/head -1`,\

This is a very hacky way to find how much space is left in the InnoDB tablespace. I use one set of InnoDB data files for all the InnoDB tables I have, so I can just take the free space from the first table, and that is OK. If you are using 1 file per table, you will want to get each value.

`/usr/bin/mysql -u $USER -p$PASS $DB -e "show status" | /bin/awk '/Bytes_received\W|Bytes_sent\W|Com_delete\W|Com_insert\W|Com_replace\W|Com_select\W|Com_update\W|Connections\W|Questions\W|Uptime\W/ {printf $2 ","}'`\

This gets lots of information from the SHOW STATUS command of MySQL. The Com_* commands are the counts, so Com_insert is the count of inserts (since the database instance started, so you can do Com_insert/Uptime to get inserts/second if you want).

Connections is the total number of connections (not to be confused with Threads, which is the current # of threads connected). Questions is total number of queries.

I wrote this in fits and starts, and above I parsed some output of “\s”. However, those numbers can be gotten by the “SHOW STATUS” variables of Slow_queries, Threads_created and Questions/Uptime. Because I am lazy, I have not changed my scripts, although you could decrease the amount of data MySQL sends and lessen the impact of this script with one less connection if you added it to this line.

There are TONS of variables in “SHOW STATUS” that are interesting and you might want to track. Decide what you like…and you can always add them in later!

`df -k /var | /bin/awk '/dev/ {print $2 "," $4}'`,\

This just looks at the space of /var, which is where my company keeps our mysql data files.

`df -k /var/log | /bin/awk '/dev/ {print $2 "," $4}'`,\

This just looks at the space of /var/log — where my company keeps our mysql logfiles.

`/usr/bin/mysql -u $USER -p$PASS $DB -e "SELECT COUNT(DISTINCT uid) FROM Sessions\G" | /bin/awk '/COUNT/ {print $3}' `,\

This gets information from a particular table in our system — it shows how many people are online. This is useful for us to gather because now we can see how the statistics we gather are related to how many people are logged in.

`/usr/bin/iostat -dk | /bin/awk '/sda/ {print $2","$5","$6}'`\),\

This requires the “sysstat” package, and this line retrieves the I/O (disk) transactions per second, the KB read and the KB written to the disk.

>> /home/care/`echo $HOST`perf.csv

And this is the last line — finally, at last, we save it to a file.

PLEASE feel free to modify this and make it cleaner……obviously you will want to apply it to your own situation and what you want to gather.

I run this as a local script on each database machine, because it gets information using the commandline. I save everything to a file and do a weekly import. Theoretically I could have it save directly to a reporting database, but I do not want any dependence on network, just in case. You may want to have the process more automatic.

Weekly I have a cron job to “rotate” the .csv file: copy the file locally, copy it to a remote server, and then delete all but the first line (the first line being “INSERT INTO tbl VALUES(…..”).

(for those interested, this is how I delete all but the first line. Oh so hacky….)
HOST=`/bin/hostname | /bin/cut -f1 -d\.`
head -1 /home/care/`echo $HOST`perf.csv > /tmp/temp; mv /tmp/temp /home/care/`echo $HOST`perf.csv;

And for those who care, the following is the first line of my file:

REPLACE INTO dbstats (serverName,statsTime,procs,slowQueries,queriespersec,serverLoad,MemTotal
,MemFree,SwapTotal,SwapFree,ibFree,bytesRec,bytesSent,deletes,inserts,replaces,selects,updates
,connections,queries,mysqlUptime,sizeKBvar,availKBvar,sizeKBvarlog,availKBvarlog,Sessions,tps,
kbRead,kbWrtn) VALUES


* while numbers do not need to be quoted, if there’s ever a chance that the number may end up blank instead of 0, it should be quoted, as MySQL does not handle “VALUES (‘foo’,,’bar’)”
and throws an error.

I posted some code to the MySQL Forge at:

http://forge.mysql.com/snippets/view.php?id=52

Basically I gather some stats on my webservers in a very crude way. It runs a bunch of commands on the commandline (I run it every 5 minutes from a cron script). Please note that it adds to the load of the database server by running these commands, and it connects a few times to the database, so it actually increases things like “total # of connections” by the sheer fact that it runs.

This should run on most Unix machines, running most versions of MySQL. Please comment to this entry if you find something wrong or an incompatibility.

An explanation of the code follows:
Continue reading “Statistics Gathering Script”

The MySQL / Oracle Connection

So, in going around the room at mysqlcamp this morning, only one person mentioned having using another database (SQL Server & Sybase). Another mentioned Access…..but that’s not a “real” database.

But many people mentioned connections to Oracle — be it “used to work for Oracle” or “used to use Oracle”. I think this speaks to how similar Oracle and MySQL are. Now, there are PLENTY of differences. But compared to Sybase, Microsoft SQL Server, Ingres, Postgres and IBM DB2, it sounds like MySQL and Oracle are closer to each other than MySQL and any other database.

So, in going around the room at mysqlcamp this morning, only one person mentioned having using another database (SQL Server & Sybase). Another mentioned Access…..but that’s not a “real” database.

But many people mentioned connections to Oracle — be it “used to work for Oracle” or “used to use Oracle”. I think this speaks to how similar Oracle and MySQL are. Now, there are PLENTY of differences. But compared to Sybase, Microsoft SQL Server, Ingres, Postgres and IBM DB2, it sounds like MySQL and Oracle are closer to each other than MySQL and any other database.

Google’s Open Source MySQL Tools

Chip Turner, been @ Google approximately 1.5 years.

http://code.google.com/p/google-mysql-tools

Right now you have to click “source” and then browse the subversion tree’s trunk.

These are tools written at Google that are frequently used, and perhaps useful for other folks.

So what’s there, how does Google use the code, and how can other folks use it?

mypgrep is to show and kill processes among many database servers. Perhaps from certain users, IP addresses, how often certain queries are being run, long-running queries, if it’s locked, connectionless queries — anything you’d use show processlist or show full processlist on.

Uses threads to talk to multiple MySQL databases in parallel to try to be fast, but it’s pretty standard Python code.

Connectionless queries are those that have been killed, but MySQL doesn’t check for the connection until it tries to send rows back — I know I’ve had this issue a lot, when I kill a query but it still runs, I’d like it to just die.

You can specify it to include or not include replication queries.

compact_innodb uses mysqldump and reload with checks to make sure that stuff doesn’t corrupt. This defragments InnoDB tables (and others). Done offline.

Dumps all the innodb tables, then drops the tables, stops mysql, deletes the innodb datafile and logs, and then restarts mysql and reload the dumps. They’ve found the defragmentation works well.

Can do only the dump or only the restore if you want.

It does your largest tables first, to maximize I/O bandwidth.

Why not ALTER TABLE? Takes about the same amount of time, so you can dump and restore to and from different machines. Seems to be more reliable, you can kill the process without worrying about data corruption.

Why not OPTIMIZE? If you use one innodb file per table and autoextend, you can reclaim the space used by the fragmentation. Of course, if you don’t use one innodb data file per table, and just have a hard limit, then it’s not going to reclaim space.

When is it time to defragment? One answer: Disk busy time vs. queries per second — when the disk starts to take a lot longer to seek with the same # of queries per second, it’s time to defragment.

compat_* are the compatibility libraries.

Interestingly, making the source open was easy — Google gave approval, so the work was mostly removing comments that were to-do lists for folks that no longer work there.

Other useful tools?
One way to speed up replication is to start some processes which will do all physical I/O that replication threads are about to do, so the stuff to be updated is already in the cache. If your slave has enough disk drives, and you can use iostat and see that the disks are 10% busy, it probably means that you’re I/O bound by the serialized nature of replication, and you can use this method to decrease the time. If your disks are 100% busy AND I/O bound, it won’t help. This will be on the page linked above sometime next week.

Basically, it reads the binary logs, trying not to get too far ahead, and issues the queries so the disk can cache the right data.

Chip Turner, been @ Google approximately 1.5 years.

http://code.google.com/p/google-mysql-tools

Right now you have to click “source” and then browse the subversion tree’s trunk.

These are tools written at Google that are frequently used, and perhaps useful for other folks.

So what’s there, how does Google use the code, and how can other folks use it?

mypgrep is to show and kill processes among many database servers. Perhaps from certain users, IP addresses, how often certain queries are being run, long-running queries, if it’s locked, connectionless queries — anything you’d use show processlist or show full processlist on.

Uses threads to talk to multiple MySQL databases in parallel to try to be fast, but it’s pretty standard Python code.

Connectionless queries are those that have been killed, but MySQL doesn’t check for the connection until it tries to send rows back — I know I’ve had this issue a lot, when I kill a query but it still runs, I’d like it to just die.

You can specify it to include or not include replication queries.

compact_innodb uses mysqldump and reload with checks to make sure that stuff doesn’t corrupt. This defragments InnoDB tables (and others). Done offline.

Dumps all the innodb tables, then drops the tables, stops mysql, deletes the innodb datafile and logs, and then restarts mysql and reload the dumps. They’ve found the defragmentation works well.

Can do only the dump or only the restore if you want.

It does your largest tables first, to maximize I/O bandwidth.

Why not ALTER TABLE? Takes about the same amount of time, so you can dump and restore to and from different machines. Seems to be more reliable, you can kill the process without worrying about data corruption.

Why not OPTIMIZE? If you use one innodb file per table and autoextend, you can reclaim the space used by the fragmentation. Of course, if you don’t use one innodb data file per table, and just have a hard limit, then it’s not going to reclaim space.

When is it time to defragment? One answer: Disk busy time vs. queries per second — when the disk starts to take a lot longer to seek with the same # of queries per second, it’s time to defragment.

compat_* are the compatibility libraries.

Interestingly, making the source open was easy — Google gave approval, so the work was mostly removing comments that were to-do lists for folks that no longer work there.

Other useful tools?
One way to speed up replication is to start some processes which will do all physical I/O that replication threads are about to do, so the stuff to be updated is already in the cache. If your slave has enough disk drives, and you can use iostat and see that the disks are 10% busy, it probably means that you’re I/O bound by the serialized nature of replication, and you can use this method to decrease the time. If your disks are 100% busy AND I/O bound, it won’t help. This will be on the page linked above sometime next week.

Basically, it reads the binary logs, trying not to get too far ahead, and issues the queries so the disk can cache the right data.

Measuring MySQL Community

People didn’t even know about mysql planet! Is it over their heads, even if they did know about it?

User Activity — blogging and Developer Activity — blogging, patches

Some metrics in place now, blogs that mention MySQL, etc. What about experts-exchange.com, or sitepoint.com?

How do we measure community, how do we define the communities?

How do we know how our community is growing or shrinking?

Scoreboard Idea: have and opt-in to have a program that will send info to MySQL every so often.
issues: How do you get people to opt-in? Could be easy for Windows users. Getting stats from installations — incentive is that companies can see their own specific stuff, voluntary of course.

Other metrics — days to patch

Who are our community users???
1) enslistments in scoreboard
2) # patches
3) # blog entries
4) # forum posts

Easy API for: # of posts, # of posts solved? What can we offer sites like sitepoint.com and experts-exchange.com to get that data?

Privacy policy….what is it? make sure it’s moderated, so user can choose what to send, what not to send. ANYNONYMOUS.

What are the groups of users and how can we reach them?

How can services be advertised and marketed appropriately?

Simple enough for neophyte.

How much time was used in setting up the system rather than actually doing what they need? How can that be measured? Download and “powered by” timestamp….

Should we work with open source software like joomla! or drupal or phpmyadmin?

How can non-technical (or less-technical) people have a place to gather?

————————

What about hosting companies like Dreamhost?

Can you use google code search to find programs using mysql?

Technorati, slashdot, digg, delicio.us, etc. google blog search?

Surveys — what other dbs have you considered? what made you decide to download mysql? what kinds of people do you think could do what you did? (ie, your mother with no technical skills, etc). contest? maybe at the end of an installation of open source software? Maybe a yearly census type thing?

Blogs are a great thing, because lots of non-techie users use mysql. Also web-based services, do the users know they’re using mysql? do we care? blogspot and lj, stuff like that. You can download lj code if you want….but do we care about folks just using lj on the site…?

Referral program? Tell a friend how easy it was to install MySQL?

Servoy and db4free.net

Who’s using MySQL? What are they using it for? Where are they (location)? These are all categories…

People didn’t even know about mysql planet! Is it over their heads, even if they did know about it?

User Activity — blogging and Developer Activity — blogging, patches

Some metrics in place now, blogs that mention MySQL, etc. What about experts-exchange.com, or sitepoint.com?

How do we measure community, how do we define the communities?

How do we know how our community is growing or shrinking?

Scoreboard Idea: have and opt-in to have a program that will send info to MySQL every so often.
issues: How do you get people to opt-in? Could be easy for Windows users. Getting stats from installations — incentive is that companies can see their own specific stuff, voluntary of course.

Other metrics — days to patch

Who are our community users???
1) enslistments in scoreboard
2) # patches
3) # blog entries
4) # forum posts

Easy API for: # of posts, # of posts solved? What can we offer sites like sitepoint.com and experts-exchange.com to get that data?

Privacy policy….what is it? make sure it’s moderated, so user can choose what to send, what not to send. ANYNONYMOUS.

What are the groups of users and how can we reach them?

How can services be advertised and marketed appropriately?

Simple enough for neophyte.

How much time was used in setting up the system rather than actually doing what they need? How can that be measured? Download and “powered by” timestamp….

Should we work with open source software like joomla! or drupal or phpmyadmin?

How can non-technical (or less-technical) people have a place to gather?

————————

What about hosting companies like Dreamhost?

Can you use google code search to find programs using mysql?

Technorati, slashdot, digg, delicio.us, etc. google blog search?

Surveys — what other dbs have you considered? what made you decide to download mysql? what kinds of people do you think could do what you did? (ie, your mother with no technical skills, etc). contest? maybe at the end of an installation of open source software? Maybe a yearly census type thing?

Blogs are a great thing, because lots of non-techie users use mysql. Also web-based services, do the users know they’re using mysql? do we care? blogspot and lj, stuff like that. You can download lj code if you want….but do we care about folks just using lj on the site…?

Referral program? Tell a friend how easy it was to install MySQL?

Servoy and db4free.net

Who’s using MySQL? What are they using it for? Where are they (location)? These are all categories…

updated testimonials page

Certification Validation
certificate id: 74651
password: sheeri
               
                4.0 Core Certification Validation

Co-workers have said:

“It is with mixed feelings that I am writing today to tell you that Sheeri has accepted a job in the private sector and will be leaving Tufts on 22 February.

It’s hard to let go of a bright, cheerful and highly energetic co-worker like Sheeri. She always brings a lot of enthusiasm to bear on new projects and works hard at them even when they have lost their novelty and started to become tedious. Sheeri always puts her best effort into everything she takes on, so for purely selfish reasons I’m sorry to see her go.” – Bob Webber, Manager, University Systems Group, Tufts University

“Sheeri’s collaborative approach to systems’ design and administration ensured that relevant details came forth in the planning rather than testing or production stage of a project. Her care to get a job done well and prompt handling of issues enabled the team I led to be more productive than we had been under any other system administrator at Tufts.” http://tinyurl.com/q7fzv — must login to linkedin.com to view the original.

Clients have said:
“Your [technical] review has been really thorough, and you’ve gone above and beyond and done some things to make my job a lot easier!” — Brian Jepson, Editor, O’Reilly Media, Inc.

Other sites have said:
“Sheeri Kritzer is a certified MySQL administrator with an active interest and involvement in MySQL and the MySQL community. In her blog she discusses database security and performance in a practical and competent manner, while also covering events she is participating in as well as more general issues of interest to users and developers of MySQL. The blog seems to be updated regularly, and it contains many entries – all of which are well written and more extensive than one perhaps have come to expect from blogs today.” — Bill, the founder of PHPCentral.org at http://www.phpcentral.com/49-notable-mysql-blogs-reviews.html

Boston MySQL Meetup Group members have said the following about Sheeri’s presentations:

June 2006 Meetup
“Very KISS (keep it simple stupid). Presentation seemed accessible to people of different levels of technical knowledge; Jargon was kept to a minimum.”

March 2006 Meetup
“very interesting and informative”

January 2006 Meetup
“I found the meeting informative with many tips on backup techniques. Thank you for sharing your experience and for the pizza!”

“Very focused and informative presentation.”

“I thought it went rather well, I like the structure.”

Certification Validation
certificate id: 74651
password: sheeri
               
                4.0 Core Certification Validation

Co-workers have said:

“It is with mixed feelings that I am writing today to tell you that Sheeri has accepted a job in the private sector and will be leaving Tufts on 22 February.

It’s hard to let go of a bright, cheerful and highly energetic co-worker like Sheeri. She always brings a lot of enthusiasm to bear on new projects and works hard at them even when they have lost their novelty and started to become tedious. Sheeri always puts her best effort into everything she takes on, so for purely selfish reasons I’m sorry to see her go.” – Bob Webber, Manager, University Systems Group, Tufts University

“Sheeri’s collaborative approach to systems’ design and administration ensured that relevant details came forth in the planning rather than testing or production stage of a project. Her care to get a job done well and prompt handling of issues enabled the team I led to be more productive than we had been under any other system administrator at Tufts.” http://tinyurl.com/q7fzv — must login to linkedin.com to view the original.

Clients have said:
“Your [technical] review has been really thorough, and you’ve gone above and beyond and done some things to make my job a lot easier!” — Brian Jepson, Editor, O’Reilly Media, Inc.

Other sites have said:
“Sheeri Kritzer is a certified MySQL administrator with an active interest and involvement in MySQL and the MySQL community. In her blog she discusses database security and performance in a practical and competent manner, while also covering events she is participating in as well as more general issues of interest to users and developers of MySQL. The blog seems to be updated regularly, and it contains many entries – all of which are well written and more extensive than one perhaps have come to expect from blogs today.” — Bill, the founder of PHPCentral.org at http://www.phpcentral.com/49-notable-mysql-blogs-reviews.html

Boston MySQL Meetup Group members have said the following about Sheeri’s presentations:

June 2006 Meetup
“Very KISS (keep it simple stupid). Presentation seemed accessible to people of different levels of technical knowledge; Jargon was kept to a minimum.”

March 2006 Meetup
“very interesting and informative”

January 2006 Meetup
“I found the meeting informative with many tips on backup techniques. Thank you for sharing your experience and for the pizza!”

“Very focused and informative presentation.”

“I thought it went rather well, I like the structure.”

What’s Your Uptime? What’s Your Uptime Worth?

My company has 9 production MySQL servers.

Our company does:

over 4 billion queries a week — an average of over 450,000 per machine, though in reality 2 servers do near 1 billion themselves, 5 do about the average, and 2 do much less (about 65k and 100k queries).

receive over 1380 GB (almost 1.35 TB!!) of data per week, an average of over 153 GB per server.

send out over 1400 GB of data per week, an average of 157 GB per server.

Our hardware is only somewhat beefy — 64-bit architecture, 3.20 GHz Intel(R) Xeon(TM) CPU, 6GB of RAM in 4 of the 5 most-used servers (4GB in the others).

We make over USD $220,000 per week ($10 billion per year) in sales for our web application.

If we bought the highest level of service, Platinum, for all 9 production machines, the cost would be 0.40% of our sales. The cost would be less than the cost of a new IT person (even a junior IT person!), and the benefits that come with that include 72 hours of free consulting (8 hours per server).

What’s 0.40% of your company’s income? Would you be willing to spend that on buying MySQL Network, which not only includes the binary but *support*? How about willing to spend that on a great open source product and company?

What about 0.05% of your company’s income? Because that’s what the Basic Service Level would cost my company. What would it cost your company?

I’m not an employee of MySQL, so I can say things like “I am confident that the Community Edition is stable and reliable in enterprise-level production systems.” This is my personal experience, and I stick by that statement. I’ve been using it for years in heavy-use systems. Yes, there have been times where we’ve had severe bugs with MySQL, but those were all solved with an upgrade — minor version upgrade (ie, 4.1.14 to 4.1.19 solved our worst problem). (I’m not kidding — all of our issues we’ve ever asked on a forum about were resolved with an upgrade).

Let me be clear:
MySQL is not making the promise that the Community Edition is reliable in enterprise-level systems.

However, the fact is that the Community Edition IS reliable in an enterprise-level system.

Regardless of any guarantees MySQL makes or does not make, the Community Edition is robust and reliable, and I do not see that going away. I have worked for organizations that buy support contracts for open source products on principle, regardless of requirement or need, because they felt they should pay something for a good product, and wanted to give back to a community.

Ask yourself this question: If your company is not willing to pay $600/year for a certified enterprise-level product for your core, why are you using the product for your core at all? Remember, that price includes support, so that $600 price tag is even more meager.

This is not Oracle vs. Oracle Express. The Community Edition is not an afterthought. It’s the foundation.

My company has been running on MySQL for 6 years. Even if my company buys the Enterprise Edition, I will continue to use the Community Edition for not-for-profits, consulting gigs and my personal projects.

Folks who felt that MySQL was not robust enough for them stopped using it years ago, so what’s the issue with having the same excellent Community Edition that’s always been available to us?

I have yet to hear a valid complaint about this move. All I hear is “I don’t wanna pay!” This is, in my opinion, a very ethical and legal move for MySQL to take. Go ahead, whinge about “there’s a better version if you pay”. But you are forgetting — there’s a completely excellent version for free. The free version works very well on enterprise-level systems, even though MySQL will not say that formally.

My company has 9 production MySQL servers.

Our company does:

over 4 billion queries a week — an average of over 450,000 per machine, though in reality 2 servers do near 1 billion themselves, 5 do about the average, and 2 do much less (about 65k and 100k queries).

receive over 1380 GB (almost 1.35 TB!!) of data per week, an average of over 153 GB per server.

send out over 1400 GB of data per week, an average of 157 GB per server.

Our hardware is only somewhat beefy — 64-bit architecture, 3.20 GHz Intel(R) Xeon(TM) CPU, 6GB of RAM in 4 of the 5 most-used servers (4GB in the others).

We make over USD $220,000 per week ($10 billion per year) in sales for our web application.

If we bought the highest level of service, Platinum, for all 9 production machines, the cost would be 0.40% of our sales. The cost would be less than the cost of a new IT person (even a junior IT person!), and the benefits that come with that include 72 hours of free consulting (8 hours per server).

What’s 0.40% of your company’s income? Would you be willing to spend that on buying MySQL Network, which not only includes the binary but *support*? How about willing to spend that on a great open source product and company?

What about 0.05% of your company’s income? Because that’s what the Basic Service Level would cost my company. What would it cost your company?

I’m not an employee of MySQL, so I can say things like “I am confident that the Community Edition is stable and reliable in enterprise-level production systems.” This is my personal experience, and I stick by that statement. I’ve been using it for years in heavy-use systems. Yes, there have been times where we’ve had severe bugs with MySQL, but those were all solved with an upgrade — minor version upgrade (ie, 4.1.14 to 4.1.19 solved our worst problem). (I’m not kidding — all of our issues we’ve ever asked on a forum about were resolved with an upgrade).

Let me be clear:
MySQL is not making the promise that the Community Edition is reliable in enterprise-level systems.

However, the fact is that the Community Edition IS reliable in an enterprise-level system.

Regardless of any guarantees MySQL makes or does not make, the Community Edition is robust and reliable, and I do not see that going away. I have worked for organizations that buy support contracts for open source products on principle, regardless of requirement or need, because they felt they should pay something for a good product, and wanted to give back to a community.

Ask yourself this question: If your company is not willing to pay $600/year for a certified enterprise-level product for your core, why are you using the product for your core at all? Remember, that price includes support, so that $600 price tag is even more meager.

This is not Oracle vs. Oracle Express. The Community Edition is not an afterthought. It’s the foundation.

My company has been running on MySQL for 6 years. Even if my company buys the Enterprise Edition, I will continue to use the Community Edition for not-for-profits, consulting gigs and my personal projects.

Folks who felt that MySQL was not robust enough for them stopped using it years ago, so what’s the issue with having the same excellent Community Edition that’s always been available to us?

I have yet to hear a valid complaint about this move. All I hear is “I don’t wanna pay!” This is, in my opinion, a very ethical and legal move for MySQL to take. Go ahead, whinge about “there’s a better version if you pay”. But you are forgetting — there’s a completely excellent version for free. The free version works very well on enterprise-level systems, even though MySQL will not say that formally.

“I Work Best Under Pressure”

I am helping a friend with a project by doing DBA, sysadmin and project management work, because he has 2 junior programmers on staff. I have a lot of patience, so it works out OK. But just to give an example:

One programmer had “10 years’ experience” and when I asked his experience level when I first met him, because I knew my friend had been looking for junior programmers, he got all offended and “hoped” that’s not how he was thought of.

This same programmer said “I work best under pressure.” That really irked me. You know what that says to me? “I cannot prioritize; I procrastinate; I really need pressure to motivate me to get something done.”

Adrenaline is nice, but adrenaline does not make up for well-thought out steps, nor sane development processes.

“I work well under pressure” is great, but I am very suspicious of folks who need pressure to work well. This applies to any line of work — emergency personnel, military, etc. These folks are constantly thrust into highly pressurized environments, but I doubt anyone would say “I can handle saving a life well if I have 30 seconds, but if I have 30 minutes, I’m a lot less efficient.”

Adrenaline junkies need not apply.

I am helping a friend with a project by doing DBA, sysadmin and project management work, because he has 2 junior programmers on staff. I have a lot of patience, so it works out OK. But just to give an example:

One programmer had “10 years’ experience” and when I asked his experience level when I first met him, because I knew my friend had been looking for junior programmers, he got all offended and “hoped” that’s not how he was thought of.

This same programmer said “I work best under pressure.” That really irked me. You know what that says to me? “I cannot prioritize; I procrastinate; I really need pressure to motivate me to get something done.”

Adrenaline is nice, but adrenaline does not make up for well-thought out steps, nor sane development processes.

“I work well under pressure” is great, but I am very suspicious of folks who need pressure to work well. This applies to any line of work — emergency personnel, military, etc. These folks are constantly thrust into highly pressurized environments, but I doubt anyone would say “I can handle saving a life well if I have 30 seconds, but if I have 30 minutes, I’m a lot less efficient.”

Adrenaline junkies need not apply.

Example of Query Clarity with BETWEEN

A bit ago ago I posted some optimizer hints and recommended using them to write more readable queries. Today I stumbled across an example of my own work that I changed. This is not quite a case of using BETWEEN instead of AND statements, but it sort of is. Basically, I needed to reuse this query:

SELECT count(*),city,
CASE WHEN lastLogin < now ()-interval 90 day THEN 90 WHEN lastLogin < NOW()-interval 60 day THEN 60 WHEN lastLogin < NOW()-interval 30 day THEN 30 WHEN lastLogin > NOW()-interval 30 day THEN 0
ELSE -1
END
FROM . . .

I wrote this query less than a month ago.

I looked at this query today and wondered, “why did I not put lastLogin < now ()-interval 60 AND lastLogin>NOW()-interval 90 in there?” I then realized what I did.

Because the CASE statement evaluates in order, to make the query more concise I used a standard procedural programming trick — I put the older cases first. In this way, something older than 90 days gets caught in the first case, something between 60-90 days gets caught in the 2nd case, etc.

This is great, and probably even standard for a procedural language. And in fact, MySQL’s CASE statement works the same.

However, it is much more clear (in my mind) to write:

SELECT count(*),city,
CASE WHEN lastLogin < (NOW()-interval 90 day) THEN 90 WHEN lastLogin BETWEEN (NOW()-INTERVAL 60 DAY) AND (NOW()-INTERVAL 90 DAY) THEN 60 WHEN lastLogin BETWEEN (NOW()-INTERVAL 30 DAY) AND (NOW()-INTERVAL 60 DAY) THEN 30 WHEN lastLogin < (NOW()-INTERVAL 30 DAY) THEN 0 ELSE -1 END FROM . . .

It's more characters, which is less efficient. But I feel it is more readable, because it really does spell out declaratively what I am looking for.

A bit ago ago I posted some optimizer hints and recommended using them to write more readable queries. Today I stumbled across an example of my own work that I changed. This is not quite a case of using BETWEEN instead of AND statements, but it sort of is. Basically, I needed to reuse this query:

SELECT count(*),city,
CASE WHEN lastLogin < now ()-interval 90 day THEN 90 WHEN lastLogin < NOW()-interval 60 day THEN 60 WHEN lastLogin < NOW()-interval 30 day THEN 30 WHEN lastLogin > NOW()-interval 30 day THEN 0
ELSE -1
END
FROM . . .

I wrote this query less than a month ago.

I looked at this query today and wondered, “why did I not put lastLogin < now ()-interval 60 AND lastLogin>NOW()-interval 90 in there?” I then realized what I did.

Because the CASE statement evaluates in order, to make the query more concise I used a standard procedural programming trick — I put the older cases first. In this way, something older than 90 days gets caught in the first case, something between 60-90 days gets caught in the 2nd case, etc.

This is great, and probably even standard for a procedural language. And in fact, MySQL’s CASE statement works the same.

However, it is much more clear (in my mind) to write:

SELECT count(*),city,
CASE WHEN lastLogin < (NOW()-interval 90 day) THEN 90 WHEN lastLogin BETWEEN (NOW()-INTERVAL 60 DAY) AND (NOW()-INTERVAL 90 DAY) THEN 60 WHEN lastLogin BETWEEN (NOW()-INTERVAL 30 DAY) AND (NOW()-INTERVAL 60 DAY) THEN 30 WHEN lastLogin < (NOW()-INTERVAL 30 DAY) THEN 0 ELSE -1 END FROM . . .

It's more characters, which is less efficient. But I feel it is more readable, because it really does spell out declaratively what I am looking for.