I think many people truly realized how much they take the MySQL documentation for granted during the recent multi-hour outage from mysql.com’s data center. Apparently there is a lot of FUD floating around about the legality of mirroring the documentation, as presented by Justin Swanhart and asked by Mark Callaghan.
The manual page at http://dev.mysql.com/doc/refman/5.1/en/copyright-mysql.html says:
You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Sun disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium.
Now, I am not a lawyer, however, to me this means that you can indeed mirror the documentation, so long as you mirror the binaries as well. Giuseppe commented on Mark’s post (linked above) saying “There is no license restriction to mirror the docs.”
Note that I played a part in unknowingly spreading the FUD — I thought special permission was needed to mirror the documentation (and binaries) and indeed, it is not.
As a postscript, what are everyone’s favorite site mirror programs? Searching http://www.ohloh.net for an open-source website mirror did not reveal anything very popular, though I am sure there are a few “standard” mirroring tools that folks use. (Perhaps I should have searched for spiders, and seen which spiders have sync/download capabilities?)
I think many people truly realized how much they take the MySQL documentation for granted during the recent multi-hour outage from mysql.com’s data center. Apparently there is a lot of FUD floating around about the legality of mirroring the documentation, as presented by Justin Swanhart and asked by Mark Callaghan.
The manual page at http://dev.mysql.com/doc/refman/5.1/en/copyright-mysql.html says:
You shall not publish or distribute this documentation in any form or on any media, except if you distribute the documentation in a manner similar to how Sun disseminates it (that is, electronically for download on a Web site with the software) or on a CD-ROM or similar medium, provided however that the documentation is disseminated together with the software on the same medium.
Now, I am not a lawyer, however, to me this means that you can indeed mirror the documentation, so long as you mirror the binaries as well. Giuseppe commented on Mark’s post (linked above) saying “There is no license restriction to mirror the docs.”
Note that I played a part in unknowingly spreading the FUD — I thought special permission was needed to mirror the documentation (and binaries) and indeed, it is not.
As a postscript, what are everyone’s favorite site mirror programs? Searching http://www.ohloh.net for an open-source website mirror did not reveal anything very popular, though I am sure there are a few “standard” mirroring tools that folks use. (Perhaps I should have searched for spiders, and seen which spiders have sync/download capabilities?)
OpenSQLCamp 2009 is happening “in parallel to the Free and Open Source Conference 2009 (FrOSCon) on Saturday 22nd and Sunday 23rd August in St. Augustin, Germany …. close to Bonn and Cologne.”
I plan on being at FrOSCon and OpenSQLCamp. Where I go before and after that is up to *you*. Yes, that is right, perhaps I will visit a user group, such as France’s MySQL User Group. Or perhaps your company needs the type of services Pythian can offer — we can do the “traditional consulting” model where we look over your systems for performance tuning and security gains, or fix problems in an emergency. Even more of a win, we specialize in recurring engineering — we can supplement your existing IT staff with database expertise, and do all the database work your current staff does not have time to do.
We do it all, whether it’s large project work that would ordinarily be farmed out to consultants such as new schema development, ETL and reporting scripts and data warehouse creation; or the ordinary but important stuff that keeps the business running like ensuring backups work, constructing a no-downtime architecture, making the development environment realistic, testing failover, and capacity planning.
If you have a mixed environment, that’s OK too — we cover the major database vendors (open and closed source) and have a team of system administrators if you need help in that arena.
Keep in mind — I’m not a sales person, I’m a tech person. Day-to-day I not only lead a team of MySQL DBA’s, I am one myself, and have my hands in the tech stuff all the time. Most of the reasons I like working at Pythian are the same reasons our clients love us — 24×7 coverage to me means “my pager only goes off past 5 pm if it’s the one weekend a month I’m oncall”, to a client it means “whoever is paged is already awake most of the time.”
So if you would like to sit down with me and have a drink, just say “hi”, speak at an upcoming User Group, or talk about what Pythian can do for you, comment here, or e-mail me at “cabral at pythian dot com” and we will work something out. I can come to your European location, meet you at FrOSCon or OpenSQLCamp, or we can arrange a meeting at our Prague office sometime around the end of August.
OpenSQLCamp 2009 is happening “in parallel to the Free and Open Source Conference 2009 (FrOSCon) on Saturday 22nd and Sunday 23rd August in St. Augustin, Germany …. close to Bonn and Cologne.”
I plan on being at FrOSCon and OpenSQLCamp. Where I go before and after that is up to *you*. Yes, that is right, perhaps I will visit a user group, such as France’s MySQL User Group. Or perhaps your company needs the type of services Pythian can offer — we can do the “traditional consulting” model where we look over your systems for performance tuning and security gains, or fix problems in an emergency. Even more of a win, we specialize in recurring engineering — we can supplement your existing IT staff with database expertise, and do all the database work your current staff does not have time to do.
We do it all, whether it’s large project work that would ordinarily be farmed out to consultants such as new schema development, ETL and reporting scripts and data warehouse creation; or the ordinary but important stuff that keeps the business running like ensuring backups work, constructing a no-downtime architecture, making the development environment realistic, testing failover, and capacity planning.
If you have a mixed environment, that’s OK too — we cover the major database vendors (open and closed source) and have a team of system administrators if you need help in that arena.
Keep in mind — I’m not a sales person, I’m a tech person. Day-to-day I not only lead a team of MySQL DBA’s, I am one myself, and have my hands in the tech stuff all the time. Most of the reasons I like working at Pythian are the same reasons our clients love us — 24×7 coverage to me means “my pager only goes off past 5 pm if it’s the one weekend a month I’m oncall”, to a client it means “whoever is paged is already awake most of the time.”
So if you would like to sit down with me and have a drink, just say “hi”, speak at an upcoming User Group, or talk about what Pythian can do for you, comment here, or e-mail me at “cabral at pythian dot com” and we will work something out. I can come to your European location, meet you at FrOSCon or OpenSQLCamp, or we can arrange a meeting at our Prague office sometime around the end of August.
A short time ago I posted how I was Using XtraDB Backup to backup InnoDB. Overall, the blog post was positive, but experiences that others have had (and commented to on that blog post) have made me want to put out another short article about using XtraDB backup.
The first few points remain the same — the backup process is stable, we were able to use the binaries without compiling, and using Innobackupex as the wrapper script, analogous to Innobackup.pl.
However, we did figure out why Xtrabackup had to be run as the mysql user:
Xtrabackup writes to the data dictionary file (ibdata1, for example). We have not examined if it also writes to the data and index files (still ibdata1 by default, or the .ibd files when using innodb_file_per_table). [EDIT: The authors of Xtrabackup have commented below as to why the write occurs:
xtrabackup is kind of very small and restricted instance of InnoDB, and reuses a lot of InnoDB code.
InnoDB by default requires O_RDWR option on ibdata1 files at start, and xtrabackup therefore also did that. In the latest push to trunk it was fixed, now xtrabackup opens file with O_RDONLY flag.
When the new version is released, we will be sure to test it so that we can run the backup as a read-only user, and report back.]
On the one hand, Xtrabackup is a free tool. On the other hand, modifying InnoDB’s underlying files risks corrupting all the InnoDB tables in the system. Which is a tricky situation when it is your backup tool that might cause corruption that is beyond repair, as you do not know if you can trust your backups.
Regarding the complaint Dan R commented on the previous post that innobackupex could stream the backup to gzip, and helpfully gave the syntax. Shlomi Noach also pointed out streaming as a feature of Xtrabackup that ibbackup does not have. However, Gerry Narvaja, a co-worker noted (and commented):
I’ve been trying to install xtrabackup/innobackupex for a customer and I’m finding a few glitches, especially w/ streaming:
We use innoback(ex) wrapped in our own scripts to allow for rotation and other operations. We also use ZRM for some installations, so this would apply to integration with this tool as well. These are the glitches I found:
1. Using streaming by piping it into ‘gzip’ masks the return code from innobackupex.pl. Since gzip will almost always return 0, you can’t rely on it to determine backup success.
2. The next alternative would be to review the innobackupex.pl’s output for the OK at the end. But since it redirects the output to ’stderr’ to allow for streaming, you need to add “2> innobackupex.log” before piping and grep for the “OK” at the end.
and noted that there were some limitations:
innobackupex script is limited in the type of options you can specify compared to what the xtrabackup executable supports. I find this annoying since it limits the directories where you can have the backups, data directories and logs.
Xtrabackup doesn’t work for MySQL v4.1. In the Percona forums there was a suggestion that the 5.0 patch should work. This is true, but xtrabackup.c has other dependencies on 5.x definitions and structures I didn’t have time to review. Baron Schwartz correctly suggested that these dependencies might be trivial in a tweet directed to myself. I’ll post my findings to the Percona forums and hopefully we can soon have a patched version.
A short time ago I posted how I was Using XtraDB Backup to backup InnoDB. Overall, the blog post was positive, but experiences that others have had (and commented to on that blog post) have made me want to put out another short article about using XtraDB backup.
The first few points remain the same — the backup process is stable, we were able to use the binaries without compiling, and using Innobackupex as the wrapper script, analogous to Innobackup.pl.
However, we did figure out why Xtrabackup had to be run as the mysql user:
Xtrabackup writes to the data dictionary file (ibdata1, for example). We have not examined if it also writes to the data and index files (still ibdata1 by default, or the .ibd files when using innodb_file_per_table). [EDIT: The authors of Xtrabackup have commented below as to why the write occurs:
xtrabackup is kind of very small and restricted instance of InnoDB, and reuses a lot of InnoDB code.
InnoDB by default requires O_RDWR option on ibdata1 files at start, and xtrabackup therefore also did that. In the latest push to trunk it was fixed, now xtrabackup opens file with O_RDONLY flag.
When the new version is released, we will be sure to test it so that we can run the backup as a read-only user, and report back.]
On the one hand, Xtrabackup is a free tool. On the other hand, modifying InnoDB’s underlying files risks corrupting all the InnoDB tables in the system. Which is a tricky situation when it is your backup tool that might cause corruption that is beyond repair, as you do not know if you can trust your backups.
Regarding the complaint Dan R commented on the previous post that innobackupex could stream the backup to gzip, and helpfully gave the syntax. Shlomi Noach also pointed out streaming as a feature of Xtrabackup that ibbackup does not have. However, Gerry Narvaja, a co-worker noted (and commented):
I’ve been trying to install xtrabackup/innobackupex for a customer and I’m finding a few glitches, especially w/ streaming:
We use innoback(ex) wrapped in our own scripts to allow for rotation and other operations. We also use ZRM for some installations, so this would apply to integration with this tool as well. These are the glitches I found:
1. Using streaming by piping it into ‘gzip’ masks the return code from innobackupex.pl. Since gzip will almost always return 0, you can’t rely on it to determine backup success.
2. The next alternative would be to review the innobackupex.pl’s output for the OK at the end. But since it redirects the output to ’stderr’ to allow for streaming, you need to add “2> innobackupex.log” before piping and grep for the “OK” at the end.
and noted that there were some limitations:
innobackupex script is limited in the type of options you can specify compared to what the xtrabackup executable supports. I find this annoying since it limits the directories where you can have the backups, data directories and logs.
Xtrabackup doesn’t work for MySQL v4.1. In the Percona forums there was a suggestion that the 5.0 patch should work. This is true, but xtrabackup.c has other dependencies on 5.x definitions and structures I didn’t have time to review. Baron Schwartz correctly suggested that these dependencies might be trivial in a tweet directed to myself. I’ll post my findings to the Percona forums and hopefully we can soon have a patched version.
Today’s trivial MySQL system variable:
old_alter_table
The interesting bit is that this is a system variable, and shows up in SHOW GLOBAL VARIABLES, but is not documented on the Server System Variables manual page.
Instead, it is documented on the manual page for Server options.
Unfortunately, that documentation is very sparse. It tells us:
old-alter-table
is an option that can be set in an option file (such as /etc/my.cnf)
old_alter_table
is the name of the variable.
And….that’s it. It is neither a system nor status variable, there is no scope, and no entry for whether or not it is dynamic. These last 2 are baffling, though they show up in other variables in the matrix on the “server options” manual page. The scope is either session or global; there is no NULL — the scope may not be *relevant*, but it still exists.
As well, either the variable can be settable on commandline, or not. There is no meaning to a NULL value in that column.
The most egregious issue is that there is no documentation whatsoever about what the variable does. What happens if I set old_alter_table to ON?
(My guess is that it’s a placeholder for the behavior of ALTER TABLE in 5.0 and earlier, perhaps it disables the use of ALTER TABLE ONLINE? My second guess is that whatever the functionality, it is not implemented yet, similar to date_format).
Today’s trivial MySQL system variable:
old_alter_table
The interesting bit is that this is a system variable, and shows up in SHOW GLOBAL VARIABLES, but is not documented on the Server System Variables manual page.
Instead, it is documented on the manual page for Server options.
Unfortunately, that documentation is very sparse. It tells us:
old-alter-table
is an option that can be set in an option file (such as /etc/my.cnf)
old_alter_table
is the name of the variable.
And….that’s it. It is neither a system nor status variable, there is no scope, and no entry for whether or not it is dynamic. These last 2 are baffling, though they show up in other variables in the matrix on the “server options” manual page. The scope is either session or global; there is no NULL — the scope may not be *relevant*, but it still exists.
As well, either the variable can be settable on commandline, or not. There is no meaning to a NULL value in that column.
The most egregious issue is that there is no documentation whatsoever about what the variable does. What happens if I set old_alter_table to ON?
(My guess is that it’s a placeholder for the behavior of ALTER TABLE in 5.0 and earlier, perhaps it disables the use of ALTER TABLE ONLINE? My second guess is that whatever the functionality, it is not implemented yet, similar to date_format).
The July meeting of the Boston MySQL User Group will feature Eric Day, a prominent Drizzle developer, talking about Drizzle and Gearman:
In this talk we will discuss two growing technologies: Drizzle and Gearman.
We will explain what the Drizzle project is, what we aim to accomplish, and an overview of where we are at. We will also be introducing the fundamentals of how to leverage Gearman, an open-source, distributed job queuing system. Gearman’s generic design allows it to be used as a building block for almost any use – from speeding up your website to building your own Map/Reduce cluster. We will tie Drizzle and Gearman together and demonstrate how they work in a custom Search Engine application.
————————
Here is the URL for MIT’s Map with the location of this building:
http://whereis.mit.edu/map-jpg?selection=E51&Buildings=go
This map shows the MBTA Kendall Stop:
http://whereis.mit.edu/map-jpg?selection=L5&Landmarks=go
(the stop is in red on that map, and you can see E51 in the bottom right)
Here are the URL’s for the parking lots (free and open to the public after 3 pm):
http://whereis.mit.edu/map-jpg?selection=P4&Parking=go
http://whereis.mit.edu/map-jpg?selection=P5&Parking=go
Free pizza and soda will be served, so please RSVP accurately.
To RSVP anonymously, please login to the Meetup site with the e-mail address “admin at sheeri dot com” and the password “guest”.
For more information, see: http://mysql.meetup.com/137/
The July meeting of the Boston MySQL User Group will feature Eric Day, a prominent Drizzle developer, talking about Drizzle and Gearman:
In this talk we will discuss two growing technologies: Drizzle and Gearman.
We will explain what the Drizzle project is, what we aim to accomplish, and an overview of where we are at. We will also be introducing the fundamentals of how to leverage Gearman, an open-source, distributed job queuing system. Gearman’s generic design allows it to be used as a building block for almost any use – from speeding up your website to building your own Map/Reduce cluster. We will tie Drizzle and Gearman together and demonstrate how they work in a custom Search Engine application.
————————
Here is the URL for MIT’s Map with the location of this building:
http://whereis.mit.edu/map-jpg?selection=E51&Buildings=go
This map shows the MBTA Kendall Stop:
http://whereis.mit.edu/map-jpg?selection=L5&Landmarks=go
(the stop is in red on that map, and you can see E51 in the bottom right)
Here are the URL’s for the parking lots (free and open to the public after 3 pm):
http://whereis.mit.edu/map-jpg?selection=P4&Parking=go
http://whereis.mit.edu/map-jpg?selection=P5&Parking=go
Free pizza and soda will be served, so please RSVP accurately.
To RSVP anonymously, please login to the Meetup site with the e-mail address “admin at sheeri dot com” and the password “guest”.
For more information, see: http://mysql.meetup.com/137/
This is the 150th edition of Log Buffer, the weekly review of database blogs. Someone accidentally left Dave Edwards‘ cage unlocked, and he escaped, thus leaving me with the pleasurable duty of compiling the 150th weekly Log Buffer.
Many people other than Dave are finding release this week. Giuseppe Maxia explains some details of MySQL’s New Release Model. Andrew Morgan announces a New MySQL Cluster Maintenance Release. Aleksandr Kuzminsky of the MySQL Performance Blog releases build16 of the Percona MySQL binaries (versions 5.0.77 and 5.0.82), which adds some 5.4 features and fixes some bugs.
Darran Cassar, the MySQL Preacher, has created a package for Security Roles and Password Expiry on MySQL. And for a future MySQL release, look for Two New Status Variable Patches, for query invalidation count and a last received datetime for replication heartbeat. These patches were contributed by MySQL Support Team member Andrew Hutchins.
Dave Beulke points out a new feature in DB2 9.7 — DB2 Compatible with Oracle.
To prepare for a future SQL Server release where CREATE DEFAULT, sp_bindefault and sp_unbindefault will be deprecated, Martin Bell advocates Changing Bound Defaults to Default Constraints. If you are going to upgrade SQL Server, definitely look at the notes from SQL Master of SQL Server QA’s presentation on SQL Server Upgrade Issues and How To Evaluate Potential Issues.
Stewart Smith lets is know that Drizzle Tarballs for the Next Milestone – Aloha are being released weekly. Meanwhile, Jay Pipes and the rest of the Drizzle team find, fix and explain the cause of a performance regression in Drizzle Performance Regression Solved – TCMalloc vs. No TCMalloc.
Lenz Grimmer has started organizing OpenSQLCamp 2009, Aug 22-23 in Germany; he posts details and links in Speaking at FrOSCon and Organizing the OpenSQLCamp 2009, European Edition. If you want to speak, Lenz also lets you know that Ronald Bradford gives out a discount code and reminds us that we can still attend OSCon 2009 at a Discounted Rate (until June 23rd). OSCon 2009 will be held July 20-24 in San Jose, California. And in Iowa, Michelle Ufford sends out the East Iowa SQL Saturday Call for Speakers to be held on Saturday, October 10, 2009.
Getting back to basics, Richard Foote explains Oracle’s cost-based optimizer in CBO and Indexes, an Introduction for Absolute Beginners. Speaking of optimizations, Valcora has Another Way To Do Performance Tuning — make sure you actually need the queries that are running against your system!
Tanel Poder points to a blog post on Using Perfsheet and TPT Scripts for Solving Real Life Performance Problems in an Oracle RAC environment. And Jonathan Lewis provides a script you can run if you are concerned about the potantial of Oracle PGA leaks. Over at Oraclue, Miladin Modrakovic shows how to discover memory “leaks and other problems with allocations of memory” in Memory Annotations and Oradebug.
If you are migrating a database from Oracle to MySQL, you may be interested in George Trujillo’s process of Converting an Oracle Schema to MySQL.
Kimberly L. Tripp reveals a lot of information about how SQL Server optimizes queries and common myths when she reveals The Tipping Point Query Answers. David Fetter shares Materialized Views Performance Tips in Postgres, and Leo Hsu and Regina Obe talk about Planner Statistics in the Postgres optimizer.
In the land of DB2, Henrik Loeser shares a PureXML Performance Tip: A Sequence of Good Indexes.
Coskan explains
How to Use Sysman Schema Without Oracle Enterprise Manager. John Hallas notes that using Oracle’s EM to migrate a database to ASM is easy, but seems slow, in ASM Metadata and Migrating a Database to ASM. He then goes on to share a coworker’s Script to Backup ASM Metadata. J. Arneil shows how to go about Fixing up ASM Disk Header Corruption, should you find yourself in a rough spot.
Aaron Alton has a great article telling us that in Defensive Programming, Assumptions Must be Guaranteed or Tested, and another one on handling tags efficiently in Full Text Search vs. Denormalized Tables Remus Rusanu provides a Transact-SQL stored procedure template for Exception Handling and Nested Transactions.
I’ll end with a link to another survey on What’s the hardest part of becoming an involuntary DBA? It’s one simple question, so go fill it out! You have the time, especially since Craig Mullins points out that on average, we got a 4.6% salary increase in 2008 in Salaries for Data Professionals Inching Upward. To learn more and become even better in your field, get a 15-day free trial to Safari Books Online from O’Reilly, with a 15% discount if you continue past the free trial, courtesy of Susan Visser.
This is the 150th edition of Log Buffer, the weekly review of database blogs. Someone accidentally left Dave Edwards‘ cage unlocked, and he escaped, thus leaving me with the pleasurable duty of compiling the 150th weekly Log Buffer.
Many people other than Dave are finding release this week. Giuseppe Maxia explains some details of MySQL’s New Release Model. Andrew Morgan announces a New MySQL Cluster Maintenance Release. Aleksandr Kuzminsky of the MySQL Performance Blog releases build16 of the Percona MySQL binaries (versions 5.0.77 and 5.0.82), which adds some 5.4 features and fixes some bugs.
Darran Cassar, the MySQL Preacher, has created a package for Security Roles and Password Expiry on MySQL. And for a future MySQL release, look for Two New Status Variable Patches, for query invalidation count and a last received datetime for replication heartbeat. These patches were contributed by MySQL Support Team member Andrew Hutchins.
Dave Beulke points out a new feature in DB2 9.7 — DB2 Compatible with Oracle.
To prepare for a future SQL Server release where CREATE DEFAULT, sp_bindefault and sp_unbindefault will be deprecated, Martin Bell advocates Changing Bound Defaults to Default Constraints. If you are going to upgrade SQL Server, definitely look at the notes from SQL Master of SQL Server QA’s presentation on SQL Server Upgrade Issues and How To Evaluate Potential Issues.
Stewart Smith lets is know that Drizzle Tarballs for the Next Milestone – Aloha are being released weekly. Meanwhile, Jay Pipes and the rest of the Drizzle team find, fix and explain the cause of a performance regression in Drizzle Performance Regression Solved – TCMalloc vs. No TCMalloc.
Lenz Grimmer has started organizing OpenSQLCamp 2009, Aug 22-23 in Germany; he posts details and links in Speaking at FrOSCon and Organizing the OpenSQLCamp 2009, European Edition. If you want to speak, Lenz also lets you know that Ronald Bradford gives out a discount code and reminds us that we can still attend OSCon 2009 at a Discounted Rate (until June 23rd). OSCon 2009 will be held July 20-24 in San Jose, California. And in Iowa, Michelle Ufford sends out the East Iowa SQL Saturday Call for Speakers to be held on Saturday, October 10, 2009.
Getting back to basics, Richard Foote explains Oracle’s cost-based optimizer in CBO and Indexes, an Introduction for Absolute Beginners. Speaking of optimizations, Valcora has Another Way To Do Performance Tuning — make sure you actually need the queries that are running against your system!
Tanel Poder points to a blog post on Using Perfsheet and TPT Scripts for Solving Real Life Performance Problems in an Oracle RAC environment. And Jonathan Lewis provides a script you can run if you are concerned about the potantial of Oracle PGA leaks. Over at Oraclue, Miladin Modrakovic shows how to discover memory “leaks and other problems with allocations of memory” in Memory Annotations and Oradebug.
If you are migrating a database from Oracle to MySQL, you may be interested in George Trujillo’s process of Converting an Oracle Schema to MySQL.
Kimberly L. Tripp reveals a lot of information about how SQL Server optimizes queries and common myths when she reveals The Tipping Point Query Answers. David Fetter shares Materialized Views Performance Tips in Postgres, and Leo Hsu and Regina Obe talk about Planner Statistics in the Postgres optimizer.
In the land of DB2, Henrik Loeser shares a PureXML Performance Tip: A Sequence of Good Indexes.
Coskan explains
How to Use Sysman Schema Without Oracle Enterprise Manager. John Hallas notes that using Oracle’s EM to migrate a database to ASM is easy, but seems slow, in ASM Metadata and Migrating a Database to ASM. He then goes on to share a coworker’s Script to Backup ASM Metadata. J. Arneil shows how to go about Fixing up ASM Disk Header Corruption, should you find yourself in a rough spot.
Aaron Alton has a great article telling us that in Defensive Programming, Assumptions Must be Guaranteed or Tested, and another one on handling tags efficiently in Full Text Search vs. Denormalized Tables Remus Rusanu provides a Transact-SQL stored procedure template for Exception Handling and Nested Transactions.
I’ll end with a link to another survey on What’s the hardest part of becoming an involuntary DBA? It’s one simple question, so go fill it out! You have the time, especially since Craig Mullins points out that on average, we got a 4.6% salary increase in 2008 in Salaries for Data Professionals Inching Upward. To learn more and become even better in your field, get a 15-day free trial to Safari Books Online from O’Reilly, with a 15% discount if you continue past the free trial, courtesy of Susan Visser.
About six months ago, the question of storing images in a database came up. This is one of my favorite topics, and has many database-agnostic parts.
Personally, I think “tell me about storing images in a database” is actually a great interview question, because you will be able to see the difference between someone who has just memorized “what’s right” versus someone who is really thinking. It also helps you see how someone will communicate — if they just say “NEVER do it, it’s as bad as crossing the streams!” then they are a type of person that gives you a short answer, without much explanation, and without many nuances. (That may be what you are looking for, but usually you want someone who gives reasons for why they strongly feel one way or another).
Consider the following cases:
What about storing lots of small
(say 1Mb or less) images that change frequently (say, dating profile pictures)?
What about large images that change infrequently (say, highly detailed satellite images)?
How important is consistency, and how will data consistency be maintained if images are stored outside the database? How will consistent backups be done?
What other resources (such as a CDN) are available?
What compliance and auditing issues might this client have?
What is the “sweet spot” in terms of size of an image stored in the database, and does that apply to text too? Would it be OK to have a TEXT or LONGTEXT field in the database, but all images are stored outside of the database?
All of these (and more) may swing the pendulum one way or another.
Putting images on a filesystem is the most popular method of doing things. We do have a client that stores “big BLOBby data” in a table (documents, images) — this is mostly for consistency and archiving purposes.
The cons of storing images in a database:
– more db traffic — in, out, and through (where through = replication)
– backups and exports from the db are more of a pain. Data/index files and tablespaces can get quite large. It’s nicer to be able to rsync
files….
– restores are more of a pain too.
– there is no computational stuff really that the db can do (other than, say, length). It cannot sort, compare, etc (well it can but it’s
meaningless). Using separate files makes coding actions such as like making thumbnails much easier.
The pros of storing images in a database:
– you have data integrity and consistency. If you delete a record you can cascade that delete down to the image. You cannot do a JOIN of an OS
and a db.
– databases are really really good at storage and retrieval. It can be argued that filesystems are really really good at storage and retrieval too, but for high activity only certain types of filesystems are, and if you do not have a ReiserFS or MogileFS expert, you may end up with more
than you bargained for. Very likely, you have someone already knowledgeable about databases.
Things that can be pros and cons:
– filesystem vs. database caching. This really depends on the filesystem and the DBMS you’re using. Many folks also use other caching methods for images too…..(webserver/app server/memcached/CDN/whatever)
Here are some references:
Oracle’s whitepaper on The Move to Store Images in a Database
An Article I wrote in Mar 2006, which is still highly relevant. Make sure to read all the comments, including following links like Mike Kruckenberg’s thoughts.
About six months ago, the question of storing images in a database came up. This is one of my favorite topics, and has many database-agnostic parts.
Personally, I think “tell me about storing images in a database” is actually a great interview question, because you will be able to see the difference between someone who has just memorized “what’s right” versus someone who is really thinking. It also helps you see how someone will communicate — if they just say “NEVER do it, it’s as bad as crossing the streams!” then they are a type of person that gives you a short answer, without much explanation, and without many nuances. (That may be what you are looking for, but usually you want someone who gives reasons for why they strongly feel one way or another).
Consider the following cases:
What about storing lots of small
(say 1Mb or less) images that change frequently (say, dating profile pictures)?
What about large images that change infrequently (say, highly detailed satellite images)?
How important is consistency, and how will data consistency be maintained if images are stored outside the database? How will consistent backups be done?
What other resources (such as a CDN) are available?
What compliance and auditing issues might this client have?
What is the “sweet spot” in terms of size of an image stored in the database, and does that apply to text too? Would it be OK to have a TEXT or LONGTEXT field in the database, but all images are stored outside of the database?
All of these (and more) may swing the pendulum one way or another.
Putting images on a filesystem is the most popular method of doing things. We do have a client that stores “big BLOBby data” in a table (documents, images) — this is mostly for consistency and archiving purposes.
The cons of storing images in a database:
– more db traffic — in, out, and through (where through = replication)
– backups and exports from the db are more of a pain. Data/index files and tablespaces can get quite large. It’s nicer to be able to rsync
files….
– restores are more of a pain too.
– there is no computational stuff really that the db can do (other than, say, length). It cannot sort, compare, etc (well it can but it’s
meaningless). Using separate files makes coding actions such as like making thumbnails much easier.
The pros of storing images in a database:
– you have data integrity and consistency. If you delete a record you can cascade that delete down to the image. You cannot do a JOIN of an OS
and a db.
– databases are really really good at storage and retrieval. It can be argued that filesystems are really really good at storage and retrieval too, but for high activity only certain types of filesystems are, and if you do not have a ReiserFS or MogileFS expert, you may end up with more
than you bargained for. Very likely, you have someone already knowledgeable about databases.
Things that can be pros and cons:
– filesystem vs. database caching. This really depends on the filesystem and the DBMS you’re using. Many folks also use other caching methods for images too…..(webserver/app server/memcached/CDN/whatever)
Here are some references:
Oracle’s whitepaper on The Move to Store Images in a Database
An Article I wrote in Mar 2006, which is still highly relevant. Make sure to read all the comments, including following links like Mike Kruckenberg’s thoughts.
As Baron points out, Percona’s Xtrabackup tool can be used just like InnoDB Hot Backup.
Many are wondering, “is it good enough?” In fact, I wondered the same thing, and after a few weeks of using and testing Xtrabackup (on machines that have MyISAM and InnoDB tables), I can say:
0) We have not run into any problems with backing up InnoDB tables; on the machines we’ve run it on (RHEL 5.3 and Debian) it has been stable.
1) We were able to use one of the binaries provided by Percona — we did not have to compile anything. Binaries are provided for 64-bit versions of Linux
2) Innobackupex is the script analogous to Innobackup. Basically, this is a wrapper script for Xtrabackup that will copy the non-InnoDB tables.
3) Usually we run InnoDB Hot Backup from a non-privileged user. We had to run Xtrabackup as the mysql
user; we think this is because it writes a state file in the datadir
, but we did not dig too hard — once we saw that running it from the mysql
user worked, we just left it like that.
4) Xtrabackup does not have a –compress option like InnoDB hot backup does. Having the option would be useful, but I believe this option will come soon.
Currently, for clients who want a hot backup solution but do not want to pay for InnoDB Hot Backup, Pythian recommends Xtrabackup.
As Baron points out, Percona’s Xtrabackup tool can be used just like InnoDB Hot Backup.
Many are wondering, “is it good enough?” In fact, I wondered the same thing, and after a few weeks of using and testing Xtrabackup (on machines that have MyISAM and InnoDB tables), I can say:
0) We have not run into any problems with backing up InnoDB tables; on the machines we’ve run it on (RHEL 5.3 and Debian) it has been stable.
1) We were able to use one of the binaries provided by Percona — we did not have to compile anything. Binaries are provided for 64-bit versions of Linux
2) Innobackupex is the script analogous to Innobackup. Basically, this is a wrapper script for Xtrabackup that will copy the non-InnoDB tables.
3) Usually we run InnoDB Hot Backup from a non-privileged user. We had to run Xtrabackup as the mysql
user; we think this is because it writes a state file in the datadir
, but we did not dig too hard — once we saw that running it from the mysql
user worked, we just left it like that.
4) Xtrabackup does not have a –compress option like InnoDB hot backup does. Having the option would be useful, but I believe this option will come soon.
Currently, for clients who want a hot backup solution but do not want to pay for InnoDB Hot Backup, Pythian recommends Xtrabackup.
Today’s contradiction:
MySQL has server variables named new
and old
.
The new
variable can be set per-session and globally, and is dynamic. The old
variable is not dynamic, and only global in scope. Both default to FALSE in MySQL 5.1.
According to the manual, the new
variable:
was used in MySQL 4.0 to turn on some 4.1 behaviors, and is retained for backward compatibility.
That same page notes the following about the old
variable:
when old is enabled, it changes the default scope of index hints to that used prior to MySQL 5.1.17. That is, index hints with no FOR clause apply only to how indexes are used for row retrieval and not to resolution of ORDER BY or GROUP BY clauses.
That’s right — the old
variable changes some index hint behavior to before 5.1.17, and the new
variable is provided for backwards compatibility of a 4.0 function.
Perhaps now that they have already used the vague terms, MySQL will stick to descriptive names, such as old_index_hints
and new_behavior
(or, even better, specifying what that new behavior is. If not in the variable itself, at least in the manual!)
Today’s contradiction:
MySQL has server variables named new
and old
.
The new
variable can be set per-session and globally, and is dynamic. The old
variable is not dynamic, and only global in scope. Both default to FALSE in MySQL 5.1.
According to the manual, the new
variable:
was used in MySQL 4.0 to turn on some 4.1 behaviors, and is retained for backward compatibility.
That same page notes the following about the old
variable:
when old is enabled, it changes the default scope of index hints to that used prior to MySQL 5.1.17. That is, index hints with no FOR clause apply only to how indexes are used for row retrieval and not to resolution of ORDER BY or GROUP BY clauses.
That’s right — the old
variable changes some index hint behavior to before 5.1.17, and the new
variable is provided for backwards compatibility of a 4.0 function.
Perhaps now that they have already used the vague terms, MySQL will stick to descriptive names, such as old_index_hints
and new_behavior
(or, even better, specifying what that new behavior is. If not in the variable itself, at least in the manual!)
Yesterday, The Pythian Group issued a press release about my book, Pythian’s partnership with Sun, and our new “MySQL Adoption Accelerator Package”. I am not a marketing guru, but I can tell you what we the package means in terms of new work that the MySQL teams have been doing.
Basically, the MySQL Adoption Accelerator Package combines customized training with a comprehensive audit of systems. The name “Adoption Accelerator” makes it sound like it’s only for new applications that are almost ready to go live. What the program actually does is have us evaluate your systems, and intensively train you in the areas you want and need. The program is designed to suit all your needs, whether it’s teaching you about one topic (say, query optimization) or an entire range of topics, from Architecture to ZFS (special issues with running MySQL on ZFS, that is, but that did not fit a cute “from A-Z” model…).
Whether you have already adopted MySQL or are thinking of converting from Oracle, DB2, Microsoft SQL Server or even sqlite, this new package may be what you need.
And now, the full text of the press release, for the curious:
‘MySQL Administrator’s Bible’ Hits the Bookstands: Pythian Launches MySQL Accelerator Adoption Package
The Pythian Group, the leading provider of remote database services, is pleased to announce that the much-anticipated MySQL Administrator’s Bible, written by employee Sheeri K. Cabral, is now available.
Published by Wiley, MySQL Administrator’s Bible is a comprehensive guide to using and administering MySQL. Ideal for the beginner and the experienced administrator, the book’s 22 chapters cover the fundamentals of MySQL database management-including MySQL’s unique approach to basic database features and functions-as well as SQL queries, data and index types, stored procedure and functions, triggers and views, events and transactions. It also discusses such topics as MySQL server tuning, administering storage engines, caching, backup and recovery, managing users, index tuning, database and performance monitoring, security, and more.
Cabral is a world-renowned MySQL expert and a sought-after speaker for many conferences, including the MySQL Conference and Expo (where she delivered a keynote this year), OSCon and LISA. She founded and organizes the Boston, Massachusetts MySQL User Group and in 2007 and 2008 won the MySQL Community Member of the Year Award.
In her job at Pythian, Cabral provides database administration support and consulting services to enterprises worldwide. She is one of several Certified MySQL DBAs at Pythian, which boasts an impressive 75 per cent certification rate.
On April 21, Sun Microsystems named Pythian its first authorized MySQL Enterprise Remote DBA partner, as part of Sun’s MySQL Enterprise Connection Alliance (MECA) partner program. As the only Platinum partner in the program, Pythian offers the highest level of support and care available for customers.
IODA, the industry-leading digital distribution company for the global independent music community, and Conductor, a leading provider of search engine optimization technology and services, have just signed on as Pythian’s first Platinum customers within the program. Pythian is celebrating the launch of Cabral’s book by giving complimentary copies to IODA and Conductor, and to the next eight customers to join the program.
“Pythian is uniquely positioned to take advantage of two interesting trends that are happening in the IT industry,” said Paul Vallee, Founder and Executive Chairman at Pythian. “First, a growing number of companies, large and small, are moving applications to MySQL. Second, outsourcing remote DBA service providers is on the rise. Enterprises want to reduce costs without sacrificing performance and industry best-practices.”
Anticipating this demand, Pythian has launched the MySQL Adoption Accelerator Package: a series of interactive training and consulting sessions that prepares DBA teams to deploy MySQL more securely, reliably and efficiently in their production environments. The package is a comprehensive deployment readiness program that covers all aspects of a deployment from an overview of MySQL and comparisons to other databases to strategies for performance testing and developer support and mentoring.
Cost and Availability
MySQL Administrator’s Bible is sold through most bookstores including Wiley and Amazon.com. It retails for $US 49.99. The MySQL Adoption Accelerator Package is available immediately. For more information, contact sales@pythian.com.
About Pythian
The Pythian Group is a global industry-leader in remote database administration services and consulting for MySQL, Oracle and SQL Server. Since 1997, companies have trusted Pythian to keep their database infrastructures running efficiently while strategically aligning IT with business goals. Pythian’s unparalleled DBA skills, mature methodologies, best practices and tools enable clients to do more with fewer resources. Pythian’s corporate headquarters is in Ottawa Canada, with offices worldwide. Pythian was the first managed services provider to build a practice around MySQL in 2002. For more information visit: www.pythian.com. Follow Sheeri Cabral’s blogs on MySQL: http://www.pythian.com/news/author/sheeri. Have a question? Use our hashtag #Pythian on Twitter and ask away.
Contacts:
Jolita Communications for Pythian
Lisa Courtney Lloyd
613-271-7512
lcourtneylloyd@jolita.ca
Yesterday, The Pythian Group issued a press release about my book, Pythian’s partnership with Sun, and our new “MySQL Adoption Accelerator Package”. I am not a marketing guru, but I can tell you what we the package means in terms of new work that the MySQL teams have been doing.
Basically, the MySQL Adoption Accelerator Package combines customized training with a comprehensive audit of systems. The name “Adoption Accelerator” makes it sound like it’s only for new applications that are almost ready to go live. What the program actually does is have us evaluate your systems, and intensively train you in the areas you want and need. The program is designed to suit all your needs, whether it’s teaching you about one topic (say, query optimization) or an entire range of topics, from Architecture to ZFS (special issues with running MySQL on ZFS, that is, but that did not fit a cute “from A-Z” model…).
Whether you have already adopted MySQL or are thinking of converting from Oracle, DB2, Microsoft SQL Server or even sqlite, this new package may be what you need.
And now, the full text of the press release, for the curious:
‘MySQL Administrator’s Bible’ Hits the Bookstands: Pythian Launches MySQL Accelerator Adoption Package
The Pythian Group, the leading provider of remote database services, is pleased to announce that the much-anticipated MySQL Administrator’s Bible, written by employee Sheeri K. Cabral, is now available.
Published by Wiley, MySQL Administrator’s Bible is a comprehensive guide to using and administering MySQL. Ideal for the beginner and the experienced administrator, the book’s 22 chapters cover the fundamentals of MySQL database management-including MySQL’s unique approach to basic database features and functions-as well as SQL queries, data and index types, stored procedure and functions, triggers and views, events and transactions. It also discusses such topics as MySQL server tuning, administering storage engines, caching, backup and recovery, managing users, index tuning, database and performance monitoring, security, and more.
Cabral is a world-renowned MySQL expert and a sought-after speaker for many conferences, including the MySQL Conference and Expo (where she delivered a keynote this year), OSCon and LISA. She founded and organizes the Boston, Massachusetts MySQL User Group and in 2007 and 2008 won the MySQL Community Member of the Year Award.
In her job at Pythian, Cabral provides database administration support and consulting services to enterprises worldwide. She is one of several Certified MySQL DBAs at Pythian, which boasts an impressive 75 per cent certification rate.
On April 21, Sun Microsystems named Pythian its first authorized MySQL Enterprise Remote DBA partner, as part of Sun’s MySQL Enterprise Connection Alliance (MECA) partner program. As the only Platinum partner in the program, Pythian offers the highest level of support and care available for customers.
IODA, the industry-leading digital distribution company for the global independent music community, and Conductor, a leading provider of search engine optimization technology and services, have just signed on as Pythian’s first Platinum customers within the program. Pythian is celebrating the launch of Cabral’s book by giving complimentary copies to IODA and Conductor, and to the next eight customers to join the program.
“Pythian is uniquely positioned to take advantage of two interesting trends that are happening in the IT industry,” said Paul Vallee, Founder and Executive Chairman at Pythian. “First, a growing number of companies, large and small, are moving applications to MySQL. Second, outsourcing remote DBA service providers is on the rise. Enterprises want to reduce costs without sacrificing performance and industry best-practices.”
Anticipating this demand, Pythian has launched the MySQL Adoption Accelerator Package: a series of interactive training and consulting sessions that prepares DBA teams to deploy MySQL more securely, reliably and efficiently in their production environments. The package is a comprehensive deployment readiness program that covers all aspects of a deployment from an overview of MySQL and comparisons to other databases to strategies for performance testing and developer support and mentoring.
Cost and Availability
MySQL Administrator’s Bible is sold through most bookstores including Wiley and Amazon.com. It retails for $US 49.99. The MySQL Adoption Accelerator Package is available immediately. For more information, contact sales@pythian.com.
About Pythian
The Pythian Group is a global industry-leader in remote database administration services and consulting for MySQL, Oracle and SQL Server. Since 1997, companies have trusted Pythian to keep their database infrastructures running efficiently while strategically aligning IT with business goals. Pythian’s unparalleled DBA skills, mature methodologies, best practices and tools enable clients to do more with fewer resources. Pythian’s corporate headquarters is in Ottawa Canada, with offices worldwide. Pythian was the first managed services provider to build a practice around MySQL in 2002. For more information visit: www.pythian.com. Follow Sheeri Cabral’s blogs on MySQL: http://www.pythian.com/news/author/sheeri. Have a question? Use our hashtag #Pythian on Twitter and ask away.
Contacts:
Jolita Communications for Pythian
Lisa Courtney Lloyd
613-271-7512
lcourtneylloyd@jolita.ca