So it’s totally appropriate to denormalize a table when the data you’re dealing with is variable length. Variable-length rows lead to fragmentation, whereas if you have fixed-length rows, a new row can take the place of a deleted row with no problem.
With disk space so cheap these days, why is varchar used so often? I understand that varchar is probably misrepresented — if one attribute is variable length (varchar) then all char attributes are changed to varchars. However, why employ the use of varchar instead of char?
So it’s totally appropriate to denormalize a table when the data you’re dealing with is variable length. Variable-length rows lead to fragmentation, whereas if you have fixed-length rows, a new row can take the place of a deleted row with no problem.
With disk space so cheap these days, why is varchar used so often? I understand that varchar is probably misrepresented — if one attribute is variable length (varchar) then all char attributes are changed to varchars. However, why employ the use of varchar instead of char?
So, how do I do it?
How do I enable comments to be posted immediately, but seem to avoid spam? Well, there are 2 features I employ. The first is that comments are moderated, but *only* if they contain more than 3 links. All the spam I get is a series of links, and there’s only been one time that a real message has been put in the moderation queue because it contains a series of links. And I have not yet had spam with 3 or fewer links, so it works out OK by me. I get an e-mail whenever a comment is in the moderation queue. About once a week, I go to the moderation panel and delete about 15 spam messages.
The second feature is that I have all comments e-mailed to me, so if a spam comment does get through, I see it right away.
This saves a lot of time — the real messages are sent to me, so I get feedback right away (as do other readers). The spam messages get sent to me in full, so I just glance at the message, delete it from my inbox, and deal with the moderation queue when I want to.
So, how do I do it?
How do I enable comments to be posted immediately, but seem to avoid spam? Well, there are 2 features I employ. The first is that comments are moderated, but *only* if they contain more than 3 links. All the spam I get is a series of links, and there’s only been one time that a real message has been put in the moderation queue because it contains a series of links. And I have not yet had spam with 3 or fewer links, so it works out OK by me. I get an e-mail whenever a comment is in the moderation queue. About once a week, I go to the moderation panel and delete about 15 spam messages.
The second feature is that I have all comments e-mailed to me, so if a spam comment does get through, I see it right away.
This saves a lot of time — the real messages are sent to me, so I get feedback right away (as do other readers). The spam messages get sent to me in full, so I just glance at the message, delete it from my inbox, and deal with the moderation queue when I want to.
So, most of the “I want images in MySQL” conversations are terminated with “Don’t.” Some articles say “MySQL’s overhead isn’t worth it” and others that say “If you put the images in MySQL you don’t have to deal with the filesystem overhead.”
My company’s site has over 2 million images to store/retrieve, and our current setup is at its limit and does not scale well — 2 NFS servers. We’d rather avoid adding another NFS server, because it involves changing where images are located whenever we scale. The problem is that with so many images, there are so many inodes that the filesystem cannot keep up. As well, if a server is rebooted or offline for more than an hour (ie, during maintenance) it is extremely slow until the cache catches up (about half an hour).
We believe the best retrieval method is to use MySQL. We will be doing our own speed testing, so I’m not relying on what folks say here. But I’m curious — what do other folks think? Flickr uses MySQL for image storage…..
So, most of the “I want images in MySQL” conversations are terminated with “Don’t.” Some articles say “MySQL’s overhead isn’t worth it” and others that say “If you put the images in MySQL you don’t have to deal with the filesystem overhead.”
My company’s site has over 2 million images to store/retrieve, and our current setup is at its limit and does not scale well — 2 NFS servers. We’d rather avoid adding another NFS server, because it involves changing where images are located whenever we scale. The problem is that with so many images, there are so many inodes that the filesystem cannot keep up. As well, if a server is rebooted or offline for more than an hour (ie, during maintenance) it is extremely slow until the cache catches up (about half an hour).
We believe the best retrieval method is to use MySQL. We will be doing our own speed testing, so I’m not relying on what folks say here. But I’m curious — what do other folks think? Flickr uses MySQL for image storage…..
Oracle bought 2 companies which develop 2 open source storage engines that MySQL uses. Those storage engines are both transactional.
There’s the possibility Oracle does the stupid (from a PR perspective) move of making those databases proprietary. But it’s OPEN SOURCE right now. Which means that it’ll branch.
Worst case scenario? The developing stops here, and InnoDB and BDB never get any more functional than they currently are.
The “bad” scenario? MySQL is a company. With hired staff. They pay the hired staff to get familiar with the source code for those engines and continue to develop them. Hence the branch.
The likely scenario? 1) Oracle tries to have a pluggable storage engine architecture like MySQL. Oracle Express or Lite or whatever would be free with those engines, pay with the original. 2) Oracle tries to go on the good partnership of “we support MySQL by developing their architecture, we’re so nice, buy our product.”
Really, everyone is worried over nothing. The whole point of open source is that it’s open! Once you go open, you can close later on and people still have something to work with.
Oracle bought 2 companies which develop 2 open source storage engines that MySQL uses. Those storage engines are both transactional.
There’s the possibility Oracle does the stupid (from a PR perspective) move of making those databases proprietary. But it’s OPEN SOURCE right now. Which means that it’ll branch.
Worst case scenario? The developing stops here, and InnoDB and BDB never get any more functional than they currently are.
The “bad” scenario? MySQL is a company. With hired staff. They pay the hired staff to get familiar with the source code for those engines and continue to develop them. Hence the branch.
The likely scenario? 1) Oracle tries to have a pluggable storage engine architecture like MySQL. Oracle Express or Lite or whatever would be free with those engines, pay with the original. 2) Oracle tries to go on the good partnership of “we support MySQL by developing their architecture, we’re so nice, buy our product.”
Really, everyone is worried over nothing. The whole point of open source is that it’s open! Once you go open, you can close later on and people still have something to work with.
The first hour of a 2-hour workshop on what to do with a MySQL instance you’ve inherited. Very detailed, goes into real examples using shell prompts.
Unfortunately, the settings were incorrect on the video recorder, so only the 1st hour was captured. Presented at the Boston MySQL User Group on February 13th, 2006.
Special thanks to Tom C. for securing the MIT space.
google video
flash slides
pdf slides (this might be broken, please let me know)
worksheet
As always, stylistic criticism as well as content criticism is appreciated.
The first hour of a 2-hour workshop on what to do with a MySQL instance you’ve inherited. Very detailed, goes into real examples using shell prompts.
Unfortunately, the settings were incorrect on the video recorder, so only the 1st hour was captured. Presented at the Boston MySQL User Group on February 13th, 2006.
Special thanks to Tom C. for securing the MIT space.
google video
flash slides
pdf slides (this might be broken, please let me know)
worksheet
As always, stylistic criticism as well as content criticism is appreciated.
The February meeting of the Boston MySQL User Group will be held on Monday, February 13th (tomorrow!) at 7 pm. We have a new location, on the MIT campus. This means that there is close subway access AND plenty of FREE PARKING. The meeting is free, including pizza (with toppings) and soda.
To view details, see
http://mysql.meetup.com/137/events/4829769/?a=rem_c
The topic is “So You’ve Inherited a MySQL Instance” (which I *just* found out last week was not accepted for the MySQL Users Conference. Probably too basic of a workshop for the conference). It will be useful for beginners, and experience MySQL users will refresh their memory on what needs to be secured.
It has a unix focus, but most of the ideas can be applied to Windows users as well.
The February meeting of the Boston MySQL User Group will be held on Monday, February 13th (tomorrow!) at 7 pm. We have a new location, on the MIT campus. This means that there is close subway access AND plenty of FREE PARKING. The meeting is free, including pizza (with toppings) and soda.
To view details, see
http://mysql.meetup.com/137/events/4829769/?a=rem_c
The topic is “So You’ve Inherited a MySQL Instance” (which I *just* found out last week was not accepted for the MySQL Users Conference. Probably too basic of a workshop for the conference). It will be useful for beginners, and experience MySQL users will refresh their memory on what needs to be secured.
It has a unix focus, but most of the ideas can be applied to Windows users as well.
I hated university and graduate school. I had a liberal arts education, so I was learning theory. Now, computer science theory is boring to someone who wants to make something useful. I understand why many folks dropped out of college to go work for the internet boom.
However, in the brief 5 years of work experience I’ve had since getting my Master’s, I’ve learned that a self-taught practicum only goes so far. Whether it’s system administration or database administration, the more basic levels of problem solving do not require much insight into how the system (OS or DB) works, the deeper problems do.
Some of that you can learn by doing, but most of the concepts aren’t learnable unless you’re digging into the source code.
That being said, I found a good web page today that goes through Entity-Relationship modeling. It’s very good, comprehensive.
http://www.utexas.edu/its/windows/database/datamodeling/index.html
Perhaps a good workshop idea would be “Data Modeling Using MySQL and the Sakila sample database” — in other words, teaching generic database concepts, translating it into MySQL for folks that already use MySQL.
(As full disclosure, PostgreSQL was the standard teaching database, because at the time MySQL hadn’t been full-featured, as it is now. I’m not quite sure why PostgreSQL isn’t more popular, given that it’s been full-featured for longer. Probably either not enough marketing, or that MySQL is just more intuitive for the self-taught folks…..)
And oh, the irony: “Dictionary.com Word of the Day – autodidact: one who is self-taught.”
I hated university and graduate school. I had a liberal arts education, so I was learning theory. Now, computer science theory is boring to someone who wants to make something useful. I understand why many folks dropped out of college to go work for the internet boom.
However, in the brief 5 years of work experience I’ve had since getting my Master’s, I’ve learned that a self-taught practicum only goes so far. Whether it’s system administration or database administration, the more basic levels of problem solving do not require much insight into how the system (OS or DB) works, the deeper problems do.
Some of that you can learn by doing, but most of the concepts aren’t learnable unless you’re digging into the source code.
That being said, I found a good web page today that goes through Entity-Relationship modeling. It’s very good, comprehensive.
http://www.utexas.edu/its/windows/database/datamodeling/index.html
Perhaps a good workshop idea would be “Data Modeling Using MySQL and the Sakila sample database” — in other words, teaching generic database concepts, translating it into MySQL for folks that already use MySQL.
(As full disclosure, PostgreSQL was the standard teaching database, because at the time MySQL hadn’t been full-featured, as it is now. I’m not quite sure why PostgreSQL isn’t more popular, given that it’s been full-featured for longer. Probably either not enough marketing, or that MySQL is just more intuitive for the self-taught folks…..)
And oh, the irony: “Dictionary.com Word of the Day – autodidact: one who is self-taught.”
For the geeky folks running a non-windows operating system on a windows system, what are you using, why, and do you like it?
I just found colinux, which looks good, and I’m installing mysql on it now. I’ve used cygwin and it’s just a pig, and things don’t work so well there.
VMware costs $$, and also isn’t really great either (I’ve had stability problems). What do folks think about these three?
For the geeky folks running a non-windows operating system on a windows system, what are you using, why, and do you like it?
I just found colinux, which looks good, and I’m installing mysql on it now. I’ve used cygwin and it’s just a pig, and things don’t work so well there.
VMware costs $$, and also isn’t really great either (I’ve had stability problems). What do folks think about these three?
So, today I have had the following problems on various databases:
1) lack of hosts file causing intermittent connectivity
2) running out of space in innodb tablespace
3) a backup causing undue slowness
4) a data partition running out of space and corrupting the binary log, thus making the slaves not replicate
5) mysql server not starting after I added space (see 2) because I put the same logfile in twice
All in all, not bad. Only problem 4 caused me to furrow my brow to fix it, because I hadn’t seen anything like it on the list, but one of the first google hits led me to the cause. Basically the symptom was a slave server that would not start replication, with the error in mysqld.err only (not in SHOW SLAVE STATUS — that would always show NULL seconds behind). But Google immediately gave me the answer.
What a day. Luckily I slept for 11 hours last night; I’ve been working 12 so far!
So, today I have had the following problems on various databases:
1) lack of hosts file causing intermittent connectivity
2) running out of space in innodb tablespace
3) a backup causing undue slowness
4) a data partition running out of space and corrupting the binary log, thus making the slaves not replicate
5) mysql server not starting after I added space (see 2) because I put the same logfile in twice
All in all, not bad. Only problem 4 caused me to furrow my brow to fix it, because I hadn’t seen anything like it on the list, but one of the first google hits led me to the cause. Basically the symptom was a slave server that would not start replication, with the error in mysqld.err only (not in SHOW SLAVE STATUS — that would always show NULL seconds behind). But Google immediately gave me the answer.
What a day. Luckily I slept for 11 hours last night; I’ve been working 12 so far!
So, I have what I consider a feature request for binlog
. I’d like there to be an option to log DDL only or DML only, or both DDL and DML (current functionality).
Sure, I can take a diff of the schema, or grep for TABLE or INDEX and put that into its own logfile; however, this is something that really should be an option to mysqlbinlog.
(http://bugs.mysql.com/bug.php?id=16916)
So, I have what I consider a feature request for binlog
. I’d like there to be an option to log DDL only or DML only, or both DDL and DML (current functionality).
Sure, I can take a diff of the schema, or grep for TABLE or INDEX and put that into its own logfile; however, this is something that really should be an option to mysqlbinlog.
(http://bugs.mysql.com/bug.php?id=16916)