Statistics Gathering Script

I posted some code to the MySQL Forge at:

http://forge.mysql.com/snippets/view.php?id=52

Basically I gather some stats on my webservers in a very crude way. It runs a bunch of commands on the commandline (I run it every 5 minutes from a cron script). Please note that it adds to the load of the database server by running these commands, and it connects a few times to the database, so it actually increases things like “total # of connections” by the sheer fact that it runs.

This should run on most Unix machines, running most versions of MySQL. Please comment to this entry if you find something wrong or an incompatibility.

An explanation of the code follows:

This script will produce a string that looks like this:
(‘value’,’value’,’value’,123,’value’),
and all you have to do is add a line at the top: “INSERT INTO table (field1,field2,field3,field4,field5) VALUES”

and change the last comma to a semicolon. Then you will have something suitable for entering into a database, for later processing.

So here is an explanation of the lines of code, in order — the first few set variables:
#!/bin/sh
DB=db_here
USER=user_here
PASS=pass_here
HOST=`/bin/hostname | /bin/cut -f1 -d\.`

/bin/echo \(\'$HOST\',\'\

“echo” sets up the fact that you are printing all this out (later you >> to a file). \ is the escape character, and the line starts with “(“. Then we put the hostname in quotes, print a comma, and then an open quote. The final \ means “continue on the next line”, and it shows up on every line hereafter, so we can build a long string and then save it to a file.

`/bin/date +%Y-%m-%d\ %T`\',\

Putting “ around a command ensures that it gets run first, and then echo uses the output. Otherwise echo would simply print out “/bin/date +%Y…..”

This gets the date and time in MySQL’s date format (‘YYYY-MM-DD HH:MM:SS’), and puts in the ‘ to end the quoted field and a comma.

`/usr/bin/mysql -u $USER -p$PASS -e "\s"| /bin/awk '/Threads/ {print $2","$7","$20}'`,\

If you type \s at the database prompt, you will get some information that looks like this:


mysql> \s
--------------
mysql Ver 12.22 Distrib 4.0.20, for sun-solaris2.8 (sparc)
Connection id: 296486
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Server version: 4.0.20-log
Protocol version: 10
Connection: Localhost via UNIX socket
Client characterset: latin1
Server characterset: latin1
UNIX socket: /tmp/mysql.sock
Uptime: 57 days 8 hours 6 min 25 sec
Threads: 1 Questions: 9331360 Slow queries: 32 Opens: 19372 Flush tables: 1 Open tables: 64 Queries per second avg: 1.884
--------------

This gets fields 2, 7 and 20 (fields are delimited by spaces) from the line marked “Threads” — field 2 is “1” relating to # of threads, field 7 is “32” which refers to the number of slow queries, and field 20 is “1.884” which referes to “Queries per second avg”. So this line gets 3 values. Since they are numbers, they do not need to be quoted*.

`cat /proc/loadavg | /bin/awk '{print $1}'`,\

/proc has some great information in it, but you have to use cat to view them. This gets the current load average, that appears as the first argument to “uptime”. Again, this is a number and does not need to be quoted.*

`cat /proc/meminfo | /bin/awk '/MemTotal|MemFree|SwapTotal|SwapFree/ {print $2","}'` \

This prints the 2nd field (and a comma separation) from /proc/meminfo from the lines that deal with total memory, free memory, total swap and free swap space. Again, these are numbers and do not require quoting.*

`/usr/bin/mysql -u $USER -p$PASS $DB -e "show table status\G" | /bin/awk '/InnoDB free:/ {print $4}' | /usr/bin/head -1`,\

This is a very hacky way to find how much space is left in the InnoDB tablespace. I use one set of InnoDB data files for all the InnoDB tables I have, so I can just take the free space from the first table, and that is OK. If you are using 1 file per table, you will want to get each value.

`/usr/bin/mysql -u $USER -p$PASS $DB -e "show status" | /bin/awk '/Bytes_received\W|Bytes_sent\W|Com_delete\W|Com_insert\W|Com_replace\W|Com_select\W|Com_update\W|Connections\W|Questions\W|Uptime\W/ {printf $2 ","}'`\

This gets lots of information from the SHOW STATUS command of MySQL. The Com_* commands are the counts, so Com_insert is the count of inserts (since the database instance started, so you can do Com_insert/Uptime to get inserts/second if you want).

Connections is the total number of connections (not to be confused with Threads, which is the current # of threads connected). Questions is total number of queries.

I wrote this in fits and starts, and above I parsed some output of “\s”. However, those numbers can be gotten by the “SHOW STATUS” variables of Slow_queries, Threads_created and Questions/Uptime. Because I am lazy, I have not changed my scripts, although you could decrease the amount of data MySQL sends and lessen the impact of this script with one less connection if you added it to this line.

There are TONS of variables in “SHOW STATUS” that are interesting and you might want to track. Decide what you like…and you can always add them in later!

`df -k /var | /bin/awk '/dev/ {print $2 "," $4}'`,\

This just looks at the space of /var, which is where my company keeps our mysql data files.

`df -k /var/log | /bin/awk '/dev/ {print $2 "," $4}'`,\

This just looks at the space of /var/log — where my company keeps our mysql logfiles.

`/usr/bin/mysql -u $USER -p$PASS $DB -e "SELECT COUNT(DISTINCT uid) FROM Sessions\G" | /bin/awk '/COUNT/ {print $3}' `,\

This gets information from a particular table in our system — it shows how many people are online. This is useful for us to gather because now we can see how the statistics we gather are related to how many people are logged in.

`/usr/bin/iostat -dk | /bin/awk '/sda/ {print $2","$5","$6}'`\),\

This requires the “sysstat” package, and this line retrieves the I/O (disk) transactions per second, the KB read and the KB written to the disk.

>> /home/care/`echo $HOST`perf.csv

And this is the last line — finally, at last, we save it to a file.

PLEASE feel free to modify this and make it cleaner……obviously you will want to apply it to your own situation and what you want to gather.

I run this as a local script on each database machine, because it gets information using the commandline. I save everything to a file and do a weekly import. Theoretically I could have it save directly to a reporting database, but I do not want any dependence on network, just in case. You may want to have the process more automatic.

Weekly I have a cron job to “rotate” the .csv file: copy the file locally, copy it to a remote server, and then delete all but the first line (the first line being “INSERT INTO tbl VALUES(…..”).

(for those interested, this is how I delete all but the first line. Oh so hacky….)
HOST=`/bin/hostname | /bin/cut -f1 -d\.`
head -1 /home/care/`echo $HOST`perf.csv > /tmp/temp; mv /tmp/temp /home/care/`echo $HOST`perf.csv;

And for those who care, the following is the first line of my file:

REPLACE INTO dbstats (serverName,statsTime,procs,slowQueries,queriespersec,serverLoad,MemTotal
,MemFree,SwapTotal,SwapFree,ibFree,bytesRec,bytesSent,deletes,inserts,replaces,selects,updates
,connections,queries,mysqlUptime,sizeKBvar,availKBvar,sizeKBvarlog,availKBvarlog,Sessions,tps,
kbRead,kbWrtn) VALUES


* while numbers do not need to be quoted, if there’s ever a chance that the number may end up blank instead of 0, it should be quoted, as MySQL does not handle “VALUES (‘foo’,,’bar’)”
and throws an error.

I posted some code to the MySQL Forge at:

http://forge.mysql.com/snippets/view.php?id=52

Basically I gather some stats on my webservers in a very crude way. It runs a bunch of commands on the commandline (I run it every 5 minutes from a cron script). Please note that it adds to the load of the database server by running these commands, and it connects a few times to the database, so it actually increases things like “total # of connections” by the sheer fact that it runs.

This should run on most Unix machines, running most versions of MySQL. Please comment to this entry if you find something wrong or an incompatibility.

An explanation of the code follows:
Continue reading “Statistics Gathering Script”

“I Work Best Under Pressure”

I am helping a friend with a project by doing DBA, sysadmin and project management work, because he has 2 junior programmers on staff. I have a lot of patience, so it works out OK. But just to give an example:

One programmer had “10 years’ experience” and when I asked his experience level when I first met him, because I knew my friend had been looking for junior programmers, he got all offended and “hoped” that’s not how he was thought of.

This same programmer said “I work best under pressure.” That really irked me. You know what that says to me? “I cannot prioritize; I procrastinate; I really need pressure to motivate me to get something done.”

Adrenaline is nice, but adrenaline does not make up for well-thought out steps, nor sane development processes.

“I work well under pressure” is great, but I am very suspicious of folks who need pressure to work well. This applies to any line of work — emergency personnel, military, etc. These folks are constantly thrust into highly pressurized environments, but I doubt anyone would say “I can handle saving a life well if I have 30 seconds, but if I have 30 minutes, I’m a lot less efficient.”

Adrenaline junkies need not apply.

I am helping a friend with a project by doing DBA, sysadmin and project management work, because he has 2 junior programmers on staff. I have a lot of patience, so it works out OK. But just to give an example:

One programmer had “10 years’ experience” and when I asked his experience level when I first met him, because I knew my friend had been looking for junior programmers, he got all offended and “hoped” that’s not how he was thought of.

This same programmer said “I work best under pressure.” That really irked me. You know what that says to me? “I cannot prioritize; I procrastinate; I really need pressure to motivate me to get something done.”

Adrenaline is nice, but adrenaline does not make up for well-thought out steps, nor sane development processes.

“I work well under pressure” is great, but I am very suspicious of folks who need pressure to work well. This applies to any line of work — emergency personnel, military, etc. These folks are constantly thrust into highly pressurized environments, but I doubt anyone would say “I can handle saving a life well if I have 30 seconds, but if I have 30 minutes, I’m a lot less efficient.”

Adrenaline junkies need not apply.

Sometimes, I wonder….

If foreign keys are a part of the database namespace, why is it that you have to ALTER TABLE to add and remove them? Why not alter the database? I suspect the issue is more in MySQL’s limitations that names of foreign keys be unique among tables in a database. . . but it does make me wonder.

If foreign keys are a part of the database namespace, why is it that you have to ALTER TABLE to add and remove them? Why not alter the database? I suspect the issue is more in MySQL’s limitations that names of foreign keys be unique among tables in a database. . . but it does make me wonder.

Vote For Something That Matters To You

Vote for your favorite MySQL Community(TM) logo! Add comments on existing logos! Even mock up

http://mysqlcamp.pbwiki.com/LogoVoting

Seriously, though — think of how a new logo will reflect the Community, and how we all want to be perceived. Also consider what makes a nice favicon.ico …..

So vote on something that really matters to you, and make your voice heard. It does not cost anything and takes just a few seconds.

Vote for your favorite MySQL Community(TM) logo! Add comments on existing logos! Even mock up

http://mysqlcamp.pbwiki.com/LogoVoting

Seriously, though — think of how a new logo will reflect the Community, and how we all want to be perceived. Also consider what makes a nice favicon.ico …..

So vote on something that really matters to you, and make your voice heard. It does not cost anything and takes just a few seconds.

9Rules Accepting Submissions!

I have found http://9rules.com/ to have pretty good content, and they’re accepting submissions for their community. This is a great opportunity to post your great content to get more exposure.

Submit your site at: http://9rules.com/submit/

I have found http://9rules.com/ to have pretty good content, and they’re accepting submissions for their community. This is a great opportunity to post your great content to get more exposure.

Submit your site at: http://9rules.com/submit/

MySQL Community Edition and Website Changes

I posted earlier about the MySQL announcement (see http://www.mysql.com/news-and-events/news/article_1171.html )and how I see it affecting Enterprise level users.

However, as someone who uses it for work other than my day job, some at the hobby level and some above that level, the Community edition is also important to me.

The way I see it, the changes are great for people who use the Community edition.

Wait, there will be 2 codebases, and they have not doubled their staff, so won’t there actually be less development? Are they expecting the community to write all the code for the Community edition?

Kind of. I do know that MySQL has been hiring lots of people, but I’m not an employee, so I have no idea of their growth. There will be 2 codebases, but that works out for the best for everyone. Now, there will be 2 priority lists — things that Enterprise customers really want, and things that Community users really want.

So Community users do not have to hear “Sorry, that’s not a priority, but if you pay us we’ll make it one.” Community users can get the features that are a priority for them, and Enterprise users can get the features that are a prioirty for them, too.

If a priority is high for Enterprise, and lower for Community, it can be developed in Enterprise first, and moved to Community. And, of course, vice versa. The part where the community coding comes in is that not only does MySQL have priorities for Community development, but if a user wants to submit something, they can, thus changing the priority order, because the work is mostly done, and all MySQL needs to do is integrate it.

But Community users do not have to suffer with a package that has more code (and thus is a bit bloatier) just because someone paid MySQL USD $1 million to extend the maximum table size to 1 terabyte. Nor do they have to wait for development on something like that to finish before other features get implemented.

The announcement also had some poor wording choices. The Community edition will be as stable and reliable as ever. The truth is that new features have bugs, and there will be more new features in Community than in Enterprise. As well, both Community and Enterprise are tested quite thoroughly. But Enterprise will be tested to enterprise-level standards — for example, throughput could be tested and MySQL could say “buy Enterprise! It can handle 10,000 queries per second!”

My company is using the Community Edition, and we have over 3600 queries per second being run — and that’s an AVERAGE. So there’s no question in my mind that the Community Edition is stable and reliable enough to be used at an “enterprise” level. That quality will NOT go away.

I’m not quite clear on what the licensing will be for the Enterprise and Community Editions, but I would think that MySQL will keep the ability to embed and redistribute both editions. I don’t know for sure, but I think it would be a bad business move not to.

I have had a few people come to me and say “Did you see the announcement? How could MySQL do this? They’re forcing everyone that’s not a hobbyist to buy the product!!!!” Everyone predicted the end of InnoDB when Oracle bought it — I said “it’s no big deal” (see http://sheeri.net/archives/38).

Similarly, this is not a loss for the MySQL community. It’s actually a gain for both Enterprise and Community users. I’m not a MySQL employee, so I don’t know everything that’s going on behind the scenes, but I have used and administered MySQL for over 5 years in real life situations and environments, and I can tell you that this is a win. We will hear lots less of “well, we could do that, but we’re programming something really big right now, so it will have to wait”.

As for the website changes, my one complaint is that at http://mysql.com/products/, the Community edition isn’t listed — there’s a link to the “Community” site that’s hard to miss on the right-hand side, and a menu item similar to “Products” at the top nav bar called “Community” — but it doesn’t seem consistent. Either you have “Products” and “Services” and you have a “Free download” section under “Products”, or you have a section with “Enterprise” and a section with “Community”….mixing the 2 is bad UI.

But if that’s the worst I have to say, then the changes are pretty good. And it’s more obvious that http://dev.mysql.com is for Community stuff….and when I went there, I found the polls….NEAT!

I posted earlier about the MySQL announcement (see http://www.mysql.com/news-and-events/news/article_1171.html )and how I see it affecting Enterprise level users.

However, as someone who uses it for work other than my day job, some at the hobby level and some above that level, the Community edition is also important to me.

The way I see it, the changes are great for people who use the Community edition.

Wait, there will be 2 codebases, and they have not doubled their staff, so won’t there actually be less development? Are they expecting the community to write all the code for the Community edition?

Kind of. I do know that MySQL has been hiring lots of people, but I’m not an employee, so I have no idea of their growth. There will be 2 codebases, but that works out for the best for everyone. Now, there will be 2 priority lists — things that Enterprise customers really want, and things that Community users really want.

So Community users do not have to hear “Sorry, that’s not a priority, but if you pay us we’ll make it one.” Community users can get the features that are a priority for them, and Enterprise users can get the features that are a prioirty for them, too.

If a priority is high for Enterprise, and lower for Community, it can be developed in Enterprise first, and moved to Community. And, of course, vice versa. The part where the community coding comes in is that not only does MySQL have priorities for Community development, but if a user wants to submit something, they can, thus changing the priority order, because the work is mostly done, and all MySQL needs to do is integrate it.

But Community users do not have to suffer with a package that has more code (and thus is a bit bloatier) just because someone paid MySQL USD $1 million to extend the maximum table size to 1 terabyte. Nor do they have to wait for development on something like that to finish before other features get implemented.

The announcement also had some poor wording choices. The Community edition will be as stable and reliable as ever. The truth is that new features have bugs, and there will be more new features in Community than in Enterprise. As well, both Community and Enterprise are tested quite thoroughly. But Enterprise will be tested to enterprise-level standards — for example, throughput could be tested and MySQL could say “buy Enterprise! It can handle 10,000 queries per second!”

My company is using the Community Edition, and we have over 3600 queries per second being run — and that’s an AVERAGE. So there’s no question in my mind that the Community Edition is stable and reliable enough to be used at an “enterprise” level. That quality will NOT go away.

I’m not quite clear on what the licensing will be for the Enterprise and Community Editions, but I would think that MySQL will keep the ability to embed and redistribute both editions. I don’t know for sure, but I think it would be a bad business move not to.

I have had a few people come to me and say “Did you see the announcement? How could MySQL do this? They’re forcing everyone that’s not a hobbyist to buy the product!!!!” Everyone predicted the end of InnoDB when Oracle bought it — I said “it’s no big deal” (see http://sheeri.net/archives/38).

Similarly, this is not a loss for the MySQL community. It’s actually a gain for both Enterprise and Community users. I’m not a MySQL employee, so I don’t know everything that’s going on behind the scenes, but I have used and administered MySQL for over 5 years in real life situations and environments, and I can tell you that this is a win. We will hear lots less of “well, we could do that, but we’re programming something really big right now, so it will have to wait”.

As for the website changes, my one complaint is that at http://mysql.com/products/, the Community edition isn’t listed — there’s a link to the “Community” site that’s hard to miss on the right-hand side, and a menu item similar to “Products” at the top nav bar called “Community” — but it doesn’t seem consistent. Either you have “Products” and “Services” and you have a “Free download” section under “Products”, or you have a section with “Enterprise” and a section with “Community”….mixing the 2 is bad UI.

But if that’s the worst I have to say, then the changes are pretty good. And it’s more obvious that http://dev.mysql.com is for Community stuff….and when I went there, I found the polls….NEAT!

MySQL Enterprise & Website Changes

OK, so I am excited. Yesterday MySQL announced MySQL Enterprise (http://www.mysql.com/news-and-events/news/article_1171.html). Now, I had not been so keen on MySQL Network — I had not realized the level of service MySQL will go to. Does Oracle or Microsoft offer schema review as part of a service package? I do not think so.

Basically, MySQL Enterprise is really a full suite of enterprise solutions for companies that need them. Many folks know how hard it can be to convince their boss that a DBA (or team of DBA’s) is needed. Many folks know how hard it is to be a DBA, and many folks know how hard it is to be a developer or sysadmin with DBA duties.

It’s easier to convince a company to spend money on a support contract than it is to spend money on headcount. And with MySQL, companies get so much for their money.

Oh, sure, there is what we already know — the enterprise binary, which is rigorously tested against enterprise-level demands. The “GA” or “Generally Available” (which will now be the “Community” edition) source code/binaries are tested for functionality, but not necessarily for things like throughput. (I have no idea if the Enterprise binary is tested for throughput, it’s just a good example of the kind of thing that people want in enterprise-level software).

There is also what anyone would expect from a support contract — access to updates, incident response (phone and web), emergency help, a knowledgebase, etc.

But looking at the features (http://mysql.com/products/enterprise/features.html), there’s a whole world of services that as far as I know, very few software companies are offering.

Let me say that again — MySQL offers features in their support contracts that very few other software companies offer. As far as I know, you cannot buy all of these from ANY of the enterprise-level database vendors.

MySQL offers consultative services, including schema reviews, performance tuning and SQL code reviews. Yes, that’s right, MySQL offers the service of reviewing your SQL queries, including procedures and server extensions.

The feature list can be seen here:

http://mysql.com/products/enterprise/features.html

All in all, very exciting for those who use MySQL in an enterprise-level capacity (which I do at work).

OK, so I am excited. Yesterday MySQL announced MySQL Enterprise (http://www.mysql.com/news-and-events/news/article_1171.html). Now, I had not been so keen on MySQL Network — I had not realized the level of service MySQL will go to. Does Oracle or Microsoft offer schema review as part of a service package? I do not think so.

Basically, MySQL Enterprise is really a full suite of enterprise solutions for companies that need them. Many folks know how hard it can be to convince their boss that a DBA (or team of DBA’s) is needed. Many folks know how hard it is to be a DBA, and many folks know how hard it is to be a developer or sysadmin with DBA duties.

It’s easier to convince a company to spend money on a support contract than it is to spend money on headcount. And with MySQL, companies get so much for their money.

Oh, sure, there is what we already know — the enterprise binary, which is rigorously tested against enterprise-level demands. The “GA” or “Generally Available” (which will now be the “Community” edition) source code/binaries are tested for functionality, but not necessarily for things like throughput. (I have no idea if the Enterprise binary is tested for throughput, it’s just a good example of the kind of thing that people want in enterprise-level software).

There is also what anyone would expect from a support contract — access to updates, incident response (phone and web), emergency help, a knowledgebase, etc.

But looking at the features (http://mysql.com/products/enterprise/features.html), there’s a whole world of services that as far as I know, very few software companies are offering.

Let me say that again — MySQL offers features in their support contracts that very few other software companies offer. As far as I know, you cannot buy all of these from ANY of the enterprise-level database vendors.

MySQL offers consultative services, including schema reviews, performance tuning and SQL code reviews. Yes, that’s right, MySQL offers the service of reviewing your SQL queries, including procedures and server extensions.

The feature list can be seen here:

http://mysql.com/products/enterprise/features.html

All in all, very exciting for those who use MySQL in an enterprise-level capacity (which I do at work).

The Trend of Managed Schemas: A Database is Not a Messaging System

This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

(From the original poster:)

I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.

Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.

(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.

What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.

The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.

Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.

And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.

Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.

What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?

It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).

I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.

This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

(From the original poster:)

I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.

Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.

(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.

What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.

The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.

Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.

And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.

Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.

What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?

It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).

I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.

Choosing Datatypes for Fields

Twice recently I have had to import data from a flat file into a database — both sets of data were being imported to make better use of the data in the files. Importing a file is not that difficult using mysqlimport, particularly since in both cases the data was tab-delimited. I also could have used LOAD DATA INFILE.

The problem with mysqlimport and LOAD DATA INFILE is that you need to create the table beforehand. Often, though, I may not know the exact data types. Folks have written about PROCEDURE ANALYSE(), but they use it in the context of “you can check to see that your tables have the correct data types.”

Now, that’s always a good thing — say, quarterly or yearly, doing a schema review, including indexes, foreign keys and data types. However, where PROCEDURE ANALYSE() really comes in handy for me is when I am importing data.

Basically, I create a table with the correct number of fields, all with the datatype of tinytext. Then I run a SELECT * FROM tbl PROCEDURE ANALYSE()\G to find out what types my data really want to be.

Of course, you want to pay attention to warnings when importing, especially truncation warnings. Generally I use this method a few times on the data, and refine the column types a few times before I get it right.

My one complaint is that I wish PROCEDURE ANALYSE() would state the current column datatype as well as the “ideal” data type. It wastes a lot of time for me to make sure that the “ideal” data type is the one I already have, and it should not be that difficult for the function itself to do it.

Twice recently I have had to import data from a flat file into a database — both sets of data were being imported to make better use of the data in the files. Importing a file is not that difficult using mysqlimport, particularly since in both cases the data was tab-delimited. I also could have used LOAD DATA INFILE.

The problem with mysqlimport and LOAD DATA INFILE is that you need to create the table beforehand. Often, though, I may not know the exact data types. Folks have written about PROCEDURE ANALYSE(), but they use it in the context of “you can check to see that your tables have the correct data types.”

Now, that’s always a good thing — say, quarterly or yearly, doing a schema review, including indexes, foreign keys and data types. However, where PROCEDURE ANALYSE() really comes in handy for me is when I am importing data.

Basically, I create a table with the correct number of fields, all with the datatype of tinytext. Then I run a SELECT * FROM tbl PROCEDURE ANALYSE()\G to find out what types my data really want to be.

Of course, you want to pay attention to warnings when importing, especially truncation warnings. Generally I use this method a few times on the data, and refine the column types a few times before I get it right.

My one complaint is that I wish PROCEDURE ANALYSE() would state the current column datatype as well as the “ideal” data type. It wastes a lot of time for me to make sure that the “ideal” data type is the one I already have, and it should not be that difficult for the function itself to do it.

Open Source Removes the Need for Warez . . . . ?

I was listening to Practical Web Design Magazine’s podcast on Free Web Designer’s Toolkit, and towards the end he said something along the lines of:

With all this great open source software, there’s no need to buy or crack expensive software anymore.

I thought it was an interesting statement. I do not believe WAREZ will ever go away, because many like to have the expensive software as a bragging right — or even boast that they have a cracked version.

However, it does keep the honest people honest. If you want to do some photo editing, you can use the GIMP. If you want to write a document, spreadsheet or presentation, you can use OpenOffice. And these programs are compatible; they can read and save in proprietary formats as well.

It is getting to the point where not only are there excellent open source programs for anything you want to do, there are also “good enough” open source (or closed, but freeware) programs for almost anything. For instance, I was looking into open source versions of money management software (such as Quicken) for corporations and I found some web-based versions, which is great for organizations that want to be able to share information without having multiple software licenses.

Stealing will never go away; however, with less stealing, companies will be able to focus more on the products. And perhaps they’ll realize that if they go open source, “stealing” becomes a different beast, depending on the pricing model. It’s much more difficult to steal a support contract, which may involve a serial # of a machine, than a software key. Or consulting support, for that matter.

What do you think?

I was listening to Practical Web Design Magazine’s podcast on Free Web Designer’s Toolkit, and towards the end he said something along the lines of:

With all this great open source software, there’s no need to buy or crack expensive software anymore.

I thought it was an interesting statement. I do not believe WAREZ will ever go away, because many like to have the expensive software as a bragging right — or even boast that they have a cracked version.

However, it does keep the honest people honest. If you want to do some photo editing, you can use the GIMP. If you want to write a document, spreadsheet or presentation, you can use OpenOffice. And these programs are compatible; they can read and save in proprietary formats as well.

It is getting to the point where not only are there excellent open source programs for anything you want to do, there are also “good enough” open source (or closed, but freeware) programs for almost anything. For instance, I was looking into open source versions of money management software (such as Quicken) for corporations and I found some web-based versions, which is great for organizations that want to be able to share information without having multiple software licenses.

Stealing will never go away; however, with less stealing, companies will be able to focus more on the products. And perhaps they’ll realize that if they go open source, “stealing” becomes a different beast, depending on the pricing model. It’s much more difficult to steal a support contract, which may involve a serial # of a machine, than a software key. Or consulting support, for that matter.

What do you think?