Cardinality.
Also a blog post on “using the sakila sample database”
Speeding-up Queries: New Features of the MySQL 5.0 Query Engine — Katchaounov — blog about it, find the slides, explain well.
new fulltext parser like sergei’s talk. dayum!
Things I learned
Tune server params is last resort AFTER application optimization (from Jay)
http://dev.mysql.com/doc/refman/5.0/en/communication-errors.html
http://lists.mysql.com/mysql/196677 and my response.
nested set/adjacency hybrid code from roland
FOUND_ROWS
ROW_COUNT
SQL_CALC_FOUND_ROWS — calculate how many there are when you’re doing a limit. (or SELECT CALC_FOUND ROWS)
http://firestuff.org/wordpress/2006/07/20/sqlsearch-100/
crosstab: http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html
http://www.futhark.ch/mysql/106.html
(sun has a db tech group — neat, ponder a job?)
What about composite searches — and, or, not? Pivot tables? Crosstabs? SELECTs and defaults, as well as multiple selections? newsome’s “building sane query interfaces” get the code from newsome@acm.org (emailed 8/16)
‘mysqld –verbose –help’
how to find out how it was compiled?
Cardinality.
Also a blog post on “using the sakila sample database”
Speeding-up Queries: New Features of the MySQL 5.0 Query Engine — Katchaounov — blog about it, find the slides, explain well.
new fulltext parser like sergei’s talk. dayum!
Things I learned
Tune server params is last resort AFTER application optimization (from Jay)
http://dev.mysql.com/doc/refman/5.0/en/communication-errors.html
http://lists.mysql.com/mysql/196677 and my response.
nested set/adjacency hybrid code from roland
FOUND_ROWS
ROW_COUNT
SQL_CALC_FOUND_ROWS — calculate how many there are when you’re doing a limit. (or SELECT CALC_FOUND ROWS)
http://firestuff.org/wordpress/2006/07/20/sqlsearch-100/
crosstab: http://rpbouman.blogspot.com/2005/10/creating-crosstabs-in-mysql.html
http://www.futhark.ch/mysql/106.html
(sun has a db tech group — neat, ponder a job?)
What about composite searches — and, or, not? Pivot tables? Crosstabs? SELECTs and defaults, as well as multiple selections? newsome’s “building sane query interfaces” get the code from newsome@acm.org (emailed 8/16)
‘mysqld –verbose –help’
how to find out how it was compiled?
If foreign keys are a part of the database namespace, why is it that you have to ALTER TABLE to add and remove them? Why not alter the database? I suspect the issue is more in MySQL’s limitations that names of foreign keys be unique among tables in a database. . . but it does make me wonder.
If foreign keys are a part of the database namespace, why is it that you have to ALTER TABLE to add and remove them? Why not alter the database? I suspect the issue is more in MySQL’s limitations that names of foreign keys be unique among tables in a database. . . but it does make me wonder.
SHOW STATUS and SHOW VARIABLES are very useful features for analyzing server performance, server setup and general debugging. Many times, there are items applicable to a scenario in both lists. Further, sometimes it’s difficult to know whether an option or variable can be changed on the commandline or in an option file.
For example, let’s say you are debugging a lot of aborted connects. You may care about the following items in “SHOW STATUS”:
Aborted_connects
Connections
Max_used_connections
Threads_connected
Uptime
And the following items in SHOW VARIABLES:
connect_timeout
init_connect
max_connect_errors
max_connections
max_user_connections
You might compare max_connections to max_used_connections, for example. Let’s say you realize you want to increase max_connections from 100 to 200 — can you do it dynamically on the commandline? Do you have to put it in an option file and restart?
Now MySQL has a matrix of variables and options that easily shows what can be changed where:
http://dev.mysql.com/doc/refman/5.0/en/mysqld-option-tables.html
As a bonus, many of the items in the list have links to the section that explains what they do.
This is a great reference — the matrix is only for 5.0 and up, but most of it likely applies to below 5.0 — just that some options may not be available. At any rate, you can use the 5.0 matrix, click through to the variable/option’s description, and then use the link on the left-hand side to go to the “3.23, 4.0, 4.1” documentation, which will give you the correct information for that version.
I know this will save me much time.
SHOW STATUS and SHOW VARIABLES are very useful features for analyzing server performance, server setup and general debugging. Many times, there are items applicable to a scenario in both lists. Further, sometimes it’s difficult to know whether an option or variable can be changed on the commandline or in an option file.
For example, let’s say you are debugging a lot of aborted connects. You may care about the following items in “SHOW STATUS”:
Aborted_connects
Connections
Max_used_connections
Threads_connected
Uptime
And the following items in SHOW VARIABLES:
connect_timeout
init_connect
max_connect_errors
max_connections
max_user_connections
You might compare max_connections to max_used_connections, for example. Let’s say you realize you want to increase max_connections from 100 to 200 — can you do it dynamically on the commandline? Do you have to put it in an option file and restart?
Now MySQL has a matrix of variables and options that easily shows what can be changed where:
http://dev.mysql.com/doc/refman/5.0/en/mysqld-option-tables.html
As a bonus, many of the items in the list have links to the section that explains what they do.
This is a great reference — the matrix is only for 5.0 and up, but most of it likely applies to below 5.0 — just that some options may not be available. At any rate, you can use the 5.0 matrix, click through to the variable/option’s description, and then use the link on the left-hand side to go to the “3.23, 4.0, 4.1” documentation, which will give you the correct information for that version.
I know this will save me much time.
Even if you have listed your name on the Participants page of mysqlcamp.org, you need to register.
Granted, the registration page says that all that’s needed to register is to send
Jay Pipes (jay@mysql.com) the following information:
Your Name
Your Company Name
City and State of Residence
And that’s it. So do it today! The conference is free and open to anyone who registers.
Even if you have listed your name on the Participants page of mysqlcamp.org, you need to register.
Granted, the registration page says that all that’s needed to register is to send
Jay Pipes (jay@mysql.com) the following information:
Your Name
Your Company Name
City and State of Residence
And that’s it. So do it today! The conference is free and open to anyone who registers.
Vote for your favorite MySQL Community(TM) logo! Add comments on existing logos! Even mock up
http://mysqlcamp.pbwiki.com/LogoVoting
Seriously, though — think of how a new logo will reflect the Community, and how we all want to be perceived. Also consider what makes a nice favicon.ico …..
So vote on something that really matters to you, and make your voice heard. It does not cost anything and takes just a few seconds.
Vote for your favorite MySQL Community(TM) logo! Add comments on existing logos! Even mock up
http://mysqlcamp.pbwiki.com/LogoVoting
Seriously, though — think of how a new logo will reflect the Community, and how we all want to be perceived. Also consider what makes a nice favicon.ico …..
So vote on something that really matters to you, and make your voice heard. It does not cost anything and takes just a few seconds.
I have found http://9rules.com/ to have pretty good content, and they’re accepting submissions for their community. This is a great opportunity to post your great content to get more exposure.
Submit your site at: http://9rules.com/submit/
I have found http://9rules.com/ to have pretty good content, and they’re accepting submissions for their community. This is a great opportunity to post your great content to get more exposure.
Submit your site at: http://9rules.com/submit/
I posted earlier about the MySQL announcement (see http://www.mysql.com/news-and-events/news/article_1171.html )and how I see it affecting Enterprise level users.
However, as someone who uses it for work other than my day job, some at the hobby level and some above that level, the Community edition is also important to me.
The way I see it, the changes are great for people who use the Community edition.
Wait, there will be 2 codebases, and they have not doubled their staff, so won’t there actually be less development? Are they expecting the community to write all the code for the Community edition?
Kind of. I do know that MySQL has been hiring lots of people, but I’m not an employee, so I have no idea of their growth. There will be 2 codebases, but that works out for the best for everyone. Now, there will be 2 priority lists — things that Enterprise customers really want, and things that Community users really want.
So Community users do not have to hear “Sorry, that’s not a priority, but if you pay us we’ll make it one.” Community users can get the features that are a priority for them, and Enterprise users can get the features that are a prioirty for them, too.
If a priority is high for Enterprise, and lower for Community, it can be developed in Enterprise first, and moved to Community. And, of course, vice versa. The part where the community coding comes in is that not only does MySQL have priorities for Community development, but if a user wants to submit something, they can, thus changing the priority order, because the work is mostly done, and all MySQL needs to do is integrate it.
But Community users do not have to suffer with a package that has more code (and thus is a bit bloatier) just because someone paid MySQL USD $1 million to extend the maximum table size to 1 terabyte. Nor do they have to wait for development on something like that to finish before other features get implemented.
The announcement also had some poor wording choices. The Community edition will be as stable and reliable as ever. The truth is that new features have bugs, and there will be more new features in Community than in Enterprise. As well, both Community and Enterprise are tested quite thoroughly. But Enterprise will be tested to enterprise-level standards — for example, throughput could be tested and MySQL could say “buy Enterprise! It can handle 10,000 queries per second!”
My company is using the Community Edition, and we have over 3600 queries per second being run — and that’s an AVERAGE. So there’s no question in my mind that the Community Edition is stable and reliable enough to be used at an “enterprise” level. That quality will NOT go away.
I’m not quite clear on what the licensing will be for the Enterprise and Community Editions, but I would think that MySQL will keep the ability to embed and redistribute both editions. I don’t know for sure, but I think it would be a bad business move not to.
I have had a few people come to me and say “Did you see the announcement? How could MySQL do this? They’re forcing everyone that’s not a hobbyist to buy the product!!!!” Everyone predicted the end of InnoDB when Oracle bought it — I said “it’s no big deal” (see http://sheeri.net/archives/38).
Similarly, this is not a loss for the MySQL community. It’s actually a gain for both Enterprise and Community users. I’m not a MySQL employee, so I don’t know everything that’s going on behind the scenes, but I have used and administered MySQL for over 5 years in real life situations and environments, and I can tell you that this is a win. We will hear lots less of “well, we could do that, but we’re programming something really big right now, so it will have to wait”.
As for the website changes, my one complaint is that at http://mysql.com/products/, the Community edition isn’t listed — there’s a link to the “Community” site that’s hard to miss on the right-hand side, and a menu item similar to “Products” at the top nav bar called “Community” — but it doesn’t seem consistent. Either you have “Products” and “Services” and you have a “Free download” section under “Products”, or you have a section with “Enterprise” and a section with “Community”….mixing the 2 is bad UI.
But if that’s the worst I have to say, then the changes are pretty good. And it’s more obvious that http://dev.mysql.com is for Community stuff….and when I went there, I found the polls….NEAT!
I posted earlier about the MySQL announcement (see http://www.mysql.com/news-and-events/news/article_1171.html )and how I see it affecting Enterprise level users.
However, as someone who uses it for work other than my day job, some at the hobby level and some above that level, the Community edition is also important to me.
The way I see it, the changes are great for people who use the Community edition.
Wait, there will be 2 codebases, and they have not doubled their staff, so won’t there actually be less development? Are they expecting the community to write all the code for the Community edition?
Kind of. I do know that MySQL has been hiring lots of people, but I’m not an employee, so I have no idea of their growth. There will be 2 codebases, but that works out for the best for everyone. Now, there will be 2 priority lists — things that Enterprise customers really want, and things that Community users really want.
So Community users do not have to hear “Sorry, that’s not a priority, but if you pay us we’ll make it one.” Community users can get the features that are a priority for them, and Enterprise users can get the features that are a prioirty for them, too.
If a priority is high for Enterprise, and lower for Community, it can be developed in Enterprise first, and moved to Community. And, of course, vice versa. The part where the community coding comes in is that not only does MySQL have priorities for Community development, but if a user wants to submit something, they can, thus changing the priority order, because the work is mostly done, and all MySQL needs to do is integrate it.
But Community users do not have to suffer with a package that has more code (and thus is a bit bloatier) just because someone paid MySQL USD $1 million to extend the maximum table size to 1 terabyte. Nor do they have to wait for development on something like that to finish before other features get implemented.
The announcement also had some poor wording choices. The Community edition will be as stable and reliable as ever. The truth is that new features have bugs, and there will be more new features in Community than in Enterprise. As well, both Community and Enterprise are tested quite thoroughly. But Enterprise will be tested to enterprise-level standards — for example, throughput could be tested and MySQL could say “buy Enterprise! It can handle 10,000 queries per second!”
My company is using the Community Edition, and we have over 3600 queries per second being run — and that’s an AVERAGE. So there’s no question in my mind that the Community Edition is stable and reliable enough to be used at an “enterprise” level. That quality will NOT go away.
I’m not quite clear on what the licensing will be for the Enterprise and Community Editions, but I would think that MySQL will keep the ability to embed and redistribute both editions. I don’t know for sure, but I think it would be a bad business move not to.
I have had a few people come to me and say “Did you see the announcement? How could MySQL do this? They’re forcing everyone that’s not a hobbyist to buy the product!!!!” Everyone predicted the end of InnoDB when Oracle bought it — I said “it’s no big deal” (see http://sheeri.net/archives/38).
Similarly, this is not a loss for the MySQL community. It’s actually a gain for both Enterprise and Community users. I’m not a MySQL employee, so I don’t know everything that’s going on behind the scenes, but I have used and administered MySQL for over 5 years in real life situations and environments, and I can tell you that this is a win. We will hear lots less of “well, we could do that, but we’re programming something really big right now, so it will have to wait”.
As for the website changes, my one complaint is that at http://mysql.com/products/, the Community edition isn’t listed — there’s a link to the “Community” site that’s hard to miss on the right-hand side, and a menu item similar to “Products” at the top nav bar called “Community” — but it doesn’t seem consistent. Either you have “Products” and “Services” and you have a “Free download” section under “Products”, or you have a section with “Enterprise” and a section with “Community”….mixing the 2 is bad UI.
But if that’s the worst I have to say, then the changes are pretty good. And it’s more obvious that http://dev.mysql.com is for Community stuff….and when I went there, I found the polls….NEAT!
OK, so I am excited. Yesterday MySQL announced MySQL Enterprise (http://www.mysql.com/news-and-events/news/article_1171.html). Now, I had not been so keen on MySQL Network — I had not realized the level of service MySQL will go to. Does Oracle or Microsoft offer schema review as part of a service package? I do not think so.
Basically, MySQL Enterprise is really a full suite of enterprise solutions for companies that need them. Many folks know how hard it can be to convince their boss that a DBA (or team of DBA’s) is needed. Many folks know how hard it is to be a DBA, and many folks know how hard it is to be a developer or sysadmin with DBA duties.
It’s easier to convince a company to spend money on a support contract than it is to spend money on headcount. And with MySQL, companies get so much for their money.
Oh, sure, there is what we already know — the enterprise binary, which is rigorously tested against enterprise-level demands. The “GA” or “Generally Available” (which will now be the “Community” edition) source code/binaries are tested for functionality, but not necessarily for things like throughput. (I have no idea if the Enterprise binary is tested for throughput, it’s just a good example of the kind of thing that people want in enterprise-level software).
There is also what anyone would expect from a support contract — access to updates, incident response (phone and web), emergency help, a knowledgebase, etc.
But looking at the features (http://mysql.com/products/enterprise/features.html), there’s a whole world of services that as far as I know, very few software companies are offering.
Let me say that again — MySQL offers features in their support contracts that very few other software companies offer. As far as I know, you cannot buy all of these from ANY of the enterprise-level database vendors.
MySQL offers consultative services, including schema reviews, performance tuning and SQL code reviews. Yes, that’s right, MySQL offers the service of reviewing your SQL queries, including procedures and server extensions.
The feature list can be seen here:
http://mysql.com/products/enterprise/features.html
All in all, very exciting for those who use MySQL in an enterprise-level capacity (which I do at work).
OK, so I am excited. Yesterday MySQL announced MySQL Enterprise (http://www.mysql.com/news-and-events/news/article_1171.html). Now, I had not been so keen on MySQL Network — I had not realized the level of service MySQL will go to. Does Oracle or Microsoft offer schema review as part of a service package? I do not think so.
Basically, MySQL Enterprise is really a full suite of enterprise solutions for companies that need them. Many folks know how hard it can be to convince their boss that a DBA (or team of DBA’s) is needed. Many folks know how hard it is to be a DBA, and many folks know how hard it is to be a developer or sysadmin with DBA duties.
It’s easier to convince a company to spend money on a support contract than it is to spend money on headcount. And with MySQL, companies get so much for their money.
Oh, sure, there is what we already know — the enterprise binary, which is rigorously tested against enterprise-level demands. The “GA” or “Generally Available” (which will now be the “Community” edition) source code/binaries are tested for functionality, but not necessarily for things like throughput. (I have no idea if the Enterprise binary is tested for throughput, it’s just a good example of the kind of thing that people want in enterprise-level software).
There is also what anyone would expect from a support contract — access to updates, incident response (phone and web), emergency help, a knowledgebase, etc.
But looking at the features (http://mysql.com/products/enterprise/features.html), there’s a whole world of services that as far as I know, very few software companies are offering.
Let me say that again — MySQL offers features in their support contracts that very few other software companies offer. As far as I know, you cannot buy all of these from ANY of the enterprise-level database vendors.
MySQL offers consultative services, including schema reviews, performance tuning and SQL code reviews. Yes, that’s right, MySQL offers the service of reviewing your SQL queries, including procedures and server extensions.
The feature list can be seen here:
http://mysql.com/products/enterprise/features.html
All in all, very exciting for those who use MySQL in an enterprise-level capacity (which I do at work).
This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640
(From the original poster:)
I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.
Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.
(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.
What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.
The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.
Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.
And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.
Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.
What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?
It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).
I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.
This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640
(From the original poster:)
I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.
Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.
(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.
What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.
The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.
Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.
And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.
Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.
What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?
It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).
I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.
So, the biggest issue from last month’s Boston MySQL Meetup was “What is the most efficient way to query across many values that are boolean in nature?”
Now, it was a bit more complex than that, as it always is. Values may be set or not set, and people may search across any number of boolean values, or all of them. There are over 1 million entries, and over 40 boolean values!
A few ways of doing so came up:
1) Simply put all the values into separate columns, with a BOOLEAN type (really TINYINT(1))
2) Using ENUMs
3) Using foreign keys to “join tables”
4) Using SET
5) Using bits and bit operations
Using real data, the original questioner and I sought to find the best answer for his case.
So, the biggest issue from last month’s Boston MySQL Meetup was “What is the most efficient way to query across many values that are boolean in nature?”
Now, it was a bit more complex than that, as it always is. Values may be set or not set, and people may search across any number of boolean values, or all of them. There are over 1 million entries, and over 40 boolean values!
A few ways of doing so came up:
1) Simply put all the values into separate columns, with a BOOLEAN type (really TINYINT(1))
2) Using ENUMs
3) Using foreign keys to “join tables”
4) Using SET
5) Using bits and bit operations
Using real data, the original questioner and I sought to find the best answer for his case.