OurSQL Episode 17: Hashing it out

In this episode we tackle what a hash looks like in terms of a data structure, in preparation for next episode’s discussion on the difference between hashes and btree indexes, and what kind of indexes are good for what kind of optimizations.

Show Notes:
Direct play this episode at:
http://technocation.org/content/oursql-episode-17%3A-hashing-it-out-0

Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql

News:
MySQL Connector/NET 5.1.1 released:
http://tinyurl.com/23a9ax

Download the new Connector/NET version:
http://dev.mysql.com/downloads/connector/net/5.1.html

MySQL 5.0.x security vulnerability:
http://bugs.mysql.com/bug.php?id=27513
Solution: upgrade to 5.0.40. This bug is not known to affect major versions 3 or 4.

Learning Resource:

http://onlinesolutionsmysql.blogspot.com/

The dates for the all the sessions:

* 27th March: Part 1 – High Availability and Scalability Architectures
* 19th April: Part 2 – Advanced Scalability Solutions
* 2nd May: Part 3 – MySQL Enterprise To Control Mission Critical Online Services
* 23rd May: Part 4 – 99.999% High Availability solutions
* 13th June: Part 5 – MySQL Enterprise performance and benchmarking
* 27th June: Part 6 – Advanced HA solutions

Find all the material and documentation for past webinars at:
http://onlinesolutionsmysql.blogspot.com/2007/03/links-to-material-and-documentation.html

Feature: Hash tables explained.

http://www.sparknotes.com/cs/searching/hashtables/section1.html

http://www.cs.sunysb.edu/~algorith/lectures-good/node7.html (search for “Hash Tables” on the page)

Feedback:

Email podcast@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

In this episode we tackle what a hash looks like in terms of a data structure, in preparation for next episode’s discussion on the difference between hashes and btree indexes, and what kind of indexes are good for what kind of optimizations.

Show Notes:
Direct play this episode at:
http://technocation.org/content/oursql-episode-17%3A-hashing-it-out-0

Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql

News:
MySQL Connector/NET 5.1.1 released:
http://tinyurl.com/23a9ax

Download the new Connector/NET version:
http://dev.mysql.com/downloads/connector/net/5.1.html

MySQL 5.0.x security vulnerability:
http://bugs.mysql.com/bug.php?id=27513
Solution: upgrade to 5.0.40. This bug is not known to affect major versions 3 or 4.

Learning Resource:

http://onlinesolutionsmysql.blogspot.com/

The dates for the all the sessions:

* 27th March: Part 1 – High Availability and Scalability Architectures
* 19th April: Part 2 – Advanced Scalability Solutions
* 2nd May: Part 3 – MySQL Enterprise To Control Mission Critical Online Services
* 23rd May: Part 4 – 99.999% High Availability solutions
* 13th June: Part 5 – MySQL Enterprise performance and benchmarking
* 27th June: Part 6 – Advanced HA solutions

Find all the material and documentation for past webinars at:
http://onlinesolutionsmysql.blogspot.com/2007/03/links-to-material-and-documentation.html

Feature: Hash tables explained.

http://www.sparknotes.com/cs/searching/hashtables/section1.html

http://www.cs.sunysb.edu/~algorith/lectures-good/node7.html (search for “Hash Tables” on the page)

Feedback:

Email podcast@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

Quiz Show Video Up

For folks to know — to create the page with links to conference material, I took the slides from the O’Reilly official page, combines it with the myriad of “here are my slides” posts to Planet MySQL, and links to Baron, Kevin and Mike’s audio and video as well as the video and audio I processed (Because Baron made statements about bandwidth, I downloaded the .ogg files and technocation.org is hosting them, whereas Kevin and Mike’s files are linked to).

I know I hate going to 20 places to find everything I want. There’s no need for folks to have to go to more than one site, just because the content was provided by more than one person. The referenced page is one-stop shopping, as I feel it should be. If folks have anything to add (or change), I’m happy to update the page, just comment here.

So, I have updated the page at:

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

that fixes a link, adds some slides, and also adds the Quiz Show footage that I have. I got a late start to the Quiz Show, but I did get it in time to catch Solomon Chang’s infamous “Coder McKinnan o’ The Cubicles”. Sadly, I did not have video of the dance, but you can see that at http://people.warp.es/~nacho/blog/?p=225 — that post also contains a comment by Solomon himself with the lyrics (which he also sent to me, but I’m a bad videographer and took too long to process all the video…).

(updates to the page: Fixed the “Clash of the Database Egos” wmv link, thanks to Hakan Kücükyilmaz for pointing out the brokenness. Added the link to download slides (pdf and swf) for the SQL Kitchen talk, courtesy of Damien Seguy. Added the Quiz Show and links to audio and video.)

Go forth and enjoy!

For folks to know — to create the page with links to conference material, I took the slides from the O’Reilly official page, combines it with the myriad of “here are my slides” posts to Planet MySQL, and links to Baron, Kevin and Mike’s audio and video as well as the video and audio I processed (Because Baron made statements about bandwidth, I downloaded the .ogg files and technocation.org is hosting them, whereas Kevin and Mike’s files are linked to).

I know I hate going to 20 places to find everything I want. There’s no need for folks to have to go to more than one site, just because the content was provided by more than one person. The referenced page is one-stop shopping, as I feel it should be. If folks have anything to add (or change), I’m happy to update the page, just comment here.

So, I have updated the page at:

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

that fixes a link, adds some slides, and also adds the Quiz Show footage that I have. I got a late start to the Quiz Show, but I did get it in time to catch Solomon Chang’s infamous “Coder McKinnan o’ The Cubicles”. Sadly, I did not have video of the dance, but you can see that at http://people.warp.es/~nacho/blog/?p=225 — that post also contains a comment by Solomon himself with the lyrics (which he also sent to me, but I’m a bad videographer and took too long to process all the video…).

(updates to the page: Fixed the “Clash of the Database Egos” wmv link, thanks to Hakan Kücükyilmaz for pointing out the brokenness. Added the link to download slides (pdf and swf) for the SQL Kitchen talk, courtesy of Damien Seguy. Added the Quiz Show and links to audio and video.)

Go forth and enjoy!

2007 MySQL Conference Slides, Video and Audio Now Available

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

Need I say more? Go download the slides, video and audio from the 2007 MySQL Users Conference & Expo. I have no plans to take anything down, so please download wisely, and take only what you need. If there’s demand, I can make higher-quality versions available. I can also burn DVD’s of the content if that’s desired.

Enjoy!

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

Need I say more? Go download the slides, video and audio from the 2007 MySQL Users Conference & Expo. I have no plans to take anything down, so please download wisely, and take only what you need. If there’s demand, I can make higher-quality versions available. I can also burn DVD’s of the content if that’s desired.

Enjoy!

MySQL Boston May User Group: Auditing MySQL for Security and Compliance

Mehlam Shakir, CTO of RippleTech, discusses a practical approach for auditing MySQL databases to meet security and compliance regulations. Hear real-world cases and see a live demonstration of how RippleTech’s Informant solution compliments MySQL by adding a security layer without any performance impact.

For more information on RippleTech’s INFORMANT, visit http://www.rippletech.com/

I have to say, I was a bit worried this would be a typical vendor presentation where every other word is marketing speak for how great the product is. It actually just ended up being “here’s how Informant works, and here’s how auditing, security and compliance needs can be met,” presented in a way that’s useful and valuable to anyone who is interested in auditing or security.

Rippletech’s Informant is not only interesting because it’s currently the only software that audits MySQL, but it’s impressive in its simplicity and flexibility. I think my favorite surprise about Informant was that it has the ability to store a user session as just that.

Download the video of the presentation at:
http://technocation.org/movies/mysql/AuditingRippleTech2007MayUGbig.wmv”>http://technocation.org/movies/mysql/AuditingRippleTech2007MayUGbig.wmv

http://technocation.org/movies/mysql/AuditingRippleTech2007MayUGbig.wmv (446 Mb)

Mehlam Shakir, CTO of RippleTech, discusses a practical approach for auditing MySQL databases to meet security and compliance regulations. Hear real-world cases and see a live demonstration of how RippleTech’s Informant solution compliments MySQL by adding a security layer without any performance impact.

For more information on RippleTech’s INFORMANT, visit http://www.rippletech.com/

I have to say, I was a bit worried this would be a typical vendor presentation where every other word is marketing speak for how great the product is. It actually just ended up being “here’s how Informant works, and here’s how auditing, security and compliance needs can be met,” presented in a way that’s useful and valuable to anyone who is interested in auditing or security.

Rippletech’s Informant is not only interesting because it’s currently the only software that audits MySQL, but it’s impressive in its simplicity and flexibility. I think my favorite surprise about Informant was that it has the ability to store a user session as just that.

Download the video of the presentation at:
http://technocation.org/movies/mysql/AuditingRippleTech2007MayUGbig.wmv”>http://technocation.org/movies/mysql/AuditingRippleTech2007MayUGbig.wmv

http://technocation.org/movies/mysql/AuditingRippleTech2007MayUGbig.wmv (446 Mb)

Work With the She-BA

You’ve heard me on the MySQL Podcast at http://www.technocation.org, now come work with me, the “She”-BA!

The company I work for is an online social networking/dating site. Our main product is for men seeking men in 87 countries throughout the world. We’re looking for another MySQL DBA, as designing schemas maintaining data integrity for our 1 million users (and growing fast!). The salary is dependent upon experience of course, but the company I work for pays on the high side of the industry standard for the Boston area.

Application Instructions
Please send cover letter, résumé and sample schema to work@online-buddies.com, with “MySQL DBA” as your subject. The
sample schema should reflect your abilities, so if you send along a schema you would like to see improved, include a description of what you would love to do to that schema to make it better.

The fun “requirements”:

  • A schema of 1 database with 85 tables — all of which have an auto-increment id as their primary key — makes you cringe
  • Knowing the difference between InnoDB and MyISAM storage engines and when you might use them
  • You cry when you see field names like “ExtraInfo1” and “ExtraInfo2”
  • You want to poke your own eyes out when you see schemas full of varchar(20) default NULL
  • When someone says, “Can you add a [type, ie, INT, BLOB] field to the table for me?” your first response is, “Sure, why do you need it?”
  • You know that “data warehouse” is not a synonym for “replicated copy of the database” — and if you did not, your first thought upon reading that was “It isn’t? I wonder how the schemas would be different?”
  • When designing a schema, you draw out an ER diagram first (or are willing to learn how)
  • Thinking about what data will be stored is the most important thing to you when you’re creating a table, not how the application will access it
  • You are comfortable with a job that does not involve writing code in a procedural language, but are comfortable enough if you have to help debug code written in a procedural language that you could.
  • You answer the question ‘Do images belong in the filesystem or the database?’ with ‘it depends’ and can go through scenarios of when each one is appropriate. (Alternatively you say “I have no idea” and do some research)

On to the more boring description…..

MySQL Database Administrator

The MySQL DBA will ensure that all data remains consistent across the database, that the data is clearly defined, that all users access data concurrently, in a form that suits their needs, that there is provision for data security and recovery control (all data is retrievable in an emergency).

Essential Duties/Tasks

* assist in establishing the needs of users;
* plan dataflow for a new or revised database;
* help design databases;
* test all new systems;
* maintain data standards, including adherence to the Data Protection Act, C.I.S.P or other security requirements as implemented and dictated;
* write database documentation, including data standards, procedures and definitions for the data dictionary;
* control access permissions and privileges;
* ensuring that storage, archiving, backup and recovery procedures are functioning correctly;
* capacity planning;
* work closely with IT project managers, programmers and developers;
* provide technical support for outdated legacy systems;
* ensure the database integrity and security;
* commission and install new applications.

Knowledge of Industry, Product, and Technology

* Bachelor’s degree
* At least 2 years of experience that is directly related to the
duties and responsibilities specified.

Education and other Qualifications

* Knowledge of current technological developments/trends in area
of expertise.
* Ability to interpret data models and to develop database structures.
* Ability to use standard diagramming techniques to design and
develop computer data models.
* Ability to configure, manage, and maintain the operation of
complex relational databases.
* Ability to develop and manipulate large, complex data sets.
* Knowledge of computer and/or network security systems,
applications, procedures, and techniques.
* Ability to operate on a scheduled 24-hour on-call basis.
* Knowledge of data integrity methods and techniques a plus.
* Technical writing skills.
* Ability to install, maintain, modify, and upgrade MySQL.

About our Company
Founded in 2001, Online Buddies Inc., continues to enjoy extraordinary growth as we achieve worldwide recognition for our product range of alternative lifestyle online personals; each providing safe, friendly and exciting sites through which members can express themselves, communicate and interact with one another as they wish.

Our Mission is to build upon our reputation as an internationally regarded leader for online personals as well as an organization that positively impacts the communities we serve. We work to accomplish this goal through partnerships with local, state and federal health and human service organizations; providing our members with accurate and easily accessible health-related information.

We acknowledge that our success begins with our ability to select a uniquely talented and diverse workforce that is afforded equal opportunity to enjoy both personal and professional growth, contributing to our collective success as we work to achieve our individual aspirations.

Application Instructions (repeated from top)
Application Instructions
Please send cover letter, résumé and sample schema to work@online-buddies.com, with “MySQL DBA” as your subject. The
sample schema should reflect your abilities, so if you send along a schema you would like to see improved, include a description of what you would love to do to that schema to make it better.

You’ve heard me on the MySQL Podcast at http://www.technocation.org, now come work with me, the “She”-BA!

The company I work for is an online social networking/dating site. Our main product is for men seeking men in 87 countries throughout the world. We’re looking for another MySQL DBA, as designing schemas maintaining data integrity for our 1 million users (and growing fast!). The salary is dependent upon experience of course, but the company I work for pays on the high side of the industry standard for the Boston area.

Application Instructions
Please send cover letter, résumé and sample schema to work@online-buddies.com, with “MySQL DBA” as your subject. The
sample schema should reflect your abilities, so if you send along a schema you would like to see improved, include a description of what you would love to do to that schema to make it better.

The fun “requirements”:

  • A schema of 1 database with 85 tables — all of which have an auto-increment id as their primary key — makes you cringe
  • Knowing the difference between InnoDB and MyISAM storage engines and when you might use them
  • You cry when you see field names like “ExtraInfo1” and “ExtraInfo2”
  • You want to poke your own eyes out when you see schemas full of varchar(20) default NULL
  • When someone says, “Can you add a [type, ie, INT, BLOB] field to the table for me?” your first response is, “Sure, why do you need it?”
  • You know that “data warehouse” is not a synonym for “replicated copy of the database” — and if you did not, your first thought upon reading that was “It isn’t? I wonder how the schemas would be different?”
  • When designing a schema, you draw out an ER diagram first (or are willing to learn how)
  • Thinking about what data will be stored is the most important thing to you when you’re creating a table, not how the application will access it
  • You are comfortable with a job that does not involve writing code in a procedural language, but are comfortable enough if you have to help debug code written in a procedural language that you could.
  • You answer the question ‘Do images belong in the filesystem or the database?’ with ‘it depends’ and can go through scenarios of when each one is appropriate. (Alternatively you say “I have no idea” and do some research)

On to the more boring description…..
Continue reading “Work With the She-BA”

Upgrade news & OpenID

Today I upgraded the blog software at sheeri.com (and sheeri.net and sheeri.org). Please let me know if you find something that doesn’t work as expected — awfief@gmail.com.

At the MySQL Users Conference, my good friend Mark Atwood (creator of the free Amazon S3 Storage Engine) mentioned that any site with a login should have OpenID as an option.

Mark, I upgraded for you! I was using Wordpress 1.5.2, now I’m at the “latest” version. Anyway, this is just to let folks know that if you so choose, you may now use OpenId if you wish to login and make comments.

Of course, I do not require login (and have a great spam filter) so that’s just another choice you have.

Today I upgraded the blog software at sheeri.com (and sheeri.net and sheeri.org). Please let me know if you find something that doesn’t work as expected — awfief@gmail.com.

At the MySQL Users Conference, my good friend Mark Atwood (creator of the free Amazon S3 Storage Engine) mentioned that any site with a login should have OpenID as an option.

Mark, I upgraded for you! I was using WordPress 1.5.2, now I’m at the “latest” version. Anyway, this is just to let folks know that if you so choose, you may now use OpenId if you wish to login and make comments.

Of course, I do not require login (and have a great spam filter) so that’s just another choice you have.

OurSQL Episode 16: The Art of Innovation, Guy Kawasaki

I take the easy way out again this week by sharing Guy Kawasaki (of the How To Change the World blog) and his irreverent and truthful keynote at the 2007 MySQL Users Conference.

Kawasaki will challenge your thoughts about being an entrepreneur in the technology industry.

The big news is that soon I’ll be able to announce that the videos from the conference sessions are up….stay tuned!

Show Notes:
Guy Kawasaki’s Blog: How to Change the World
http://blog.guykawasaki.com/

Direct play this episode at:
http://technocation.org/content/oursql-episode-16%3A-art-innovation%2C-guy-kawasaki

Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql

Feedback:

Email podcast@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

I take the easy way out again this week by sharing Guy Kawasaki (of the How To Change the World blog) and his irreverent and truthful keynote at the 2007 MySQL Users Conference.

Kawasaki will challenge your thoughts about being an entrepreneur in the technology industry.

The big news is that soon I’ll be able to announce that the videos from the conference sessions are up….stay tuned!

Show Notes:
Guy Kawasaki’s Blog: How to Change the World
http://blog.guykawasaki.com/

Direct play this episode at:
http://technocation.org/content/oursql-episode-16%3A-art-innovation%2C-guy-kawasaki

Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql

Feedback:

Email podcast@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

OurSQL Episode 15: Eben Moglen’s Keynote at the MySQL Conference

Eben Moglen, director of the Software Freedom Law Center, discusses why Free Beer isn't so good if your data are getting drunk! The keynote looks at how "Free as in Freedom" businesses help prevent the ultimate privacy catastrophe.
 
This speech is not to be missed! 
 
 
 
Direct play this episode at:
 
Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql
 
Feedback:

Email
info@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
 

Eben Moglen, director of the Software Freedom Law Center, discusses why Free Beer isn't so good if your data are getting drunk! The keynote looks at how "Free as in Freedom" businesses help prevent the ultimate privacy catastrophe.
 
This speech is not to be missed! 
 
 
 
Direct play this episode at:
 
Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql
 
Feedback:

Email
info@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
 

SQL performance tips for podcast

Things to avoid (config)
10. –skip-name-resolve
13. increase temp table size in a data warehousing environment (default is 32Mb) so it doesn’t write to disk (also constrained by max_heap_table_size, default 16Mb)
4. if you can, compress text/blobs
5. compress static data
6. don’t back up static data as often

Things to avoid: schema
12. Separate text/blobs from metadata, don’t put text/blobs in results if you don’t need them
18. Redundant data is redundant

Top 1000 SQL Performance Tips

Interactive session from MySQL Camp I:

Specific Query Performance Tips (see also database design tips for tips on indexes):

1. Use EXPLAIN to profile the query execution plan
2. Use Slow Query Log (always have it on!)
3. Don’t use DISTINCT when you have or could use GROUP BY
4. Insert performance
1. Batch INSERT and REPLACE
2. Use LOAD DATA instead of INSERT
5. LIMIT m,n may not be as fast as it sounds
6. Don’t use ORDER BY RAND() if you have > ~2K records
13. Derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs without sorting them. (Self-join can speed up a query if 1st part finds the IDs and uses then to fetch the rest)
15. Know when to split a complex query and join smaller ones
16. Delete small amounts at a time if you can
18. Have good SQL query standards
19. Don’t use deprecated features

Scaling Performance Tips:

1. Use benchmarking
2. isolate workloads don’t let administrative work interfere with customer performance. (ie backups)
3. Debugging sucks, testing rocks!

Network Performance Tips:

1. Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
3. Be wary of lots of small quick queries if a longer query can be more efficient
2. Use multi_query if appropriate to reduce round-trips

OS Performance Tips:

1. Use proper data partitions
1. For Cluster. Start thinking about Cluster *before* you need them
2. Keep the database host as clean as possible. Do you really need a windowing system on that server?
3. Utilize the strengths of the OS
4. pare down cron scripts
5. create a test environment
6. source control schema and config files
7. for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll forward
8. partition appropriately
9. partition your database when you have real data — do not assume you know your dataset until you have real data

MySQL Server Overall Tips:

1. innodb_flush_commit=0 can help slave lag
2. Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help determine the smallest data type for your needs.
3. use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive lock. share mode vs. FOR UPDATE
8. config params — http://docs.cellblue.nl/easy_mysql_performance_tweaks/ is a good reference
9. Config variables & tips:
1. use one of the supplied config files
2. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables
3. be aware of global vs. per-connection variables
4. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
5. be aware of swapping esp. with Linux, “swappiness” (bypass OS filecache for innodb data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
6. defragment tables, rebuild indexes, do table maintenance
7. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller
8. more RAM is good so faster disk speed
9. use 64-bit architectures
11. increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)
12. (look up) memory tuning parameter for on-insert caching
14. Run in SQL_MODE=STRICT to help identify warnings
15. /tmp dir on battery-backed write cache
16. consider battery-backed RAM for innodb logfiles
17. use –safe-updates for client

Storage Engine Performance Tips:
2. Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table.
3. BLACKHOLE engine and replication is much faster than FEDERATED tables for things like logs.
4. Know your storage engines and what performs best for your needs, know that different ones exist.
1. ie, use MERGE tables ARCHIVE tables for logs
2. Archive old data — don’t be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables
5. use row-level instead of table-level locking for OLTP workloads
6. try out a few schemas and storage engines in your test environment before picking one.

Database Design Performance Tips:

1. Design sane query schemas. don’t be afraid of table joins, often they are faster than denormalization
2. Don’t use boolean flags
8. Use a clever key and ORDER BY instead of MAX
9. Normalize first, and denormalize where appropriate.
10. Databases are not spreadsheets, even though Access really really looks like one. Then again, Access isn’t a real database

11. use INET_ATON and INET_NTOA for IP addresses, not char or varchar
12. make it a habit to REVERSE() email addresses, so you can easily search domains (this will help avoid wildcards at the start of LIKE queries if you want to find everyone whose e-mail is in a certain domain)
13. A NULL data type can take more room to store than NOT NULL
14. Choose appropriate character sets & collations — UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.
15. Use Triggers wisely
16. use min_rows and max_rows to specify approximate data size so space can be pre-allocated and reference points can be calculated.
18. Use myisam_pack_keys for int data
19. be able to change your schema without ruining functionality of your code
20. segregate tables/databases that benefit from different configuration variables

Other:

1. Hire a MySQL ™ Certified DBA
2. Know that there are many consulting companies out there that can help, as well as MySQL’s Professional Services.
3. Read and post to MySQL Planet at http://www.mysqlplanet.org
4. Attend the yearly MySQL Conference and Expo or other conferences with MySQL tracks (link to the conference here)
5. Support your local User Group (link to forge page w/user groups here)

Rebuild indexes and why
20. Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT), after 5.0 the index_merge should pick stuff up.
UNION was introduced in MySQL 4.0.
11. ORDER BY and LIMIT work best with equalities and covered indexes
1. InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary key very large
3. Use Indexes
4. Don’t Index Everything
5. Do not duplicate indexes
6. Do not use large columns in indexes if the ratio of SELECTs:INSERTs is low.
7. be careful of redundant columns in an index or across indexes
14. ALTER TABLE…ORDER BY can take data sorted chronologically and re-order it by a different field — this can make queries on that field run faster (maybe this goes in indexing?)
4. As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
17. Use HASH indexing for indexing across columns with similar data prefixes

Used in Episode 2:
8. Avoid wildcards at the start of LIKE queries
21. Don’t use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT FOUND_ROWS()
2. Don’t use SELECT *

OurSQL Episode 22: Things to avoid

(in queries):
9. Avoid correlated subqueries and in select and where clause (try to avoid in)
23. use groupwise maximum instead of subqueries
10. No calculated comparisons — isolate indexed columns
22. Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT

Things to avoid (config)
10. –skip-name-resolve
13. increase temp table size in a data warehousing environment (default is 32Mb) so it doesn’t write to disk (also constrained by max_heap_table_size, default 16Mb)
4. if you can, compress text/blobs
5. compress static data
6. don’t back up static data as often

Things to avoid: schema
12. Separate text/blobs from metadata, don’t put text/blobs in results if you don’t need them
18. Redundant data is redundant

Top 1000 SQL Performance Tips

Interactive session from MySQL Camp I:

Specific Query Performance Tips (see also database design tips for tips on indexes):

1. Use EXPLAIN to profile the query execution plan
2. Use Slow Query Log (always have it on!)
3. Don’t use DISTINCT when you have or could use GROUP BY
4. Insert performance
1. Batch INSERT and REPLACE
2. Use LOAD DATA instead of INSERT
5. LIMIT m,n may not be as fast as it sounds
6. Don’t use ORDER BY RAND() if you have > ~2K records
13. Derived tables (subqueries in the FROM clause) can be useful for retrieving BLOBs without sorting them. (Self-join can speed up a query if 1st part finds the IDs and uses then to fetch the rest)
15. Know when to split a complex query and join smaller ones
16. Delete small amounts at a time if you can
18. Have good SQL query standards
19. Don’t use deprecated features

Scaling Performance Tips:

1. Use benchmarking
2. isolate workloads don’t let administrative work interfere with customer performance. (ie backups)
3. Debugging sucks, testing rocks!

Network Performance Tips:

1. Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
3. Be wary of lots of small quick queries if a longer query can be more efficient
2. Use multi_query if appropriate to reduce round-trips

OS Performance Tips:

1. Use proper data partitions
1. For Cluster. Start thinking about Cluster *before* you need them
2. Keep the database host as clean as possible. Do you really need a windowing system on that server?
3. Utilize the strengths of the OS
4. pare down cron scripts
5. create a test environment
6. source control schema and config files
7. for LVM innodb backups, restore to a different instance of MySQL so Innodb can roll forward
8. partition appropriately
9. partition your database when you have real data — do not assume you know your dataset until you have real data

MySQL Server Overall Tips:

1. innodb_flush_commit=0 can help slave lag
2. Optimize for data types, use consistent data types. Use PROCEDURE ANALYSE() to help determine the smallest data type for your needs.
3. use optimistic locking, not pessimistic locking. try to use shared lock, not exclusive lock. share mode vs. FOR UPDATE
8. config params — http://docs.cellblue.nl/easy_mysql_performance_tweaks/ is a good reference
9. Config variables & tips:
1. use one of the supplied config files
2. key_buffer, unix cache (leave some RAM free), per-connection variables, innodb memory variables
3. be aware of global vs. per-connection variables
4. check SHOW STATUS and SHOW VARIABLES (GLOBAL|SESSION in 5.0 and up)
5. be aware of swapping esp. with Linux, “swappiness” (bypass OS filecache for innodb data files, innodb_flush_method=O_DIRECT if possible (this is also OS specific))
6. defragment tables, rebuild indexes, do table maintenance
7. If you use innodb_flush_txn_commit=1, use a battery-backed hardware cache write controller
8. more RAM is good so faster disk speed
9. use 64-bit architectures
11. increase myisam_sort_buffer_size to optimize large inserts (this is a per-connection variable)
12. (look up) memory tuning parameter for on-insert caching
14. Run in SQL_MODE=STRICT to help identify warnings
15. /tmp dir on battery-backed write cache
16. consider battery-backed RAM for innodb logfiles
17. use –safe-updates for client

Storage Engine Performance Tips:
2. Utilize different storage engines on master/slave ie, if you need fulltext indexing on a table.
3. BLACKHOLE engine and replication is much faster than FEDERATED tables for things like logs.
4. Know your storage engines and what performs best for your needs, know that different ones exist.
1. ie, use MERGE tables ARCHIVE tables for logs
2. Archive old data — don’t be a pack-rat! 2 common engines for this are ARCHIVE tables and MERGE tables
5. use row-level instead of table-level locking for OLTP workloads
6. try out a few schemas and storage engines in your test environment before picking one.

Database Design Performance Tips:

1. Design sane query schemas. don’t be afraid of table joins, often they are faster than denormalization
2. Don’t use boolean flags
8. Use a clever key and ORDER BY instead of MAX
9. Normalize first, and denormalize where appropriate.
10. Databases are not spreadsheets, even though Access really really looks like one. Then again, Access isn’t a real database

11. use INET_ATON and INET_NTOA for IP addresses, not char or varchar
12. make it a habit to REVERSE() email addresses, so you can easily search domains (this will help avoid wildcards at the start of LIKE queries if you want to find everyone whose e-mail is in a certain domain)
13. A NULL data type can take more room to store than NOT NULL
14. Choose appropriate character sets & collations — UTF16 will store each character in 2 bytes, whether it needs it or not, latin1 is faster than UTF8.
15. Use Triggers wisely
16. use min_rows and max_rows to specify approximate data size so space can be pre-allocated and reference points can be calculated.
18. Use myisam_pack_keys for int data
19. be able to change your schema without ruining functionality of your code
20. segregate tables/databases that benefit from different configuration variables

Other:

1. Hire a MySQL ™ Certified DBA
2. Know that there are many consulting companies out there that can help, as well as MySQL’s Professional Services.
3. Read and post to MySQL Planet at http://www.mysqlplanet.org
4. Attend the yearly MySQL Conference and Expo or other conferences with MySQL tracks (link to the conference here)
5. Support your local User Group (link to forge page w/user groups here)

Rebuild indexes and why
20. Turning OR on multiple index fields (<5.0) into UNION may speed things up (with LIMIT), after 5.0 the index_merge should pick stuff up.
UNION was introduced in MySQL 4.0.
11. ORDER BY and LIMIT work best with equalities and covered indexes
1. InnoDB ALWAYS keeps the primary key as part of each index, so do not make the primary key very large
3. Use Indexes
4. Don’t Index Everything
5. Do not duplicate indexes
6. Do not use large columns in indexes if the ratio of SELECTs:INSERTs is low.
7. be careful of redundant columns in an index or across indexes
14. ALTER TABLE…ORDER BY can take data sorted chronologically and re-order it by a different field — this can make queries on that field run faster (maybe this goes in indexing?)
4. As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
17. Use HASH indexing for indexing across columns with similar data prefixes

Used in Episode 2:
8. Avoid wildcards at the start of LIKE queries
21. Don’t use COUNT * on Innodb tables for every search, do it a few times and/or summary tables, or if you need it for the total # of rows, use SQL_CALC_FOUND_ROWS and SELECT FOUND_ROWS()
2. Don’t use SELECT *

OurSQL Episode 22: Things to avoid

(in queries):
9. Avoid correlated subqueries and in select and where clause (try to avoid in)
23. use groupwise maximum instead of subqueries
10. No calculated comparisons — isolate indexed columns
22. Use INSERT … ON DUPLICATE KEY update (INSERT IGNORE) to avoid having to SELECT

Part 2: Data Warehousing Tips and Tricks

Ask and you shall receive: http://face.centosprime.com/rdb-w/?p=68 linked to my previous post on the Data Warehousing Tips and Tricks session (http://sheeri.net/archives/204) with the comment, “I need to learn more about MERGE TABLES and INSERT … ON DUPLICATE KEY UPDATE“.

So here’s a bit more:

The manual pages for the MERGE storage engine:
http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
and
http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html

MySQL Forums for the MERGE talbe are at:
http://forums.mysql.com/list.php?93

In a nutshell, a MERGE table is really a set of pointers to similarly-schema’d MyISAM tables. So if you have the same table schema multiple times (ie, partition per day, so you have tables named 2007_04_27_Sales, 2007_04_26_Sales, etc) you’d use a MERGE table to link them all together and then you can run a query on the MERGE table and it will query all the tables that the MERGE table points to.

As for INSERT . . . ON DUPLICATE KEY UPDATE

MySQL gives many ways to deal with INSERTs and unique/primary keys. If you do an INSERT and the primary key you are trying to insert is already in the table, MySQL will give an error. Ways to deal with this:

1) Try & catch errors in the application code.

2) Use INSERT IGNORE INTO . . . this will insert a new record if a record with the key does not exist. If it does exist, nothing happens. Simply add the word “IGNORE” into your INSERT query after INSERT and before INTO.

3) Use REPLACE INTO . . .this will insert a new record if a record with the key does not exist. If a record does exist, MySQL will *delete* the record and then INSERT your record. This can cause problems when you just want to update part of a row, and not insert the whole row again. And it changes timestamps and auto-increment numbers, which may not be a desired result. Simply change the word “INSERT” in your query to “REPLACE”.

4) Use INSERT . . .ON DUPLICATE KEY UPDATE. The syntax is the regular INSERT statement, and at the end add ON DUPLICATE KEY UPDATE [expression]. For instance,

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing='book';

and what makes it easier, if you have variables or whatever in your VALUES, you can actually set the update statement to say “just use the value I wanted to insert, OK?” as in the following:

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing=VALUES(thing);

Manual page:
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Hope this helps!

Ask and you shall receive: http://face.centosprime.com/rdb-w/?p=68 linked to my previous post on the Data Warehousing Tips and Tricks session (http://sheeri.net/archives/204) with the comment, “I need to learn more about MERGE TABLES and INSERT … ON DUPLICATE KEY UPDATE“.

So here’s a bit more:

The manual pages for the MERGE storage engine:
http://dev.mysql.com/doc/refman/5.0/en/merge-storage-engine.html
and
http://dev.mysql.com/doc/refman/5.0/en/merge-table-problems.html

MySQL Forums for the MERGE talbe are at:
http://forums.mysql.com/list.php?93

In a nutshell, a MERGE table is really a set of pointers to similarly-schema’d MyISAM tables. So if you have the same table schema multiple times (ie, partition per day, so you have tables named 2007_04_27_Sales, 2007_04_26_Sales, etc) you’d use a MERGE table to link them all together and then you can run a query on the MERGE table and it will query all the tables that the MERGE table points to.

As for INSERT . . . ON DUPLICATE KEY UPDATE

MySQL gives many ways to deal with INSERTs and unique/primary keys. If you do an INSERT and the primary key you are trying to insert is already in the table, MySQL will give an error. Ways to deal with this:

1) Try & catch errors in the application code.

2) Use INSERT IGNORE INTO . . . this will insert a new record if a record with the key does not exist. If it does exist, nothing happens. Simply add the word “IGNORE” into your INSERT query after INSERT and before INTO.

3) Use REPLACE INTO . . .this will insert a new record if a record with the key does not exist. If a record does exist, MySQL will *delete* the record and then INSERT your record. This can cause problems when you just want to update part of a row, and not insert the whole row again. And it changes timestamps and auto-increment numbers, which may not be a desired result. Simply change the word “INSERT” in your query to “REPLACE”.

4) Use INSERT . . .ON DUPLICATE KEY UPDATE. The syntax is the regular INSERT statement, and at the end add ON DUPLICATE KEY UPDATE [expression]. For instance,

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing='book';

and what makes it easier, if you have variables or whatever in your VALUES, you can actually set the update statement to say “just use the value I wanted to insert, OK?” as in the following:

INSERT INTO tbl (id,name,thing) VALUES (154,'sheeri','book') ON DUPLICATE KEY UPDATE thing=VALUES(thing);

Manual page:
http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

Hope this helps!