No Buyer’s Remorse Here

So last night, during a break in the quiz show (where Prokrasti Nation had a good showing, as did the other teams — Recreational Evil, Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!

So last night, during a break in the quiz show (where Prokrasti Nation had a good showing, as did the other teams — Recreational Evil, Peeps, and Safe Hex) we bid on the T-shirt that had the signatures of all the speakers at the conference. All the proceeds were to go to the EFF, so it’s a good cause.

They announced it was cash only, so I looked in my wallet. $33. Well, the bidding quickly went over that, and when it reached about $100 they said it didn’t have to be cash only. Around $300 Brian Aker said that they’d give whoever won credits in a new command, SHOW CONTRIBUTORS. Well, when they said that I knew I HAD to have my name in the source code.

I mean, dude, my NAME in the SOURCE CODE!!! But then again, this is an open source application, I could just spend some time and write a patch.

I’ve been saving for my wedding next June (14 months away) so when I bid $500, I said, “hey, I don’t need flowers for my wedding.” (My entire wedding budget is $5,000, so spending 10% of that on my name in the source code was, I felt, worth it.)

The bidding stalled at $775, so I asked, “Will MySQL match what is raised?” And indeed, if the bidding reached $1,000 then MySQL would donate $800. So then Boyd Hemphill (wearing the “practice safe hex” T-shirt) walked up to the front, plunked down $20 and said, “I’m giving cash to help make up the $225 difference. Who else will help?”

And people started giving cash, and the bidding increased. I bid $900, and Ronald Bradford bid $1,000. That was the top bid, so he won the T-shirt, but the MySQL folks were nice enough to say if I donated the $900 I was willing to, I’d also get my name in the SHOW CONTRIBUTORS function. So I did!

And that is how it happened.

In other news:

40% of the people who took an exam on Tuesday passed. That means 60% failed — which is a lot, although it was mentioned that probably many people took the tutorials, got the free exam, and just tried it, not caring if they failed or not because it was free.

I passed both certification exams, so now I’m MySQL certified! And I stumped Brian Aker with a question about what rpl_recovery_rank was, and won an ipod nano!

Tuning MySQL5 SQL and Stored Procedures: Guy Harrison

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/

This post dedicated to Edwin DeSouza.

Un-tuned SQL or stored procedures often fail to scale as table volumes increase, inefficiency increases exponentially with size.

Tune SQL/stored procedures and then buy new hardware.

use EXPLAIN to help optimize queries. Also use the slow query log.

EXPLAIN EXTENDED shows sql that was actually used — ie, optimizer may rewrite query, so it’s a neat tool.

you can always give optimizer hints, but they’re not recommended — keep checking them as your app grows — STRAIGHT_JOIN, FORCE INDEX, USE INDEX, and one other one.

SHOW STATUS gives you status variables. innodb_buffer_pool_read_requests and innodb_data_read will show how much data is being read from the buffer pool vs. data.

Index isn’t always used, if more than 20% or so of rows, MySQL will use a full table scan. There’s usually a range where MySQL will choose a full table scan when an index is more appropriate, or vice versa, so that’s when you’d use hints. Hey, nobody’s perfect!

think indexes — joining tables of non-trivial size Subqueries ( [NOT] EXISTS, [NOT] IN) in WHERE clause. Use index to avoid a sort, use “covering” indexes.

Establish the best set of multi-column indexes along with singular indexes.

Derived tables (subqueries in FROM cause) can’t use an index. VIEWs with UNION or GROUP BY also can’t use index — all these use TEMPTABLE view algorithm. (temp table created, and then reads from temp table).

Sorts can be improved by increasing memory (sort_buffer_size) or using an index.

Use procedures to:

  • Avoid self joins
  • Correlated updates (subqueries accessing same data)

Performance of SQL within a stored routine that dominates the performance. When SQL is tuned, optimize the routine using traditional techniques:

  • only put what’s needed in a loop
  • stop testing when you know the answer
  • order tests by most likely first

Recursion:

  • only allowed in procedures, not functions
  • depth controlled by max_sp_recursion_depth
  • iterative alternatives are almost always faster and scaleable

TRIGGERS
non-trivial (12% at least) to even simplest trigger. No trigger should EVER contain expensive SQL, because they are done for each row.

Quest free software for MySQL — http://www.quest.com/mysql/

Web Performance and Scalability with MySQL

Some of these may be conflicting, not applicable to everyone.

1) think horizontal — everything, not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Some of these may be conflicting, not applicable to everyone.

1) think horizontal — everything, not just the web servers. Micro optimizations are boring, as or other details
2) benchmarking techniques;. Not “how fast” but “how many”. test force, not speed.
3) bigger and faster vertical scaling is the enemy.
4) horizontal scaling = add another box
5) implementation, scale your system a few times, but scale your ARCHITECTUREa dozens or hundreds of time.
6) start from the beginning with architecture implementation.
7) don’t have “The server” for anything
8) stateless good, stateful bad
9) “shared nothing” good
10) don’t keep state within app server
11) caching good.
12) generate static pages periodically, works well for not millions of pages or changes.
13) cache full output in application
14) include cookies in the “cache key” so diff browsers can get diff info too
15) use cache when this, not when that
16) use regexp to insert customized content into the cahed page
17) set Expires header to control cache times, or rewrite rule to generate page if the cached file does not exist (rails does this)
18) if content is dynamic this does not work, but great for caching “dynamic” images
19) parial pages — pre-generate static page snippets, have handler just assemble pieces.
20) cache little snippets, ie sidebar
21) don’t spend more time managing the cadche than you sav
22) cache data that’s too slow to query, fetch, calc.
23) generate page from cached data
24) use same data to generate api responss
25) moves load to web servers
26) start with things you hit all the time
27) if you don’t use it, don’t cache it, check db logs
28) don’t depend on MySQL Query cache unless it actually helps
29) local file system not so good because you copy page for every server
30) use process memory, not shared
31) mysql cache table — id is the “cache key” type is the “namespace”, metadata for things like headers for cached http responses; purge_key to make it easier to delete data from cache (make it an index, too, primary index on id,type, expire index on expire field) fields
32) why 31 fails, how do you load balance, what if mysql server died, now no cache
33) but you can use mysql scaling techniques to deal, like dual-master replication
34) use memcached, like lj, slashdot, wikipedia — memory based, linux 2.6(epoll) or FreeBsD(kqueue), low overhead for lots of cxns, no master, simple!
35) how to scale the db horizontally, use MySQL, use replication to share the load, write to one master, read from many slaves, good for heavy read apps (or insert delayed, if you don’t need to write right away) — check out “High Performance MySQL”
36) relay slave replication if too much bandwidth on the master, use a replication slave to replicate to other slaves.
37) writing does not scale with replication — all servers need to do the same writes. 5.1’s row-level replication might help.
38) so partition the data, divide and conquer. separate cluster for different data sets
39) if you can’t divide, use flexible partitioning, global server keeps track for which “cluster” has what info. auto_increment columns only in the “global master”. Aggressively cache “global master” data.
40) If you use a master-master setup like 39, then you don’t have replication slaves, no latency from commit to data being available. if you are careful you can write to both masters. Make each user always use the same master, so primary keys won’t be messed up. If one master fails, use the other one.
41) don’t be afraid of the data duplication monster. use summary tables, to avoid things like COUNT(*) and GROUP BY. do it once, put result into a table — do this periodically, or do it when the data is inserted. Or data affecting a “user” and a “group” goes into both the “user” and “group” partitions (clusters). so it’s duplicating data.
42) but you can go further, and use summary dbs! copy data into special dbs optimized for special queries, ie FULLTEXT searches, anything spanning more than one or all clusters, different dbs for different latency requirements, ie RSS feeds from a replicated slave db — RSS feeds can be late).
43) save data to multiple “partitions” like the application doing manual replication — app writes to 2 places OR last_updated and deleted columns, use triggers to add to “replication_queue” table, background program to copy data based on queue table or last_updated column
44) if you’re running oracle, move read operations to MySQL with this manual replication idea. Good way to sneak MySQL into an oracle shop.
45) make everything repeatable, build summary and load scripts so they can restart or run again — also have one trusted eata place, so summaries and copies can be (re)created from there.

BREATHE! HALFWAY THERE!!

46) use innodb because it’s more robust. except for big read-only tables, high volume streaming tables (logging), lcoked tables or INSERT DELAYED, specialized engines for special needs, and more engines in the future — but for now, InnoDB
47) Multiple MySQL instances — run diff instances for diff workloads, even if they share the same server. moving to separate hardware is easier, of course. optimize the server instance for the workload. e4asy to set up with instance manager or mysqld_multi, and there are init scripts that support the instance manager.
48) asynchronous data loading when you can — if you’re updating counts or loading logs, send updates through Spread (or whatever messaging something) to a daemon loading data. Don’t update for each request (ie, counts), do it every 1000 updates, or every few minutes. This helps if db loses net connection, the frontend keeps running! or if you want to lock tables, etc.
49) preload, dump and process — let the servers pre-process, as much as possible. dump never changing data structures to js files for the client to cache (postal data maybe), or dump to memory, or use SQLite, or BerkeleyDB and rsync to each webserver, or mysql replica on webserver
50) stored procedures are dangerous because they’re not horizontal, more work than just adding a webserver– only use if it saves the db work (ie send 5 rows to app instead of 5,000 and parsing in app)
51) reconsider persistent db connections because it requires a thread = memory, all httpd processes talk to all dbs, lots of caching might mean you don’t need main db, mysql cxns are fast so why not just reopen?
52) innodb_file_per_table, so OPTIMIZE TABLE clears unused space. innodb_buffer_pool_soze set to 80% of total mem (dedicated mysql server). innodb_flush_log_at_trx_commit, innodb_log_file_size
53) have metadata in db, store images in filesystem, but then how do you replicate? or store images in myisam tables, split up so tables don’t get bigger than 4G, so if gets corrupt fewer problems. metadata table might specify what table it’s in. include last modified date in metadata, and use in URLs to optimize caching, ie with squid: /images/$timestamp/$id.jpg
54) do everything in unicode
55) UTC for everything
56) STRICT_TRANS_TABLE so MySQL is picky about bad input and does not just turn it to NULL or zero.
57) Don’t overwork the DB — dbs don’t easily scale like web servers
58) STATELESS. don’t make cookie id’s easy to guess, or sequential, etc. don’t save state on one server only, save it on every one. put the data in the db, don’t put it in the cookie, that duplicates efforts. important data into db, so it gets saved, unimportant transient data puts in memcache, SMALL data in cookie. a shopping cart would go in db, background color goes in cookie, and last viewed items go in memcache
59) to make cookies safer, use checksums and timestamps to validate cookies. Encryption usually a waste of cycles.
60) use resources wisely. balance how you use hardware — use memory to save I/O or CPU, don’t swap memory to disk EVER.
61) do the work in parallel — split work into smaller pieces and run on different boxes. send sub-requests off as soon as possible and do other stuff in the meantime.
62) light processes for light tasks — thin proxy servers for “network buffers”, goes between the user and your heavier backend application. Use httpd with mod_proxy, mod_backhand. the proxy does the ‘net work, and fewer httpd processes are needed to do the real work, this saves memory and db connections. proxies can also server static files and cache responses. Avoid starting main app as root. Load balancing, and very important if your background processes are “heavy”. Very EASY to set up a light process. ProxyPreserveHostOn in apache 2
63) job queues — use queues, AJAX can make this easy. webserver submits job to database “queue”, first avail worker picks up first job, and sends result to queue. or ue gearman, Spread, MQ/Java Messaging Service(?)
64) log http requests to a database! log all 4xx and 5xx requests, great to see which requests are slow or fast. but only log 1-2% of all requests. Time::HiRes in Perl, microseconds from gettimeofday system call.
65) get good deals on servers http://www.siliconmechanics.com, server vendor of lj and others.

IN SUMMARY: HORIZONTAL GOOD, VERTICAL BAD

for jobs: ask@develooper.com (jobs, moonlighters, perl/mysql etc)
slides will be up at http://develooper.com/talks/
Phew! That was a lot of fast typing (60 words per minute, baby!). Ask is smart, but QUICK!!!! His slides will be VERY useful when they appear. He said there were 53 tips, but I numbered each new line (and not smartly with OL and LI) and I have more than that…

Max DB Managing and Assorted Cool Features

talk by Roland Mallmann

MaxDB is older than I am, in 1977 started at University of Berlin. Owned by SAP today. Today it’s open source under GPL, or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.
talk by Roland Mallmann

MaxDB is older than I am, or commercial license from SAP or MySQL AB.

Why Max DB is so great:
Low cost of ownership
Few config parameters
no size estimates for indvidual db objects

no reorg — space management done automatically — space no longer needed is returned immediately to the db, data occupied vs. free (holes) ration is highest as possible. This is done by matching logical pages to physical on disk with the Converter, and I/O and space management.

Space management done automatically
No reorganization is needed (ie, OPTIMIZE TABLE)
Gaps are not allowed, therefore updates and deletes are in place, and sorts happen AFTER an insertion.
Space freed is immediately returned to DB
Done by Converter, matches logical pages to physical disk.
Data is stored in B* Trees (b star tree) for almost all objects (Tables, indexes, secondary indexes, BLOBs)

Concurrent asynchronous I/O
Manages free blocks
Auto balancing of disk I/O
Savepoints
Backup Integration (including incremental)
Segmentation of the data cache
A 10 minutes cycle of changes flushed to disk
Flushing data pages to disk is spread out over the 10 minutes

Online Backup and Restore
Consistent backups, no need to apply logs
Savepoint issued before db backup, savepoint includes undo information for remaining open transactions.
Can do incremental, full data, or log backup
can restore, restore from a medium, or backup from history, or backup to a point in time.

Snapshots
Can make complete database backup
Can make a snapshot for replication
Can make incremental on master and restore snapshot on replication as a backup strategy (as long as there isn’t a newer snapshot, because then incremental backup logs are reset)

Standby Database
A standby is made possible using log shipping.
Master and slave share backup media (shared disk)
Init once with complete master backup
Redo available logs

In case of emergency: start slave, back up last log piece from master in case it hasn’t been shipped. Redo all ‘open’ log backups (should be none), redo final piece, start slave, it’s now the master!

Synchronization Manager
no permanent attention required
unattended desktop/laptop installation and operation

database snapshot functionality!

Mascot for Team Prokrasti Nation

I was told that teams had to have a physical instantiation of a mascot, so I said, “maybe I’ll knit something.” Well, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.

Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
Formed a MySQL Quiz team
Met all the requirements for the MySQL Quiz
Took a Certification exam

everyone root for Team Prokrasti Nation!
I was told that teams had to have a physical instantiation of a mascot, I didn’t knit something, but I did hand-craft an origami butterfly for Team Prokrasti Nation’s mascot:

(click picture for larger image).

Oh, and I won a fun game from O’Reilly for submitting speaker evaluations.

The State of the Dolphin

There are over 1600 people registered for the conference, a record for the conference (this is the 4th year)!

“It’s fun to be a bus driver when the bus is full” says Marten Mickos, CEO of MySQL. 1 billion in hte ‘net, 2 billion with a mobile phone, so there’s a lot of connectivity out there. 6.6 billion people in the world, so 15% are online (1 billion). And there are 20 million MySQL downloads per year, with 1/2 that remaining active. So what happens when the online population doubles?

Corporations are doing the same thing consumers are, and adopting for enterprise use. Corporate functions are hosted or put on-premise, because of the consumer experience (ie, google search).

Humans tend to overestimate the short term, and underestimate the long term. The internet has risen slowly since the bubble burst (which was the “hey, the short term isn’t meeting our expectations!”), and we’ve underestimated that.

State of the industry — everyone wants everything online, with people making $$ from subscription-based services. Free and open-sourced software is the fabric of this world (FOSS).

There’s a lot of competition and freedom, which means technology innovation increases, and people are happy because they can follow their passion. We have to defend this freedom, from patents.

To have a successful open source project, you need modularity, so folks can work on a part of a system. This encourages participation, crucial for success.

Pluggable storage engines mean that it’s easy to add storage engines to do what you as an individual needs. No need to recompile to add a new one, just download it, plug it in….and it’s just that easy to remove it too. Of course, you can build your own, too.

Falcon will probably becfome the best general purpose OLTP storage engine.

MySQL has a very active community! Many LAMP startups, many distributors (Dell, Novell and HP), even Oracle has an open source solution.

Awards: yesterday they did community service awards. Applications of the year have been chosen, too, and they are:

Nokia: World leader in mobile communications, using MySQL Cluster to maintain realtime info about mobile network uers online.

Flickr: Leading Online Photo Management and Sharing

NetQS: Fastest Growing Network Management Company.

Partners of the Year:
Business Objects: Leading Business Intelligence Solutions

Dell: High Performance Computer Systems

Oracle: Maker of storage engine for MySQL (ie, innodb)

Congrats to all!

There are over 1600 people registered for the conference, a record for the conference (this is the 4th year)!

“It’s fun to be a bus driver when the bus is full” says Marten Mickos, CEO of MySQL. 1 billion in hte ‘net, 2 billion with a mobile phone, so there’s a lot of connectivity out there. 6.6 billion people in the world, so 15% are online (1 billion). And there are 20 million MySQL downloads per year, with 1/2 that remaining active. So what happens when the online population doubles?

Corporations are doing the same thing consumers are, and adopting for enterprise use. Corporate functions are hosted or put on-premise, because of the consumer experience (ie, google search).

Humans tend to overestimate the short term, and underestimate the long term. The internet has risen slowly since the bubble burst (which was the “hey, the short term isn’t meeting our expectations!”), and we’ve underestimated that.

State of the industry — everyone wants everything online, with people making $$ from subscription-based services. Free and open-sourced software is the fabric of this world (FOSS).

There’s a lot of competition and freedom, which means technology innovation increases, and people are happy because they can follow their passion. We have to defend this freedom, from patents.

To have a successful open source project, you need modularity, so folks can work on a part of a system. This encourages participation, crucial for success.

Pluggable storage engines mean that it’s easy to add storage engines to do what you as an individual needs. No need to recompile to add a new one, just download it, plug it in….and it’s just that easy to remove it too. Of course, you can build your own, too.

Falcon will probably becfome the best general purpose OLTP storage engine.

MySQL has a very active community! Many LAMP startups, many distributors (Dell, Novell and HP), even Oracle has an open source solution.

Awards: yesterday they did community service awards. Applications of the year have been chosen, too, and they are:

Nokia: World leader in mobile communications, using MySQL Cluster to maintain realtime info about mobile network uers online.

Flickr: Leading Online Photo Management and Sharing

NetQS: Fastest Growing Network Management Company.

Partners of the Year:
Business Objects: Leading Business Intelligence Solutions

Dell: High Performance Computer Systems

Oracle: Maker of storage engine for MySQL (ie, innodb)

Congrats to all!

FLUSH HOSTS, aborted connections, and max_connect_errors

This happened to me 2 weeks ago on a site I run pro bono, and I forgot to blog about it. A comment in the “MySQL Automated Failover with Scripts” BOF reminded me about it, so here goes.

Basically, the web application stopped being able to talk to the database. I run the staging/test server AND the database, and the production site is run by someone else. So I checked out the logs, and indeed, there are many of the following:

050814 18:16:42 Aborted connection 241474 to db: ‘database’ user: ‘userhere’ host: `localhost’ (Got an error reading communication packets)
050814 23:58:43 Aborted connection 241487 to db: ‘database’ user: ‘userhere’ host: `localhost’ (Got timeout reading communication packets)

(and of course going up through the time it crashed, and even today). I believe –log-warnings is ON by default, because it’s not in my my.cnf and show variables shows that it’s on. But if it’s not on, you won’t see those errors.

The answer was to send a FLUSH HOSTS command, which I did, and it worked. Of course I have to figure out what is causing the errors, likely really bad PHP code without sessions and people hitting ESC when loading a page. (that site has a lot of downloaded code)

But I figured I’d blog about it, because it was rare enough to make me notice. This can be the result of a DOS attack, too — but in my case, it wasn’t.

I’m not sure how MySQL figures out how many errors have occurred — the default is 10, and there were more than that in the logs. I’m guessing it’s “10 failed in a row” from the same user@host, but I’m not sure. It’d be great to figure that out, although if your server has log-warnings set to ON, you can check the error logs.

This happened to me 2 weeks ago on a site I run pro bono, and I forgot to blog about it. A comment in the “MySQL Automated Failover with Scripts” BOF reminded me about it, so here goes.

Basically, the web application stopped being able to talk to the database. I run the staging/test server AND the database, and the production site is run by someone else. So I checked out the logs, and indeed, there are many of the following:

050814 18:16:42 Aborted connection 241474 to db: ‘database’ user: ‘userhere’ host: `localhost’ (Got an error reading communication packets)
050814 23:58:43 Aborted connection 241487 to db: ‘database’ user: ‘userhere’ host: `localhost’ (Got timeout reading communication packets)

(and of course going up through the time it crashed, and even today). I believe –log-warnings is ON by default, because it’s not in my my.cnf and show variables shows that it’s on. But if it’s not on, you won’t see those errors.

The answer was to send a FLUSH HOSTS command, which I did, and it worked. Of course I have to figure out what is causing the errors, likely really bad PHP code without sessions and people hitting ESC when loading a page. (that site has a lot of downloaded code)

But I figured I’d blog about it, because it was rare enough to make me notice. This can be the result of a DOS attack, too — but in my case, it wasn’t.

I’m not sure how MySQL figures out how many errors have occurred — the default is 10, and there were more than that in the logs. I’m guessing it’s “10 failed in a row” from the same user@host, but I’m not sure. It’d be great to figure that out, although if your server has log-warnings set to ON, you can check the error logs.

Higher Order Mysql

Advanced Stored Routines techniques

Higher-order functions in math and computer science are functions which:
take one or more functions as an input
OR
output a function
in MySQL, we’re talking about stored routines that create other stored routines.

Why?
Pivot tables,
Global table checksums
Getters and setters
Multi-row insert query makers
Iterators
Dynamic functions allow you to access data from INFORMATION_SCHEMA and create dedicated dynamic routines and overcome syntax that does not accept variables.

Officially, MySQL stored routines can’t do it.

However, Guiseppe is a really smart guy (as I’m learning firsthand this week)…

One way:
You can create a text definition for a stored procedure from a routine, even though you can’t actually create the stored procedure. Get the text back into the client, and you’re all set! ie, pass text back to client, which client just passes back to MySQL. (genius, just genius).

ie
CREATE FUNCTION make_f_test()
returns text
return
'CREATE FUNCTION hi()
returns text return "hi" ';

get output of code into a variable and then execute that string.

Another way:
Don’t do this!
not recommended by MySQL AB, may not work in the future, may damage your data.
DON’T USE IT.

That being said, if you get:
ERROR 1457 (HY000):Failed to load routine xxx.yyyy.
The table mysql.proc is missig, corrupt or containts bad data (internal code -6)

then you did it wrong. Fixing this error is NOT trivial.

Create text, like in the last example, and change the mysql.proc table. Close the mysql connection, and then open a new one. When you open a new connection, the existing stored procedures and functions are cached. So you can only make 1 function with the make_routine function per session.

To change the mysql.proc code:
http://www.stardata.it/code/

Install the make_routine function (need full “mysql” db access)
Make a limited user who has access to the lib database.

(the make_routine function takes the following parameters)
database
name
type
params
return_type
actual code

He showed an example: given a set of values, find a record and give the primary key of the record; if the record does not exist, insert it and give the new primary key.

Another example: A function that gets all values for one field from a table and joins into a string (called COLUMN_CONCAT). One can, of course, do this with GROUP_CONCAT but only if the total character count is 1024 (otherwise it truncates it).

1. create a function that accepts paramaters
2. create a string (a ‘template’) with the function with placeholders
3. replace placeholders in template with paramaters (using REPLACE)
4. call make_routine using the template string

Advanced Stored Routines techniques

Higher-order functions in math and computer science are functions which:
take one or more functions as an input
OR
output a function
in MySQL, we’re talking about stored routines that create other stored routines.

Why?
Pivot tables,
Global table checksums
Getters and setters
Multi-row insert query makers
Iterators
Dynamic functions allow you to access data from INFORMATION_SCHEMA and create dedicated dynamic routines and overcome syntax that does not accept variables.

Officially, MySQL stored routines can’t do it.

However, Guiseppe is a really smart guy (as I’m learning firsthand this week)…

One way:
You can create a text definition for a stored procedure from a routine, even though you can’t actually create the stored procedure. Get the text back into the client, and you’re all set! ie, pass text back to client, which client just passes back to MySQL. (genius, just genius).

ie
CREATE FUNCTION make_f_test()
returns text
return
'CREATE FUNCTION hi()
returns text return "hi" ';

get output of code into a variable and then execute that string.

Another way:
Don’t do this!
not recommended by MySQL AB, may not work in the future, may damage your data.
DON’T USE IT.

That being said, if you get:
ERROR 1457 (HY000):Failed to load routine xxx.yyyy.
The table mysql.proc is missig, corrupt or containts bad data (internal code -6)

then you did it wrong. Fixing this error is NOT trivial.

Create text, like in the last example, and change the mysql.proc table. Close the mysql connection, and then open a new one. When you open a new connection, the existing stored procedures and functions are cached. So you can only make 1 function with the make_routine function per session.

To change the mysql.proc code:
http://www.stardata.it/code/

Install the make_routine function (need full “mysql” db access)
Make a limited user who has access to the lib database.

(the make_routine function takes the following parameters)
database
name
type
params
return_type
actual code

He showed an example: given a set of values, find a record and give the primary key of the record; if the record does not exist, insert it and give the new primary key.

Another example: A function that gets all values for one field from a table and joins into a string (called COLUMN_CONCAT). One can, of course, do this with GROUP_CONCAT but only if the total character count is 1024 (otherwise it truncates it).

1. create a function that accepts paramaters
2. create a string (a ‘template’) with the function with placeholders
3. replace placeholders in template with paramaters (using REPLACE)
4. call make_routine using the template string

Sakila Sample Database

Why is this needed?
“world” database current sample is 3 tables, cities, countries and langs.
Not the Intellectual Property of MySQL, so it can’t be bundled with the software
No table type diversity, not a lot of data type diversity
No MySQL 5 features
Not a lot of data
Not really used in tutorials and articles because it’s not robust enough

Scenario
Basically, needed a sample database that was new, that allowed for many queries.
Used an idea from a Dell Sample DB
It’s 2 stores, in 2 different countries.

Designing the Database
Designed with MySQL Query Browser
Normalized
Community provided feedback and content
Trick was balance between simplicity and completeness
Learning db, so people need to understand it
After schema was approved, data was loaded
Stored procedures, triggers, UDF’s and VIEWS were used

Schema
2 files in installation — one for schema & routines and one for data, so you can just look at schema if you want.
16 tables, 8 views,
Data is never deleted — it’s marked as deleted, but not actually deleted, for data integrity reasons.
Contains most (all?) data types, including SET and ENUM.

VIEWS
VIEWS used to simplify queries. ie, there’s a table for actors and one for films, so actor_info contains a list of type of movie and name for each actor. So it would show a list of actors with a list of films that they’ve done — 1 row per actor, and a comma-separated list of films they’ve done (sorted by type).
There are a lot of good views in there, which show that that’s what you should do for popular queries (like “show me all the movies that are rented, by customer”).

Stored Procedures
For common procedures — ie, is a film in stock?

Stored Functions
get_customer_balance — it does what you think it does.

Triggers
There’s one to sync 2 tables, for example.

Licensing
PostgreSQL want to do
Documentation copyright MySQL
BSD licensing

Goals Going Forward
Make as few changes as possible
Implement 5.1 changes if necessary

Download it at: http://www.openwin.org/mike/download/sakila-0.8.zip

Why is this needed?
“world” database current sample is 3 tables, cities, countries and langs.
Not the Intellectual Property of MySQL, so it can’t be bundled with the software
No table type diversity, not a lot of data type diversity
No MySQL 5 features
Not a lot of data
Not really used in tutorials and articles because it’s not robust enough

Scenario
Basically, needed a sample database that was new, that allowed for many queries.
Used an idea from a Dell Sample DB
It’s 2 stores, in 2 different countries.

Designing the Database
Designed with MySQL Query Browser
Normalized
Community provided feedback and content
Trick was balance between simplicity and completeness
Learning db, so people need to understand it
After schema was approved, data was loaded
Stored procedures, triggers, UDF’s and VIEWS were used

Schema
2 files in installation — one for schema & routines and one for data, so you can just look at schema if you want.
16 tables, 8 views,
Data is never deleted — it’s marked as deleted, but not actually deleted, for data integrity reasons.
Contains most (all?) data types, including SET and ENUM.

VIEWS
VIEWS used to simplify queries. ie, there’s a table for actors and one for films, so actor_info contains a list of type of movie and name for each actor. So it would show a list of actors with a list of films that they’ve done — 1 row per actor, and a comma-separated list of films they’ve done (sorted by type).
There are a lot of good views in there, which show that that’s what you should do for popular queries (like “show me all the movies that are rented, by customer”).

Stored Procedures
For common procedures — ie, is a film in stock?

Stored Functions
get_customer_balance — it does what you think it does.

Triggers
There’s one to sync 2 tables, for example.

Licensing
PostgreSQL want to do
Documentation copyright MySQL
BSD licensing

Goals Going Forward
Make as few changes as possible
Implement 5.1 changes if necessary

Download it at: http://www.openwin.org/mike/download/sakila-0.8.zip