aka…..”when good queries go bad!”
So, today the developers were debugging why a script was running much longer than expected. They were doing text database inserts, and got to the point where they realized that double the amount of text meant the queries took double the amount of time.
You see, they were doing similar text inserts over and over, instead of using connection pooling and/or batching them. Apparently the other DBA explained that it was a limitation of MySQL, but either the developers didn’t convey what they were doing well, or the DBA didn’t think to mention batching.
I ran a simple test on a test server. I used the commandline to connect to a db server on the same machine (even though in qa and production the db machine is on a different machine) just to make a point:
Type | Connects | Queries | Length of data transmitted | Time |
---|---|---|---|---|
One-off | 1000 | 1 | 619 bytes | 12.232s |
Single Connection | 1 | 1000 | 604 kilobytes | 0.268s |
Batch | 1 | 1 | 517 kilobytes | 0.135s |
So 1000 INSERTs using 1 connection is over 45 times faster than 1000 INSERTs using 1000 connections.
Using 1 batch INSERT statement is over 1.75 times faster than using 1 connection.
Using 1 batch INSERT statement is over 90 times faster than 1000 INSERTs using 1000 connections.
Note that while it’s faster to send a batch, if you don’t support sending 517 kilobytes to your database at once, you’ll want to break it up. That’s a small coding price to pay for 90x the database performance!!!
For reference, the formats used:
One-off:
INSERT INTO foo (col1, col2…) VALUES (val1, val2…);
Single Connection:
INSERT INTO foo (col1, col2…) VALUES (val1, val2…);
INSERT INTO foo (col1, col2…) VALUES (val1a, val2a…);
Batch: INSERT INTO foo (col1, col2…) VALUES (val1, val2…), (val1a, val2a);
Comments are closed.