Percona Repo Conflicts With Itself

On a RHEL 6.3 server, we are running MySQL 5.1 (not MariaDB, not the Percona patched version, plain old regular MySQL): $ rpm -qa | egrep -i “percona|maria|mysql” | sort mysql-5.1.61-4.el6.x86_64 mysql-libs-5.1.61-4.el6.x86_64 mysql-server-5.1.61-4.el6.x86_64 percona-toolkit-2.1.8-1.noarch percona-xtrabackup-2.0.5-499.rhel6.x86_64 perl-DBD-MySQL-4.013-3.el6.x86_64 $ We went to … Continue reading

On a RHEL 6.3 server, we are running MySQL 5.1 (not MariaDB, not the Percona patched version, plain old regular MySQL):

$ rpm -qa | egrep -i "percona|maria|mysql" | sort
mysql-5.1.61-4.el6.x86_64
mysql-libs-5.1.61-4.el6.x86_64
mysql-server-5.1.61-4.el6.x86_64
percona-toolkit-2.1.8-1.noarch
percona-xtrabackup-2.0.5-499.rhel6.x86_64
perl-DBD-MySQL-4.013-3.el6.x86_64
$

We went to do a system update today, and had a problem with dependencies, because the Percona repository wants to replace mysql-libs with Perconas shared libraries. How rude!

# yum update mysql-libs
Loaded plugins: downloadonly, rhnplugin, security
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package Percona-Server-shared-51.x86_64 0:5.1.67-rel14.3.506.rhel6 will be obsoleting
---> Package Percona-Server-shared-55.x86_64 0:5.5.29-rel29.4.401.rhel6 will be obsoleting
---> Package Percona-Server-shared-compat.x86_64 0:5.5.29-rel29.4.401.rhel6 will be obsoleting
---> Package mysql-libs.x86_64 0:5.1.61-4.el6 will be obsoleted
--> Finished Dependency Resolution

Dependencies Resolved

==============================================================================================================
Package Arch Version Repository Size
==============================================================================================================
Installing:
Percona-Server-shared-51 x86_64 5.1.67-rel14.3.506.rhel6 percona 2.8 M
replacing mysql-libs.x86_64 5.1.61-4.el6
Percona-Server-shared-55 x86_64 5.5.29-rel29.4.401.rhel6 percona 787 k
replacing mysql-libs.x86_64 5.1.61-4.el6
Percona-Server-shared-compat x86_64 5.5.29-rel29.4.401.rhel6 percona 3.4 M
replacing mysql-libs.x86_64 5.1.61-4.el6

Transaction Summary
==============================================================================================================
Install 3 Package(s)

Total size: 7.0 M
Is this ok [y/N]: N

The reason I say this is rude is because the Percona repository does not have mysql-libs. Its not for the Percona repository to say that its obsolete. If we were running the Percona packages, thats fine, but were not. We are running the MySQL packages, and we want to keep running them. Just because we run the Percona toolkit and Xtrabackup doesn’t mean we want the shared MySQL libraries from Percona as well.

The bizarre part is that the Percona repository recommends installing BOTH Percona-Server-shared-51 AND Percona-Server-shared-55, and those packages conflict with each other. So if I hit yes I would end up getting:

Transaction Check Error:
file /usr/lib64/libmysqlclient.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient.so.16.0.0 conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient_r.so conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-55-5.5.29-rel29.4.401.rhel6.x86_64
file /usr/lib64/libmysqlclient_r.so.16.0.0 conflicts between attempted installs of Percona-Server-shared-51-5.1.67-rel14.3.506.rhel6.x86_64 and Percona-Server-shared-compat-5.5.29-rel29.4.401.rhel6.x86_64

Error Summary
-------------

So in order to upgrade this machine, we had to remove the Percona repository, update the packages so that mysql-libs was properly updated by the right repository, put the Percona repository back, and run update again to see if xtrabackup or percona-toolkit has updates to be run.

And all is happy again:
$ rpm -qa | egrep -i "percona|maria|mysql" | sort
mysql-5.1.67-1.el6_3.x86_64
mysql-libs-5.1.67-1.el6_3.x86_64
mysql-server-5.1.67-1.el6_3.x86_64
percona-toolkit-2.1.8-1.noarch
percona-xtrabackup-2.0.5-499.rhel6.x86_64
perl-DBD-MySQL-4.014-1.el6.rfx.x86_64
$

Liveblog: How to Use Puppet Like an Adult

My Mozilla coworkers Ben Kero and Dan Maher gave a standing-room only presentation at Linux Conf AU about “How to Use Puppet Like an Adult”. It was fantastic! Data != logic Business DATA does not belong in modules, but business … Continue reading

My Mozilla coworkers Ben Kero and Dan Maher gave a standing-room only presentation at Linux Conf AU about How to Use Puppet Like an Adult. It was fantastic!

Data != logic
Business DATA does not belong in modules, but business LOGIC is OK.

What are the data sources, then?
Hiera lightweight pluggable, hierarchical databases. External to the modules, you can use many backends, including MySQL. New feature, standard in puppet 3.0. If you like YAML (and you should), youll like this.

$var = lookup('something') # unscoped (complicated)
$var = lookup('namespace::something') # scoped (yay!)

Another data source is puppetdb. This is a bigger topic, but the important thing is that it can be used for high performance store configs.

Where to find pre-built modules for puppet?
Github
Puppet Forge

Or you can write your own module
.but dont waste time building your own, say, Apache modulesomeone else has a better one out there.

Is that module right for me?
What to check:
OS/distribution
Complexity Can you read the module and understand what it does? If not, this might not be the module for you.
Popularity the more people using/forking it, the more support is probably around. Also age of last commit.
Whats the documentation like?

Recommended pre-built modules these work, and theyre good. Analyze how they work and learn from them:
puppetlabs/firewall
puppetlabs/cloud_provisioner

When rolling your own modules if this is going to be a one-off, do whatever you want. If you want to make it open source, know that someone else will use it, and make it more generic.

Use parameterized classes. This allowed you to separate your business data from your business logic. You can avoid having passwords, ssh keys, etc in there, and then you CAN open source it.

Make sure its documented.

Module template
puppet module generate author-mod_name gets you all the files you need with the necessary templares (e.g. README has the sections you need).

module template slide

Note: Everybody should be doing spec testing, not just puppet..

Parameterized classes
Similar to definitions they are passed in data. Its how to separate data from logic. If you dont get anything else, get this:

parameterized classes slide

These help you write your manifest one time for different nodes. If you have 10 web servers with different node names, write one manifest, and use logic and parameterized classes to instantiate that manifest 10 times. Dont write 10 manifests.

USE A STYLE GUIDE
Who here has written Perl code? Who here has written Perl code that someone else can read? USE A STYLE GUIDE

Parser Validation
just run:
$ puppet parser validate manifest.pp

Parser validation example

Put this into your commit hook, so that parser errors dont get committed.

Linting
A way of making sure code meets the style guide. External tool, but stable. Very customizable, you can use your own style guide, and you can have it ignore certain things (e.g. dont care about quoting everything, so dont error on that). You can put this into commit hooks too.

Linting slide

puppet-concat
Dynamically build files out of lots of parts. How you can build good config files for daemons that dont support .d directories. Assume you have puppet-concat installed already, its widely used, because pre-built modules use it too.

puppet-concat slide

stdlib
Put out by puppetlabs, not actually part of the standard library, but contains lots of useful functions. This is also widely used. Can check if the variable is boolean, integer, strings, can collide hashes together, can check functions, etc.

Sanity preservation
Set default values for your variables make sure theyre sane you can pull variables out of facter.
Verify content play it safe, dont blithely act on user data. You can throw an error (e.g. if you have a string instead of an integer)
Mutually exclusive declarations ensure when you start navigating down one logical path, it cant go down the other path. This comes down to if/then programming, makes more layers to your manifest, but you can make accurate statements about what you want the module to do and predict what it WONT do. Being able to predict what puppet will and wont do is important.

Useful Log Output
Functions for each log level
e.g. notice(); warn(); err();
Make these informative and human-readable. What broke and why, can other people understand whats going on with this?

Puppet As a Source of Truth
Build an ecosystem around puppet. The more you use puppet, the more it describes your infrastructure. How do you do this, though?
You can use the puppet data library (PDL) a collection of services with an API so you can query puppet from other services e.g. inventory system, dashboard, etc. You can also use it from within puppet.

You can build an inventory service to know all the facter information about all the machines. You can use the run report service for dashboards like puppetdashboard.

You can download a .dat file and visualize it with graphviz to see how your logic paths look. This .dat file comes within puppet (you do gem install puppet and then puppet with some options and you can get it).

The take-home:
take-home points

Linux in the Flesh: Adventures Embedding Linux in Hardware

This is not quite a liveblog, but a set of notes about the points I found most interesting in this morning’s Linux Conf AU keynote, given by Dr. Andrew “bunnie” Huang, best known for Hacking the Xbox and developing the … Continue reading

This is not quite a liveblog, but a set of notes about the points I found most interesting in this mornings Linux Conf AU keynote, given by Dr. Andrew “bunnie” Huang, best known for Hacking the Xbox and developing the chumby.

The talk was a fascinating look at how complex developing embedded Linux devices is. Lets start with the takeaways:

Slide with takeaway points

One of the points bunnie made was that customizing embedded devices is really a frontier at this point (though he did not use the word frontier). There are not a lot of folks doing it. The Sustainability bullet point emphasizes this bunnie talked about how people create custom environments for the device and want updates pushed to it, but thats difficult because the environment may mean special tweaks to the updates.

The cost point was interesting because we often see a feature and wonder why they cannot put a feature that seems low-cost to us. Here is why:

slide with why low-cost features aren't that easy to add

So if the Cost of Goods Sold (COGS) is $30, you want to add some margin so you can earn a living, say $15. The retail markup will be $45, so the total cost to the consumer is now $90. Then there is a bit of rounding up to the magic price of $99.

Now if you decide to add a $5 feature to it, the COGS is now $35. Your margin becomes $17.50, and the retail markup becomes $52.50. Now the total cost is $105, and gets increased to the magic price point of $149. So there is a lot of argument about *which* $5 features to add with that price point. (Also note that $5 does not seem like much, but when your total COGS is $30, you are adding 16.67% of the cost)

The tech stack is surprisingly full of old-school standards, I was pretty excited to see MySQL in it:

tech stack including MySQL

One of my coworkers left the keynote looking at the challenge and wanting to take it on. I heard all the complexity bunnie told us about and realized, that is not a stack I want to spend the time to learn, for what I would do with it. It is not where I want to spend my time. Here is a slide that shows where the time was spent on the chumby device design:

time spent on chumby device design

The actual hardware design, encircled by a box on the slide, is 11%. And the 19% in the product and software design is iterative, some of the 11% of time in the hardware design overlapped with product and software design, because they are intimately tied together. In fact, they are so intimately tied, that the hardware is actually dominated by software concerns – this is not something I was aware of, but of course once I think about it, it makes sense:
shape of hardware is tied to software

bunnie got these times by going back to e-mails, so this is more like the phases of the project. The bulk of the time is the marketing and business development plus the mass-production ramp-up. bunnie talked about the tooling, which was also quite interesting. After showing us all the steps and explaining that a tool run takes 6 weeks and can cost around $20k (I assume USD, that was not specified), this slide came up:

startup vs. Apple tooling costs

What a startup will do is try one design, spending the time and money, and if its not right they will have to spend the time and money again, in a serial way. Apple can put down $100k and have 5 different tooling runs happen simultaneously, and they pick the best one, so at the end of 6 weeks, Apple has a great design, whereas a startup only has their first design.

And with that being said, Apple re-tooled the iPhone 3G at least 3 times. So that is why Apple stuff looks so much nicer than what a startup can come up with that and they have brilliant designers.

The software design is not easy, either:

Why software design is so tricky

The code has to be extremely optimal, binaries have to be stripped, so the UI is responsive, stable, and your gadget succeeds.

The last takeaway on that first slide I showed was Ship or Die. Most folks know about this concept, but this slide made it pretty clear:

Ship or Die slide

As bunnie explains it, most gadget sales (with some exceptions) happen between Black Friday [the day after US Thanksgiving] and Christmas [the Dec 25th one]. If you are aiming to release in mid-November, and your deadline slips a few weeks, you are now missing your huge window of sales, and you will have to wait until next year to really sell your product.

There was much more content than I am able to put here, and I am glad I got the chance to see this keynote!

Liveblog: Think, Create and Critique Design

At Linux Conf AU, “Think, Create and Critique Design” by Andy Fitzsimon… HTML slides design checklist Elements and principles of design. Design is like cooking Ingredients create flavors that influence a meal in that way: Elements create principles influencing a … Continue reading

At Linux Conf AU, Think, Create and Critique Design by Andy Fitzsimon

HTML slides
design checklist

Elements and principles of design.

Design is like cooking
Ingredients create flavors that influence a meal
in that way:
Elements create principles influencing a composition.

Some definitions.
Elements are:

    • Line: A continuous path between 2 points. Can also be a process, or a story plot.
    • Shape: When a line joins to cover an area, it evolves into a shape. Bottleneck and pyramid scheme are ways we use geometry as a metaphor.
    • Space: area between and inside things. Positive space/negative space.
    • Size: physical scale, bigness or smallness.
    • Color: perceived mixture. Color can be additive/subtractive or a mood. Or a metaphor, colorful personality.
    • Value: Static measure. lightness, darkness, volume.
    • Texture: structure and feel, rough/smooth, soft, etc.

 

Principles are the methods applied. They influence composition, but theyre not composition. They can be made with elements and also other things.

  • Proportion divided measure of a relative whole
  • Pattern using the same element(s) multiple times. Can be a template.
  • Graduation Incremental changes to one element over another
  • Balance/Harmony/Unity one or more elements creating a cohesion.
  • Contrast
  • Emphasis a significant use of one or more elements in a single place to distinguish.
  • Form the whole that a sum of parts becomes Gestalt German for form but in a mind-blowing way.If youre creating something, you look at the elements and see if they form principles. (e.g. cooking, ingredients)

    If youre appraising, you start with the whole and try to figure out what the elements are. (e.g. this is great curry, lets see if I can taste whats in it).

    Can apply these checklists (elements + principles defined above) to anything songs, poems, CSS, a movie, whatever.

    Visual Design and Typography tools:
    baseline grid so easy you can check it with a ruler. A baseline grid always follows a vertical rhythm.

    Varied scale Robert Bringhurst (god of typography) 1 unit, 1 metric, scale up by a proportion. How you can create and measure things too.

    Symbolics are a great tool they follow typographic tools graphical glyphs treated as type.

    Art nouveau is ornate, decorated, and hard to create. But if you try to create ornate styles, theres plenty of places to hide you can mess up some details, people wont necessarily notice. Its a good aesthetic but hard to communicate well because its so busy.

    Style tiles and brand guides if you have a set of rules, its easier to follow. It also creates consistency, and it lends itself to a balanced piece of work.

    Interactive Design tools:
    Fantastic field, UI is hot right now. Be wary of abstracted tools.

    Patterns are intuitive, isolated and repeatable. Patterns dont dictate a complete success, they need to be interpreted in context.

    Wireframe the problem with wireframes is that anyone can do one and if its not done well, when people execute it, they follow the wireframe. If you start with a bad template that doesnt take things into consideration, youll end up with a bad implementation. Who makes the wireframe has to know what theyre doing.

    Workflow a procedure so hard to remember, you write it down. Really a series of steps in order to produce a result.

    Persona a compromise for never meeting real stakeholders written by gamblers and liars So often a persona doc is just punting. Does the person comes from a set of data (the average person has one breast and one testicle) or are they a real person?

    So be wary of abstracted tools but also be wary of abstracted results!

    Analytics metrics that justify slavery you check what metrics you check, and those tell you what to do. Pick them carefully!

    Instrumentation you do it to yourself You implement analytics yourself. Youre creating your own enslavement rules.

    Surveys the bored, attention-starved periphery of your audience dont forget that its a self-selecting audience.

    Reviews http://xkcd.com/937/ you have to read the text to figure out why the good is good and the bad is bad

    User Testing more like zombie testing Why wont they smile?

    Hyper-realism looks like something real
    skeu reminds you of something real, doesnt have to be high fidelity (e.g. clock/watch to evoke time)
    Experience design:
    deliberate design a dog with tiger stripes bad tiger, cute dog. Have deliberate differences, be something different.
    think, make, become. Empathy is a currency now. Take ownership win empathy, forgiveness and support.

    Now he puts up Maslows heirarchy of needs Then theres Aaron Walter, who has designing for emotion and design personas

    love, meaning, pleasure, convenience, predictabiilty, purpose these are the goals in design. If you hit half of these this is great.

    Easy to observe (easy/logical/predictable) vs. hard to tell (lovable/loyal/trusted)

    Now lets talk about process the process creates a product, but the design itself is a process. Design thinking is huge.

    Design for Hackers Andy says it is great if you can stomach the Apple stuff and the chapter on Web 2.0.

    Pragmatic Thinking and Learning: Refactor your Wetware is also great.

IN Subqueries in MySQL 5.6 Are Optimized Away

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Here’s a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 … Continue reading

In MySQL 5.6, it looks like IN() subqueries are optimized even better than they are in MariaDB 5.5. Heres a typical IN() subquery, using the sakila sample database (query taken from slide 6 of the presentation about new MySQL 5.6 optimizer statistics):

SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)

Before there were any subquery optimizations, say if you are using MySQL 5.1, you would have to rewrite this query as a JOIN, to avoid the dreaded DEPENDENT SUBQUERY that shows up in the EXPLAIN:

mysql> explain SELECT title FROM film WHERE film_id IN (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: NULL
key: idx_title
key_len: 767
ref: NULL
rows: 1025
Extra: Using where; Using index
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: film_actor
type: index_subquery
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: func
rows: 1
Extra: Using index

MariaDB 5.5 boasts subquery optimization, and rightfully so. It looks like MariaDB materializes the subquery:

MariaDB [sakila]> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1043
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table:
type: eq_ref
possible_keys: distinct_key
key: distinct_key
key_len: 2
ref: func
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: film_actor
type: index
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: NULL
rows: 4889
Extra: Using index

So MariaDB recognizes the subquery and optimizes it. But it is still optimized as a subquery. There are 3 rows here, a new <subquery2> table is used to help optimize the results.

In MySQL 5.6, the subquery is actually optimized away. The optimizer actually changes it. You can see this in the EXPLAIN plan by looking at the select_type. In both the MySQL 5.1 and MariaDB 5.5 examples, the select_type is PRIMARY, which is used for the outer query in a subquery (or the first SELECT in a UNION, but that does not apply here). In MySQL 5.6, the select_type is SIMPLE for both rows. Note that MySQL 5.6 also does not have to add a third table as MariaDB does:

mysql> explain SELECT title FROM film WHERE film_id IN
-> (SELECT film_id FROM film_actor)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: index
possible_keys: PRIMARY
key: idx_title
key_len: 767
ref: NULL
rows: 1000
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
type: ref
possible_keys: idx_fk_film_id
key: idx_fk_film_id
key_len: 2
ref: sakila.film.film_id
rows: 1
Extra: Using index; FirstMatch(film)
2 rows in set (0.00 sec)

In the presentation, the Oracle team says that for DBT3 Query #18, execution time reduces from days to seconds. With optimizations like this, I believe it!

New EXPLAIN Features in MySQL 5.6 – FORMAT=JSON actually adds more information!

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT,UPDATE,DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was “I do not care about that!” … Continue reading

When I heard that MySQL 5.6 was implementing EXPLAIN for writes (INSERT,UPDATE,DELETE,REPLACE), I was pretty excited. Then I heard that MySQL 5.6 also was implementing a JSON format for EXPLAIN and my thought was I do not care about that!

Boy, was I wrong. The JSON format does not just put the output into JSON format, it also gives extra information thats actually pretty useful! It can tell you when you are doing an implicit cast, which parts of a composite index are being used, and when index condition pushdown are being used. None of these are shown in regular EXPLAIN (which seems odd, why could they extend the JSON format but not put the information into the regular EXPLAIN format?), so using the JSON format is actually a good idea even if you do not care about what format your output is in.

As a note, MySQL Workbenchs Visual Explain (go to Query->Visual Explain Current Statement) also gives this information.

attached_condition and implicit casts

In a talk about EXPLAIN I do, I use the Sakila sample database. Here is an example of a bad query:

mysql> EXPLAIN SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index
possible_keys: NULL
key: rental_date
key_len: 10
ref: NULL
rows: 16005
Extra: Using where; Using index
1 row in set (0.00 sec)

This query is bad because it is doing a full index scan (type: index) instead of doing a range scan for just the range of dates we want (should be type: range). Ironically, the EXPLAIN does not actually explain why.

However, the JSON format does explain why:
mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE DATE(rental_date) = '2006-02-14'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "index",
"key": "rental_date",
"used_key_parts": [
"rental_date",
"inventory_id",
"customer_id"
],
"key_length": "10",
"rows": 16005,
"filtered": 100,
"using_index": true,
"attached_condition": "(cast(`sakila`.`rental`.`rental_date` as date) = '2006-02-14')"
}
}
}

Note that the attached_condition shows the implicit cast. This is MUCH more friendly to a developer or administrator who is trying to figure out why MySQL is not doing what they want it to do!

The visual EXPLAIN from MySQL Workbench also shows the implicit cast:

You may also notice it shows the filtered attribute, which is not in regular EXPLAIN but is part of EXPLAIN EXTENDED filtered is the percentage of rows that are estimated to be returned. A higher number here is better, if it is low it means that you are examining a lot of rows that you do not return.

used_key_parts

You may have noticed above that there is a used_key_parts array that does not show up in the traditional EXPLAIN. In a traditional EXPLAIN (or EXPLAIN EXTENDED), you do get to see the index length with the key_len field, so you can guess that only part of a composite index is used. Both the previous query and the following query use this index:

UNIQUE KEY `rental_date` (`rental_date`,`inventory_id`,`customer_id`)

Here is the traditional EXPLAIN note that it shows the rental_date index is used, and the key_len is 5, which infers that only the first field fo the index, rental_date is being used, not the other 2 id fields. But you have to deduce that for yourself:

mysql> EXPLAIN EXTENDED SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 5
ref: NULL
rows: 181
filtered: 100.00
Extra: Using where; Using index

Here is the JSON format, which shows the used_key_parts field, which reveals very clearly that only the first field of the index is used:

mysql> EXPLAIN FORMAT=JSON SELECT rental_id FROM rental WHERE rental_date BETWEEN '2006-02-14 00:00:00' and '2006-02-14 23:59:59'\G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"table": {
"table_name": "rental",
"access_type": "range",
"possible_keys": [
"rental_date"
],
"key": "rental_date",
"used_key_parts": [
"rental_date"
],
"key_length": "5",
"rows": 181,
"filtered": 100,
"using_index": true,
"attached_condition": "(`sakila`.`rental`.`rental_date` between '2006-02-14 00:00:00' and '2006-02-14 23:59:59')"
}
}
}

And here is the MySQL Workbench Visual EXPLAIN that shows the used_key_parts clearly:

Index condition pushdown is itself a new feature in MySQL 5.6, and I will talk about it in another blog post.

I am glad I took a second look at EXPLAIN FORMAT=JSON the new features are awesome! My only complaint is that I think they should be added to either EXPLAIN or EXPLAIN EXTENDED. I also hope that tools like pt-query-digest will be updated to use the extra information.

MySQL 5.1 vs. MySQL 5.5: Floats, Doubles, and Scientific Notation

For the past few days, I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, and we have lots of subqueries). Our process involves upgrading a slave … Continue reading

For the past few days, I have been upgrading a few servers. We are going from Percona’s patched MySQL 5.1 to MariaDB 5.5 (the subquery optimization is excellent, and we have lots of subqueries). Our process involves upgrading a slave first, and letting it replicate for a while, and if everything’s good, update more slaves, then the master.

This has served us well in the past. And now that we are checking data integrity between masters and slaves with pt-table-checksum, the process involves checking before we start that there is data integrity. This is easy, as we checksum twice daily and have a Nagios alert if there are any discrepancies. After the upgrade, we checksum again, to be sure no data has been changed/corrupted in the process of doing a mysqldump export and import.*

Much to my surprise, after importing the data on one of our dev servers, I found that there were a lot of discrepancies. So I picked a chunk to do some comparisons on, and found something interesting:

On Server version: 5.1.65-rel14.0-log Percona Server (GPL), 14.0, Revision 475:
mysql> select float_field from db.tbl where id=218964;
+-------------+
| float_field |
+-------------+
| 9.58084e-05 |
+-------------+
1 row in set (0.04 sec)

On Server version: 5.5.28a-MariaDB-log MariaDB Server
MariaDB [(none)]> select float_field from db.tbl where id=218964;
+--------------+
| float_field |
+--------------+
| 0.0000958084 |
+--------------+
1 row in set (0.24 sec)

Which of course causes a different checksum. I tried SELECTing the values, casting and converting them, but I could not get them to change in the database. MySQL 5.1 insists on storing in scientific notation, and MariaDB 5.5 (and MySQL 5.5, we tested it out) insists on storing without scientific notation.

Frankly, I’m surprised this has not come up before (I did lots of querying Google for MySQL 5.5 and scientific notation), since it radically changes how numbers look when they are stored and retrieved. I guess code does the right thing…except for pt-table-checksum, and I cannot really blame it.

In the end, I used the –ignore-columns option to pt-table-checksum, with the result of:

SELECT GROUP_CONCAT(DISTINCT COLUMN_NAME) FROM INFORMATION_SCHEMA.COLUMNS WHERE DATA_TYPE IN ('float','double') AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema');

In this way, I can get an accurate checksum to see if anything has changed, before I mark that the upgrade is complete on this slave server.

* This is just on the first slave. After the first slave is upgraded, we use xtrabackup to copy the data to another server to upgrade it.

 

Open Database Camp at Northeast LinuxFest

I am happy and proud to announce that there will be an Open Database Camp held at this year’s Northeast LinuxFest! The venue is at Harvard University in Cambridge, MA (“our fair city”), and will take place Saturday, March 16 … Continue reading

I am happy and proud to announce that there will be an Open Database Camp held at this years Northeast LinuxFest! The venue is at Harvard University in Cambridge, MA (our fair city), and will take place Saturday, March 16 and Sunday, March 17, 2013.

Northeast LinuxFest and Open Database Camp are both free, but there is no reciprocal membership. To register for Open Database Camp, just sign up with Eventbrite. We are also soliciting session ideas ahead of time, and attendees will choose sessions during the Saturday morning planning session, as usual for Open DB Camp.

If you are interested in sponsoring, do so directly to Northeast LinuxFest and let them know its for Open Database Camp!

Open Database Camp is for all open databases whether its MySQL, Postgres, NoSQL, been around for years or something youre thinking about. You can see previous session ideas at the OpenSQLCamp website.

December News from the Mozilla Database Team

Being the end of the quarter, there has been some planning going on this month about goals for Q1 2013 as well as meeting our goals for Q4 2012. Our biggest goal was to stop using MySQL 5.0, which we … Continue reading

Being the end of the quarter, there has been some planning going on this month about goals for Q1 2013 as well as meeting our goals for Q4 2012. Our biggest goal was to stop using MySQL 5.0, which we have successfully done. We only have one server left on MySQL 5.0, and that has a compatible MySQL 5.1 server waiting for a few developers to get back from their well-deserved vacations to migrate off. In December, we finished upgrading 2 servers to MySQL 5.1.

– Looked at the top 30 Bugzilla queries and started to give optimization tips for MySQL.
– Did our regular purge/defrag of TinderBox PushLog.
– Worked on integrating our datazilla code with chart.io features.
– Helped change the data model for datazilla.
– Moved some Bugzilla tables to a different partition when the data partition filled up. There is a plan to upgrade but we had an immediate need for the move.
– Upgraded one of the Bugzilla slaves to MariaDB 5.5.
– Refreshed the support staging database with production data.
– Added grants for metrics users to support new Bugzilla custom fields.
– Did some research on whether SSDs were good enough for the addons database or if we really needed Fusion I/O. (conclusion: SSDs are good enough! The driver for this was cost of larger Fusion I/O disks, and starting to worry about space on the database systems.)
– Found a bug in new build code for the builder that builds Firefox, that would effectively stop updated builds from being recorded in the builder database. The bug was found in development, the code itself is not in production yet, but there were several hours of database debugging to figure out the problem.
– Built a new database server for backups that does not depend on NFS.
– Implemented checksum checking on several more MySQL clusters to ensure the data on the master and slaves match.
– Created databases for Game On.
– Optimized a query for a Firefox build system component (clobberer).
– Installed larger disks on our production Postgres failover server. We will be testing failover and adding more disks to the current master server in Q1.
– Created a database cluster for the main Mozilla website for failover.
– Cleaned up replication on a cluster after a power problem caused the master to crash.
– Added a Nagios check that uses pt-config-diff to all our MySQL servers to ensure that we know whenever the running MySQL configuration does not match the /etc/my.cnf file.
– Dealt with a set of queries breaking replication due to not being inside a transaction.
– Dealt with a schema change for Bugzilla taking a long time, and took slaves out of the load balancer one at a time to let the ALTER TABLE complete without read queries getting locked and causing slow responsiveness on the database servers.
– Created read-only database logins for the administrators of Air Mozilla so they can better debug problems.
– Imported some data for Graphs.
– Audited the Persona/BrowserID databases to get them ready for prime time (these databases are not normally managed by the DB team).
– Did a security review for our Engagement team to get reports of Mozillians emails for sending out information to registered and vouched Mozillians.
– Added documentation for 11 Nagios checks related to MySQL and Postgres.
– Researched the Zero Day Exploits for MySQL to see if Mozilla was affected.
– Puppetized the postgresql.conf files for all our postgres servers.
– Puppetized our datazilla database servers.
– Puppetiezed database servers for web development and for internal tools.
– We sized MySQL database machines for the Platform as a Service (PaaS) platform that the webops team will be implementing. The next step is ordering the hardware!

Under planning we have done a lot in 2012 to stabilize our MySQL environment and have a good, sane centralized puppet configuration for control of MySQL packages, configuration files, scripts and backups. 2013 will be the year we do the same with Postgres:
– Stabilizing Postgres
– Streamlining Postgres configuration and installation and upgrading with puppet
– Reconfiguring Postgres logging
– Stabilizing Postgres backups

There are plenty of great things that will happen in 2013 from the Mozilla Database Team for both MySQL and Postgres databases!

2012 Mozilla DB Year in Graphs

I’m not a wizard with infographics, but I can do a few pie charts. I copied the data to the right of the pie charts for those that want to see the numbers. Overall, there are almost 400 databases at … Continue reading

I am not a wizard with infographics, but I can do a few pie charts. I copied the data to the right of the pie charts for those that want to see the numbers. Overall, there are almost 400 databases at Mozilla, in 11 different categories. Here is how each category fares in number of databases:

Here is how each category measures up with regards to database size clearly, our crash-stats database (which is on Postgres, not MySQL) is the largest:

So here is another pie chart with the relative sizes of the MySQL databases:

I’m sure I’ve miscategorized some things (for instance, are metrics on AMO classified under AMO/Marketplace or internal tools?) but here are the categories I used:

Categories:
air.m.o air.mozilla.org
AMO/Marketplace addons/marketplace
blog/web page its a db behind a blog or mostly static webpage
bugzilla Bugzilla
Crash-stats Socorro, crash-stats.mozilla.com Where apps like Firefox send crash details.
Internal tool If the db behind this is down, moco/mofo people may not be able to do their work. This covers applications from graphs.mozilla.org to inventory.mozilla.org to the PTO app.
release tool If this db is down, releases can not happen (but this db is not a tree-closing db).
SUMO support.mozilla.org
Tree-closing if this db is down, the tree closes (and releases cant happen)
World-facing if this db is down, non moco/mofo ppl will notice. These are specifically tools that folks interact with, including the Mozilla Developer Network and sites like gameon.mozilla.org
World-interfacing This db is critical to tools we use to interface with the world, though not necessarily world visible. basket.mozilla.org, Mozillians, etc.

The count of databases includes all production/dev/stage servers. The size is the size of the database on one of the production/dev/stage machines. For example, Bugzilla has 6 servers in use 4 in production and 2 in stage. The size is the size of the master in production and the master in stage, combined. This way we have not grossly inflated the size of the database, even though technically speaking we do have to manage the data on each of the servers.

For next year, I hope to be able to gather this kind of information automatically, and have easily accessible comprehensive numbers for bandwidth, number of queries per day on each server, and more.