What Does Open Source Mean?

At last night’s event, a lot of the questions were really implicitly asking, “Is open source better? Why?”

The first answer everyone comes up with is that it’s free, and that’s better.

However, that is neither necessary nor sufficient to deem it “better”.

If MySQL did exactly the same tasks Oracle did, but was free, there’s still a huge amount of money involved when migrating. Merely staffing the migration costs a lot of money.

Companies using open source technologies because they are free are (probably) making the right software choice for the wrong reason.

Firstly, open source does not have to be free — MySQL proves that. Their Enterprise source code is free to paying customers (and whoever paying customers distribute to, but that is not the issue).

Secondly, open source’s benefits far outweigh mere license costs, though the license cost is definitely the most tangible benefit.

I realized while the benefits of open source were being touched upon that the benefits are not lacking in the closed software world, they are simply much harder to come by. For instance, there are companies that reverse engineer solutions, develop their own in-house solutions without being able to read a line of original code. Surely it is easier to build a home-grown solution when the code is readable to begin with.

As well, the talent pool for open source is greater, because there is a lower barrier to entry. It’s still just as difficult to separate the wheat from the chaff as it is in a closed source world, however if your company is willing to hire the top 10%, I’d rather try to find the top 10% from a pool of tens of thousands of people than from a pool of thousands.

The oft-quoted “you can hack it yourself if you want” still applies, and moreso the idea that “even if the company goes out of business, or the core developers stop developing, others can pick up where the previous developers left off.”

One issue we did not touch upon was that open source tends to follow a popular concept in “extreme programming” — the idea that the software is always working. It may not have all the features, maybe it’s not much more than “hello world”, but it works. A feature is added, the code integrated, and it still works, now with +1 feature.

I think the issue is that in general, it is *easier* to reap these benefits from open source than from closed. It makes the argument more difficult, because it’s *possible* to reap similar (or the same) benefits from closed source, but it’s easier with open source.

At last night’s event, a lot of the questions were really implicitly asking, “Is open source better? Why?”

The first answer everyone comes up with is that it’s free, and that’s better.

However, that is neither necessary nor sufficient to deem it “better”.

If MySQL did exactly the same tasks Oracle did, but was free, there’s still a huge amount of money involved when migrating. Merely staffing the migration costs a lot of money.

Companies using open source technologies because they are free are (probably) making the right software choice for the wrong reason.

Firstly, open source does not have to be free — MySQL proves that. Their Enterprise source code is free to paying customers (and whoever paying customers distribute to, but that is not the issue).

Secondly, open source’s benefits far outweigh mere license costs, though the license cost is definitely the most tangible benefit.

I realized while the benefits of open source were being touched upon that the benefits are not lacking in the closed software world, they are simply much harder to come by. For instance, there are companies that reverse engineer solutions, develop their own in-house solutions without being able to read a line of original code. Surely it is easier to build a home-grown solution when the code is readable to begin with.

As well, the talent pool for open source is greater, because there is a lower barrier to entry. It’s still just as difficult to separate the wheat from the chaff as it is in a closed source world, however if your company is willing to hire the top 10%, I’d rather try to find the top 10% from a pool of tens of thousands of people than from a pool of thousands.

The oft-quoted “you can hack it yourself if you want” still applies, and moreso the idea that “even if the company goes out of business, or the core developers stop developing, others can pick up where the previous developers left off.”

One issue we did not touch upon was that open source tends to follow a popular concept in “extreme programming” — the idea that the software is always working. It may not have all the features, maybe it’s not much more than “hello world”, but it works. A feature is added, the code integrated, and it still works, now with +1 feature.

I think the issue is that in general, it is *easier* to reap these benefits from open source than from closed. It makes the argument more difficult, because it’s *possible* to reap similar (or the same) benefits from closed source, but it’s easier with open source.

MySQL Focuses on Community

Post Summary: An apology with a lesson.

When Steve Curry contacted me just after the MySQL Conference and Expo asking me if I’d be interested in a community roundtable, I was excited. Not just because Steve Curry brought me an inflatable pink dolphin after I squee‘d that I needed one, although I never forget when someone does me a favor.

However, a few weeks ago it seemed like the event was more of a PR gathering than a community roundtable. I was disappointed, and told Steve as much.

And then, one of two things happened:

1) My concerns were brought up, discussed and folks decided a roundtable involving community was a good idea;
or
2) I had come up with two different pictures of the event in my mind, based on my expectations of “community roundtable” at first and “event with businesses and PR, to include community” as the final description.

Now, last night was an excellent opportunity for me and also a lot of fun. A lot of the questions were really implicitly asking, “Is open source better? Why?” More on that in the next post, I promise.

So I wanted to say to MySQL that I was wrong.

I am sorry.

Sure, MySQL did not know what I was thinking. And certainly the event could have turned out to be one I did not enjoy.

The lesson to learn from this is that sometimes we get upset at our perception of reality, and not reality itself.

And to follow up on my cranky post where I was annoyed at the MySQL’s website’s lack of functionality at http://www.pythian.com/blogs/1016/mysql-website-a-reflection-of-values, I feel I should note that I got a call later that day from MySQL’s web designer telling me that my concerns were valid and MySQL was actively working on them. Indeed, www.mysql.com has added a “Documentation” link in the orange submenu (first is “Products” and second is “Downloads”, so I completely agree with their prioritization as well).

The other lesson: Always trade business cards with people, so they have your contact information when they want to contact you. A phone call was so much more powerful than an e-mail ever could have been.

Post Summary: An apology with a lesson.

When Steve Curry contacted me just after the MySQL Conference and Expo asking me if I’d be interested in a community roundtable, I was excited. Not just because Steve Curry brought me an inflatable pink dolphin after I squee‘d that I needed one, although I never forget when someone does me a favor.

However, a few weeks ago it seemed like the event was more of a PR gathering than a community roundtable. I was disappointed, and told Steve as much.

And then, one of two things happened:

1) My concerns were brought up, discussed and folks decided a roundtable involving community was a good idea;
or
2) I had come up with two different pictures of the event in my mind, based on my expectations of “community roundtable” at first and “event with businesses and PR, to include community” as the final description.

Now, last night was an excellent opportunity for me and also a lot of fun. A lot of the questions were really implicitly asking, “Is open source better? Why?” More on that in the next post, I promise.

So I wanted to say to MySQL that I was wrong.

I am sorry.

Sure, MySQL did not know what I was thinking. And certainly the event could have turned out to be one I did not enjoy.

The lesson to learn from this is that sometimes we get upset at our perception of reality, and not reality itself.

And to follow up on my cranky post where I was annoyed at the MySQL’s website’s lack of functionality at http://www.pythian.com/blogs/1016/mysql-website-a-reflection-of-values, I feel I should note that I got a call later that day from MySQL’s web designer telling me that my concerns were valid and MySQL was actively working on them. Indeed, www.mysql.com has added a “Documentation” link in the orange submenu (first is “Products” and second is “Downloads”, so I completely agree with their prioritization as well).

The other lesson: Always trade business cards with people, so they have your contact information when they want to contact you. A phone call was so much more powerful than an e-mail ever could have been.

Open Source – The Foundation of Civilization

Almost 2 years ago, in How Open Do You Have To Be To Be Open Source? I wrote:

Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue.

Matt Asay’s recent post Google’s slow transformation into an open, transparent company made me dig up that post, which by many standards is old in terms of time, but it’s only now that some of this change is actually happening.

Matt ponders,

It remains to be seen what, if anything, Google will actually open, but I trust its track record on living up to its word more than Microsoft’s, which also went through a flurry of “We’re now really open!” announcements lately that actually netted the industry…not much.

In interesting news, at last night’s Boston Sun/MySQL event (more on that in another post), the question was asked if the panel thought that Microsoft was really serious about open sourcing their software(s) and what that would mean for open source software.

I couldn’t wait to jump in with my answer — and even though I had to wait, I did eventually say what was on my mind.

If Microsoft opened all of their code tomorrow, how big of a *developer* community would they have? By that I mean, how many people would say “yeah, all right! I’m going to make this code better!” and how many would take a look at the internals and feel like they’d just been on a roller coaster?

Open source is the foundation of civiliazation. The title of this post mentions that, and now I will explain why.

Civilization occurs because prior work does not need to be repeated. Derivative works allow for more complexity, more specialization. The first took that made this easier by leaps and bounds was language. After that, written language made it easier, and distribution due to the duplication of written language (specifically the printing press) made it even easier. Fast forward a few hundred years, add the internet, and we have our pick of giants on whose shoulders we might stand, as information is disseminated widely, instantly.

In order to actually build upon previous work, the previous work must be understood. Through reverse engineering, closed source software can be mostly understood. However, as “reverse engineering” becomes taboo to say, “design recovery” (or as I prefer, “intelligent design recovery”, as I feel that “design recovery” is as silly a phrase as “intelligent design”) is not as appealing as having the source open to begin with.

After all, if the source is open, it is easier to understand software. And thus, create derivative works that are more complex, perhaps more specialized.

The flip side of this is that it makes the success of open source software more difficult. Open source projects are more prone to being abandoned due to early recognition that the fundamentals are flawed. That early recognition is not often by the developers who write the software, but by the early adopters themselves. Being early adopters, they are not afraid to take risks, and they likely are planning on having to do some code hacking to get the software to their liking. If they are not satisfied when they play around with it, they stop using it before they get too invested in the software.

If Microsoft opened their source, I strongly believe it would bring their demise on faster. Perhaps the only saving grace would be Bill Gates having a reality show “Who Wants To Be Microsoft’s Next Developer?” and in each episode create challenges to have great developers solve existing issues with Microsoft software. Each week one developer gets booted off the show……….perhaps “saving grace” is too strong a term for that concept.

Anyway. There are many discussion points in this post, and I hope you will think about some of them and comment (or post to your own blog).

Open source — the only civilized choice.

Almost 2 years ago, in How Open Do You Have To Be To Be Open Source? I wrote:

Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue.

Matt Asay’s recent post Google’s slow transformation into an open, transparent company made me dig up that post, which by many standards is old in terms of time, but it’s only now that some of this change is actually happening.

Matt ponders,

It remains to be seen what, if anything, Google will actually open, but I trust its track record on living up to its word more than Microsoft’s, which also went through a flurry of “We’re now really open!” announcements lately that actually netted the industry…not much.

In interesting news, at last night’s Boston Sun/MySQL event (more on that in another post), the question was asked if the panel thought that Microsoft was really serious about open sourcing their software(s) and what that would mean for open source software.

I couldn’t wait to jump in with my answer — and even though I had to wait, I did eventually say what was on my mind.

If Microsoft opened all of their code tomorrow, how big of a *developer* community would they have? By that I mean, how many people would say “yeah, all right! I’m going to make this code better!” and how many would take a look at the internals and feel like they’d just been on a roller coaster?

Open source is the foundation of civiliazation. The title of this post mentions that, and now I will explain why.

Civilization occurs because prior work does not need to be repeated. Derivative works allow for more complexity, more specialization. The first took that made this easier by leaps and bounds was language. After that, written language made it easier, and distribution due to the duplication of written language (specifically the printing press) made it even easier. Fast forward a few hundred years, add the internet, and we have our pick of giants on whose shoulders we might stand, as information is disseminated widely, instantly.

In order to actually build upon previous work, the previous work must be understood. Through reverse engineering, closed source software can be mostly understood. However, as “reverse engineering” becomes taboo to say, “design recovery” (or as I prefer, “intelligent design recovery”, as I feel that “design recovery” is as silly a phrase as “intelligent design”) is not as appealing as having the source open to begin with.

After all, if the source is open, it is easier to understand software. And thus, create derivative works that are more complex, perhaps more specialized.

The flip side of this is that it makes the success of open source software more difficult. Open source projects are more prone to being abandoned due to early recognition that the fundamentals are flawed. That early recognition is not often by the developers who write the software, but by the early adopters themselves. Being early adopters, they are not afraid to take risks, and they likely are planning on having to do some code hacking to get the software to their liking. If they are not satisfied when they play around with it, they stop using it before they get too invested in the software.

If Microsoft opened their source, I strongly believe it would bring their demise on faster. Perhaps the only saving grace would be Bill Gates having a reality show “Who Wants To Be Microsoft’s Next Developer?” and in each episode create challenges to have great developers solve existing issues with Microsoft software. Each week one developer gets booted off the show……….perhaps “saving grace” is too strong a term for that concept.

Anyway. There are many discussion points in this post, and I hope you will think about some of them and comment (or post to your own blog).

Open source — the only civilized choice.

Oracle Open World 2008 Sessions — Vote on Oracle Mix

In the recent month there were several Oracle community web sites created. Well, I remember I registered on one or two but I couldn’t really keep an eye on many so I decided to wait and see which one wins. Turns out that Oracle Mix came out as a winner. Maybe it’s just my impression.

But I digress… I just wanted to make a quick note that Oracle Mix organized an interesting hybrid between call for papers and abstract judging. Anyone registered at Oracle Mix can propose a session abstract to present themselves or as an idea for others. Everyone can give their votes to the proposed sessions. At the end of the voting deadline (25th of June) Oracle will select the top sessions to be included in the Oracle Open World schedule.

So what? Well, I did send mine few days ago — Demystifying Workload Management with Oracle RAC based on my Hotsos Symposium 2008 presentation. I’m not sure how wide is the potential audience for this session — it’s far from beginners session and is specific to RAC. However, I do believe that this topic is often misunderstood and there is very good potential to spread the knowledge. So if you are coming to the Oracle Open World and interested in that topic – go ahead and vote. If not mine, there are plenty of others.

PS: I managed to do 2 (!) typos in the title and can’t edit it anymore. I don’t have any error on update but the title comes back unchanged. I have already filled in a bug report – let’s see if it gets fixed.

In the recent month there were several Oracle community web sites created. Well, I remember I registered on one or two but I couldn’t really keep an eye on many so I decided to wait and see which one wins. Turns out that Oracle Mix came out as a winner. Maybe it’s just my impression.

But I digress… I just wanted to make a quick note that Oracle Mix organized an interesting hybrid between call for papers and abstract judging. Anyone registered at Oracle Mix can propose a session abstract to present themselves or as an idea for others. Everyone can give their votes to the proposed sessions. At the end of the voting deadline (25th of June) Oracle will select the top sessions to be included in the Oracle Open World schedule.

So what? Well, I did send mine few days ago — Demystifying Workload Management with Oracle RAC based on my Hotsos Symposium 2008 presentation. I’m not sure how wide is the potential audience for this session — it’s far from beginners session and is specific to RAC. However, I do believe that this topic is often misunderstood and there is very good potential to spread the knowledge. So if you are coming to the Oracle Open World and interested in that topic – go ahead and vote. If not mine, there are plenty of others.

PS: I managed to do 2 (!) typos in the title and can’t edit it anymore. I don’t have any error on update but the title comes back unchanged. I have already filled in a bug report – let’s see if it gets fixed.

BigTable Thoughts

So, Paul’s blog post pointing to Todd’s blog post got me thinking.

The main point Paul summarized was that duplicating data was a great way to scale, and used Todd’s reference to Flickr and how in their partition-by-user scheme, they put a comment in the commenter’s shard as well as in the commentee’s shard.

In my recent post about Twitter, I wrote:

Now, I understand that it is hard to get all the histories for the people I follow. But it only needs to be done once, and could then be cached — “Posts from who Sheeri follows on 5/20″. It would not be difficult, and I would be OK with the functionality changing such that “once you follow a new person, their tweets prior to when you followed them do not show up in the history.”

So using this thinking, every time someone I follow (say, @paulandstorm) makes a comment, it not only writes to their shard, but to mine. Now, that may not work given that the system also has to send messages at the same time, and that there can be numerous followers — dozens, hundreds, thousands.

The Flickr model works because it involves 2 writes to get the faster caching later, and there are more reads than writes. Twitter is more write-heavy, and likely has more writes than reads, considering that many folks do not visit an historical website to see their history.

This particular idea may not work for Twitter. But I’ve picked on Twitter enough….

I thought about livejournal. I’ve been a livejournal member since 2001 — after 2 months of writing my own journaling system with comments, I got wind that a system already existed, so started to use that.

Now, I can go and pick specific entries from specific days, or I can read my “friends list”. I specify my friends and livejournal dynamically populates pages of my friends list, with the amount of entries per page that I specify.

Livejournal could also use the idea presented above, as well as the concept of semi-dynamic data. Instead of dynamically generating the last, let’s say 20, entries of my “friends list”, livejournal could be making my friends list as it gets written to. A friend makes a post and it gets added to my shard, whether or not I read it. Once the count gets up to 20, a new cache page is generated.

Now, livejournal already has great caching, and has indeed had the growing pains Twitter is seeing. And for either livejournal or twitter to take advantage of these concepts, they would likely require a rewrite from the ground up. So it’s not that I am suggesting this. I just think it’s a great idea, and if you are working on a project, think of where it might be useful to apply…..again, it may not be applicable in all situations. Like Twitter, livejournals may have many “friends” so doing 100 or 1,000 writes every time a post is made may not actually be feasible.

So, Paul’s blog post pointing to Todd’s blog post got me thinking.

The main point Paul summarized was that duplicating data was a great way to scale, and used Todd’s reference to Flickr and how in their partition-by-user scheme, they put a comment in the commenter’s shard as well as in the commentee’s shard.

In my recent post about Twitter, I wrote:

Now, I understand that it is hard to get all the histories for the people I follow. But it only needs to be done once, and could then be cached — “Posts from who Sheeri follows on 5/20″. It would not be difficult, and I would be OK with the functionality changing such that “once you follow a new person, their tweets prior to when you followed them do not show up in the history.”

So using this thinking, every time someone I follow (say, @paulandstorm) makes a comment, it not only writes to their shard, but to mine. Now, that may not work given that the system also has to send messages at the same time, and that there can be numerous followers — dozens, hundreds, thousands.

The Flickr model works because it involves 2 writes to get the faster caching later, and there are more reads than writes. Twitter is more write-heavy, and likely has more writes than reads, considering that many folks do not visit an historical website to see their history.

This particular idea may not work for Twitter. But I’ve picked on Twitter enough….

I thought about livejournal. I’ve been a livejournal member since 2001 — after 2 months of writing my own journaling system with comments, I got wind that a system already existed, so started to use that.

Now, I can go and pick specific entries from specific days, or I can read my “friends list”. I specify my friends and livejournal dynamically populates pages of my friends list, with the amount of entries per page that I specify.

Livejournal could also use the idea presented above, as well as the concept of semi-dynamic data. Instead of dynamically generating the last, let’s say 20, entries of my “friends list”, livejournal could be making my friends list as it gets written to. A friend makes a post and it gets added to my shard, whether or not I read it. Once the count gets up to 20, a new cache page is generated.

Now, livejournal already has great caching, and has indeed had the growing pains Twitter is seeing. And for either livejournal or twitter to take advantage of these concepts, they would likely require a rewrite from the ground up. So it’s not that I am suggesting this. I just think it’s a great idea, and if you are working on a project, think of where it might be useful to apply…..again, it may not be applicable in all situations. Like Twitter, livejournals may have many “friends” so doing 100 or 1,000 writes every time a post is made may not actually be feasible.

Twitter Should Get Back to Basics

Twitter has had many outages recently. On May 17th, 2008 http://blog.twitter.com/2007/05/devils-in-details.html was posted and says:

What went wrong? We checked in code to provide more accurate pagination, to better distribute and optimize our messaging system—basically we just kept tweaking when we should have called it a day. Details are great but getting too caught up in them is a mistake. I’ve been CEO of Twitter for two months now and this an awesome lesson learned. We’re seeing the bigger picture and Twitter is back. Please contact us if something isn’t working right (with Twitter that is).

(in other news, that post was made on May 17th and does not show up on http://blog.twitter.com, which it should, between the May 16th and May 19th posts. I found a reference in other posts and had to search the site to find that post).

A real “awesome lesson learned” is “do not tweak production without testing first.” In every job I have had I have first learned and then taught the concept of “test everything possible.” Which Twitter has not learned yet, because http://blog.twitter.com/2008/05/not-true.html, posted on Tuesday May 20th, states:

We caused a database to fail during a routine update early this afternoon.

As someone who has years of experience working with MySQL, and before that was a systems adminsitrator; as someone who was referred to as “the MySQL Queen” yesterday (by someone who wanted me to test their product, so yes, they were flattering me); I can assure you:

no matter how “routine” a change is, if you do it on production without testing it first, you are playing with fire, and 95% of the fires caused by not testing first are completely preventable.

I will repeat this, because repetition is important to learning concepts.

no matter how “routine” a change is, if you do it on production without testing it first, you are playing with fire, and 95% of the fires caused by not testing first are completely preventable.

With a proper testing environment, 19 out of 20 “whoops, didn’t expect THAT from a routine change!” issues are caught. And I can tell you that often “routine changes” cause unexpected results.

Now, I was online during an outage, and http://twitter.com/home was showing their “site isn’t working” page for at least 3 hours between 2 and 5 am EDT yesterday (Tuesday, May 20th, 2008).

So…..there is no read-only copy around that Twitter could use? Maybe I cannot tweet, but I should at least be able to read what was done before!

Of course, since last week Twitter has done the opposite — often I can see the most recent 20 or so posts, but not anything prior. Now, I understand that it is hard to get all the histories for the people I follow. But it only needs to be done once, and could then be cached — “Posts from who Sheeri follows on 5/20″. It would not be difficult, and I would be OK with the functionality changing such that “once you follow a new person, their tweets prior to when you followed them do not show up in the history.”

Alternatively, you could go the snarky way and say: http://www.techcrunch.com/2008/05/20/twitter-something-is-technically-wrong/ states:
What would be great is if Twitter just moved their blog to another platform so that it doesn’t fail when users need it most.

I am not a huge user of rails. But I will say that given the content of the public announcements, the platform is not the problem. It is the code release process that is the problem. Maybe there’s “agile development” happening, paired programming and code reviews. But there is not adequate testing.

Twitter — if you truly need scaling help, please ask for help — I know Pythian would be happy to help. However, if it really is as it seems — that basic good practice is not being followed — I would like to remind you that backups are really important too, just on the off chance that backups are not happening.

Twitter has had many outages recently. On May 17th, 2008 http://blog.twitter.com/2007/05/devils-in-details.html was posted and says:

What went wrong? We checked in code to provide more accurate pagination, to better distribute and optimize our messaging system—basically we just kept tweaking when we should have called it a day. Details are great but getting too caught up in them is a mistake. I’ve been CEO of Twitter for two months now and this an awesome lesson learned. We’re seeing the bigger picture and Twitter is back. Please contact us if something isn’t working right (with Twitter that is).

(in other news, that post was made on May 17th and does not show up on http://blog.twitter.com, which it should, between the May 16th and May 19th posts. I found a reference in other posts and had to search the site to find that post).

A real “awesome lesson learned” is “do not tweak production without testing first.” In every job I have had I have first learned and then taught the concept of “test everything possible.” Which Twitter has not learned yet, because http://blog.twitter.com/2008/05/not-true.html, posted on Tuesday May 20th, states:

We caused a database to fail during a routine update early this afternoon.

As someone who has years of experience working with MySQL, and before that was a systems adminsitrator; as someone who was referred to as “the MySQL Queen” yesterday (by someone who wanted me to test their product, so yes, they were flattering me); I can assure you:

no matter how “routine” a change is, if you do it on production without testing it first, you are playing with fire, and 95% of the fires caused by not testing first are completely preventable.

I will repeat this, because repetition is important to learning concepts.

no matter how “routine” a change is, if you do it on production without testing it first, you are playing with fire, and 95% of the fires caused by not testing first are completely preventable.

With a proper testing environment, 19 out of 20 “whoops, didn’t expect THAT from a routine change!” issues are caught. And I can tell you that often “routine changes” cause unexpected results.

Now, I was online during an outage, and http://twitter.com/home was showing their “site isn’t working” page for at least 3 hours between 2 and 5 am EDT yesterday (Tuesday, May 20th, 2008).

So…..there is no read-only copy around that Twitter could use? Maybe I cannot tweet, but I should at least be able to read what was done before!

Of course, since last week Twitter has done the opposite — often I can see the most recent 20 or so posts, but not anything prior. Now, I understand that it is hard to get all the histories for the people I follow. But it only needs to be done once, and could then be cached — “Posts from who Sheeri follows on 5/20″. It would not be difficult, and I would be OK with the functionality changing such that “once you follow a new person, their tweets prior to when you followed them do not show up in the history.”

Alternatively, you could go the snarky way and say: http://www.techcrunch.com/2008/05/20/twitter-something-is-technically-wrong/ states:
What would be great is if Twitter just moved their blog to another platform so that it doesn’t fail when users need it most.

I am not a huge user of rails. But I will say that given the content of the public announcements, the platform is not the problem. It is the code release process that is the problem. Maybe there’s “agile development” happening, paired programming and code reviews. But there is not adequate testing.

Twitter — if you truly need scaling help, please ask for help — I know Pythian would be happy to help. However, if it really is as it seems — that basic good practice is not being followed — I would like to remind you that backups are really important too, just on the off chance that backups are not happening.

MySQL Website a Reflection of Values

I understand that MySQL as a company wants to recruit paying customers. However, as a community user I have a hard time finding what I want on the MySQL website. Today’s frustration is brought to you by trying to find the documentation.

Go ahead, hit http://www.mysql.com. From there, where do you go to find the documentation?

It’s not Services, not even Services -> Support.

According to Products, the community server is not even a product. How is a potential new user, who wants to learn about MySQL, supposed to know a community version exists? Here are the products listed on the Products page:

MySQL Enterprise
MySQL Enterprise Monitor
MySQL Cluster
MySQL Embedded Database
MySQL Database Drivers
MySQL Database Tools

Where’s “MySQL Database” on that list? A website user basically has to know what they’re looking for, since that page does not help find “the mysql database”. You can guess it’s the “MySQL Enterprise” — but you’d be wrong. Imagine if you’re a person unfamiliar with MySQL who has been told “go to the MySQL website and get the free version of the database, it’s great!”

OK, OK, I know, you’re saying “click on Downloads”. Which takes me to the “choose which version you want to download.” OK, nice for newbies, but really annoying for the experienced. Why aren’t the links to download at the top, with the explanations just underneath? A newbie would be presented with the links, think “I don’t know which to choose!” but then see that the descriptions are just underneath. An experienced user can just click and go.

This of course is made even more silly when you realize that the community download link simply scrolls down. Yes, that’s right, the community downloads immediately follow that chart of explanations. Which means that in the current state, when the link is at the bottom of the chart, the link points you to the next line. If the link is at the top of the page, at least there’s a reason for it — to scroll down past the explanations.

Now recall that the exercise was to find documentation. There is no way to download the documentation here.

I am a non-paying user, not a developer. Going to the “Developer Zone” is not intuitive for me. However, that is exactly where I need to be:

* DevZone
* Downloads
* Documentation
* Articles
* Forums
* Bugs
* Forge
* Blogs

Given that list of topics, why on earth is this section called the “Developer Zone”? (it’s a rhetorical question!) Sure, if I’m a software developer I might think it’s useful to me. On any other website “developer zone” is relegated to advanced users, or folks using some different part of the product than what most people use (think Apple’s Developer Connection).

Take a look at that list. On any other website, it would be under “Support” or “Help” or “Learn”.

No wonder folks have no idea that the forums exist, much less the Forge or Planet MySQL. If I was a new user to MySQL and I wanted to find the documentation, I’d be very unhappy.

Heck, I’m very unhappy anyway — what kind of company has a huge community of people supporting each other and contributing back to the company, and does not give people an easy way to find the community?

I understand that MySQL as a company wants to recruit paying customers. However, as a community user I have a hard time finding what I want on the MySQL website. Today’s frustration is brought to you by trying to find the documentation.

Go ahead, hit http://www.mysql.com. From there, where do you go to find the documentation?

It’s not Services, not even Services -> Support.

According to Products, the community server is not even a product. How is a potential new user, who wants to learn about MySQL, supposed to know a community version exists? Here are the products listed on the Products page:

MySQL Enterprise
MySQL Enterprise Monitor
MySQL Cluster
MySQL Embedded Database
MySQL Database Drivers
MySQL Database Tools

Where’s “MySQL Database” on that list? A website user basically has to know what they’re looking for, since that page does not help find “the mysql database”. You can guess it’s the “MySQL Enterprise” — but you’d be wrong. Imagine if you’re a person unfamiliar with MySQL who has been told “go to the MySQL website and get the free version of the database, it’s great!”

OK, OK, I know, you’re saying “click on Downloads”. Which takes me to the “choose which version you want to download.” OK, nice for newbies, but really annoying for the experienced. Why aren’t the links to download at the top, with the explanations just underneath? A newbie would be presented with the links, think “I don’t know which to choose!” but then see that the descriptions are just underneath. An experienced user can just click and go.

This of course is made even more silly when you realize that the community download link simply scrolls down. Yes, that’s right, the community downloads immediately follow that chart of explanations. Which means that in the current state, when the link is at the bottom of the chart, the link points you to the next line. If the link is at the top of the page, at least there’s a reason for it — to scroll down past the explanations.

Now recall that the exercise was to find documentation. There is no way to download the documentation here.

I am a non-paying user, not a developer. Going to the “Developer Zone” is not intuitive for me. However, that is exactly where I need to be:

* DevZone
* Downloads
* Documentation
* Articles
* Forums
* Bugs
* Forge
* Blogs

Given that list of topics, why on earth is this section called the “Developer Zone”? (it’s a rhetorical question!) Sure, if I’m a software developer I might think it’s useful to me. On any other website “developer zone” is relegated to advanced users, or folks using some different part of the product than what most people use (think Apple’s Developer Connection).

Take a look at that list. On any other website, it would be under “Support” or “Help” or “Learn”.

No wonder folks have no idea that the forums exist, much less the Forge or Planet MySQL. If I was a new user to MySQL and I wanted to find the documentation, I’d be very unhappy.

Heck, I’m very unhappy anyway — what kind of company has a huge community of people supporting each other and contributing back to the company, and does not give people an easy way to find the community?

Most Commonly Sought-After Command in MySQL Proxy

One of the most frequently needed functionality in the MySQL Proxy is the need to know which server you are on. This is not given, on purpose, by the proxy, because the proxy is supposed to be transparent. It is not supposed to matter which back-end server you are on.

However, for testing purposes we often want to know which back-end server we’re on. Thus I developed functionality for SHOW PROXY BACKEND [INDEX ADDRESS OTHER].

SHOW PROXY BACKEND INDEX — gives the index of the server you’re on (backend_ndx, ie 1)

SHOW PROXY BACKEND ADDRESS — gives the address of the server you’re on (ie, foo.bar.com:3306)

SHOW PROXY BACKEND OTHER — gives the address of all the other servers except those you’re not on, in multiline format.

Note that I was pretty lazy and the commands are case-sensitive. But I figured that since this is supposed to be used mostly in testing circumstances, it did not really matter.

The code is on the MySQL Forge Wiki at http://forge.mysql.com/tools/tool.php?id=139

Interestingly enough, this script is actually being used in production — a site has a primary and failover server, and wants to check that when the primary server is in use, there are no connections on the failover. I wrote that check as well, but as the logic is somewhat particular, I am not sure it would be useful to many. The logic is:

  1. Run SHOW PROXY BACKEND INDEX.
  2. if it fails, I can’t connect to the proxy, log an error, exit.
  3. if it !=1, I’m on the failover server, log a warning, exit.
  4. if it =1, I’m on the primary server.
  5. Run SHOW PROXY BACKEND OTHER.
  6. if it is empty, either I can’t connect to the proxy or there are no failovers defined, log a warning, exit
  7. For each OTHER address, connect to them and find what that connection’s host looks like (sometimes it looks like foo.bar.com:4325, other times it looks like 1.2.3.4:4573). Get the value of “my host” by stripping off the port.
  8. Connect to the OTHER address again, killing off connections from “my host” except for the “system user” and a few other special accounts (replication slave being one of them). Log each kill with the thread number and a warning.
  9. If there are no connections to be killed, log OK.

Let me know if you’d like to see that…it’s a shell script, and it requires the mysql client and bintools like grep and cut.

One of the most frequently needed functionality in the MySQL Proxy is the need to know which server you are on. This is not given, on purpose, by the proxy, because the proxy is supposed to be transparent. It is not supposed to matter which back-end server you are on.

However, for testing purposes we often want to know which back-end server we’re on. Thus I developed functionality for SHOW PROXY BACKEND [INDEX ADDRESS OTHER].

SHOW PROXY BACKEND INDEX — gives the index of the server you’re on (backend_ndx, ie 1)

SHOW PROXY BACKEND ADDRESS — gives the address of the server you’re on (ie, foo.bar.com:3306)

SHOW PROXY BACKEND OTHER — gives the address of all the other servers except those you’re not on, in multiline format.

Note that I was pretty lazy and the commands are case-sensitive. But I figured that since this is supposed to be used mostly in testing circumstances, it did not really matter.

The code is on the MySQL Forge Wiki at http://forge.mysql.com/tools/tool.php?id=139

Interestingly enough, this script is actually being used in production — a site has a primary and failover server, and wants to check that when the primary server is in use, there are no connections on the failover. I wrote that check as well, but as the logic is somewhat particular, I am not sure it would be useful to many. The logic is:

  1. Run SHOW PROXY BACKEND INDEX.
  2. if it fails, I can’t connect to the proxy, log an error, exit.
  3. if it !=1, I’m on the failover server, log a warning, exit.
  4. if it =1, I’m on the primary server.
  5. Run SHOW PROXY BACKEND OTHER.
  6. if it is empty, either I can’t connect to the proxy or there are no failovers defined, log a warning, exit
  7. For each OTHER address, connect to them and find what that connection’s host looks like (sometimes it looks like foo.bar.com:4325, other times it looks like 1.2.3.4:4573). Get the value of “my host” by stripping off the port.
  8. Connect to the OTHER address again, killing off connections from “my host” except for the “system user” and a few other special accounts (replication slave being one of them). Log each kill with the thread number and a warning.
  9. If there are no connections to be killed, log OK.

Let me know if you’d like to see that…it’s a shell script, and it requires the mysql client and bintools like grep and cut.

The Architecture Layer

Contemporary software engineering models include many loosely-defined layers. Database developers might help with other layers, but for the most part a database administrator’s domain is the persistence layer.


  • Presentation

  • Application

  • Business Logic

  • Persistence (also called Storage)

The Daily WTF has an article on The Mythical Business Layer makes the case for not separating the business layer and the application layer:

A good system (as in, one that’s maintainable by other people) has no choice but to duplicate, triplicate, or even-more-licate business logic. If Account_Number is a seven-digit required field, it should be declared as CHAR(7) NOT NULL in the database and have some client-side code to validate it was entered as seven digits. If the system allows data entry in other places by other means, that means more duplication of the Account_Number logic is required.

It almost goes without saying that business logic changes frequently and in unpredictable ways. The solution to this problem is not a cleverly coded business layer. Unfortunately, it’s much more boring than that. Accommodating change can only be accomplished through careful analysis and thorough testing.

I will call this merged business/application layer the “functional layer.”

The serious scaling requirements posed by most applications these days call for partitioning, clustering, sharding or some other term for “dividing up the data so it does not become the bottleneck”. Enter the “architecture layer”.

“Wait a minute,” I hear you asking. “Isn’t that just the persistence layer?”

Yes and no. To me, there’s a difference between the storage and the architecture of said storage. The database schema for storing a user profile is a persistence layer issue. Figuring out which database instance to go to is an architecture layer issue.

This is an important distinction for me. Many folks are coding the architecture layer directly into the functional layer. A “save_profile()” API function might call an ORM to deal with the persistence, or it will have MySQL (or other database) connection handling and queries. However, the database will grow, and at some point you will find yourself wanting to split the data [more].

This type of information, like the presentation layer, needs to be separate. Why should the application care whether save_profile(’Sheeri’,’hair color’,’blonde’) accesses database1 or database2? More importantly, why should there be major code changes to the functional layer if the architecture changes? Just like no functionality has changed when you change your website color from blue to red, there is no functionality change when you go from splitting data between 2 database servers to splitting among 3, or 10.

For me, the persistence layer is about how the data is stored. Which, explicitly and for the record, I also believe should be separate from the functional layer — if you store hair color and eye color in one table or 2, the functionality of the application has not changed; all that’s needed is a change in how that data is stored and retrieved.

The architecture layer is all about where the data is stored. Early forms of the architecture layer are configuration files, though most would not call that a “layer”. Database administrators should be able to change the architecture of the database system without requiring mucking about in the application’s functional code.

Thoughts?

Contemporary software engineering models include many loosely-defined layers. Database developers might help with other layers, but for the most part a database administrator’s domain is the persistence layer.


  • Presentation

  • Application

  • Business Logic

  • Persistence (also called Storage)

The Daily WTF has an article on The Mythical Business Layer makes the case for not separating the business layer and the application layer:

A good system (as in, one that’s maintainable by other people) has no choice but to duplicate, triplicate, or even-more-licate business logic. If Account_Number is a seven-digit required field, it should be declared as CHAR(7) NOT NULL in the database and have some client-side code to validate it was entered as seven digits. If the system allows data entry in other places by other means, that means more duplication of the Account_Number logic is required.

It almost goes without saying that business logic changes frequently and in unpredictable ways. The solution to this problem is not a cleverly coded business layer. Unfortunately, it’s much more boring than that. Accommodating change can only be accomplished through careful analysis and thorough testing.

I will call this merged business/application layer the “functional layer.”

The serious scaling requirements posed by most applications these days call for partitioning, clustering, sharding or some other term for “dividing up the data so it does not become the bottleneck”. Enter the “architecture layer”.

“Wait a minute,” I hear you asking. “Isn’t that just the persistence layer?”

Yes and no. To me, there’s a difference between the storage and the architecture of said storage. The database schema for storing a user profile is a persistence layer issue. Figuring out which database instance to go to is an architecture layer issue.

This is an important distinction for me. Many folks are coding the architecture layer directly into the functional layer. A “save_profile()” API function might call an ORM to deal with the persistence, or it will have MySQL (or other database) connection handling and queries. However, the database will grow, and at some point you will find yourself wanting to split the data [more].

This type of information, like the presentation layer, needs to be separate. Why should the application care whether save_profile(’Sheeri’,’hair color’,’blonde’) accesses database1 or database2? More importantly, why should there be major code changes to the functional layer if the architecture changes? Just like no functionality has changed when you change your website color from blue to red, there is no functionality change when you go from splitting data between 2 database servers to splitting among 3, or 10.

For me, the persistence layer is about how the data is stored. Which, explicitly and for the record, I also believe should be separate from the functional layer — if you store hair color and eye color in one table or 2, the functionality of the application has not changed; all that’s needed is a change in how that data is stored and retrieved.

The architecture layer is all about where the data is stored. Early forms of the architecture layer are configuration files, though most would not call that a “layer”. Database administrators should be able to change the architecture of the database system without requiring mucking about in the application’s functional code.

Thoughts?

2008 MySQL Conference Videos, Notes, Slides and Photos!

All of the videos from the 2008 MySQL Conference have been processed and uploaded. Links to the videos, slides, notes, photos for each presentation are all on the mega-conference page at:
http://forge.mysql.com/wiki/MySQLConf2008Notes

This represents many hours of my own toil, but it also reflects plenty of people who have blogged, edited the wiki pages and speakers who wrote and gave tutorials and presentations. I am proud of everyone’s efforts to offer so many learning resources for free….

Enjoy! EDIT: I forgot to thank Jay, the folks at O’Reilly and all the speakers for giving me explicit permission to video and freely offer their presentations.

If you know of any video, audio, notes, slides, photos, etc that are not linked, please link them at the wiki page. If you can’t or won’t, please comment here and I will update the wiki for you.

Please note that there’s still some work to be done for a volunteer — Currently there is no one page where you can get all the videos, notes and slides for a presentation. The Forge Wiki page linked above is very close — it is missing many presentations and their corresponding slides.

O’Reilly has all of the slides speakers submitted at http://en.oreilly.com/mysql2008/public/schedule/presentations/. If someone or a few folks work on linking the slides on the O’Reilly site to the presentations on the Forge Wiki page at http://forge.mysql.com/wiki/MySQLConf2008Notes, then the Forge Wiki page will be comprehensive and folks can go to one page to get any and all information about a presentation at the conference.

All of the videos from the 2008 MySQL Conference have been processed and uploaded. Links to the videos, slides, notes, photos for each presentation are all on the mega-conference page at:
http://forge.mysql.com/wiki/MySQLConf2008Notes

This represents many hours of my own toil, but it also reflects plenty of people who have blogged, edited the wiki pages and speakers who wrote and gave tutorials and presentations. I am proud of everyone’s efforts to offer so many learning resources for free….

Enjoy! EDIT: I forgot to thank Jay, the folks at O’Reilly and all the speakers for giving me explicit permission to video and freely offer their presentations.

If you know of any video, audio, notes, slides, photos, etc that are not linked, please link them at the wiki page. If you can’t or won’t, please comment here and I will update the wiki for you.

Please note that there’s still some work to be done for a volunteer — Currently there is no one page where you can get all the videos, notes and slides for a presentation. The Forge Wiki page linked above is very close — it is missing many presentations and their corresponding slides.

O’Reilly has all of the slides speakers submitted at http://en.oreilly.com/mysql2008/public/schedule/presentations/. If someone or a few folks work on linking the slides on the O’Reilly site to the presentations on the Forge Wiki page at http://forge.mysql.com/wiki/MySQLConf2008Notes, then the Forge Wiki page will be comprehensive and folks can go to one page to get any and all information about a presentation at the conference.