August 23rd was the first proposed date for DBA Day since, apparently, DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.
I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.
However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.
I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.
I got a smiley from the sysadmin yesterday, so I’ll share it here:
[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component
(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
August 23rd was the first proposed date for DBA Day since, apparently, DBAs were left out of the Sysadmin Day this year — http://www.sysadminday.com/ explicitly mentioned DBAs last year, but not this year.
I do not necessarily take it as a snub; I would rather have people treating me with respect all year round than have cake on one day. To be fair, my company has a big cake monthly for all the birthdays in the month, and my coworkers and I have mutual respect for each other.
However, I will happily partake in a celebration of me, or a celebration of what I do. So even though http://www.dbaday.com/ remains undefined, I suggest that people do something nice for their DBAs. But not something token, make it genuine. It does not even have to be monetary, or tangible. Tell your DBA today a specific instance that you can think of where s/he made a positive difference.
I have folders called “smiley” — in my work e-mail, my personal e-mail and in my filing cabinet. In those folders I put words of praise, or thank-yous, or anything that makes me smile, feel respected and loved, etc. So give your DBA a “smiley” today, on DBA day.
I got a smiley from the sysadmin yesterday, so I’ll share it here:
[boss has] been impressed
and i know he never mentions it to you
but you’ve allowed me to not have to worry about the db side in all this
and thats a major component
(it was from IM and he was sick yesterday, hence the capitalization/spelling not being perfect).
(also entitled, “Who Put the J in lam-a-lam-a-LAMJ?”)
So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):
MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.
Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .
JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?
So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )
Continuent using “Java”
Continuent uses “JBoss/J2EE”
Continuent uses “JSP/J2EE”
They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?
But then why does the Stored Procedures book use “JBoss”?
(also entitled, “Who Put the J in lam-a-lam-a-LAMJ?”)
So, I have started to read Mysql Stored Procedures by Guy Harrison with Steven Feuerstein — a fabulous book already! One thing that caught my attention was this (which you can see in the Preface, available through Safari):
MySQL is the dominant open source database management system: it is being used increasingly to build very significant applications based on the LAMP (Linux-Apache-MySQL-PHP/Perl/Python) and LAMJ (Linux-Apache-MySQL-JBoss) open source stacks, and it is, more and more, being deployed wherever a high-performance, reliable, relational database is required.
Now, I figured that the “J” in “LAMJ” stood for “Java”, given that the P stands for a programming language beginning with “P”. It does not stand for “CGI”, a specific type of web programming [popularized? created? by Perl], even though it usually means CGI, because of Apache. Obviously, there are applets and servlets and JavaBeans and all sorts of ways to use Java . . .
JBoss is an architecture, which the other 3 (Linux, Apache and MySQL) all are as well. I guess what bothers me is that the “P” stands for a language, not an architecture, so I feel like the “J” should too. And what if we use Resin, Websphere or Wenlogic? Does it become LAMR or LAMW? Can we still call it LAMJ?
So I went searching, and I did not have to look a long time before finding out that nobody really knows, and folks just use what they want. Because it’s internally inconsistent and a good example, I use Continuent as an example. This is not anything negative toward Continuent (in fact, if their marketing is not so great, perhaps it is because they are putting the bulk of their money to technology… 🙂 )
Continuent using “Java”
Continuent uses “JBoss/J2EE”
Continuent uses “JSP/J2EE”
They use this last one in most places; perhaps being partnered with JBoss is why they use it on their “Products” site?
But then why does the Stored Procedures book use “JBoss”?
Most developers are used to programming in procedural or object-oriented languages. SQL, as a declarative language, is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.
That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).
In no particular order:
1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b
2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump
does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.
3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.
4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.
5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.
6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.
7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.
8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.
(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Most developers are used to programming in procedural or object-oriented languages. SQL, as a declarative language, is quite different. In declarative languages like SQL, you program what you want the result to be, not the procedure to get it. For instance, “give me all the people with the first name starting with the letter S from a certain table.” Unlike procedural programming (or even methods in object-oriented languages), you do not say how to get the information. This is, I believe, why many developers want to give the query optimizer “hints” on how to do its job.
That being said, I will list the top 8 Basic SQL Practices I live by, and attempt to enforce. Please feel free to comment adding your own (or post your own, linking back here).
In no particular order:
1) Always use explicit joins. If I mean INNER JOIN, then I use INNER JOIN. No use of just plain “JOIN”. Never, ever, ever use a comma join — I consider that a mistake. If I explicitly state “CROSS JOIN” then I know I have consciously made that decision. Also, keep join conditions in an ON or USING clause; they should not go in the WHERE clause. I also put my join conditions in parentheses; for whatever reason, I find:
ON (foo=bar AND baz=bop) WHERE a=b
is easier to see that the join condition contains 2 conditions than
ON foo=bar AND baz=bop WHERE a=b
2) Always define field names. No using SELECT * or INSERT INTO table VALUES. It’s a pain, and more so of a pain given that mysqldump
does not specify INSERT fields. However, if it’s important enough to save in a text file (ie, it’s seed data or a migration script) then it gets explicit field names.
3) Always use the database server’s timestamp. Web servers may have disparate times. Reports may come from different servers than the inserted data.
4) Store IPs as integers with INET_ATON and retrieve them with INET_NTOA.
5) When doing reports, the network traffic is usually the biggest bottleneck. If you’re going to receive information, it’s better to receive in chunks, which will likely be larger than a logical piece. For instance, state reporting — instead of making 50 connections for states in the US, get them all at once. If the dataset is very large and folks do not want to stare at a blank page while the report is loading, use paging with LIMIT to grab, say, 1000 entries at a time and display them on the screen so people can start looking at the data while the rest is being grabbed.
6) Running a query in a loop is usually a bad idea. If you are executing the same query with different data, consider building a query string using UNION and executing it at the end of the loop, so you can execute multiple queries with only one trip across the network to the database.
7) Do not be afraid of JOINs. They are not necessarily resource intensive, given good indexing. Most of the time a denormalized schema without a join ends up being worse than a normalized one using a join. When there is redundant data, ensuring data integrity takes up more cycles than providing a framework for data integrity in the first place.
8) Limit the use of correlated subqueries; often they can be replaced with a JOIN.
(I also try to put SQL commands in capital letters to help me easily spot fields and variables I use).
Back at the MySQL Users Conference, I was talking to Monty about a good PHP* interface to MySQL that would go through a database, and make pages to be able to search, update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.
As far as I can tell from here, here, here and here:
- Unireg started as a curses-based interface to an SQL database.
- Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
- Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports
Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..
It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.
I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.
So what is your favorite program to automatically generate a database ui? Specifically, it should:
- Generate web pages in PHP*,
- Automatically connect to the database,
- Allow for easy specification of join tables based on (a) field(s),
- Allow for easy selecting of all, none, or some table fields,
- Not require that fields in the join condition be shown.
- Allow the “view”, “edit” and “add” pages to show different fields
I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.
* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
Back at the MySQL Users Conference, I was talking to Monty about a good PHP* interface to MySQL that would go through a database, and make pages to be able to search, update and add new fields in the database. He mentioned Unireg, and I wrote it down, but only got to checking out what that was recently.
As far as I can tell from here, here, here and here:
- Unireg started as a curses-based interface to an SQL database.
- Unireg turned into MySQL — that is, libraries and such from Unireg were used in MySQL, and Unireg was no longer developed
- Unireg was similar to the MySQL Query Browser or any number of administration tools, but it also generated reports
Of course, I could be misunderstanding the information on these pages, or they could be wrong, so feel free to correct me…..
It’s not quite what I was talking about, but it’s an interesting history lesson. Even more interesting is how functionality that [I gather] used to be in Unireg took a long time to get into MySQL, and in the case of reporting, still is not in there.
I used PHPCodeGenie for the one system I did not hand-code, and even that was painful, with lots of code. I did a bit more research, and found lots of stuff that have huge learning curves, and I have not overcome that obstacle yet.
So what is your favorite program to automatically generate a database ui? Specifically, it should:
- Generate web pages in PHP*,
- Automatically connect to the database,
- Allow for easy specification of join tables based on (a) field(s),
- Allow for easy selecting of all, none, or some table fields,
- Not require that fields in the join condition be shown.
- Allow the “view”, “edit” and “add” pages to show different fields
I do not even need the application to have authentication, as for what I am doing I do not need ACLs and a .htaccess file will suffice.
* or really, any lightweight structure — Perl would be OK, Java might be OK if it did not middleware like JBoss or Resin — basically anything I could stick on a web server to connect to a database.
http://www.artfulsoftware.com has a “Common Queries” page, which I find rather useful. I hadn’t realized its history, as described on the homepage:
Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.
One day, I clicked on the page and got the dreaded “blank” PHP page.
This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.
Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.
The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.
I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.
I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).
Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.
This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.
* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
http://www.artfulsoftware.com has a “Common Queries” page, which I find rather useful. I hadn’t realized its history, as described on the homepage:
Our collection of common MySQL queries outgrew Chapter 9, and is still growing, so we turned it into a PHP page driven from a MySQL table.
One day, I clicked on the page and got the dreaded “blank” PHP page.
This gets into one of the fundamental flaws I find with “semi-dynamic data” (my terminology for it) — it is not completely dynamic data, because it gets updated by humans, and it is deterministic*, so it does not need a completely dynamic page.
Part of the updating process could be a “generate the web page” script, that runs what the actual page is now, but stores the result as an HTML page. In this way, if 1000 users want the same page, there are *no* database queries done. After all, it only needs to change when content is uploaded, which isn’t very often at all.
The “generation” script could easily be a part of a web form that uploads content, or it could be a separate form/script run after a batch of changes is done, so multiple changes do not require generating pages that will just be written over after the next update in a minute or so. As well, it could write to a temporary file, and the very last script action would move the temporary file to the right place. In this way, a generation script that takes a long time to finish would not be partially overwritten by another, simultaneous generation script.
I have used this technique in content management systems — particularly with templates, as I’ve found you can separate different content items (such as menus) and “break apart” a template into pieces, and with about an hour you can support a new template into an existing system, and have a user compare templates to see which they’d rather use, given their own content.
I have also used this technique with a listing of organizations around the world. All the listings (3,000) were stored in a database. From this, I ran a weekly (but it would be easy to run it hourly or daily) script that made “browse by” pages, categorizing all of the resources by first letter of their name as well as by their area, province/state and country. The script, which took a full 10 minutes due to poor optimization, made an overall browsing page, 26 “by letter” pages, one page for each country, and a directory with one page for each state/province and area for each country. It also generated the page and compared it to the existing page, and only overwrote the page when they differed (and then put a “last updated on:” message at the end).
Folks searching could still get truly dynamic pages, but I cut down on needless direct database calls to find out which organizations were in “England”, and more needless database calls to find out which organizations were in the “Greater London” area, when those pages changes rather rarely.
This could also be useful for large sites, such as photo sharing galleries. Sites that allow comments may or may not see a performance gain — for each new comment, generating the page again may not be the best solution. However, if there’s at least one page hit for each database write, then using this method will have better performance.
* an example of a nondeterministic page is one that changes based on the time, such as “show me all activity that has happened today [up until this moment]” because it may change from second to second.
Since OSCON, most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”
Y’all are prolific!
Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”
Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.
Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because
there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.
So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?
I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.
When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.
Restaurants publish cookbooks, yet they do not go out of business. Why is that?
It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.
It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.
I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual
Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.
I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?
Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.
Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.
So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.
I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.
There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.
Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Since OSCON, most of my time has been focused on editing a book, which is about to be finished. As I’m getting my commutes back, I have been reading up on what I’ve missed on Planet MySQL (which I affectionately call “The ‘planet.”
Y’all are prolific!
Jeremy’s On Open Source Citizenship got me thinking about the whole movement. I think there’s still a place for proprietary software in the world, as much as folks tout that “open source is ALWAYS better, because more people see it, therefore more people can help change it.”
Whenever anyone suggests a monolithic solution, I cringe. This all ties into the patent issues that are strongly debated these days. I’m still trying to figure out how I feel about everything.
Jeremy’s article talked about how Yahoo! (as an example) couldn’t just open up all the source, because
there’d be places in the code where magic voodoo functions are called but we couldn’t really talk about what they do or how they might work. That’s called our secret sauce or “business logic” if you prefer.
So, does Yahoo! patent these functions? Should they? Why can’t the secret sauce/business logic be open? Why should parts be open and other parts closed?
I know, you’re thinking “Otherwise, how would Yahoo! make money?” Or Google, for that matter, whose search algorithms are a very huge secret. The Google NDA probably specifies that employees cannott even disclose whether or not Google even has search algorithms.
When I think open source, I tend to think everything, including the business logic, is exposed. There are some companies which would lose their business if their secrets got out. However, we know what the secret sauce is made of and yet, McDonald’s business has not suffered.
Restaurants publish cookbooks, yet they do not go out of business. Why is that?
It is because what they sell is not just the food. As Google and Yahoo do not sell their searches. Sure, the food (and searches) are what made them famous. But what keeps people flocking is that, even though they could do the same thing themselves, they need the services and resources provided. I cannot cook a hamburger on a bun with sliced pickles and onions and thousand island dressing in 5 minutes for under $3.
It would cost less per burger to make it at home, but if I just want one burger, I have to buy a package of 8 hamburger buns, ground beef by the pound, an entire jar of thousand island dressing, a whole onion, and a jar of pickles. What I’m really paying for is the person behind the counter to assemble it for me.
I use Google and Yahoo! not merely because they have good products — that is one reason, but a very small one. I use them because they give me services and resources I cannot do myself, either due to lack of expertise or just plain lack of time. Flickr works not because there is a secret to programming an image gallery on the web — it is because they offer free space and a method to upload that many people just plain do not have. Even if a geeky person like myself has photo software on her server, Flickr also provides an easy way to share albums, contact people, etc. that individual
Look at livejournal! They are a perfect example — you can download the code and install it on your own server. But most of the features require the same centralized database, so unless you want control over a *very* closed community (which can exist on livejournal.com anyway, just not having the database under your control), you would probably want to just create an account on livejournal.com, because then your “friends list” can include anyone on livejournal.com.
I use gmail as my main e-mail client; I also have a Yahoo! Mail account. I’m a geek, and I’ve helped run mail for 12,000 users at a university; I have the knowledge and expertise to run my own mail server. So why would I use these services?
Because they do everything. They run on a highly available architecture, do backups for disaster recovery, etc. If I wanted to ensure that I gave myself the quality mail service that Google and Yahoo! can deliver, it would cost lots of money and even more of my time, for just myself.
Why should I duplicate effort in this case? If I had to be completely sustainable — including growing my own food and making my own clothes — I would not even be able to spend any time on a computer, much less be a DBA, whatever. Growing food and making clothes are “open sourced” — it’s not like one couldn’t find the information on how to do this.
So the real question is, how open does a product have to be in order to be called “open source”? Google and Yahoo! are not rich because they have secrets. They are rich because they started with secrets, but I believe they could safely let their secrets out with very little loss of revenue. There’s always the possibility that some billionaire with a mean streak will invest the resources in copying what Google or Yahoo! do if the secrets were let out. But folks are drawn to innovation, not blatant copies.
I am reading “Hackers and Painters” by Paul Graham, where he mentioned that his trade secret with Viaweb was that they were using LISP. But he also notes that his partner did not think that needed to be a secret, because even if competitors knew, they’d have to change their infrastructure and have developers with a different skillset, and that would take way too much time.
There are certainly companies I have worked for, where giving away the source/algorithms/business logic/trade secrets would mean the end of their business, because they ran on modest hardware in a colo, and with their code anyone could run their business for about $1,000. In those cases, I’d say sure, close the source and hide the secrets….but when those companies grow bigger and have more established resources, as Google and Yahoo! have both done, they can open the source, show their secrets, with very little consequence.
Of course, that leads to “how do you determine when a product is ‘big enough’ to warrant giving away the secrets?”
Not much more to add to the wonderful posts:
How to Kill Good Ideas
How to Come Up With Good Ideas
Supporting Ideas and Being Productive
and
Yet More Ways to Kill Great Ideas
However, while not great in quantity, I think one of the most important points has been completely left out:
Don’t have ego.
An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”
Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.
It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.
Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”
The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”
The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Not much more to add to the wonderful posts:
How to Kill Good Ideas
How to Come Up With Good Ideas
Supporting Ideas and Being Productive
and
Yet More Ways to Kill Great Ideas
However, while not great in quantity, I think one of the most important points has been completely left out:
Don’t have ego.
An idea is just that: an idea. Particularly when brainstorming, lots of people like to say, “Oh, that won’t work because of this,” immediately. Usually because they had an idea previous that they’re defending. Ego steps into this. If someone proposes an idea, a brainstorming meeting is not the place to play “let’s shoot this idea down as much as possible.”
Assume your co-workers are smarter than you are (even if you have evidence to the contrary); if it takes you 2 seconds to figure out why their idea won’t work, perhaps there’s something you are not thinking of. Instead of saying “that won’t work because of this,” saying “Oh, great idea, how does that get beyond this?” or even, “I thought that wouldn’t work because of this?” The latter puts you into it, instead of your co-worker.
It’s subtle, but saying “that won’t work because of this” implies that the person had no idea about “this”. Saying “how does that get beyond this?” implies that the person knows about “this” and has a way to get beyond it. Implying that your coworkers are smart rather than that they’re dumb is a great way to make a safe environment for ideas.
Indeed, saying “I thought that wouldn’t work because of this?” says, “I must be dumb because you obviously have a smart idea (you wouldn’t proposed it if it was dumb), and I can’t get beyond this limitation.” This natural curiosity rather than dismissing the idea might actually lead you to learn that yes, there is a solution to “this.”
The other option, “that won’t work because of this,” if “this” is solved, ends up with a heated response of “well, ACTUALLY, ‘this’ has been solved,” instead of the more neutral “I’m glad you asked, most people think you can’t do that because of this, but it’s been solved….”
The other side of this is “Don’t take it personally.” If someone stomps all over an idea of yours, it’s about the idea. Perhaps they do not understand the idea, or perhaps you were, indeed, wrong. People are wrong sometimes; that’s OK. Don’t take it personally.
Life has been super busy, but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.
Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!
We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.
My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.
I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”
I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.
I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.
Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.
So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”
“Yes,” I replied. “Oh, then yes, we’re using it.”
“I don’t think so. If the interface is awkward, you’re not using Connector/J.”
He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/
New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.
Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!
[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
Life has been super busy, but I have plenty of posting to catch up on. Fear not, there will be more content soon (after Monday, hopefully); I am working on editing a very relevant book, and I hope to be able to share my excitement after I finish.
Also on Monday is the next Boston MySQL User Group, which will go through how to do capacity planning in MySQL with very little pain and effort. In the beginning we will have 10 minutes for user questions, so we can all benefit from each other’s knowledge. I already have a user with a great question!
We have rebuilding our product all summer, with a deadline of releasing the product in the next 2 months. Our lead developer had put a surrogate key in our new schema about a month ago. He said he needed the surrogate key because “the ORM requires it.” I just accepted it.
My mistake was that I made an assumption. The table already had a primary key, but it was a composite key. I assumed that the ORM required a unique key to be one field, and thus I assumed he combined the 2 values in the primary key to get the unique key.
I was wrong. In adding the final subsystems into the schema this week, I noticed that the surrogate key was an auto-increment field. I also noticed he put it in every single table he dealt with. We had hours of meetings about schema, and this was NEVER put in as a requirement. Thus, today we had over three hours of agonizing discussions, including a back-and-forth of “If it’s going into the database I need to understand why,” and the lead developer saying “No you don’t.”
I severely wanted to say “If I don’t understand it, it does not go in the database,” but resisted. I asked him to send me the ORM so I could understand why it required the field. At first he said he would, and then kept talking to me about why I don’t need to understand the field; it didn’t add more overhead, it didn’t change the way the data relate to each other, etc.
I need to understand this because there may be other, similar requirements in the future. Or perhaps I’ll find a better way to do it (maybe a VIEW would work). Perhaps I’ll find other places where other fields need to be added. He finally explained that the API JDBC was using was awkward — it needs to retrieve basically the row number of any row it’s looking at, and if it deletes or changes the row number it uses the row number as the key to find the row.
Aha! That makes sense. However, why do the row numbers need to be in the database? Can’t it just retrieve the rows and put a row number in its own copy? Apparently, not. I cannot imagine that a mature technology would require something like that. It’s not that difficult to do. I said this, and the lead developer was insanely frustrated by it.
So I said, “Are you using Connector/J?” He was confused, but asked, “Is that JDBC?”
“Yes,” I replied. “Oh, then yes, we’re using it.”
“I don’t think so. If the interface is awkward, you’re not using Connector/J.”
He left my office. So I type in “Connector/J” into my MySQL Community Toolbar (I love it!) and find the following on http://www.mysql.com/products/connector/j/
New features from the JDBC-3.0 API in the latest production version of MySQL Connector/J include getGeneratedKeys which allows users to retrieve auto-increment fields in a non-database-specific way. Auto-increment fields now work with object-relational mapping tools, as well as Enterprise Java Beans (EJB) servers with Container Managed Persistence (CMP) that support JDBC-3.0.
Hrm….retrieve auto-increment fields in a non-database-specific way? I think that solves our problem!!!
[EDIT: I am, apparently wrong….but I cannot imagine that anyone using JDBC specifies an auto-increment field for EVERY SINGLE TABLE their application will touch. Do people actually do this?!?!?]
OSCON 2006 workshop
Building “Sane” Query Interfaces, by Mark Newsome e-mail is newsome@acm.org
Insane query interfaces have no help, no format, no range, and no help on how many results will be returned.
Better — pull-downs with a descriptive default like “select state”. Find min and max date from db. Gives a count for how many items in the pulldown menu. UI automatically reverts to use a popup button when pulldown gets too long.
“Refine” button can be pressed when the form is filled, and it will get a count, so you can “preview” your results. When the “refine” button is pressed, it refines EVERYTHING — ie, the counts for the range for the other items. Very smart! (see the song database demo below, when you enter in a lyrics search and click refine, it repopulates the pulldown menus with only the artists and titles that have the lyrics you requested). Also, if there’s only one choice after refining a pulldown,
Also, once the sort comes back, clicking the header field will sort the data by that field.
This is a great idea, even for things that aren’t at heart a db query — for instance, an interface to a “broadcast e-mail” could instruct a user to select who to send to (whole list, one location, one interest, whatever) , and you can
Framework:
QueryParameterBean
make a query parameter “bean” with getters and setters for parameters. If your form has 20 parameters, just set the ones that the user set. Also, includes methods for determining whether a field is “empty” and if “empty” means 0 or NULL or ”.
Sample classes — getCompanyNameCount(), getCompanyName(), getOrderCount(), getOrders() — they all call the “master query method”.
QueryMethod(selectClause,groupClause,sortClause,parameterBean) — builds dynamic query based on the parameters that were set — if you’ve only set the company name count, you’ll set the select query to count(*) or whatever. Empty parameters don’t participate in the query.
Results are wrapped to free resultSet — into a ResultsWrapper.
How to populate a pull-down on the query form — set parameters from form, call the getFoo() functions, then QueryMethod to build the query, and then call a ResultsFormatter to generate an HTML pulldown (or input field/popup button combo, depending on field size (settable)).
But this is tedious — lots of coding, as opposed to a simple form.
Uses more DB resources for all those extra queries.
But, database are fast….also, you could create the forms dynamically, once a day, instead of dynamically every time the page is loaded.
Other things to do:
Validate form with Javascript to avoid wasteful roundtrips to the server
Use JS popups to assist the user (ie, date selection)
Useful to show cardinality of each item, ie, if there are 6 company names and 50,000 records, you might just want to pick a company name and then hit Refine to see if you have a manageable set. If not, you can go to another field — ie, “browsing by queries”.
Accept * wildcards, but convert to % for the db.
Convert text fields to upper or lower case for comparison (or use a case-insensitive search, which MySQL has — just don’t use BINARY).
Use a text box for “too many” values for a pulldown
Also, use a trigger on insert to update values in a lookup table (ie, # of users, when a user is added, just increment a count). Or write a function that regularly makes this table by doing a count(*), but not every time.
Paged format for results — run the query, but only select the row ids (or primary key with MySQL, since not every row has an externally visible ID). Cache the ids, and retrieve the row ids for each page. (with MySQL you could use limits, but you have to be careful about running the same query at different times — if a new row is inserted, then your limit is slightly off)
Alternatively you could use cursors for paged formats, but that way is better.
Benefits:
You can drill down large datasets without leaving the form. Improves efficiency and reduces frustration.
The db does more work, but that’s its job, and it makes the overall user time lower (ie, db load is higher, but it means the db does work instead of the user).
http://www.capitolvelo.com/newsome/music.php has a technique like this for a song database, written in php. The code for that page.
Future work — automatically generate refinable and customizable query interfaces. Goals: SQL and code-free for users, users can specify the query graphically (using an ER diagram or layout of db), and tool generates the experience.
Now here’s where MySQL could really come in handy. One of my HUGE pet peeves is I’ll create a database and then end up wanting to make a bunch of forms for it — search, edit, insert new, etc. You could write code to go through all the tables in one or more dbs (or in all dbs), do a PROCEDURE ANALYSE(), and create forms based on the info in there (ie, if ENUM is recommended, use a pull-down menu, if char or varchar is recommended, use a text box, etc).
http://www.capitolvelo.com/newsome/index.html shows his writings.
(he’s done this in Java and PHP. He uses prepared statements in Java, which eliminates worries about quotes in data. If you’re building an SQL string, you have to worry about that).
OSCON 2006 workshop
Building “Sane” Query Interfaces, by Mark Newsome e-mail is newsome@acm.org
Insane query interfaces have no help, no format, no range, and no help on how many results will be returned.
Better — pull-downs with a descriptive default like “select state”. Find min and max date from db. Gives a count for how many items in the pulldown menu. UI automatically reverts to use a popup button when pulldown gets too long.
“Refine” button can be pressed when the form is filled, and it will get a count, so you can “preview” your results. When the “refine” button is pressed, it refines EVERYTHING — ie, the counts for the range for the other items. Very smart! (see the song database demo below, when you enter in a lyrics search and click refine, it repopulates the pulldown menus with only the artists and titles that have the lyrics you requested). Also, if there’s only one choice after refining a pulldown,
Also, once the sort comes back, clicking the header field will sort the data by that field.
This is a great idea, even for things that aren’t at heart a db query — for instance, an interface to a “broadcast e-mail” could instruct a user to select who to send to (whole list, one location, one interest, whatever) , and you can
Framework:
QueryParameterBean
make a query parameter “bean” with getters and setters for parameters. If your form has 20 parameters, just set the ones that the user set. Also, includes methods for determining whether a field is “empty” and if “empty” means 0 or NULL or ”.
Sample classes — getCompanyNameCount(), getCompanyName(), getOrderCount(), getOrders() — they all call the “master query method”.
QueryMethod(selectClause,groupClause,sortClause,parameterBean) — builds dynamic query based on the parameters that were set — if you’ve only set the company name count, you’ll set the select query to count(*) or whatever. Empty parameters don’t participate in the query.
Results are wrapped to free resultSet — into a ResultsWrapper.
How to populate a pull-down on the query form — set parameters from form, call the getFoo() functions, then QueryMethod to build the query, and then call a ResultsFormatter to generate an HTML pulldown (or input field/popup button combo, depending on field size (settable)).
But this is tedious — lots of coding, as opposed to a simple form.
Uses more DB resources for all those extra queries.
But, database are fast….also, you could create the forms dynamically, once a day, instead of dynamically every time the page is loaded.
Other things to do:
Validate form with Javascript to avoid wasteful roundtrips to the server
Use JS popups to assist the user (ie, date selection)
Useful to show cardinality of each item, ie, if there are 6 company names and 50,000 records, you might just want to pick a company name and then hit Refine to see if you have a manageable set. If not, you can go to another field — ie, “browsing by queries”.
Accept * wildcards, but convert to % for the db.
Convert text fields to upper or lower case for comparison (or use a case-insensitive search, which MySQL has — just don’t use BINARY).
Use a text box for “too many” values for a pulldown
Also, use a trigger on insert to update values in a lookup table (ie, # of users, when a user is added, just increment a count). Or write a function that regularly makes this table by doing a count(*), but not every time.
Paged format for results — run the query, but only select the row ids (or primary key with MySQL, since not every row has an externally visible ID). Cache the ids, and retrieve the row ids for each page. (with MySQL you could use limits, but you have to be careful about running the same query at different times — if a new row is inserted, then your limit is slightly off)
Alternatively you could use cursors for paged formats, but that way is better.
Benefits:
You can drill down large datasets without leaving the form. Improves efficiency and reduces frustration.
The db does more work, but that’s its job, and it makes the overall user time lower (ie, db load is higher, but it means the db does work instead of the user).
http://www.capitolvelo.com/newsome/music.php has a technique like this for a song database, written in php. The code for that page.
Future work — automatically generate refinable and customizable query interfaces. Goals: SQL and code-free for users, users can specify the query graphically (using an ER diagram or layout of db), and tool generates the experience.
Now here’s where MySQL could really come in handy. One of my HUGE pet peeves is I’ll create a database and then end up wanting to make a bunch of forms for it — search, edit, insert new, etc. You could write code to go through all the tables in one or more dbs (or in all dbs), do a PROCEDURE ANALYSE(), and create forms based on the info in there (ie, if ENUM is recommended, use a pull-down menu, if char or varchar is recommended, use a text box, etc).
http://www.capitolvelo.com/newsome/index.html shows his writings.
(he’s done this in Java and PHP. He uses prepared statements in Java, which eliminates worries about quotes in data. If you’re building an SQL string, you have to worry about that).
OSCON 2006 workshop report
Embedding a Database in the Browser
David Van Couvering
Database Technology GroupSun Microsystems, Inc.
Why would you want to embed a database in the browser? Well, for one, the mobile user isn’t always connected. Keeping personal data off the server (and keep it in your control). And then there’s the fast local web cache — for more performance, store the data on the client’s browser.
What you need to make this work:
cross-browser support and feature-rich language (Java), embeddable, small footprint, standards-compliant, secure, robust = automatic crash recovery (people are always closing their browser).
So why use a database? Why not a cookie or other text?
standard, portable API
ACID semantics, even if there’s multiple threads
Flexible data model
Powerful query capabilities
Works with lots of existing tools, using existing skills. nothing new to learn.
Apache Derby fits into this — 100% Java Open Source relational DB — http://db.apache.org/derby — embeddable, small footprint, standards-compliant, secure, automatic crash recovery.
Not quite AJAX, more like LJAX — local javascript and xml, not asynchronous because you’re not synching to the server. You could also use this technology to embed a web server in a browser.
He called these embedded things “applets” which makes me cringe….ie, “let’s use servlets offline, using applets!” Didn’t we already try this?
create the db, put the directory into the application’s JAR file, and then you copy the JAR to the local filesystem — for which you need to sign the JAR and wrap it within PrivilegedAction.
(he talked about security — if someone steals your laptop, won’t they access the info on your browser?)
It’s basically a geek hack. I’d wanted it to basically be “how to supercharge cookies” and use features already in most browsers to enable an embedded database, therefore making apps much faster. Sadly, not the case. A neat technology nonetheless.
OSCON 2006 workshop report
Embedding a Database in the Browser
David Van Couvering
Database Technology GroupSun Microsystems, Inc.
Why would you want to embed a database in the browser? Well, for one, the mobile user isn’t always connected. Keeping personal data off the server (and keep it in your control). And then there’s the fast local web cache — for more performance, store the data on the client’s browser.
What you need to make this work:
cross-browser support and feature-rich language (Java), embeddable, small footprint, standards-compliant, secure, robust = automatic crash recovery (people are always closing their browser).
So why use a database? Why not a cookie or other text?
standard, portable API
ACID semantics, even if there’s multiple threads
Flexible data model
Powerful query capabilities
Works with lots of existing tools, using existing skills. nothing new to learn.
Apache Derby fits into this — 100% Java Open Source relational DB — http://db.apache.org/derby — embeddable, small footprint, standards-compliant, secure, automatic crash recovery.
Not quite AJAX, more like LJAX — local javascript and xml, not asynchronous because you’re not synching to the server. You could also use this technology to embed a web server in a browser.
He called these embedded things “applets” which makes me cringe….ie, “let’s use servlets offline, using applets!” Didn’t we already try this?
create the db, put the directory into the application’s JAR file, and then you copy the JAR to the local filesystem — for which you need to sign the JAR and wrap it within PrivilegedAction.
(he talked about security — if someone steals your laptop, won’t they access the info on your browser?)
It’s basically a geek hack. I’d wanted it to basically be “how to supercharge cookies” and use features already in most browsers to enable an embedded database, therefore making apps much faster. Sadly, not the case. A neat technology nonetheless.