Good Ideas Implemented Poorly

Please share good ideas implemented poorly you have seen.

For example, in the past 24 hours I have come across 2 such beasts:

1) A company doing vertical scaling based on username. Unsurprisingly, they found that 70% of their users had usernames beginning with 1 of 2 letters. A better idea? Do it by last 2 digits of a user id. That way, even though users are created and perhaps deleted all the time, the distribution is effectively random. Language is not.

With 2 significant figures, you can divide vertically many different ways — 1 large table of all the uids, 2 tables of uids ending in 00-49 and 50-99, 4 tables, 10 tables, 25 tables, 50 tables, 100 tables. This eliminates the need for a separate table for each user.

For example, the online dating site I work for uses this method for messaging. So a user’s inbox is just one part of a table. If the user has uid 12345, their “Inbox” is the result of SELECT * FROM Inbox_45 WHERE toUid=12345. Their “Sent” box is the result of SELECT * FROM Inbox_45 WHERE fromUid=12345. We have over 650,000 active users; I can only imagine what horrors we would have if we had 1 table for each person’s mailbox (much less 1 each for Trash, Sent, Saved and Inbox).

This also helps when using InnoDB tables, as we are — the count(*) query to find the number of messages in your boxes uses the indexes on toUid and fromUid on the tables. 1 table per uid would not take advantage of this index, although it would probably be a bit faster in terms of retrieval. We kept this idea in mind — if we had one table per uid, we probably could use MyISAM tables, and counts would be faster. But again, horrors with 650,000 tables!

Also, you want to keep your code as flexible as possible. Do not hard code which servers have which tables — use SHOW TABLES in scripts and such. We truncate trashes nightly, and simply have a loop that says:

for i in "show tables"
truncate Trash_$i

2) The lead developer on a new project had the rule of “each table has a class”, which led another developer to come to me asking to resolve a problem. You guessed it — tables in different databases that have the same name. We have a static database, with a table called “Preferences”. This is a joining table, that says “preference #1 is always blah”. And in our profiles database for a site, we also have a “Preferences” table — this is where we associate users with their preferences.

The goal is world domination to be able to have multiple sites use similar codebases and data schema. Therefore, it is perfectly reasonable to have databases with identical schemas and different names. Site1 can have an identical schema and codebase to Site2, with storing the data in the “Site1” and “Site2” databases. The reason to have 2 separate databases is that we want to be avoid having to put which site the data refers to as a part of each row.

Any other examples of good ideas implemented poorly? Feel free to make an entry, but if it’s not on please put a link in the comments so I’m sure to actually read it.

Followers may note I fell way short of my goal of 1 post per day in May; in fact, I had 15 posts in May, not the 31 I had hoped. I’m OK with that, and I apologize for the long radio silence. In the meantime, I had a wonderful few weeks, including biking on the island of Martha’s Vineyard, and a wonderful phone conversation with Frank Mash last night.

If anyone cares to know, my Skype name is awfief. I’m happy to talk to folks; I’m GMT -5 (Eastern Standard Time, Daylight Savings when appropriate). I also have a tendency to leave the program on while I’m in the room, so feel free to ring (instead of Skype Chat) to see if I’m around.

Comments are closed.