Vacation As a Product Manager

Recently, I went on a 5-day vacation. It was terrific, but not because we did once-in-a-lifetime experiences. It was terrific specifically because we did “regular” vacation things.

I planned this vacation as I do most planning as a Product Manager – I started with the requirements.

Requirements

I started with coarse basics – what’s the goal of the vacation? To get away with my kids and have fun together. So I came up with a list of requirements, product management style – that is, no solutions:

  1. No flights. That’s a COVID risk I’m not willing to take yet, for a vacation. When I planned this trip a few months ago, we did not know when vaccines would be ready; as it is, by the time the vacation began, my 3-year-old had gotten one shot but was not fully vaccinated yet. Buses and trains are OK, as is driving, but nothing more than six hours or so of travel.
  2. Get away. Not a “staycation.” If we saw local sights, we should at least be in a hotel so I could be present with my kids and not worry about doing house chores.
  3. Accommodate my 3-year-old napping in the afternoons (approx 1-3 pm) and general flexibility. My 6-year-old has anxiety, and sometimes we simply have to leave. A timed experience with no re-entry means I have to be willing to go and lose the entry fee if the situation calls for it.
  4. Something engaging for my kids that they will love. They’re 6 and 3 years old, and I don’t want to spend the whole vacation trying to convince them to do something fun or listen to them complain.
  5. Focus on being active, preferably outdoors. Since it’s summer where we live, that wasn’t too difficult.
  6. Accommodations: Hotel or rental house, strong preference for a pool in the hotel, at least one bedroom that’s separated by a door so the kids can go to sleep and I can stay up without bothering them.

High-Level Solution

Once I had the requirements, I could engineer possible solutions. Requirement #1 helped narrow down what we could do. Living in Boston, we could go to the Berkshires (mountain range three hours away), New York City (four hours away), the beach, arbitrary city (Lowell MA, Mystic CT, Hartford CT, etc.), Mount Washington, amusement parks (Six Flags and Story Land)…there’s a lot of options.

However, since it is all within driving distance due to requirement #1, it was by nature something repeatable. It wasn’t a trip to Paris where I would regret not seeing the Eiffel Tower. If we didn’t see something, we could go back some other time.

I wanted to focus on an activity that was best or only possible in summer, which ruled out a lot – for example, New York City has tons to do, but summer’s heat makes everything, well, stinkier.

Accommodations were a big part of the decision – I ruled out tent camping, it’s a lot of work, and I wanted to have a vacation too. I settled on Cape Cod, basically the beach, with the second choice of Story Land because in a few years, my kids will be too old for it.

I looked for hotels or rental properties on Cape Cod, within walking distance of a beach, for the easy back-and-forth. And something that had a pool – my kids LOVE swimming and jumping into a pool. If I couldn’t find something that met our accommodation requirements on Cape Cod, we could try Story Land. I was lucky and found a beach place and booked it.

A hotel pool with blue shimmering water, and a rope with oval plastic buoys separating the deep end from the shallow end. The shallow end is in the foreground, marked as 3 FT. A three-year-old in a yellow and blue life jacket, and purple long sleeve bathing suit, is in the pool and holding on to a handrail by the pool stairs. A 6-year-old is across the pool horizontally, still in the shallow end, slipping headfirst from an inner tube decorated like a watermelon. The 6-year-old is wearing red and blue swim trunks and goggles.
Having fun in the hotel pool

My partner isn’t an outdoorsy guy, so I told him he didn’t have to come, and called my sister and invited her to go with us. This way, I’d have help wrangling the kids so I could relax on vacation too.

Things began to get complicated – my sister decided to sell her house and move while the market was hot, as her youngest is going off to college this fall. The move was nine days before the vacation, so we kept the vacation planning to a minimum until after she moved.

Tools of the Trade

Usually, we use Google Sheets to plan vacations – one sheet for ideas, one sheet with a column for each day once we figure out activities, and one sheet for figuring out who paid for what and making it fair at the end. Google Sheets isn’t lightweight on mobile devices, so I asked around and decided on a Trello board.

We liked that URLs were clickable, and things were easy to move around – we brainstormed ideas in one swimlane, then had one swimlane per day and moved around activities as we saw fit.

A trello board with several swimlanes, many activities in the 'Ideas' swimlane that did not get moved to a swimlane for any day
Our Trello Board, click to expand the image

Vacation: The MLP Version

An inflatable park sounded really co0l, but did I want to spend $100 for all of us to go for just a few hours (nap time interrupts things) and risk leaving after only an hour or so if my kids weren’t having a good time?

My 6-year-old LOVES trains, but there’s not much to do ON a train – did I really want to spend $90 for an hour on a train when my kids might get bored in the first 5 minutes?

I spent about 2 hours researching ideas. In the end, we didn’t do most of them, we spent most of our time in the pool, at the beach, and playing games together – my 6-year-old’s current favorite is Mille Bournes.

Playing in the sand at the beach.
We dug a river in the sand at the beach
An adult and child playing a card game on a coffee table in a hotel room. The child is kneeling and looking at a card holder with 6 cards in it. The adult is pointing to cards on the table in front of her, while a 3-year-old sits on her lap. The 3-year-old is holding a pink doll.
Playing Mille Bournes with my sister

Thursday, we arrived too late to see the Japanese paper theater. Sunday, it did not rain, but my 6-year-old developed swimmer’s ear, so my sister took my 3-year-old to the beach near our hotel while I took my 6-year-old to Urgent Care. We then met them at the beach. Unfortunately, we couldn’t go to Gina’s beach, but that also meant less driving around. Monday, we packed and left since checkout was 10 am; we found a local playground and then had lunch. We avoided the pool to avoid more swimmer’s ear.

MLP stands for “Minimum Lovable Product.” In the MLP that was this vacation, M stands for “minimal” – my kids are super happy swimming or playing in the sand, so that’s what we did. For food, we had free breakfast in the hotel, went to a supermarket a few times for lunch supplies and snacks, had dinner al fresco at a restaurant, or got food delivered to our hotel room.

We did make it to story time at the library on Friday morning, and we did one excursion – to the Heritage Museums and Gardens. We found the maze, looked at a Wampanoag wetu, saw art sculptures made from trash, spent time in a lily garden with a waterfall, and played in the Hidden Hollow and Treehouse. We were hot and tired after a few hours so we stopped by the indoor, air-conditioned automobile exhibit on our way back to the car. We left lots unexplored, including the labyrinth and carousel – which we can do next time.

A 6-year-old kid with short brown hair in a salmon-colored tie-die shirt and black shorts operates a water pump. The water coming out of the pump is coming near a small galvanized metal bucket being held by a 3-year-old kid with long curly dark brown hair wearing a purple tie-dyed dress. Both kids are barefoot. They are standing on rocks and slate in a shallow water feature, with rocks and greenery in the background.
Playing in the Hidden Hollow
A 6-year-old with short brown hair in a tie-dye shirt, black shorts with a green stripe, and socks and slip-on shoes points to a 1950's dark blue Ford. A 3-year-old with long black curly hair in a purple tie-die dress with a unicorn on it and tan crocs looks at the 6-year-old.
“Look, Mom, it’s Doc Hudson from Cars!”

Unexpected Extras

As minimal as it was, there were some unexpected extras that happened, mostly good ones:

  1. I thought I booked a 2-room hotel suite. As it turned out, it was a 2-bedroom hotel suite, complete with a separate living room, full kitchen, and full bathroom. I decided to invite my mother along, as she also needed a vacation, and we had the room. She carpooled with my sister and really enjoyed relaxing and doing nothing.
  2. It was handy to have other adults around, so I could take one kid to urgent care when I had to.
    A selfie with a 6-year-old with short brown hair, blue shirt, and red cloth face mask on the left. On the right is the author, a person with brown hair tied back, sunglasses perched atop her head, white N95 mask covering her chin and nose, in a dark blue shirt with a geometric diamond pattern in orange.
  3. My mother and sister took my kids to the beach one morning and let me sleep in. I got an extra TWO hours of sleep!
  4. My mother and sister babysat the kids Saturday evening while I had lunch with a friend who has a family house on the Cape.
  5. My kids wanted to sleep with me, and I learned that my 6-year-old tosses and turns ALL NIGHT. A sleep chart showing 8 hours and 3 minutes of sleep from 11:30 pm Friday to 7:33 am Saturday, with 10 'Awake' periods interrupting sleep every few hours
  6. My partner invited a friend over and had a great weekend without family responsibilities. Win/win!
  7. I brought my computer “just in case.” I did not open it once. I did no work. Some of that was luck, and some of that was my awesome coworkers covering for me.

After the trip was over, I carved two small stamps to commemorate the occasion. Like the vacation, they were simple, easy, and filled me with happiness:

Two VersaFine Clair inkpads, in pink and purple, dominate the top half of the image. They rest on a small spiral notebook with unruled pages. On the lower half of the notebook page are 2 hand-carved, blue stamps depicting a beach shovel and bucket. In the lower right quadrant of the page the stamps have been impressed on the page, showing a pink shovel laying horizontally just to the left of a purple bucket.
I carved minimal stamps to commemorate my minimal vacation.

Conclusion

So, what’s the tl;dr on how to I had a good vacation? I set core requirements and met them – minimally.

The One Where I Scream Into the Cloud

Did you know DBAs could have personalities?

Which is more exhausting – being oncall, or having a newborn?

What does a Product Manager of Technical Lineage for ETL Tools do?

How many times do I say “right?” in 36 minutes?*

For all this and more, listen to my interview on the latest Screaming in the Cloud podcast, or read the transcript at https://www.lastweekinaws.com/podcast/screaming-in-the-cloud/technical-lineage-and-careers-in-tech-with-sheeri-cabral/ or wherever you get your podcasts.

* the answer is 32

Community-Based Testing with Skoll – Presentation at MySQL Camp II (Aug 2007)

Skoll is a Community-Based Testing project out of the University of Maryland. Their first testing framework comes for MySQL. Watch Sandro Fouché, graduate researcher on this project, take you through what Skoll is, how it’s beneficial, and how you can use it with an actual demo. The Skoll testing client for MySQL can be downloaded here:
http://www.cs.umd.edu/projects/skoll/contribute/

The video can be played or downloaded using the “play” or “download” link from the original article at http://www.technocation.org.

Skoll is a Community-Based Testing project out of the University of Maryland. Their first testing framework comes for MySQL. Watch Sandro Fouché, graduate researcher on this project, take you through what Skoll is, how it’s beneficial, and how you can use it with an actual demo. The Skoll testing client for MySQL can be downloaded here:
http://www.cs.umd.edu/projects/skoll/contribute/

The video can be played or downloaded using the “play” or “download” link from the original article at http://www.technocation.org.

Tax-Deductible Sponsorship of MySQLCamp!

MySQLCamp is a free event available to the public, though geared towards the MySQL Community. There is no fee for any participant, and workshops are presented by participants and chosen by the community.

Last year, Google was kind enough to sponsor all of the logistics, from food to meeting space. This year, Polytechnic University is providing the location — and we're opening up sponsorship for the rest!

Technocation, Inc. — a US not-for-profit providing educational resources for IT professionals —  is sponsoring a donor campaign for MySQLCamp II, to take place from August 23-24, 2007 at Polytechnic University in Brooklyn, NY, USA.

By sponsoring MySQL Camp, you will not only help out the community and get a tax deduction, but your name and company's name will be mentioned throughout MySQL Camp, on the MySQL Camp website, and you will be allowed to have a banner ad on www.technocation.org. You will be contacted at your e-mail address to discuss banner details.

Technocation, Inc. is a not-for-profit organization. Your contributions are tax-deductible to the fullest extent of the law. Proof of donation will be mailed. Money may be donated through PayPal by sending payment to donate@technocation.org, or by using the links below.

Gowanda Level, $100

Ecco Level, $250

Flipper Level, $500

Dan Marino Level, $1000

Sakila Level, $2000 

To send a check or money order by mail:

MySQLCamp II Campaign
c/o Technocation, Inc.
PO Box 380221
Cambridge, MA 02238
United States

Technocation's EIN/Tax ID is 20-5445375

MySQLCamp is a free event available to the public, though geared towards the MySQL Community. There is no fee for any participant, and workshops are presented by participants and chosen by the community.

Last year, Google was kind enough to sponsor all of the logistics, from food to meeting space. This year, Polytechnic University is providing the location — and we're opening up sponsorship for the rest!

Technocation, Inc. — a US not-for-profit providing educational resources for IT professionals —  is sponsoring a donor campaign for MySQLCamp II, to take place from August 23-24, 2007 at Polytechnic University in Brooklyn, NY, USA.

By sponsoring MySQL Camp, you will not only help out the community and get a tax deduction, but your name and company's name will be mentioned throughout MySQL Camp, on the MySQL Camp website, and you will be allowed to have a banner ad on www.technocation.org. You will be contacted at your e-mail address to discuss banner details.

Technocation, Inc. is a not-for-profit organization. Your contributions are tax-deductible to the fullest extent of the law. Proof of donation will be mailed. Money may be donated through PayPal by sending payment to donate@technocation.org, or by using the links below.

Gowanda Level, $100

Ecco Level, $250

Flipper Level, $500

Dan Marino Level, $1000

Sakila Level, $2000 

To send a check or money order by mail:

MySQLCamp II Campaign
c/o Technocation, Inc.
PO Box 380221
Cambridge, MA 02238
United States

Technocation's EIN/Tax ID is 20-5445375

Quiz Show Video Up

For folks to know — to create the page with links to conference material, I took the slides from the O’Reilly official page, combines it with the myriad of “here are my slides” posts to Planet MySQL, and links to Baron, Kevin and Mike’s audio and video as well as the video and audio I processed (Because Baron made statements about bandwidth, I downloaded the .ogg files and technocation.org is hosting them, whereas Kevin and Mike’s files are linked to).

I know I hate going to 20 places to find everything I want. There’s no need for folks to have to go to more than one site, just because the content was provided by more than one person. The referenced page is one-stop shopping, as I feel it should be. If folks have anything to add (or change), I’m happy to update the page, just comment here.

So, I have updated the page at:

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

that fixes a link, adds some slides, and also adds the Quiz Show footage that I have. I got a late start to the Quiz Show, but I did get it in time to catch Solomon Chang’s infamous “Coder McKinnan o’ The Cubicles”. Sadly, I did not have video of the dance, but you can see that at http://people.warp.es/~nacho/blog/?p=225 — that post also contains a comment by Solomon himself with the lyrics (which he also sent to me, but I’m a bad videographer and took too long to process all the video…).

(updates to the page: Fixed the “Clash of the Database Egos” wmv link, thanks to Hakan Kücükyilmaz for pointing out the brokenness. Added the link to download slides (pdf and swf) for the SQL Kitchen talk, courtesy of Damien Seguy. Added the Quiz Show and links to audio and video.)

Go forth and enjoy!

For folks to know — to create the page with links to conference material, I took the slides from the O’Reilly official page, combines it with the myriad of “here are my slides” posts to Planet MySQL, and links to Baron, Kevin and Mike’s audio and video as well as the video and audio I processed (Because Baron made statements about bandwidth, I downloaded the .ogg files and technocation.org is hosting them, whereas Kevin and Mike’s files are linked to).

I know I hate going to 20 places to find everything I want. There’s no need for folks to have to go to more than one site, just because the content was provided by more than one person. The referenced page is one-stop shopping, as I feel it should be. If folks have anything to add (or change), I’m happy to update the page, just comment here.

So, I have updated the page at:

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

that fixes a link, adds some slides, and also adds the Quiz Show footage that I have. I got a late start to the Quiz Show, but I did get it in time to catch Solomon Chang’s infamous “Coder McKinnan o’ The Cubicles”. Sadly, I did not have video of the dance, but you can see that at http://people.warp.es/~nacho/blog/?p=225 — that post also contains a comment by Solomon himself with the lyrics (which he also sent to me, but I’m a bad videographer and took too long to process all the video…).

(updates to the page: Fixed the “Clash of the Database Egos” wmv link, thanks to Hakan Kücükyilmaz for pointing out the brokenness. Added the link to download slides (pdf and swf) for the SQL Kitchen talk, courtesy of Damien Seguy. Added the Quiz Show and links to audio and video.)

Go forth and enjoy!

2007 MySQL Conference Slides, Video and Audio Now Available

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

Need I say more? Go download the slides, video and audio from the 2007 MySQL Users Conference & Expo. I have no plans to take anything down, so please download wisely, and take only what you need. If there’s demand, I can make higher-quality versions available. I can also burn DVD’s of the content if that’s desired.

Enjoy!

http://technocation.org/content/2007-mysql-user-conference-and-expo-presentations-and-videos

Need I say more? Go download the slides, video and audio from the 2007 MySQL Users Conference & Expo. I have no plans to take anything down, so please download wisely, and take only what you need. If there’s demand, I can make higher-quality versions available. I can also burn DVD’s of the content if that’s desired.

Enjoy!

Upgrade news & OpenID

Today I upgraded the blog software at sheeri.com (and sheeri.net and sheeri.org). Please let me know if you find something that doesn’t work as expected — awfief@gmail.com.

At the MySQL Users Conference, my good friend Mark Atwood (creator of the free Amazon S3 Storage Engine) mentioned that any site with a login should have OpenID as an option.

Mark, I upgraded for you! I was using Wordpress 1.5.2, now I’m at the “latest” version. Anyway, this is just to let folks know that if you so choose, you may now use OpenId if you wish to login and make comments.

Of course, I do not require login (and have a great spam filter) so that’s just another choice you have.

Today I upgraded the blog software at sheeri.com (and sheeri.net and sheeri.org). Please let me know if you find something that doesn’t work as expected — awfief@gmail.com.

At the MySQL Users Conference, my good friend Mark Atwood (creator of the free Amazon S3 Storage Engine) mentioned that any site with a login should have OpenID as an option.

Mark, I upgraded for you! I was using WordPress 1.5.2, now I’m at the “latest” version. Anyway, this is just to let folks know that if you so choose, you may now use OpenId if you wish to login and make comments.

Of course, I do not require login (and have a great spam filter) so that’s just another choice you have.

OurSQL Episode 15: Eben Moglen’s Keynote at the MySQL Conference

Eben Moglen, director of the Software Freedom Law Center, discusses why Free Beer isn't so good if your data are getting drunk! The keynote looks at how "Free as in Freedom" businesses help prevent the ultimate privacy catastrophe.
 
This speech is not to be missed! 
 
 
 
Direct play this episode at:
 
Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql
 
Feedback:

Email
info@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
 

Eben Moglen, director of the Software Freedom Law Center, discusses why Free Beer isn't so good if your data are getting drunk! The keynote looks at how "Free as in Freedom" businesses help prevent the ultimate privacy catastrophe.
 
This speech is not to be missed! 
 
 
 
Direct play this episode at:
 
Download all podcasts at:
http://technocation.org/podcasts/oursql/

Subscribe to the podcast at:
http://feeds.feedburner.com/oursql
 
Feedback:

Email
info@technocation.org

call the comment line at +1 617-674-2369

use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum
 

Data Warehousing Tips and Tricks

It’s not easy to do a DW in MySQL — but it’s not impossible either. Easier to go to Teradata than to write your own.

DW characteristics:

1) Organic, evolves over time from OLTP systems — issues, locking, large queries, # of userss.

2) Starts as a copy of OLTP, but changes over time — schema evolution, replication lag, duplicate data issues

3) Custom — designed from the ground up for DW — issues with getting it started, growth, aggregations, backup.

4) How do you update the data in the warehouse? — write/update/read/delete, write/read/delete, or write only — which means that roll out requires partitions or merge tables.

The secret to DW is partitioning — can be based on:
data — date, groups like department, company, etc.
functional — sales, HR, etc.
random — hash, mod on a primary key.

You can partition:
manually — unions, application logic, etc.
using MERGE tables and MyISAM
MySQL 5.1 using partitions

You can load, backup and purge by partition, so perhaps keeping that logic intact — if it takes too much work to load a partition because you’ve grouped it oddly, then your partitioning schema isn’t so great.

Make sure your partitioning is flexible — you need to plan for growth from day 1. So don’t just partition once and forget about it, make sure you can change the partitioning schema without too much trouble. Hash and modulo partitioning aren’t very flexible, and you have to restructure your data to do so.

Use MyISAM for data warehousing — 3-4 times faster than InnoDB, data 2-3 times smaller, MyISAM table files can be easily copied from one server to another, MERGE tables available only over MyISAM tables (scans are 10-15% faster with merge tables), and you can make read-only tables (compressed with indexes) to reduce data size further. ie, compress older data (a year ago, or a week ago if it doesn’t change!)

Issues for using MyISAM for DW — Table locking for high volumes of real-time data (concurrent inserts are allowed when there is ONLY insertions going on, not deletions). This is where partitioning comes in! REPAIR TABLE also takes a long time — better to backup frequently, saving tables, loadset and logs, and then instead of REPAIR TABLE do a point-in-time recovery. For write-only DW, save your write-loads and use that as part of your backup strategy.

Deletes will break concurrent inserts — delayed inserts still work, but they’re not as efficient. You also have to program that in, you can’t, say, replicate using INSERT DELAYED where the master had INSERT.

[Baron’s idea — take current data in InnoDB format, and UNION over other DW tables]

No index clustering for queries that need it — OPTIMIZE TABLE will fix this but it can take a long time to run.

When to use InnoDB — if you must have a high volume of realtime loads — InnoDB record locking is better.

If ALL of your queries can take advantage of index clustering — most or all queries access the data using the primary key (bec. all indexes are clustered together with the primary key, so non-primary key lookups are much faster than regular non-primary key lookups in MySIAM). BUT this means you want to keep your primary keys small. Plus, the more indexes you have, the slower your inserts are, and moreso because of the clustering.

MEMORY storage engine: Use it when you have smaller tables that aren’t updated very often; they’re faster and support hash indexes, which are better for doing single record lookups.

Store the data for the MEMORY engine twice, once in the MEMORY table and once in MyISAM or InnoDB, add queries to the MySQL init script to copy the data from the disk tables to the MEMORY tables upon restart using –init-file=< file name >

ARCHIVE storage engine — use to store older data. More compression than compressed MyISAM, fast inserts, 5.1 supports limited indexes, good performance for full table scans.

Nitro Storage Engine — very high INSERT rates w/ simultaneous queries. Ultra high performance on aggregate operations on index values. Doesn’t require 64-bit server, runs well on 32-bit machines. High performance scans on temporal data, can use partial indexes in ways other engines can’t. http://www.nitrosecurity.com

InfoBright Storage Engine — best compression of all storage engines — 10:1 compression, peak can be as high as 30:1 — includes equivalent of indexes for complex analysis queries. High batch load rates — up to 65GB per hour! Right now it’s Windows only, Linux and other to come. Very good performance for analysis type queries, even working with >5TB data. http://www.infobright.com

Backup — For small tables, just back up. Best option for large tables is copying the data files. If you have a write-only/roll out DB you only need to copy the newly added tables. So you don’t need to keep backing up the same data, just backup the new stuff. Or, just save the load sets. Just backup what changes, and partition smartly.

Tips:
Use INSERT . . . ON DUPLICATE KEY UPDATE to build aggregate tables, when the tables are very large and sorts go to disk, or when you need it real time.

Emulating Star Schema Optimization & Hash Joins — MySQL doesn’t do these, except MEMORY tables can use has indexes. So use a MEMORY INDEX table and optimizer hints to manually do a star schema optimized hash join. Steps:

1) Create a query to filter the fact table
to select all sales from week 1-5 and display by region & store type:

SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5;

Access only the tables you need for filtering the data, but select the foreign key ID’s.

2) Join the result from step 1 with other facts/tables needed for the report

(SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5) AS R
INNER JOIN location AS L ON (L.locationID=R.locationID) INNER JOIN store AS S ON (S.storeId=R.storeId);

3) Aggregate the results

(SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5) AS R
INNER JOIN location AS L ON (L.locationID=R.locationID) INNER JOIN store AS S ON (S.storeId=R.storeId)
GROUP BY week, region, store_type;

Critical configuration options for DW — sort_buffer_size — used to do SELECT DISTINCT, GROUP BY, ORDER BY, UNION DISTINCT (or just UNION)

Watch the value of sort_merge_passes (more than 1 per second or 4-5 per minute) to see if you need to increase sort_buffer_size. sort_buffer_size is a PER-CONNECTION parameter, so don’t be too too greedy…..but it can also be increased dynamically before running a large query, and reduced afterwards.

key_buffer_size – use multiplekey buffer caches. Use difference caches for hot, warm & cold indexes. Preload your key caches at server startup. Try to use 1/4 of memory (up to 4G per key_buffer) for your total key buffer space. Monitor the cache hit rate by watching:

Read hit rate = key_reads/key_read_requests
Write hit rate = key_writes/key_write_requests
Key_reads & key_writes per second are also important.

hot_cache.key_buffer_size = 1G
fred.key_buffer_size = 1G
fred.key_cache_division_limit = 80
key_cache_size = 2G
key_cache_division_limit = 60
init-file = my_init_file.sql

in the init file:

CACHE INDEX T1,T2,T3 INDEX (I1,I2) INTO hot_cache;
CACHE INDEX T4,T5,T3 INDEX (I3,I4) INTO fred;
LOAD INDEX INTO CACHE T1,T3 NO LEAVES; — use when cache isn’t big enough to hold the whole index.
LOAD INDEX INTO CACHE T10, T11, T2, T4, T5

http://dev.mysql.com/doc/refman/5.0/en/myisam-key-cache.html

This was implemented in MySQL 4.1.1

Temporary table sizes — monitor created_disk_tmp_tables — more than a few per minute is bad, one a minute could be bad depending on the query. tmp tables start in memory and then go to disk…increase tmp_table_size and max_heap_table_size — can by done by session, for queries that need >64MB or so of space.

ALWAYS turn on the slow query log! save them for a few logs, use mysqldumpslow to analyze queries daily. Best to have an automated script to run mysqldumpslow and e-mail a report with the 10-25 worst queries.

log_queries_not_using_indexes unless your DW is designed to use explicit full-table scans.

Learn what the explain plan output means & how the optimizer works:
http://forge.mysql.com/wiki/MySQL_Internals_Optimizer

Other key status variables to watch
select_scan — full scan of first table
select_full_join — # of joins doing full table scan ’cause not using indexes
sort_scan — # of sorts that require
table_locks_waited
uptime

mysqladmin extended:
mysqladmin -u user -ppasswd ex =i60 -r | tee states.log | grep -v ‘0’

(runs every 60 seconds, display only status variables that have changed, logs full status to stats.log every 60 seconds).

It’s not easy to do a DW in MySQL — but it’s not impossible either. Easier to go to Teradata than to write your own.

DW characteristics:

1) Organic, evolves over time from OLTP systems — issues, locking, large queries, # of userss.

2) Starts as a copy of OLTP, but changes over time — schema evolution, replication lag, duplicate data issues

3) Custom — designed from the ground up for DW — issues with getting it started, growth, aggregations, backup.

4) How do you update the data in the warehouse? — write/update/read/delete, write/read/delete, or write only — which means that roll out requires partitions or merge tables.

The secret to DW is partitioning — can be based on:
data — date, groups like department, company, etc.
functional — sales, HR, etc.
random — hash, mod on a primary key.

You can partition:
manually — unions, application logic, etc.
using MERGE tables and MyISAM
MySQL 5.1 using partitions

You can load, backup and purge by partition, so perhaps keeping that logic intact — if it takes too much work to load a partition because you’ve grouped it oddly, then your partitioning schema isn’t so great.

Make sure your partitioning is flexible — you need to plan for growth from day 1. So don’t just partition once and forget about it, make sure you can change the partitioning schema without too much trouble. Hash and modulo partitioning aren’t very flexible, and you have to restructure your data to do so.

Use MyISAM for data warehousing — 3-4 times faster than InnoDB, data 2-3 times smaller, MyISAM table files can be easily copied from one server to another, MERGE tables available only over MyISAM tables (scans are 10-15% faster with merge tables), and you can make read-only tables (compressed with indexes) to reduce data size further. ie, compress older data (a year ago, or a week ago if it doesn’t change!)

Issues for using MyISAM for DW — Table locking for high volumes of real-time data (concurrent inserts are allowed when there is ONLY insertions going on, not deletions). This is where partitioning comes in! REPAIR TABLE also takes a long time — better to backup frequently, saving tables, loadset and logs, and then instead of REPAIR TABLE do a point-in-time recovery. For write-only DW, save your write-loads and use that as part of your backup strategy.

Deletes will break concurrent inserts — delayed inserts still work, but they’re not as efficient. You also have to program that in, you can’t, say, replicate using INSERT DELAYED where the master had INSERT.

[Baron’s idea — take current data in InnoDB format, and UNION over other DW tables]

No index clustering for queries that need it — OPTIMIZE TABLE will fix this but it can take a long time to run.

When to use InnoDB — if you must have a high volume of realtime loads — InnoDB record locking is better.

If ALL of your queries can take advantage of index clustering — most or all queries access the data using the primary key (bec. all indexes are clustered together with the primary key, so non-primary key lookups are much faster than regular non-primary key lookups in MySIAM). BUT this means you want to keep your primary keys small. Plus, the more indexes you have, the slower your inserts are, and moreso because of the clustering.

MEMORY storage engine: Use it when you have smaller tables that aren’t updated very often; they’re faster and support hash indexes, which are better for doing single record lookups.

Store the data for the MEMORY engine twice, once in the MEMORY table and once in MyISAM or InnoDB, add queries to the MySQL init script to copy the data from the disk tables to the MEMORY tables upon restart using –init-file=< file name >

ARCHIVE storage engine — use to store older data. More compression than compressed MyISAM, fast inserts, 5.1 supports limited indexes, good performance for full table scans.

Nitro Storage Engine — very high INSERT rates w/ simultaneous queries. Ultra high performance on aggregate operations on index values. Doesn’t require 64-bit server, runs well on 32-bit machines. High performance scans on temporal data, can use partial indexes in ways other engines can’t. http://www.nitrosecurity.com

InfoBright Storage Engine — best compression of all storage engines — 10:1 compression, peak can be as high as 30:1 — includes equivalent of indexes for complex analysis queries. High batch load rates — up to 65GB per hour! Right now it’s Windows only, Linux and other to come. Very good performance for analysis type queries, even working with >5TB data. http://www.infobright.com

Backup — For small tables, just back up. Best option for large tables is copying the data files. If you have a write-only/roll out DB you only need to copy the newly added tables. So you don’t need to keep backing up the same data, just backup the new stuff. Or, just save the load sets. Just backup what changes, and partition smartly.

Tips:
Use INSERT . . . ON DUPLICATE KEY UPDATE to build aggregate tables, when the tables are very large and sorts go to disk, or when you need it real time.

Emulating Star Schema Optimization & Hash Joins — MySQL doesn’t do these, except MEMORY tables can use has indexes. So use a MEMORY INDEX table and optimizer hints to manually do a star schema optimized hash join. Steps:

1) Create a query to filter the fact table
to select all sales from week 1-5 and display by region & store type:

SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5;

Access only the tables you need for filtering the data, but select the foreign key ID’s.

2) Join the result from step 1 with other facts/tables needed for the report

(SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5) AS R
INNER JOIN location AS L ON (L.locationID=R.locationID) INNER JOIN store AS S ON (S.storeId=R.storeId);

3) Aggregate the results

(SELECT D.week, S.totalsales, S.locationID, S.storeID
FROM sales S INNER JOIN date D USING (dateID)
WHERE D.week BETWEEN 1 AND 5) AS R
INNER JOIN location AS L ON (L.locationID=R.locationID) INNER JOIN store AS S ON (S.storeId=R.storeId)
GROUP BY week, region, store_type;

Critical configuration options for DW — sort_buffer_size — used to do SELECT DISTINCT, GROUP BY, ORDER BY, UNION DISTINCT (or just UNION)

Watch the value of sort_merge_passes (more than 1 per second or 4-5 per minute) to see if you need to increase sort_buffer_size. sort_buffer_size is a PER-CONNECTION parameter, so don’t be too too greedy…..but it can also be increased dynamically before running a large query, and reduced afterwards.

key_buffer_size – use multiplekey buffer caches. Use difference caches for hot, warm & cold indexes. Preload your key caches at server startup. Try to use 1/4 of memory (up to 4G per key_buffer) for your total key buffer space. Monitor the cache hit rate by watching:

Read hit rate = key_reads/key_read_requests
Write hit rate = key_writes/key_write_requests
Key_reads & key_writes per second are also important.

hot_cache.key_buffer_size = 1G
fred.key_buffer_size = 1G
fred.key_cache_division_limit = 80
key_cache_size = 2G
key_cache_division_limit = 60
init-file = my_init_file.sql

in the init file:

CACHE INDEX T1,T2,T3 INDEX (I1,I2) INTO hot_cache;
CACHE INDEX T4,T5,T3 INDEX (I3,I4) INTO fred;
LOAD INDEX INTO CACHE T1,T3 NO LEAVES; — use when cache isn’t big enough to hold the whole index.
LOAD INDEX INTO CACHE T10, T11, T2, T4, T5

http://dev.mysql.com/doc/refman/5.0/en/myisam-key-cache.html

This was implemented in MySQL 4.1.1

Temporary table sizes — monitor created_disk_tmp_tables — more than a few per minute is bad, one a minute could be bad depending on the query. tmp tables start in memory and then go to disk…increase tmp_table_size and max_heap_table_size — can by done by session, for queries that need >64MB or so of space.

ALWAYS turn on the slow query log! save them for a few logs, use mysqldumpslow to analyze queries daily. Best to have an automated script to run mysqldumpslow and e-mail a report with the 10-25 worst queries.

log_queries_not_using_indexes unless your DW is designed to use explicit full-table scans.

Learn what the explain plan output means & how the optimizer works:
http://forge.mysql.com/wiki/MySQL_Internals_Optimizer

Other key status variables to watch
select_scan — full scan of first table
select_full_join — # of joins doing full table scan ’cause not using indexes
sort_scan — # of sorts that require
table_locks_waited
uptime

mysqladmin extended:
mysqladmin -u user -ppasswd ex =i60 -r | tee states.log | grep -v ‘0’

(runs every 60 seconds, display only status variables that have changed, logs full status to stats.log every 60 seconds).

Japanese Character Set

There are too many Japanese characters to be able to use one byte to handle all of them.

Hiragana — over 50 characters

Katakana — over 50 characters

Kanji — over 6,000 characters

So the Japanese Character set has to be multi-byte. JIS=Japan Industrial Standard, this specifies it.

JIS X 0208 in 1990, updated in 1997 — covers widely used characters, not all characters
JIS X 0213 in 2000, updated in 2004

There are also vendor defined Japanese charsets — NEC Kanji and IBM Kanji — these supplement JIS X 0208.

Cellphone specific symbols have been introduced, so the # of characters is actually increasing!

For JIS X 0208, there are multiple encodings — Shift_JIS (all characters are 2 bytes), EUC-JP (most are 2 bytes, some are 3 bytes), and Unicode (all characters are 3 bytes, this makes people not want to use UTF-8 for ). Shift_JIS is most widely used, but they are moving to Unicode gradually (Vista is using UTF-8 as the standard now). Each code mapping is different, with different hex values for the same character in different encodings.

Similarly, there are different encodings for the other charsets.

MySQL supports only some of these. (get the graph from the slides)

char_length() returns the length by # of characters, length() returns the length by # of bytes.

The connection charset and the server charset have to match otherwise…mojibake!

Windows — Shift_JIS is standard encoding, linux EUC-JP is standard. So conversion may be needed.

MySQL Code Conversion algorithm — UCS-2 facilitates conversion between encodings. MySQL converts mappings to and from UCS-2. If client and server encoding are the same, there’s no conversion. If the conversion fails (ie, trying to convert to latin1), the character is converted to ? and you get mojibake.

You can set a my.cnf paramater for “skip-character-set-client-handshake”, this forces the use of the server side charset (for the column(s) in question).

Issues:

Unicode is supposed to support worldwide characters.

UCS-2 is 2-byte fixed length, takes 2^16 = 65,536 characters. This is one Basic Multilingual Plane (BMP). Some Japanese (and Chinese) characters are not covered by UCS-2. Windows Visa supports JIS X 0213:2004 as a standard character set in Japan (available for Windows XP with the right )

UCS-4 is 4-byte fixed length, can encode 2^31 characters (~2 billion) This covers many BMP’s (128?)

UTF-16 is 2 or 4 byte length, all UCS-2 are mapped to 2 bytes, not all UCS-4 characters are supported — 1 million are. Supported UCS-4 characters are mapped to 4 bytes

UTF-8 from 1-6 bytes is fully compliant with UCS-4. This is out of date. 1-4 byte UTF-8 is fully compliant with UTF-16. From 1-3 bytes, UTF-8 is compliant with UCS-2.

MySQL interally handles all characters as UCS-2, UCS-4 is not supported. This is not enough. Plus, UCS-2 is not supported for client encoding. UTF-8 support is up to 3 bytes — this is not just a MySQL problem though.

CREATE TABLE t1 (c1 VARCHAR(30)) CHARSET=utf8;
INSERT INTO T1 VALUES (0x6162F0A0808B63646566); — this inserts ‘ab’ + 4-byte UTF-8 translation of cdef

SELECT c1,HEX(c1) from t1;
if you get ab,6162 back it means that the invalid character was truncated. MySQL does throw up a warning for this.

Possible workarounds — using VARBINARY/BLOB types. Can store any binary data, but this is always case-sensitive (and yes, Japanese characters do have case). FULLTEXT index is not supported, and application code may need to be modified to handle UTF-8 — ie, String.getBytes may need “UTF-8” parameter in it.

Alternatively, use UCS-2 for column encoding:

CREATE TABLE t1 (c1 VARCHAR(30)) CHARSET=ucs2;

INSERT INTO t1 VALUES (_utf8 0x6162F0A0808B63646566);

SELECT … now gives you ?? instead of truncating.

Another alternative: use Shift_JIS or EUC-JP. Code conversion of JIS X 0213:2004 characters is not currently supported.

Shift_JIS is the most widely used encoding, 1 or 2 byte encoding. All ASCII and 1/2 width katakana are 1-byte, the rest are 2-byte. If the first byte value is between 0x00 and 0x7F it’s ASCII 1 byte, 0XA0 – 0XDf is 1-byte, 1/2 width katakana. all the rest are 2-byte characters.

The 2nd byte might be in the ASCII graphic code area 0x40 for example.

0x5C is the escape sequence (backslash in the US). Some Shift_JIS characters contain 0x5C in the 2nd byte. If the charset is specified incorrectly, you’ll end up getting different values — for instance, hex value 0X5C6e will conver to hex value 0X0A. The backslash at the end of the string, hex value 0X5C, will be removed (truncated) if charset is specified incorrectly.

Native MySQL does not support FULLTEXT search in Japanese, Korean and Chinese (CJK issue).

Japanese words do not delimit by space, so it can’t work. 2 ways to do this: dictionary based indexing, dividing words using a pre-installed dictionary. Also N-gram indexing — divide text by N letters (n could be 1, 2, 3 etc). MySQL + Senna implements this, supported by Sumisho Computer Systems.

There are too many Japanese characters to be able to use one byte to handle all of them.

Hiragana — over 50 characters

Katakana — over 50 characters

Kanji — over 6,000 characters

So the Japanese Character set has to be multi-byte. JIS=Japan Industrial Standard, this specifies it.

JIS X 0208 in 1990, updated in 1997 — covers widely used characters, not all characters
JIS X 0213 in 2000, updated in 2004

There are also vendor defined Japanese charsets — NEC Kanji and IBM Kanji — these supplement JIS X 0208.

Cellphone specific symbols have been introduced, so the # of characters is actually increasing!

For JIS X 0208, there are multiple encodings — Shift_JIS (all characters are 2 bytes), EUC-JP (most are 2 bytes, some are 3 bytes), and Unicode (all characters are 3 bytes, this makes people not want to use UTF-8 for ). Shift_JIS is most widely used, but they are moving to Unicode gradually (Vista is using UTF-8 as the standard now). Each code mapping is different, with different hex values for the same character in different encodings.

Similarly, there are different encodings for the other charsets.

MySQL supports only some of these. (get the graph from the slides)

char_length() returns the length by # of characters, length() returns the length by # of bytes.

The connection charset and the server charset have to match otherwise…mojibake!

Windows — Shift_JIS is standard encoding, linux EUC-JP is standard. So conversion may be needed.

MySQL Code Conversion algorithm — UCS-2 facilitates conversion between encodings. MySQL converts mappings to and from UCS-2. If client and server encoding are the same, there’s no conversion. If the conversion fails (ie, trying to convert to latin1), the character is converted to ? and you get mojibake.

You can set a my.cnf paramater for “skip-character-set-client-handshake”, this forces the use of the server side charset (for the column(s) in question).

Issues:

Unicode is supposed to support worldwide characters.

UCS-2 is 2-byte fixed length, takes 2^16 = 65,536 characters. This is one Basic Multilingual Plane (BMP). Some Japanese (and Chinese) characters are not covered by UCS-2. Windows Visa supports JIS X 0213:2004 as a standard character set in Japan (available for Windows XP with the right )

UCS-4 is 4-byte fixed length, can encode 2^31 characters (~2 billion) This covers many BMP’s (128?)

UTF-16 is 2 or 4 byte length, all UCS-2 are mapped to 2 bytes, not all UCS-4 characters are supported — 1 million are. Supported UCS-4 characters are mapped to 4 bytes

UTF-8 from 1-6 bytes is fully compliant with UCS-4. This is out of date. 1-4 byte UTF-8 is fully compliant with UTF-16. From 1-3 bytes, UTF-8 is compliant with UCS-2.

MySQL interally handles all characters as UCS-2, UCS-4 is not supported. This is not enough. Plus, UCS-2 is not supported for client encoding. UTF-8 support is up to 3 bytes — this is not just a MySQL problem though.

CREATE TABLE t1 (c1 VARCHAR(30)) CHARSET=utf8;
INSERT INTO T1 VALUES (0x6162F0A0808B63646566); — this inserts ‘ab’ + 4-byte UTF-8 translation of cdef

SELECT c1,HEX(c1) from t1;
if you get ab,6162 back it means that the invalid character was truncated. MySQL does throw up a warning for this.

Possible workarounds — using VARBINARY/BLOB types. Can store any binary data, but this is always case-sensitive (and yes, Japanese characters do have case). FULLTEXT index is not supported, and application code may need to be modified to handle UTF-8 — ie, String.getBytes may need “UTF-8” parameter in it.

Alternatively, use UCS-2 for column encoding:

CREATE TABLE t1 (c1 VARCHAR(30)) CHARSET=ucs2;

INSERT INTO t1 VALUES (_utf8 0x6162F0A0808B63646566);

SELECT … now gives you ?? instead of truncating.

Another alternative: use Shift_JIS or EUC-JP. Code conversion of JIS X 0213:2004 characters is not currently supported.

Shift_JIS is the most widely used encoding, 1 or 2 byte encoding. All ASCII and 1/2 width katakana are 1-byte, the rest are 2-byte. If the first byte value is between 0x00 and 0x7F it’s ASCII 1 byte, 0XA0 – 0XDf is 1-byte, 1/2 width katakana. all the rest are 2-byte characters.

The 2nd byte might be in the ASCII graphic code area 0x40 for example.

0x5C is the escape sequence (backslash in the US). Some Shift_JIS characters contain 0x5C in the 2nd byte. If the charset is specified incorrectly, you’ll end up getting different values — for instance, hex value 0X5C6e will conver to hex value 0X0A. The backslash at the end of the string, hex value 0X5C, will be removed (truncated) if charset is specified incorrectly.

Native MySQL does not support FULLTEXT search in Japanese, Korean and Chinese (CJK issue).

Japanese words do not delimit by space, so it can’t work. 2 ways to do this: dictionary based indexing, dividing words using a pre-installed dictionary. Also N-gram indexing — divide text by N letters (n could be 1, 2, 3 etc). MySQL + Senna implements this, supported by Sumisho Computer Systems.