At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. In that session, there was a section on how to determine I/O throughput for a system, because in data warehousing I/O per second (iops) is less important than I/O throughput (how much actual data goes through, not just how many reads/writes).
The section contained an Oracle-specific in-database tool, and a standalone tool that can be used on many operating systems, regardless of whether or not a database exists:
If Oracle is installed, run
SET SERVEROUTPUT ON DECLARE lat INTEGER; iops INTEGER; mbps INTEGER; BEGIN -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO(<DISKS>, <MAX_LATENCY>,iops,mbps,lat); DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat); dbms_output.put_line('max_mbps = ' || mbps); end;
For us MySQL folks, or even the Drizzle or NoSQL folks, Oracle offers a free standalone tool called Orion. The example given in the slides was:
./orion –run advanced –testname mytest –num_small 0 –size_large 1024 –type rand –simulate contact –write 0 –duration 60 –matrix column
-num_small is 0 because you don’t usually do small transactions in a dw.
-type rand for random I/O’s because data warehouse queries usually don’t do sequential reads
-write 0 – no writes, because you do not write often to the dw, that is what the ETL is for.
-duration is in seconds
-matrix column shows you how much you can sustain
I would be interested to see how other folks measure I/O throughput, and maybe even do a side-by-side comparison of different tools. Orion is available for:
Linux (x86, x86-64, Itanium, Power)
HPUX (PA RISC, Itanium)
I am working on a larger write-up of the session itself, which had many concise descriptions of data warehousing issues, but I thought that this merited its own post.