Log Buffer #72 — a Carnival of the Vanities for DBAs

Welcome to the 72nd edition of Log Buffer, the weekly review of database blogs.

Oracle OpenWorld (OOW) is over, and Lucas Jellema of the AMIS Technology blog notes the OOW Content Catalog has been updated with most of the presentations available for download.

On his way home from OOW, Chris Muir of the appropriately titled One Size Doesn’t Fit All blog notes how OOW and the Australian Oracle User Group Conference and OOW compare with regards to 99% fewer attendees in AUSOUG Perth conference – from 45k down to 350.

Mark Rittman of Rittman Mead Consulting summarizes OOW’s impact on business intelligence and data warehousing in Reflections on Oracle’s BI Strategy. On his way home, Mark found time for A First Look at Oracle OLAP 11g, noting the pros, cons, gotchas and suggestions for improvement for many useful new features.

Microsoft SQL Server also has a new release in the works. Ted Malone in Agile Methods for the DB Dev is excited about SQL Server 2008 “Katmai” CTP 5 New Features and descries almost 20 of them.

Ian Barwick of PostgreSQL Notes talks about Converting tsearch2 to 8.3 now that the tsearch2 full text search engine has been integrated as a core PostgreSQL feature.

Patrick Barel of the Bar Solutions Weblog explains a new feature of Oracle 11g called Virtual Columns. While virtual data may be a new topic, using databases on virtual machines is an ongoing issue. Marco Russo of SQL BI gives his opinion on when to use virtual machines in SQL Server Virtualization.

Database professionals can be real characters, and set in their ways. Bad puns make good transitions, and Corrado Pandiani sheds light on MySQL’s rules for Charsets and Collations on Multicolumn Fulltext Indexes. Adam Douglas of Binary Expressions fixed some trouble with MySQL and French Characters not rendering properly.

Greg Sabino Mullane shows reasons for his Problems with pl/perl and UTF-8. In Tending the Garden, Selena Deckelmann goes through the very easy process of Automatic Character Set Conversion in PostgreSQL. Selena has also been busy organizing the development of ptop, an interactive, command-line tool for monitoring the current status of a PostgreSQL database. If you read this in time and are in the Portland, Oregon area you can join the ptop hackathon at noon (local time) tomorrow, Saturday November 24th, or you can read the ptop meeting summary from pdxpug.

While some of us are database tools, some of us prefer to contribute database tools. Baron Schwartz honors MySQL’s trademark by announcing that MySQL Toolkit is now Ma’atkit. Ma’at, pronounced “mott”, is the ancient Egyption patron saint of truth, harmony and order. In addition, Baron proclaims “Ma’atkit Version 1297 Released!”

Hubert Lubaczewski notes the changes to the analyze.pgsql.logs.pl script of pgsql-tools in update 3 and update 4.

Hubert also notes how to find overlapping time ranges and how to find the number of ranges a time belongs to in time ranges in postgresql – part 2. Though written for PostgreSQL, both posts can easily be applied to another DBMS. In the same vein, Yves Trudeau shares the DBMS-independent graphical images of Unix memory usage in Generating graphs from vmstat output.

Jeromy McMahon posts sample SQL code for viewing Oracle extent segments for tablespaces, temporary spaces and sort segment space. The Cheap DBA gets Oracle specific with a Slick Shell Script for Reporting on Oracle Workload. Krister Axel of codeboxer.com has A really clean dynamic insert proc for PL/SQL ETL packages, including validation checking and exception handling. zillablog‘s Robert Treat treats us to a function for tracking plperl shared variables.

Jen M is Keeping IT simple by coding capacity measurements to show How Not to Outgrow Your DB Infra: A Simple Step. She follows up with more code to monitor a specific cache to resolve unexplainable slowness/resource leak in SQL Server.

This post began with a conference, and so it shall conclude. The Call For Proposals for PgCon 2008 is underway, and David Fetter lets us know that PgCon 2008 will be held May 22-23 at the University of Ottawa. This is different from Joshua Drake‘s call for volunteers for Command Prompt’s Postgresql Conference East 08, on March 28-29 at the University of Maryland. Neil Conway informs us of a Jim Gray Tribute, consisting of a general session and 9 half-hour technical sessions reviewing some of the 1998 Turing Award winner’s work.

In case this edition did not give you enough to read, Beth Breidenbach of Confessions of a Database Geek created an aggregate blog feed for posts relating to information quality.

Welcome to the 72nd edition of Log Buffer, the weekly review of database blogs.

Oracle OpenWorld (OOW) is over, and Lucas Jellema of the AMIS Technology blog notes the OOW Content Catalog has been updated with most of the presentations available for download.

On his way home from OOW, Chris Muir of the appropriately titled One Size Doesn’t Fit All blog notes how OOW and the Australian Oracle User Group Conference and OOW compare with regards to 99% fewer attendees in AUSOUG Perth conference – from 45k down to 350.

Mark Rittman of Rittman Mead Consulting summarizes OOW’s impact on business intelligence and data warehousing in Reflections on Oracle’s BI Strategy. On his way home, Mark found time for A First Look at Oracle OLAP 11g, noting the pros, cons, gotchas and suggestions for improvement for many useful new features.

Microsoft SQL Server also has a new release in the works. Ted Malone in Agile Methods for the DB Dev is excited about SQL Server 2008 “Katmai” CTP 5 New Features and descries almost 20 of them.

Ian Barwick of PostgreSQL Notes talks about Converting tsearch2 to 8.3 now that the tsearch2 full text search engine has been integrated as a core PostgreSQL feature.

Patrick Barel of the Bar Solutions Weblog explains a new feature of Oracle 11g called Virtual Columns. While virtual data may be a new topic, using databases on virtual machines is an ongoing issue. Marco Russo of SQL BI gives his opinion on when to use virtual machines in SQL Server Virtualization.

Database professionals can be real characters, and set in their ways. Bad puns make good transitions, and Corrado Pandiani sheds light on MySQL’s rules for Charsets and Collations on Multicolumn Fulltext Indexes. Adam Douglas of Binary Expressions fixed some trouble with MySQL and French Characters not rendering properly.

Greg Sabino Mullane shows reasons for his Problems with pl/perl and UTF-8. In Tending the Garden, Selena Deckelmann goes through the very easy process of Automatic Character Set Conversion in PostgreSQL. Selena has also been busy organizing the development of ptop, an interactive, command-line tool for monitoring the current status of a PostgreSQL database. If you read this in time and are in the Portland, Oregon area you can join the ptop hackathon at noon (local time) tomorrow, Saturday November 24th, or you can read the ptop meeting summary from pdxpug.

While some of us are database tools, some of us prefer to contribute database tools. Baron Schwartz honors MySQL’s trademark by announcing that MySQL Toolkit is now Ma’atkit. Ma’at, pronounced “mott”, is the ancient Egyption patron saint of truth, harmony and order. In addition, Baron proclaims “Ma’atkit Version 1297 Released!”

Hubert Lubaczewski notes the changes to the analyze.pgsql.logs.pl script of pgsql-tools in update 3 and update 4.

Hubert also notes how to find overlapping time ranges and how to find the number of ranges a time belongs to in time ranges in postgresql – part 2. Though written for PostgreSQL, both posts can easily be applied to another DBMS. In the same vein, Yves Trudeau shares the DBMS-independent graphical images of Unix memory usage in Generating graphs from vmstat output.

Jeromy McMahon posts sample SQL code for viewing Oracle extent segments for tablespaces, temporary spaces and sort segment space. The Cheap DBA gets Oracle specific with a Slick Shell Script for Reporting on Oracle Workload. Krister Axel of codeboxer.com has A really clean dynamic insert proc for PL/SQL ETL packages, including validation checking and exception handling. zillablog‘s Robert Treat treats us to a function for tracking plperl shared variables.

Jen M is Keeping IT simple by coding capacity measurements to show How Not to Outgrow Your DB Infra: A Simple Step. She follows up with more code to monitor a specific cache to resolve unexplainable slowness/resource leak in SQL Server.

This post began with a conference, and so it shall conclude. The Call For Proposals for PgCon 2008 is underway, and David Fetter lets us know that PgCon 2008 will be held May 22-23 at the University of Ottawa. This is different from Joshua Drake‘s call for volunteers for Command Prompt’s Postgresql Conference East 08, on March 28-29 at the University of Maryland. Neil Conway informs us of a Jim Gray Tribute, consisting of a general session and 9 half-hour technical sessions reviewing some of the 1998 Turing Award winner’s work.

In case this edition did not give you enough to read, Beth Breidenbach of Confessions of a Database Geek created an aggregate blog feed for posts relating to information quality.

Creating an API for My Stored Procedures

Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?

Does anyone have any ways they create an API for their stored routines (functions and procedures)? Currently it seems as though I have to parse the CREATE statement to get the input variables….Has anyone else done this? Is it in any third party tools?

Great Job Interview Snippet

Now, I should probably be a good Planet MySQLer and check the MySQL Forge at http://forge.mysql.com, but Dean Swift’s “mystery festive stored procedure” linked at http://deepselect.blogspot.com/2006/12/merry-christmas.html is a pretty good interview question for a candidate. I would include the hint that it’s a Christmasy stored procedure.

I laughed out loud when I figured it out. I didn’t actually run it, but read the stored procedure to see if I could puzzle it out. And so I did. It took a few minutes, and I had to copy and paste it to a buffer that used word wrap and format it properly.

But this will show how good someone is under pressure. If you give it to them and walk away, there will be less pressure. Either way, you’ll see their reaction too — if it’s “ha ha very funny who cares?” or if it’s, “that’s pure genius!” or if it’s a big groan or hearty laugh, you’ll see a person’s personality and how they might fit in with the team.

Now, I should probably be a good Planet MySQLer and check the MySQL Forge at http://forge.mysql.com, but Dean Swift’s “mystery festive stored procedure” linked at http://deepselect.blogspot.com/2006/12/merry-christmas.html is a pretty good interview question for a candidate. I would include the hint that it’s a Christmasy stored procedure.

I laughed out loud when I figured it out. I didn’t actually run it, but read the stored procedure to see if I could puzzle it out. And so I did. It took a few minutes, and I had to copy and paste it to a buffer that used word wrap and format it properly.

But this will show how good someone is under pressure. If you give it to them and walk away, there will be less pressure. Either way, you’ll see their reaction too — if it’s “ha ha very funny who cares?” or if it’s, “that’s pure genius!” or if it’s a big groan or hearty laugh, you’ll see a person’s personality and how they might fit in with the team.

The Trend of Managed Schemas: A Database is Not a Messaging System

This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

(From the original poster:)

I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.

Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.

(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.

What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.

The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.

Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.

And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.

Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.

What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?

It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).

I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.

This thread on the Boston MySQL User Group Board is getting interesting:
http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

(From the original poster:)

I think that nonequivalence comes from the past when the data sharing was a
rare situation. The data changes were always initiated by application and it
always knew about those changes. Now the situation is different. When the data
are shared between multiple remote applications you have to notify other
interested parties about those changes.

Currently databases are mostly used as “pull” components. If they had standard
“push” functionality they could compete with messaging systems with the advantages
of automatic data persistence and powerful query language.

(my response:)
Well, and that’s the problem — the paradigm *has* changed. MySQL is fast and reliable because it does NOT put things like messaging into their database, which Oracle and SQL Server do. A database is not a messaging system, it’s a database.

What effect would notification that there have been changes have on MVCC? I do wish there was a “pull” way to check if the data has changed.

The paradigm change of the application managing the schema causes this. I do not believe messaging is the correct way to handle this problem.

Consider the parallel to source code version control. Much like MVCC, you check out code, change it, and commit the code. Unlike many source code version control systems, though, MVCC (“data version control”) does not have the equivalent of an “update” command, except for doing another pull from the database. It would be great if there was an easy way to do a “diff” of what’s in the database versus what the application is changing, but that seems like it would be a programmatic thing (function or method), not a database thing.

And consider the database overhead and bandwidth….instead of just running queries, MySQL would have to somehow keep track of which thread has what data, and then notify every single thread that has that data, that it’s changed. The applications will have to be written to keep threads open longer, which will consume lots of resources. That’s lots more overhead for the database, and much more bandwidth, because there may be instances of the application that are using data that they do not care if it changed….so the messaging system would be wasting bandwidth, sending messages to instances that do not care. Although that could be mitigated by the application keeping a thread open when it cares about whether or not the data has changed.

Then again, I’m not fond of managed schema in the application…or at least, when the developers write that code. Seems to me it should be the DBA writing that code. It’s *very* useful for data consistency and integrity, which is a function of the DBA, not a developer.

What effects do you see the managed schema having on databases? Who should be responsible for writing a managed schema? Should a managed schema be used for database consistency within an application? Where is the line drawn between the application putting the required information into the database, and the database’s job of maintaining consistency and integrity?

It’s somewhat ironic, since for a long time MySQL advocated using the application to ensure the consistency and integrity (ie, before MySQL had a storage engine with foreign keys and transactions).

I often say that the biggest reason MySQL is a widely used database is because it is fast. A fast database can be complemented by an application that adds the features the database is missing; but a slow database that is full-featured cannot be made faster by an application. So it worries me when folks request very specialized systems such as a messaging server (or ANY “push” system) into the database, because that could be easily done with a “pull” mechanism, only using the bandwidth needed by the instances of the applications that care. Otherwise, it will end up adding Microsoft-level bloat to a really nice and fast program.

Calling application on the database server from stored procedure or trigger

I do not know enough about the subject to answer this, but I know the folks that read this do.

From: http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

Hi,

I wonder is there any way to call an application which resides on the database server computer from stored procedure or trigger? Is the language of stored procedures and triggers only restricted by SQL statements and some arithmetic operations?

It is interesting that Oracle and MS SQL Server already have this functionality. In Oracle it is Java stored procedure, in MS SQL Server it is called “extended stored procedure” where you can call C++ routine out of your stored procedure. It is difficult overestimate the convenience of being able to do some processing triggerted by the record insertion event.

In this case the database could be used not only for “pulling” data out, but also being able to “push” data to some component.

You can go to the meetup message boards and post an answer. If you’d like to post anonymously, feel free to use the account:

admin@sheeri.net
and password
guest

Thanx!

I do not know enough about the subject to answer this, but I know the folks that read this do.

From: http://mysql.meetup.com/137/boards/view/viewthread?thread=2280640

Hi,

I wonder is there any way to call an application which resides on the database server computer from stored procedure or trigger? Is the language of stored procedures and triggers only restricted by SQL statements and some arithmetic operations?

It is interesting that Oracle and MS SQL Server already have this functionality. In Oracle it is Java stored procedure, in MS SQL Server it is called “extended stored procedure” where you can call C++ routine out of your stored procedure. It is difficult overestimate the convenience of being able to do some processing triggerted by the record insertion event.

In this case the database could be used not only for “pulling” data out, but also being able to “push” data to some component.

You can go to the meetup message boards and post an answer. If you’d like to post anonymously, feel free to use the account:

admin@sheeri.net
and password
guest

Thanx!

Question #2: Trigger on One Table To Insert Data into Another

Question #2 from the September MySQL User Group was whether or not a TRIGGER can affect a different table. Apparently the documentation (perhaps for an earlier version??) specified this was not possible. Tom Hanlon, MySQL employee, put up this example (modified from the original, special thanks to Ralph Navarro for copying it down):

Basically, this trigger will insert the current user and timestamp into another table.

mysql> delimiter $$
mysql> CREATE TRIGGER BIcity BEFORE INSERT ON city
-> FOR EACH ROW BEGIN
-> INSERT INTO citytest (name,happened) values (current_user(),now());
-> END;
-> $$
Query OK, 0 rows affected (0.03 sec)

mysql> delimiter ;
mysql> create table citytest (
name varchar(60) not null default '',
happened datetime not null);
Query OK, 0 rows affected (0.19 sec)

mysql> describe city;
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| CountryCode | char(3) | NO | | | |
| District | char(20) | NO | | | |
| Population | int(11) | NO | | 0 | |
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

mysql> insert into city (Name) values ('tomtown');
Query OK, 1 row affected (0.02 sec)

mysql> select * from citytest\G
*************************** 1. row ***************************
name: root@localhost
happened: 2006-09-07 21:45:14
1 row in set (0.01 sec)

Question #2 from the September MySQL User Group was whether or not a TRIGGER can affect a different table. Apparently the documentation (perhaps for an earlier version??) specified this was not possible. Tom Hanlon, MySQL employee, put up this example (modified from the original, special thanks to Ralph Navarro for copying it down):

Basically, this trigger will insert the current user and timestamp into another table.

mysql> delimiter $$
mysql> CREATE TRIGGER BIcity BEFORE INSERT ON city
-> FOR EACH ROW BEGIN
-> INSERT INTO citytest (name,happened) values (current_user(),now());
-> END;
-> $$
Query OK, 0 rows affected (0.03 sec)

mysql> delimiter ;
mysql> create table citytest (
name varchar(60) not null default '',
happened datetime not null);
Query OK, 0 rows affected (0.19 sec)

mysql> describe city;
+-------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| CountryCode | char(3) | NO | | | |
| District | char(20) | NO | | | |
| Population | int(11) | NO | | 0 | |
+-------------+----------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

mysql> insert into city (Name) values ('tomtown');
Query OK, 1 row affected (0.02 sec)

mysql> select * from citytest\G
*************************** 1. row ***************************
name: root@localhost
happened: 2006-09-07 21:45:14
1 row in set (0.01 sec)

Simple E-mail address validator

On the forge:
Simple E-mail address validator

This stored procedure is a simple e-mail address validator — it makes sure the e-mail address is in the format word@word.word, and makes sure there are no special characters:
( ) <> @ , ; : \ . [ ] */

I allow ” because technically you can have “word”@word.com.

Folks can easily add to this snippet to make it fully compliant with RFC822 if they want. (I got bored and didn’t really feel like adding all that other stuff in. 🙂 )

On the forge:
Simple E-mail address validator

This stored procedure is a simple e-mail address validator — it makes sure the e-mail address is in the format word@word.word, and makes sure there are no special characters:
( ) <> @ , ; : \ . [ ] */

I allow ” because technically you can have “word”@word.com.

Folks can easily add to this snippet to make it fully compliant with RFC822 if they want. (I got bored and didn’t really feel like adding all that other stuff in. 🙂 )