Working Smarter, Not Harder (SET and ENUM)

So, in a previous post I talked about smart code. Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of values; similar to a pull-down menu, where the only values allowed in that field are the ones defined, with the option of also having a null column.

The ENUM field takes the idea of normalizing the data and eliminates the need for a join on that data. It also makes data integrity easy — if the value you’re trying to enter is not in the ENUM column definition, MySQL throws an error.

ENUM is not a standard SQL data type. It is MySQL specific.

As an example, in the real world I run a database for an international not-for-profit. Whenever a donation comes in, it is associated with some form of solicitation — either “October 2005 mailing” or “website donation” or “2005 Los Angeles House Party”, etc.

To normalize this data in a pure relational database, we’d have to have a separate table, with 2 fields — an ID, and the name of the solitication. We would add to this table whenever we have an event, or do a mailing. In the “donation” table, we would reference the solicitation via the solicitation ID, and set up referential integrity so that an ID that does not exist could not be entered into the table.

Instead, we take advantage of MySQL’s ENUM column type. I have a script that will alter the table when we need to add a new solicitation, so even non-technical folk can do it (they go to a web page, add the name of the field, and click “submit”). Pull-down menus are also not difficult with a similar script to get all the possible values in the ENUM column.

The reason ENUM is a great field is that it stores all the row values in 2 bytes; there is a limit of 65,535 elements in the ENUM column. This uses a lot less space than the hack of just having a char or varchar field, which is what folks usually do when they do not want to add a join to a query by having another table.

Only one value at a time can be stored in an ENUM field; if you want to store more than one value in a column (which makes your database not relational, by the way), you can use the SET datatype.

One example of where I use the SET datatype in the not-for-profit database is for a person’s role in the organization. For instance, a person can be a donor and a volunteer; or a major donor and have grantwriting skills and be a staff member. Or they can be none of these.

Because of the combinations that can be made, SET is limited to 64 values. Each value is stored in a maximum of 8 bytes — not impressive for one value, but rather small when it could be holding up to 64 values. This is more than enough for an application that needs “roles within a not-for-profit organization,” but not necessarily enough for an application that needs “albums in a photo database.”

Most folks will either normalize their data and have a separate table, or make their own SET field by using a text field and matching. The latter has the benefit of utilizing the relevance of a FULLTEXT search, however, the tables are likely larger because they need to store all the text. And there is no data integrity; values can be spelled wrong.

It would be great if there were a LARGE SET value; although it may well be that the computation of joining is faster than querying a table with a large field (if LARGE SET had a limit of 128 items, instead of SET’s 64, it would be stored in 16 bytes, which is pretty large if most rows only have 1 or 2 values).

I’d be interested to know where folks are using SET and ENUM, and where they are deliberately not using them.

So, in a previous post I talked about smart code. Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of values; similar to a pull-down menu, where the only values allowed in that field are the ones defined, with the option of also having a null column.

The ENUM field takes the idea of normalizing the data and eliminates the need for a join on that data. It also makes data integrity easy — if the value you’re trying to enter is not in the ENUM column definition, MySQL throws an error.

ENUM is not a standard SQL data type. It is MySQL specific.

As an example, in the real world I run a database for an international not-for-profit. Whenever a donation comes in, it is associated with some form of solicitation — either “October 2005 mailing” or “website donation” or “2005 Los Angeles House Party”, etc.

To normalize this data in a pure relational database, we’d have to have a separate table, with 2 fields — an ID, and the name of the solitication. We would add to this table whenever we have an event, or do a mailing. In the “donation” table, we would reference the solicitation via the solicitation ID, and set up referential integrity so that an ID that does not exist could not be entered into the table.

Instead, we take advantage of MySQL’s ENUM column type. I have a script that will alter the table when we need to add a new solicitation, so even non-technical folk can do it (they go to a web page, add the name of the field, and click “submit”). Pull-down menus are also not difficult with a similar script to get all the possible values in the ENUM column.

The reason ENUM is a great field is that it stores all the row values in 2 bytes; there is a limit of 65,535 elements in the ENUM column. This uses a lot less space than the hack of just having a char or varchar field, which is what folks usually do when they do not want to add a join to a query by having another table.

Only one value at a time can be stored in an ENUM field; if you want to store more than one value in a column (which makes your database not relational, by the way), you can use the SET datatype.

One example of where I use the SET datatype in the not-for-profit database is for a person’s role in the organization. For instance, a person can be a donor and a volunteer; or a major donor and have grantwriting skills and be a staff member. Or they can be none of these.

Because of the combinations that can be made, SET is limited to 64 values. Each value is stored in a maximum of 8 bytes — not impressive for one value, but rather small when it could be holding up to 64 values. This is more than enough for an application that needs “roles within a not-for-profit organization,” but not necessarily enough for an application that needs “albums in a photo database.”

Most folks will either normalize their data and have a separate table, or make their own SET field by using a text field and matching. The latter has the benefit of utilizing the relevance of a FULLTEXT search, however, the tables are likely larger because they need to store all the text. And there is no data integrity; values can be spelled wrong.

It would be great if there were a LARGE SET value; although it may well be that the computation of joining is faster than querying a table with a large field (if LARGE SET had a limit of 128 items, instead of SET’s 64, it would be stored in 16 bytes, which is pretty large if most rows only have 1 or 2 values).

I’d be interested to know where folks are using SET and ENUM, and where they are deliberately not using them.

Comments are closed.