Choosing Datatypes for Fields

Twice recently I have had to import data from a flat file into a database — both sets of data were being imported to make better use of the data in the files. Importing a file is not that difficult using mysqlimport, particularly since in both cases the data was tab-delimited. I also could have used LOAD DATA INFILE.

The problem with mysqlimport and LOAD DATA INFILE is that you need to create the table beforehand. Often, though, I may not know the exact data types. Folks have written about PROCEDURE ANALYSE(), but they use it in the context of “you can check to see that your tables have the correct data types.”

Now, that’s always a good thing — say, quarterly or yearly, doing a schema review, including indexes, foreign keys and data types. However, where PROCEDURE ANALYSE() really comes in handy for me is when I am importing data.

Basically, I create a table with the correct number of fields, all with the datatype of tinytext. Then I run a SELECT * FROM tbl PROCEDURE ANALYSE()\G to find out what types my data really want to be.

Of course, you want to pay attention to warnings when importing, especially truncation warnings. Generally I use this method a few times on the data, and refine the column types a few times before I get it right.

My one complaint is that I wish PROCEDURE ANALYSE() would state the current column datatype as well as the “ideal” data type. It wastes a lot of time for me to make sure that the “ideal” data type is the one I already have, and it should not be that difficult for the function itself to do it.

Twice recently I have had to import data from a flat file into a database — both sets of data were being imported to make better use of the data in the files. Importing a file is not that difficult using mysqlimport, particularly since in both cases the data was tab-delimited. I also could have used LOAD DATA INFILE.

The problem with mysqlimport and LOAD DATA INFILE is that you need to create the table beforehand. Often, though, I may not know the exact data types. Folks have written about PROCEDURE ANALYSE(), but they use it in the context of “you can check to see that your tables have the correct data types.”

Now, that’s always a good thing — say, quarterly or yearly, doing a schema review, including indexes, foreign keys and data types. However, where PROCEDURE ANALYSE() really comes in handy for me is when I am importing data.

Basically, I create a table with the correct number of fields, all with the datatype of tinytext. Then I run a SELECT * FROM tbl PROCEDURE ANALYSE()\G to find out what types my data really want to be.

Of course, you want to pay attention to warnings when importing, especially truncation warnings. Generally I use this method a few times on the data, and refine the column types a few times before I get it right.

My one complaint is that I wish PROCEDURE ANALYSE() would state the current column datatype as well as the “ideal” data type. It wastes a lot of time for me to make sure that the “ideal” data type is the one I already have, and it should not be that difficult for the function itself to do it.

Comments are closed.