Looking for data

I work as a QA Engineer in a “stealth mode” startup building a network storage appliance. I am looking for “real world” datasets to load into our appliance to profile performance and scalability of the product given different schema models populated real world distribution of data. I envision looking for two significantly different datasets. One is the “flat file” schema like historical or logging data from Web Server Access and Error logs. The other would be a relational (preferably star schema) database like reservation database or inventory control database.

The data doesn’t need to be current. And it can be scrubbed to remove “real” data. The data won’t be used outside the QA lab. Again, this is to test “how does the product work when data that lives in the outside world is loaded.”

Ultimately, I am looking for 2 to 10 Terabytes of composite data at the end of the project.

I work as a QA Engineer in a “stealth mode” startup building a network storage appliance. I am looking for “real world” datasets to load into our appliance to profile performance and scalability of the product given different schema models populated real world distribution of data. I envision looking for two significantly different datasets. One is the “flat file” schema like historical or logging data from Web Server Access and Error logs. The other would be a relational (preferably star schema) database like reservation database or inventory control database.

The data doesn’t need to be current. And it can be scrubbed to remove “real” data. The data won’t be used outside the QA lab. Again, this is to test “how does the product work when data that lives in the outside world is loaded.”

Ultimately, I am looking for 2 to 10 Terabytes of composite data at the end of the project.

MySQL Winter of Code

What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!

What happened to the MySQL Winter of Code? Are they waiting for winter in Australia?

I live near Boston, MA and I can tell you it’s definitely winter in the northern hemisphere….

So what are we waiting for?

Well, I can say this — we’re waiting for people. The Winter of Code idea is a great one, particularly since if MySQL works with academic institutions they could help students find Master’s Projects or part of Ph.D. work. Imagine someone writing a new storage engine and having that earn them a Master’s degree. This is exactly what MySQL needs — more people who understand database internals and best theoretical practices to start coding and see where it goes. Note the “more people” — they already have staff that does this.

I’m guessing the Winter of Code is nonexistent because of other big announcements that have been happening; still, I would love to see some collaboration with institutions and universities to give incentives to participants and push them to do it. Class credit or fulfilling graduate requirements would be perfect, and there would be many submissions.

Tying together MySQL and universities would be a great leap forward and a very important move for MySQL, as it would generate more contributions to the code. And the contest!

Access Has “SET”, Recommends Not Using It

http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.

http://www.regdeveloper.co.uk/2006/07/18/multivalued_datatypes_access/

This is an interesting read — it would be awesome if MySQL just used the “SET” or “ENUM” data types to be a placeholder for a join table, that it would create automatically for you. Of course, that’s a new level of functionality — MySQL does not implicitly create permanent tables with any commands. But it would be neat.

Top 10 Largest “Databases”

Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?

Thanx to Rich McIver for passing along this link:

http://www.businessintelligencelowdown.com/2007/02/top_10_largest_.html

I’m amused mostly because the article interchanges “database” with “data storage” — many of the sites have “digital documents” included in their count, and YouTube is in there completely with the amount of space their videos take up. But is all this stuff stored in databases? I do not think so. Anyone know for sure?

OurSQL Episode 8: Basic MySQL Security

Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Listener feedback:

MySQL will go public. Would you buy stock if you had the money? Why or why not?
Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Episode 8 Show Notes:
This episode’s feature is basic MySQL Security. Not only will we discuss what the basic security is, but we’ll discuss the *why*s, not just the how’s.

Direct play this episode at:
http://technocation.org/content/oursql-episode-8%3A-basic-mysql-security-0

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

News
MySQL offers an unlimited number of Gold licenses per year for $40,000:
http://mysql.com/products/enterprise/unlimited.html
http://mysql.com/products/enterprise/features.html

MySQL begins to talk about going public: http://www.businessreviewonline.com/os/archives/2007/01/mysql_set_to_jo.html

Learning Resource:
http://www.hackmysql.com

Feature — MySQL Security:
Bruce Scneier’s latest Crypto-Gram newsletter refers to an article where a person gets on an airplane, having bypassed all airport security via climbing a fence.
http://www.schneier.com/crypto-gram-0701.html
http://www.newsobserver.com/102/story/523482.html

Feedback
To leave a comment, suggestion, question or other feedback:

Call the comment line at +1 617-674-2369 (US phone number)

Use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Leave a message at the Technocation forums:
http://technocation.org/forum

Send an e-mail to podcast@technocation.org

Acknowledgements/Sponsors
www.technocation.org
http://music.podshow.com
www.russellwolff.com
http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

OurSQL Episode 7: What’s it Like to be Normal?

In this episode, I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, or want to suggest topics to cover in future podcasts, please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

In this episode, I go over database normalization in general and explain 1st Normal Form (1NF) in depth.

Direct play episode 7 at:
http://technocation.org/content/oursql-episode-7%3A-what%2526%2523039%3Bs-it-be-normal%3F-1

Subscribe to the podcast by clicking:
http://phobos.apple.com/WebObjects/MZStore.woa/wa/viewPodcast?id=206806301

You can Direct download all the oursql podcasts at:
http://technocation.org/podcasts/oursql/

Links:
MySQL binaries centralized repository: http://www.dorsalsource.org

SQLzoo

http://www.sqlzoo.net

Links about database normalization:
http://en.wikipedia.org/wiki/1NF

http://www.datamodel.org/NormalizationRules.html

http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html

http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html

Acknowledgements

http://www.technocation.org

http://music.podshow.com

http://www.russellwolff.com

http://www.smallfishadventures.com/Home.html “The Thank you song” — Smallfish

Feedback

If you have any feedback about this podcast, or want to suggest topics to cover in future podcasts, please email

podcast@technocation.org

You can also:

Call the comment line at +1 617-674-2369

Or use Odeo to leave a voice mail through your computer:
http://odeo.com/sendmeamessage/Sheeri

Or use the Technocation forums:
http://technocation.org/forum

Donate to Help Folks Get to the MySQL Users Conference

Phorum needs to get to the MySQL Conference. Perhaps you do, too? Or perhaps you want to help people get there? Technocation, Inc is a not-for-profit committed to helping folks get the education and networking contacts so important to IT professionals. So, they’re opening up their very first campaign!

Technocation, Inc. is a not-for-profit organization. Your contributions are tax-deductible to the fullest extent of the law. You may choose to donate money, goods or services. Money may be donated through PayPal at http://technocation.org/content/donate-now, and services should be arranged through e-mailing donate@technocation.org . To send payment by mail, see details at . Technocation’s EIN/Tax ID is 20-5445375
Currently, this campaign is looking for:

Monetary contributions
Graphic design for print media
Physical booth board materials
Transferable frequent flyer miles
Transferable Hyatt hotel points

Technocation will provide a grant application on Thursday, February 15th to anyone interested in applying for grants.

Money may be donated through PayPal at http://technocation.org/content/donate-now, and services should be arranged through e-mailing donate@technocation.org . Directed donations are accepted (ie, “for conference registration only” or “for travel for Phorum.org members”), simply attach a note with your payment.

Other questions may also be e-mailed to that address, including requesting donations of goods and services not currently on that list.

Phorum needs to get to the MySQL Conference. Perhaps you do, too? Or perhaps you want to help people get there? Technocation, Inc is a not-for-profit committed to helping folks get the education and networking contacts so important to IT professionals. So, they’re opening up their very first campaign!

Technocation, Inc. is a not-for-profit organization. Your contributions are tax-deductible to the fullest extent of the law. You may choose to donate money, goods or services. Money may be donated through PayPal at http://technocation.org/content/donate-now, and services should be arranged through e-mailing donate@technocation.org . To send payment by mail, see details at . Technocation’s EIN/Tax ID is 20-5445375
Currently, this campaign is looking for:

Monetary contributions
Graphic design for print media
Physical booth board materials
Transferable frequent flyer miles
Transferable Hyatt hotel points

Technocation will provide a grant application on Thursday, February 15th to anyone interested in applying for grants.

Money may be donated through PayPal at http://technocation.org/content/donate-now, and services should be arranged through e-mailing donate@technocation.org . Directed donations are accepted (ie, “for conference registration only” or “for travel for Phorum.org members”), simply attach a note with your payment.

Other questions may also be e-mailed to that address, including requesting donations of goods and services not currently on that list.

Women in Open Source

I stumbled across this on the fabulous http://www.everythingsysadmin.com :

from:  http://www.socallinuxexpo.org/wios07/

The Southern California Linux Expo (SCALE) will host a Women in Open Source Event as part of their upcoming 2007 conference, SCALE 5x.

The focus of this event is on the women in the open source and free software communities. The goal of this event is to encourage women to use technology, open source and free software, and to explore the obstacles that women face in breaking into the technology industry. The Women in Open Source event will be held on February 9, 2007 at the Los Angeles Airport Westin Hotel.

I have seen the dearth of women, in system administration as well as database administration, having worked in both fields.  Network administration has fewer women than both system administration and database administration. 

I have noted that there are more women in database administration than system administration.  The MySQL Conference I attended last year had the biggest percentage of women I’ve ever seen at a technical conference.  Checking out the names at http://www.planetmysql.org the first name there is female (mine), and then a few dozen names go by, and most of them are clearly identifiable as male.  There are only a few names whom I can’t place a face/picture to, and to whom I cannot assign a gender.

I’m not denying that barriers and prejudices exist. In fact, one of the reasons I opted to get a Master’s Degree in Computer Science is that I have no idea who’s looking at me, and consciously or subconsciously are thinking, “She’s a woman, she’s not as good as a man.”

Certainly, there are the usual barriers to entry in the scientific world.  Is open source any different from any other type of “boys’ club”?  My partner is a sleight-of-hand card magician, and he laments that there are few women in the field.  He also laments that he does not see women interested at all — it’s not that they’re starting and being turned away, or getting discouraged, it’s that they’re not even starting.  How would we know if we reach that point in the open source world? 

I have not met with any barriers to my career.  Then again, I grew up with 2 brothers (my sister is 7 years older, so I never really “grew up” with her), and adapted to “living in a world of boys/men” a long time ago.  Mostly it’s just grabbing the bull by the horns and just doing.

My issue with the lack of women in open source is that I wonder how far we can get into it without bringing up the stereotypes.  I am not a shy, quiet woman.  I am not ladylike in many ways.  Many “qualities” that American society teaches their girls and women — don’t interrupt, have a lack of self-confidence, looks are your most important asset so do not seem to intelligent — are the barriers to entry I’ve seen, though not encountered.  That, and the struggle between family and career.  I have seen more and more men struggle with this as well, so it’s not as much of a “female” issue as it has been in the past.

Professionally, I make sure to wear my glasses on a job interview or important meeting, and dress conservatively when I do (I’ve fallen back to wearing glasses all the time because I’ve been too lazy to get more contact lenses).  I actually don’t want to look too pretty on a job interview, and consciously think of this.  So I’m definitely aware of the discrimination that exists, but so far I have been lucky enough to not encounter it, or at least I’m not aware of encountering it.

It’s my belief that women make less than men for the same job because they are less likely to negotiate and hold out for more money, and more likely to take one of the first offers — “good women” don’t “rock the boat”, after all.  (tongue in cheek, of course)

As someone who is rather type A (see http://en.wikipedia.org/wiki/Type_A_personality), I know there are colleagues and co-workers who find me to be too strict, which translates to “she’s a bitch.”  For men, this turns into “he’s too strict and type A.”  The double-standard is annoying, though not something I particularly worry about, since it has not gotten in the way of advancement or recognition, and has not targeted me for anything negative in particular (that I know about).

So, what’s the point of this?  I think it’s good to teach girls and women the skills they need to make it in a male-dominated field.  But I feel it should be taught and instilled at an early age, to both girls AND boys, as “this is the way American/Western society works, and these are ways folks are successful.”

I just feel as though singling out women, while it’s accurate, is saying, “we’re going to put you in the special room because you haven’t learned something you should have” and therefore is somewhat degrading.  Not a lot degrading, but somewhat.

powered by performancing firefox

I stumbled across this on the fabulous http://www.everythingsysadmin.com :

from:  http://www.socallinuxexpo.org/wios07/

The Southern California Linux Expo (SCALE) will host a Women in Open Source Event as part of their upcoming 2007 conference, SCALE 5x.

The focus of this event is on the women in the open source and free software communities. The goal of this event is to encourage women to use technology, open source and free software, and to explore the obstacles that women face in breaking into the technology industry. The Women in Open Source event will be held on February 9, 2007 at the Los Angeles Airport Westin Hotel.

I have seen the dearth of women, in system administration as well as database administration, having worked in both fields.  Network administration has fewer women than both system administration and database administration. 

I have noted that there are more women in database administration than system administration.  The MySQL Conference I attended last year had the biggest percentage of women I’ve ever seen at a technical conference.  Checking out the names at http://www.planetmysql.org the first name there is female (mine), and then a few dozen names go by, and most of them are clearly identifiable as male.  There are only a few names whom I can’t place a face/picture to, and to whom I cannot assign a gender.

I’m not denying that barriers and prejudices exist. In fact, one of the reasons I opted to get a Master’s Degree in Computer Science is that I have no idea who’s looking at me, and consciously or subconsciously are thinking, “She’s a woman, she’s not as good as a man.”

Certainly, there are the usual barriers to entry in the scientific world.  Is open source any different from any other type of “boys’ club”?  My partner is a sleight-of-hand card magician, and he laments that there are few women in the field.  He also laments that he does not see women interested at all — it’s not that they’re starting and being turned away, or getting discouraged, it’s that they’re not even starting.  How would we know if we reach that point in the open source world? 

I have not met with any barriers to my career.  Then again, I grew up with 2 brothers (my sister is 7 years older, so I never really “grew up” with her), and adapted to “living in a world of boys/men” a long time ago.  Mostly it’s just grabbing the bull by the horns and just doing.

My issue with the lack of women in open source is that I wonder how far we can get into it without bringing up the stereotypes.  I am not a shy, quiet woman.  I am not ladylike in many ways.  Many “qualities” that American society teaches their girls and women — don’t interrupt, have a lack of self-confidence, looks are your most important asset so do not seem to intelligent — are the barriers to entry I’ve seen, though not encountered.  That, and the struggle between family and career.  I have seen more and more men struggle with this as well, so it’s not as much of a “female” issue as it has been in the past.

Professionally, I make sure to wear my glasses on a job interview or important meeting, and dress conservatively when I do (I’ve fallen back to wearing glasses all the time because I’ve been too lazy to get more contact lenses).  I actually don’t want to look too pretty on a job interview, and consciously think of this.  So I’m definitely aware of the discrimination that exists, but so far I have been lucky enough to not encounter it, or at least I’m not aware of encountering it.

It’s my belief that women make less than men for the same job because they are less likely to negotiate and hold out for more money, and more likely to take one of the first offers — “good women” don’t “rock the boat”, after all.  (tongue in cheek, of course)

As someone who is rather type A (see http://en.wikipedia.org/wiki/Type_A_personality), I know there are colleagues and co-workers who find me to be too strict, which translates to “she’s a bitch.”  For men, this turns into “he’s too strict and type A.”  The double-standard is annoying, though not something I particularly worry about, since it has not gotten in the way of advancement or recognition, and has not targeted me for anything negative in particular (that I know about).

So, what’s the point of this?  I think it’s good to teach girls and women the skills they need to make it in a male-dominated field.  But I feel it should be taught and instilled at an early age, to both girls AND boys, as “this is the way American/Western society works, and these are ways folks are successful.”

I just feel as though singling out women, while it’s accurate, is saying, “we’re going to put you in the special room because you haven’t learned something you should have” and therefore is somewhat degrading.  Not a lot degrading, but somewhat.

powered by performancing firefox

CHAR() vs. VARCHAR()

So, a little gotcha:

The CHAR() and VARCHAR() types are different types. MySQL silently converts any CHAR() fields to VARCHAR() when creating a table with at least 1 VARCHAR() field.

http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html

If any column in a table has a variable length, the entire row becomes variable-length as a result. Therefore, if a table contains any variable-length columns (VARCHAR, TEXT, or BLOB), all CHAR columns longer than three characters are changed to VARCHAR columns. This does not affect how you use the columns in any way; in MySQL, VARCHAR is just a different way to store characters. MySQL performs this conversion because it saves space and makes table operations faster.

However, that’s not entirely accurate. Because according to the manual page at http://dev.mysql.com/doc/refman/5.0/en/char.html:

As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.

If you have a field such as name, and require it to not be blank, you probably have some function testing it before it goes into the database. However, most languages are perfectly happy that ” ” isn’t blank. When it gets put into the database, however, it becomes blank if your column is a VARCHAR. Which means folks may be able to get beyond your requirement of a blank field, and actually store a blank field in the database (as opposed to storing a space or series of spaces).

The CHAR() and VARCHAR() types are different types. MySQL silently converts any CHAR() fields to VARCHAR() when creating a table with at least 1 VARCHAR() field.

http://dev.mysql.com/doc/refman/5.0/en/silent-column-changes.html

If any column in a table has a variable length, the entire row becomes variable-length as a result. Therefore, if a table contains any variable-length columns (VARCHAR, TEXT, or BLOB), all CHAR columns longer than three characters are changed to VARCHAR columns. This does not affect how you use the columns in any way; in MySQL, VARCHAR is just a different way to store characters. MySQL performs this conversion because it saves space and makes table operations faster.

However, that’s not entirely accurate. Because according to the manual page at http://dev.mysql.com/doc/refman/5.0/en/char.html:

As of MySQL 5.0.3, trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL. Before MySQL 5.0.3, trailing spaces are removed from values when they are stored into a VARCHAR column; this means that the spaces also are absent from retrieved values.

If you have a field such as name, and require it to not be blank, you probably have some function testing it before it goes into the database. However, most languages are perfectly happy that ” ” isn’t blank. When it gets put into the database, however, it becomes blank if your column is a VARCHAR. Which means folks may be able to get beyond your requirement of a blank field, and actually store a blank field in the database (as opposed to storing a space or series of spaces).