Final Videos of Open DB Camp Online:

The final videos from Open DB Camp back in May in Sardinia, Italy are now online.  The full matrix of sessions, videos and slides can be found on the schedule page.

Hands on JDBC by Sandro Pinna – video

“MySQL Plugins, What are They? How you can use them to do wonders” by Sergei Golubchek of MariaDBvideo

The State of Open Source Databases by Kaj Arnö of SkySQL – video

Coming soon, videos from OSCon Data!

The final videos from Open DB Camp back in May in Sardinia, Italy are now online.  The full matrix of sessions, videos and slides can be found on the schedule page.

Hands on JDBC by Sandro Pinna – video

“MySQL Plugins, What are They? How you can use them to do wonders” by Sergei Golubchek of MariaDBvideo

The State of Open Source Databases by Kaj Arnö of SkySQL – video

Coming soon, videos from OSCon Data!

Upcoming Free IOUG Webinar: Securing MySQL

Next week I will give a free IOUG webinar on Securing MySQL on Wednesday, August 10, 2011 from 11:00 AM – 12:00 PM CDT (17:00 GMT):

Securing MySQL is extremely important, but often is not done properly. I will explain the different ways to secure MySQL. In addition to securing users and privileges, file permissions and encrypted connectivity will be discussed. The MySQL server options that contribute to MySQL security will be pointed out, along with tips for eliminating unsecure external scripts. For those who want more auditing capabilities, this session will explain how to see all login attempts (successful and not) and how to lock out accounts with repeated failed logins. The session will conclude with guidelines about how to create security policies for your organization.

To register for this webinar, visit https://www1.gotomeeting.com/register/979260992.  

Next week I will give a free .  

More Videos from Open DB Camp

I have gotten to uploading more of the videos from Open DB camp in Sardinia, Italy back in May:

Henrik Ingo speaks about Xtrabackup Manager – video

Linas Virbalas speaks about “Flexible Replication: MySQL -> PostgreSQL, PostgreSQL to MySQL, PostgreSQL to PostgreSQL” – video – slideshare slides

MySQL to MongoDB replication (hackfest results) – video 

Robert Hodges of Continuent speaks about Multi-Master Replication: Problems, Solutions and Arguments – video

There are a few more videos from Open DB Camp to put up, then I start to put up the content from OSCon Data!

I have gotten to uploading more of the videos from Open DB camp in Sardinia, Italy back in May:

Henrik Ingo speaks about Xtrabackup Manager – video

Linas Virbalas speaks about “Flexible Replication: MySQL -> PostgreSQL, PostgreSQL to MySQL, PostgreSQL to PostgreSQL” – video – slideshare slides

MySQL to MongoDB replication (hackfest results) – video 

Robert Hodges of Continuent speaks about Multi-Master Replication: Problems, Solutions and Arguments – video

There are a few more videos from Open DB Camp to put up, then I start to put up the content from OSCon Data!

Liveblogging at OSCON Data: Drizzle, Virtualizing and Scaling MySQL for the Future

Brian Aker presents “Drizzle, Virtualizing and Scaling MySQL for the Future” at OSCon Data 2011

http://drizzle.org

irc.freenode.net #drizzle

http://blog.krow.net

@brianaker

2005 MySQL 5.0 released – web developers wanted tons of features that were not in the release (making replication better for instance)

2008 Sun buys MySQL

2008 MySQL 6.0 is forked to become Drizzle

2009 Oracle buys Sun

2010 Drizzle developers leave Oracle

2011 First GA release, Drizzle7

MySQL’s Architecture – monolithic kernel, not very modular, lots of interdependence.

Drizzle has a microkernel, which includes a listener, parser, optimizer, executioner, storage system, logging/error reporting.

Drizzle can accept SQL and http blog streaming, and memcached and gearman can easily talk to Drizzle.

Drizzle has tried to have no “gotchas”

– If you make a new field with NOT NULL, MySQL makes new values NULL.  Drizzle does not do this.

– No hacky ALTER TABLE

– Real datetime (64 bit), including microseconds

– IPV6 (apparently this is a strong reason for people switching, to support IPV6)

– No updates that complete halfway

– Default character set is UTF-8, default collation is utf8-general (charset in latin1 by default in MySQL, collation is latin1_swedish_ci – “case insensitive” by default)

Replication

– In MySQL, replication is kind of hacky [this is my summary and opinion, but it’s basically what Brian said]

– Drizzle is Google Protocol Buffer Based

– Replicates row transformations

– Integrates with RabbitMQ, Cassandra, Memcached, Gearman — right now.

DML and MySQL binary logs analog:

– DML is stored transactionally by delta in Drizzle

– InnoDB is already logging, no need to add another log for the binary log.  So it just logs DML to the transaction log.

LibDrizzle

– supports Drizzle, MySQL, SQLite

– Asynchronous

– BSD, so Lawyer-free

What else?

– No cost authentication (pam, ldap, htaccess, …)

– Table functions (new data dictionary, including performance and thread information).  INFORMATION_SCHEMA in Drizzle is *exactly* what’s specified in the SQL standard.

– Data types – native type for UUID, boolean, all known types (except SET, because it’s broken by design)

– Assertions are in Drizzle, you can ask what the type of the result of combining multiple data types will be.

– About 80 conflicts in the Drizzle parser as opposed to about 200 in the MySQL parser

Roadmap – Drizzle7+

– Replication – faster than MySQL and also allows multiple masters.

Virtualization:

Virtualizing a database gives you about a 40% performance hit.  How can costs be cut?  In MySQL 5.0 the Instance Manager was created to solve that but it hasn’t really been worked on.  Drizzle has worked on virtualizing databases internally within Drizzle.

– So drizzle now has catalogs.  

– One catalog has its own set of users, its own schema with tables, etc.

– A catalog is its own sandbox; there is no syntax that allows you to connect from one catalog to another, so there’s no security problems.  

– Cuts the 30/40% hit from virtualizing

– Single instance maintenance – only 1 OS and 1 database to configure, unlike VMs

    – Currently only one database configuration so there’s one global config for shared memory such as innodb buffer pool, but that will change in the future.

– Still allows for I/O spread on SAN/NAS

 

In Drizzle 7.1 – Percona’s xtrabackup supports Drizzle, and ships with drizzle.  xtrabackup supports full and partial backups, no locking, single solution for point-in-time recovery in a single solution.  Because transaction log is stored in database, replication is automatically consistent with the database.  Currently does not do incremental backups with the transaction logs but that’s in the future.

DBQP:

– consolidates standard testing tasks, server/test management, reporting, REGRESSION TESTING

– extended architecture allows for complex testing scenarios

– pluggable – supports new testing tools

– randgen, sql-bench, crashme, sysbench, standard drizzle-test-run suite

– Keeping tools and testing configurations in-tree facilitates testing for everyone

– supported by SkySQL

 

Dynamic SQL/execute()

– New UTF-8 parser

– Being extended to allow for plugging in application servers.

 

>120 developers since day 1

avg 26-36 per month that commit

 

Bugs database – http://bugs.launchpad.net/drizzle

Brian Aker presents “Drizzle, Virtualizing and Scaling MySQL for the Future” at OSCon Data 2011

http://drizzle.org

irc.freenode.net #drizzle

http://blog.krow.net

@brianaker

2005 MySQL 5.0 released – web developers wanted tons of features that were not in the release (making replication better for instance)

2008 Sun buys MySQL

2008 MySQL 6.0 is forked to become Drizzle

2009 Oracle buys Sun

2010 Drizzle developers leave Oracle

2011 First GA release, Drizzle7

MySQL’s Architecture – monolithic kernel, not very modular, lots of interdependence.

Drizzle has a microkernel, which includes a listener, parser, optimizer, executioner, storage system, logging/error reporting.

Drizzle can accept SQL and http blog streaming, and memcached and gearman can easily talk to Drizzle.

Drizzle has tried to have no “gotchas”

– If you make a new field with NOT NULL, MySQL makes new values NULL.  Drizzle does not do this.

– No hacky ALTER TABLE

– Real datetime (64 bit), including microseconds

– IPV6 (apparently this is a strong reason for people switching, to support IPV6)

– No updates that complete halfway

– Default character set is UTF-8, default collation is utf8-general (charset in latin1 by default in MySQL, collation is latin1_swedish_ci – “case insensitive” by default)

Replication

– In MySQL, replication is kind of hacky [this is my summary and opinion, but it’s basically what Brian said]

– Drizzle is Google Protocol Buffer Based

– Replicates row transformations

– Integrates with RabbitMQ, Cassandra, Memcached, Gearman — right now.

DML and MySQL binary logs analog:

– DML is stored transactionally by delta in Drizzle

– InnoDB is already logging, no need to add another log for the binary log.  So it just logs DML to the transaction log.

LibDrizzle

– supports Drizzle, MySQL, SQLite

– Asynchronous

– BSD, so Lawyer-free

What else?

– No cost authentication (pam, ldap, htaccess, …)

– Table functions (new data dictionary, including performance and thread information).  INFORMATION_SCHEMA in Drizzle is *exactly* what’s specified in the SQL standard.

– Data types – native type for UUID, boolean, all known types (except SET, because it’s broken by design)

– Assertions are in Drizzle, you can ask what the type of the result of combining multiple data types will be.

– About 80 conflicts in the Drizzle parser as opposed to about 200 in the MySQL parser

Roadmap – Drizzle7+

– Replication – faster than MySQL and also allows multiple masters.

Virtualization:

Virtualizing a database gives you about a 40% performance hit.  How can costs be cut?  In MySQL 5.0 the Instance Manager was created to solve that but it hasn’t really been worked on.  Drizzle has worked on virtualizing databases internally within Drizzle.

– So drizzle now has catalogs.  

– One catalog has its own set of users, its own schema with tables, etc.

– A catalog is its own sandbox; there is no syntax that allows you to connect from one catalog to another, so there’s no security problems.  

– Cuts the 30/40% hit from virtualizing

– Single instance maintenance – only 1 OS and 1 database to configure, unlike VMs

    – Currently only one database configuration so there’s one global config for shared memory such as innodb buffer pool, but that will change in the future.

– Still allows for I/O spread on SAN/NAS

 

In Drizzle 7.1 – Percona’s xtrabackup supports Drizzle, and ships with drizzle.  xtrabackup supports full and partial backups, no locking, single solution for point-in-time recovery in a single solution.  Because transaction log is stored in database, replication is automatically consistent with the database.  Currently does not do incremental backups with the transaction logs but that’s in the future.

DBQP:

– consolidates standard testing tasks, server/test management, reporting, REGRESSION TESTING

– extended architecture allows for complex testing scenarios

– pluggable – supports new testing tools

– randgen, sql-bench, crashme, sysbench, standard drizzle-test-run suite

– Keeping tools and testing configurations in-tree facilitates testing for everyone

– supported by SkySQL

 

Dynamic SQL/execute()

– New UTF-8 parser

– Being extended to allow for plugging in application servers.

 

>120 developers since day 1

avg 26-36 per month that commit

 

Bugs database – http://bugs.launchpad.net/drizzle

Liveblogging at OSCON Data: MongoDB Schema Design

Dwight Merriman gives “MongoDB Schema Design” at OSCon Data 2011.

@dmerr

 

RDBMS / MongoDB

relational / document-oriented

database / database

table / collection

row / JSON (BSON) document

index / index

column / field (dynamic/not predeclared)

SQL / Mongo query language (JSON)

Join / Embedding & Linking

Primary Key / _id field

 

Schema design is coupled with what you want to do:

– Dynamic queries

– Secondary indexes

– Atomic updates

– Map Reduce

Considerations:

– no joins

– atomic operations are at the single document level only

– types of queries/updates to do

– sharding

– speed

 

This is the commandline mongo interface but all this can be done in any (modern) programming language.

post = {author: “Herge”

date: new Date(),

text: “Destination Moon”,

tags: [“comic”, “adventure”]}

> db.posts.insert(post)

“posts” is the collection name.  Documents are analogous to rows but can be more complex.  Documents for one collection are grouped together.

> db.posts.find()

{ _id: ObjectId(“4c4ba5c0672….”),

 

author: “Herge”

date: new Date(),

text: “Destination Moon”,

tags: [“comic”, “adventure”]}

_id must exist and must be unique — if you don’t create an _id, one will be made for you, 12 bytes BSON, shorter than a normal UUID but that’s OK because you don’t need it to be unique globally, just on this db cluster.

Secondary index, on “author”:

> db.posts.ensureIndex({author: 1}) — “1” means ascending, -1 is descending

> db.posts.find({author: ‘Herge’}).explain() — shows you the explain plan

 

Multi-key indexes

//build an index on the “tags” array

> db.posts.ensureIndex({tags: 1})

Arrays are exploded and every element of the array will be indexed, and added separately to the B-tree data structure of the index.

> db.posts.find){tags: ‘comic’})

MongoDB assumes, when you query an array, that you mean you’re looking for an array item that matches.

 

Query operators

Conditional operators:

 

$ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte

 

 

 

 

 

 

 

 

 

 

Update operators:

$set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Extending the Schema:

new_comment = {Author: “Kyle”,

date: new Date(),

text: “great book!”

votes: 5}

db.posts.update( text: “Destination Moon” }, — this is the WHERE filter

{ ‘$push’: {comments: new_comment}, — do this

‘$inc’: {comments_count: 1}}) — and do this

If you push the comments array without it being there, it will create it without a problem.

 

> db.posts.ensureIndex({“comments.author”:1 })

> db.posts.find({comments.author:”Kyle”})

> db.posts.find({comments.text:”good book”})

The ‘dot’ operator

 

Find all posts with >50 comments,
> db.posts.findIndex({comments.votes: {$gt: 50}})
Not as robust as all the operators in SQL, but it’s pretty good, and more concise than SQL.  Over time more expressions will be added.
Find all posts with >50 comments, order by author ascending
> db.posts.findIndex({comments.votes: {$gt: 50}}).sort(author: 1)
No functional indexes (indexes on functions of fields)
If you add an index to a non-existent field, it returns NULL (which is necessary because not all documents have the same fields).
From a schema design standpoint, the point of MongoDB is to make the documents rich.  He puts up an example of a sales order, with many line items, an address field that has name, street, zip, cc field that has number, exp date.
There is model inheritance, for instance if you have
> db.shapes.find()
{_id: 1, type “circle”, area: 3.14, radius: 1}
{_id: 2, type “square”, area: 4, d: 2}
{_id: 3, type “rect”, area: 10, length: 5, width: 2}
All shapes have area, but the other dimensions are different based on the shape.
> db.shapes.find ({radius: {$gt: 0}})
— automatically finds only circles.
Note that this avoids the need to join for 1:many and many:many relationships, as in relational model.
That was embedding, now let’s talk about linking.
– done client-side

 

So for instance, a one to many relationship might look like this — in addition to the collection for posts, a collection for authors with author info:
// collection authors
{ _id: “Herge”
email: “h@foo.com”,
karma: 14.142 
}
> var p = db.posts.findOne()
> var author = db.authors.findOne({_id:p.author})
> print(author.email)
If it’s a “contains” relationship you want to embed
If you need more flexibility than that, link
Rich documents are usually easy to query

 

Rich documents are great for performance

 

Rich documents give more atomicity capability
{
account: “abc”,
debits: 21,
credits: 11
}
> db.stuff.update({account:’abc’},
{$inc:{debits:21},{$dec:{credits:11}})
Caching is based on 4k pages, so if you have very small documents, that can be a problem if you are pulling from many collections.
Trees in MongoDB:
{ comments: [
   { author: “Kyle”, text: “…”,
     replies: [
            { author: “Fred”, text: “…”,
                replies: []}
       ]}
]}
Mongo doesn’t search recursively so while this is great for display, not great for search.
> t = db.mytree;
> t.find()
{ “_id” : “a” }
{ “_id” : “b”, “ancestors” : [ “a” ], “parent” : “a” }
{ “_id” : “c”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “d”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “e”, “ancestors” : [ “a” ], “parent” : “a” }
{ “_id” : “f”, “ancestors” : [ “a”, “e” ], “parent” : “e” }
{ “_id” : “g”, “ancestors” : [ “a”, “b”, “d” ], “parent” : “d” }
> t.ensureIndex( { ancestors : 1 } )
> // find all descendents of b:
> t.find( { ancestors : ‘b’ })
{ “_id” : “c”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “d”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “g”, “ancestors” : [ “a”, “b”, “d” ], “parent” : “d” }
> // get all ancestors of f:
> anc = db.mytree.findOne({_id:’f’}).ancestors
[ “a”, “e” ]
> db.mytree.find( { _id : { $in : anc } } )
{ “_id” : “a” }
{ “_id” : “e”, “ancestors” : [ “a” ], “parent” : “a” }
Limit of document size is 16 Mb per document.  Was 4 Mb, will keep increasing probably based on Moore’s Law.  This is arbitrary just as a safety measure.
BSON is not compressed due to wanting to scan quickly.
Can do Queueing too, use the findAndModify method to find the highest priority job and mark as in-progress – see http://www.mongodb.org/display/DOCS/findAndModify+Command
The operation is fast, however the entire document is locked — not ideal, but concurrency is getting better and better in MongoDB.
MongoDB tries to do update in place, if you are adding to the document such that it does not fit in the allocation unit, it has to be deleted and reinserted.  This is expensive, hence the allocation unit – an adaptive padding factor based on the collection unit.

 

 

 

 

 

 

 

Dwight Merriman gives “MongoDB Schema Design” at OSCon Data 2011.

@dmerr

 

RDBMS / MongoDB

relational / document-oriented

database / database

table / collection

row / JSON (BSON) document

index / index

column / field (dynamic/not predeclared)

SQL / Mongo query language (JSON)

Join / Embedding & Linking

Primary Key / _id field

 

Schema design is coupled with what you want to do:

– Dynamic queries

– Secondary indexes

– Atomic updates

– Map Reduce

Considerations:

– no joins

– atomic operations are at the single document level only

– types of queries/updates to do

– sharding

– speed

 

This is the commandline mongo interface but all this can be done in any (modern) programming language.

post = {author: “Herge”

date: new Date(),

text: “Destination Moon”,

tags: [“comic”, “adventure”]}

> db.posts.insert(post)

“posts” is the collection name.  Documents are analogous to rows but can be more complex.  Documents for one collection are grouped together.

> db.posts.find()

{ _id: ObjectId(“4c4ba5c0672….”),

 

author: “Herge”

date: new Date(),

text: “Destination Moon”,

tags: [“comic”, “adventure”]}

_id must exist and must be unique — if you don’t create an _id, one will be made for you, 12 bytes BSON, shorter than a normal UUID but that’s OK because you don’t need it to be unique globally, just on this db cluster.

Secondary index, on “author”:

> db.posts.ensureIndex({author: 1}) — “1” means ascending, -1 is descending

> db.posts.find({author: ‘Herge’}).explain() — shows you the explain plan

 

Multi-key indexes

//build an index on the “tags” array

> db.posts.ensureIndex({tags: 1})

Arrays are exploded and every element of the array will be indexed, and added separately to the B-tree data structure of the index.

> db.posts.find){tags: ‘comic’})

MongoDB assumes, when you query an array, that you mean you’re looking for an array item that matches.

 

Query operators

Conditional operators:

 

$ne, $in, $nin, $mod, $all, $size, $exists, $type, $lt, $lte, $gt, $gte

 

 

 

 

 

 

 

 

 

 

Update operators:

$set, $inc, $push, $pop, $pull, $pushAll, $pullAll

Extending the Schema:

new_comment = {Author: “Kyle”,

date: new Date(),

text: “great book!”

votes: 5}

db.posts.update( text: “Destination Moon” }, — this is the WHERE filter

{ ‘$push’: {comments: new_comment}, — do this

‘$inc’: {comments_count: 1}}) — and do this

If you push the comments array without it being there, it will create it without a problem.

 

> db.posts.ensureIndex({“comments.author”:1 })

> db.posts.find({comments.author:”Kyle”})

> db.posts.find({comments.text:”good book”})

The ‘dot’ operator

 

Find all posts with >50 comments,
> db.posts.findIndex({comments.votes: {$gt: 50}})
Not as robust as all the operators in SQL, but it’s pretty good, and more concise than SQL.  Over time more expressions will be added.
Find all posts with >50 comments, order by author ascending
> db.posts.findIndex({comments.votes: {$gt: 50}}).sort(author: 1)
No functional indexes (indexes on functions of fields)
If you add an index to a non-existent field, it returns NULL (which is necessary because not all documents have the same fields).
From a schema design standpoint, the point of MongoDB is to make the documents rich.  He puts up an example of a sales order, with many line items, an address field that has name, street, zip, cc field that has number, exp date.
There is model inheritance, for instance if you have
> db.shapes.find()
{_id: 1, type “circle”, area: 3.14, radius: 1}
{_id: 2, type “square”, area: 4, d: 2}
{_id: 3, type “rect”, area: 10, length: 5, width: 2}
All shapes have area, but the other dimensions are different based on the shape.
> db.shapes.find ({radius: {$gt: 0}})
— automatically finds only circles.
Note that this avoids the need to join for 1:many and many:many relationships, as in relational model.
That was embedding, now let’s talk about linking.
– done client-side

 

So for instance, a one to many relationship might look like this — in addition to the collection for posts, a collection for authors with author info:
// collection authors
{ _id: “Herge”
email: “h@foo.com”,
karma: 14.142 
}
> var p = db.posts.findOne()
> var author = db.authors.findOne({_id:p.author})
> print(author.email)
If it’s a “contains” relationship you want to embed
If you need more flexibility than that, link
Rich documents are usually easy to query

 

Rich documents are great for performance

 

Rich documents give more atomicity capability
{
account: “abc”,
debits: 21,
credits: 11
}
> db.stuff.update({account:’abc’},
{$inc:{debits:21},{$dec:{credits:11}})
Caching is based on 4k pages, so if you have very small documents, that can be a problem if you are pulling from many collections.
Trees in MongoDB:
{ comments: [
   { author: “Kyle”, text: “…”,
     replies: [
            { author: “Fred”, text: “…”,
                replies: []}
       ]}
]}
Mongo doesn’t search recursively so while this is great for display, not great for search.
> t = db.mytree;
> t.find()
{ “_id” : “a” }
{ “_id” : “b”, “ancestors” : [ “a” ], “parent” : “a” }
{ “_id” : “c”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “d”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “e”, “ancestors” : [ “a” ], “parent” : “a” }
{ “_id” : “f”, “ancestors” : [ “a”, “e” ], “parent” : “e” }
{ “_id” : “g”, “ancestors” : [ “a”, “b”, “d” ], “parent” : “d” }
> t.ensureIndex( { ancestors : 1 } )
> // find all descendents of b:
> t.find( { ancestors : ‘b’ })
{ “_id” : “c”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “d”, “ancestors” : [ “a”, “b” ], “parent” : “b” }
{ “_id” : “g”, “ancestors” : [ “a”, “b”, “d” ], “parent” : “d” }
> // get all ancestors of f:
> anc = db.mytree.findOne({_id:’f’}).ancestors
[ “a”, “e” ]
> db.mytree.find( { _id : { $in : anc } } )
{ “_id” : “a” }
{ “_id” : “e”, “ancestors” : [ “a” ], “parent” : “a” }
Limit of document size is 16 Mb per document.  Was 4 Mb, will keep increasing probably based on Moore’s Law.  This is arbitrary just as a safety measure.
BSON is not compressed due to wanting to scan quickly.
Can do Queueing too, use the findAndModify method to find the highest priority job and mark as in-progress – see http://www.mongodb.org/display/DOCS/findAndModify+Command
The operation is fast, however the entire document is locked — not ideal, but concurrency is getting better and better in MongoDB.
MongoDB tries to do update in place, if you are adding to the document such that it does not fit in the allocation unit, it has to be deleted and reinserted.  This is expensive, hence the allocation unit – an adaptive padding factor based on the collection unit.

 

 

 

 

 

 

 

Presentations/Videos

Video: Henrik Ingo’s “Buildbot & EC2: how to qa and package your db product”

And another video from OpenDBCamp is online….Today it is from Henrik Ingo – “Buildbot & EC2: how to qa and package your db product”.

I cannot seem to find the slides Henrik used, but the video is now online at http://www.youtube.com/watch?v=07tsdSvR5C0.

By the way, there are plans to video *all* of the sessions at OSCon Data next week (MySQL and otherwise), which is what made me look in my “video to do” folder earlier this week and realize that I had not yet put up all the OpenDBCamp videos yet!

And another video from OpenDBCamp is online….Today it is from Henrik Ingo – “Buildbot & EC2: how to qa and package your db product”.

I cannot seem to find the slides Henrik used, but the video is now online at http://www.youtube.com/watch?v=07tsdSvR5C0.

By the way, there are plans to video *all* of the sessions at OSCon Data next week (MySQL and otherwise), which is what made me look in my “video to do” folder earlier this week and realize that I had not yet put up all the OpenDBCamp videos yet!

Video: High Performance Search with BlackRay Data Engine

I realized yesterday that I never did finish putting up the videos from this year’s Open Database Camp back in May, so I’m working on finishing that in the next few weeks.

Today I put up High Performance Search with BlackRay Data Engine – Felix Schupp.  The slides are on the web at http://www.slideshare.net/fschupp/blackray-the-open-source-data-engine-2011683, and the video is on YouTube at http://www.youtube.com/watch?v=oCB3ZXfc8Rs YouTube video

I realized yesterday that I never did finish putting up the videos from this year’s Open Database Camp back in May, so I’m working on finishing that in the next few weeks.

Today I put up High Performance Search with BlackRay Data Engine – Felix Schupp.  The slides are on the web at http://www.slideshare.net/fschupp/blackray-the-open-source-data-engine-2011683, and the video is on YouTube at http://www.youtube.com/watch?v=oCB3ZXfc8Rs YouTube video

Why We Need the Bugs Database to Stay Public

There are been many non-quantifiable (but valid) reasons why the bugs database at http://bugs.mysql.com should remain open and have as many bugs open to the public as possible.  However, while researching an article recently I uncovered a simple, concrete reason why the bugs database needs to stay as open as possible:

So we know when to upgrade.  Really!  I was looking at the CHANGELOG pages for MySQL 5.5 and was trying to figure out which changes were pushed into MySQL 5.5 along with being pushed into MySQL 5.1 and which were unique to MySQL 5.5.  

While it is fairly simple to figure out some of that — for instance, all of the patches regarding semisynchronous replication are unique to MySQL 5.5 – it was not clear on most of the fixes, particularly the bug patches.  Luckily we have the bugs database so we can double-check this.

Let’s say you work for an organization that is contemplating whether or not to upgrade from MySQL 5.1.50 to MySQL 5.5.  Looking at the change logs for MySQL 5.5.0, if you can manage to read the entirety of the page, you’ll see a change that notes:

Appending values to an ENUM or SET definition is a metadata change for which ALTER TABLE need not rebuild the table, but it was being rebuilt anyway. (Bug #45567)

Well, that sounds like a great bugfix – making an ALTER TABLE operation that’s supposed to be online, actually an online change.  However, if you look at the bug itself, you will note that it was “Pushed into 5.1.40” also.  Thus, already being on MySQL 5.1.50, you already have that change.

Without the bugs database it would be virtually impossible to know this — you’d have to cross-reference the changelogs for all the MySQL 5.1 minor versions, or at least the ones that were being worked on in that timeframe….which might narrow it down to 2 or 3 versions, but it’s still excessive.

And of course as fewer and fewer of the bugs are public, we are left trying to figure out what exactly the description on the CHANGELOG page is, without having the ability to search the bugs database.

So please, keep the bugs database as open as possible.

Not just so we don’t submit duplicates, wasting Oracle engineers’ (and our own) time.  

Not just so we can submit patches for fixes.

Not just so we can see that a bug exists or is being worked on.

Not just so we can know what a bugfix really entails.

But also so we can know what bugs actually affect us, and what will actually be fixed on an upgrade from one version to the next.

There are been many non-quantifiable (but valid) reasons why the bugs database at http://bugs.mysql.com should remain open and have as many bugs open to the public as possible.  However, while researching an article recently I uncovered a simple, concrete reason why the bugs database needs to stay as open as possible:

So we know when to upgrade.  Really!  I was looking at the CHANGELOG pages for MySQL 5.5 and was trying to figure out which changes were pushed into MySQL 5.5 along with being pushed into MySQL 5.1 and which were unique to MySQL 5.5.  

While it is fairly simple to figure out some of that — for instance, all of the patches regarding semisynchronous replication are unique to MySQL 5.5 – it was not clear on most of the fixes, particularly the bug patches.  Luckily we have the bugs database so we can double-check this.

Let’s say you work for an organization that is contemplating whether or not to upgrade from MySQL 5.1.50 to MySQL 5.5.  Looking at the change logs for MySQL 5.5.0, if you can manage to read the entirety of the page, you’ll see a change that notes:

Appending values to an ENUM or SET definition is a metadata change for which ALTER TABLE need not rebuild the table, but it was being rebuilt anyway. (Bug #45567)

Well, that sounds like a great bugfix – making an ALTER TABLE operation that’s supposed to be online, actually an online change.  However, if you look at the bug itself, you will note that it was “Pushed into 5.1.40” also.  Thus, already being on MySQL 5.1.50, you already have that change.

Without the bugs database it would be virtually impossible to know this — you’d have to cross-reference the changelogs for all the MySQL 5.1 minor versions, or at least the ones that were being worked on in that timeframe….which might narrow it down to 2 or 3 versions, but it’s still excessive.

And of course as fewer and fewer of the bugs are public, we are left trying to figure out what exactly the description on the CHANGELOG page is, without having the ability to search the bugs database.

So please, keep the bugs database as open as possible.

Not just so we don’t submit duplicates, wasting Oracle engineers’ (and our own) time.  

Not just so we can submit patches for fixes.

Not just so we can see that a bug exists or is being worked on.

Not just so we can know what a bugfix really entails.

But also so we can know what bugs actually affect us, and what will actually be fixed on an upgrade from one version to the next.

Liveblogging: Cassandra Internals

Cassandra Internals by Gary Dusbabek of Rackspace
Questions?
What’s the best way to access data if you’re running a program in the same JVM as Cassandra? — will talk about it during StorageProxy section of the talk
Performance characteristics of using MMAP vs. not using it? – won’t cover it.
When does repair happen?  will talk about it during repair part of the talk
How do Snitch and replication strategy work together? — will discuss though there is no slide on it.
Ring services – services that go throughout the ring.  These are in a class called StorageService.
Storage services – things that happen locally.  In a class called StorageProxy.
The cassandra executable in /bin executes cassandra.in.sh, which does:
– sets $CLASSPATH
– looks for the .jar files
– sets $CASSANDRA_CONF (mandatory, where yaml file lives)

 

– sets $CASSANDRA_HOME (not mandatory)
then it looks for another file [didn’t get what it was] which:
– determines heap size
– sets max heap size by default to 1/2 available memory
– sets the size for the young generation for Java GC
– sets “a whole bunch of other -X options for Java”
… then it goes to the main() class, org.Apache.Cassandra.Thrift.CassandraDaemon, which:
extends AbstractCassandraDaemon, the guts of the startup sequence.  Has a method called setup(), raises config file from a Database Descriptor class.  
“Database Descriptor is an awful class.”
– loads yaml file, reads into a config object, gets all the settings.
– then calls DatabaseDescriptor.loadSchemas() and loads the schema based on the last versionID, and sets them up to store them in the system column families (in the system datadir, schema column family).
– scrubs the data directories, takes out the trash (e.g. leftovers from compaction, bits and pieces from other SS tables)
– initializes the storage (keyspaces + CFs)
– Commit log recovery: CommitLog.recover() (row mutations)
– StorageService.initServer() and StorageService.joinTokenRing — this is where the magic of joining the ring happens
  — starts gossip
  — starts MessagingService
  — Negotiates bootstrap
  — knowledge of ring topology is in StorageService.tokenMetadata_ (btw underscore at end of a member variable means it’s old facebook stuff, b/c that’s their naming convention)
  — partitioner is also here.
Configuration
– in DatabaseDescriptor, really a side effect of AbstractCassandraDaemon.setup
– reads config settings from yaml
– defines system tables
– changes regularly
It uses a static initializer, so we might end up making a change that happens when we’re not ready for it.
MessagingService
– Verb handlers live here (initialized from StorageService)
 — main event handlers, haven’t changed much
– Socket listener
 — 2 threads per ring node
– Message gateway
 — MessagingService.sendRequestResponse()
 — MessagingService.sendOneWay()
 — MessagingService.receive() — when another node contacts you, this is the method that’s used to pass the message to a verb handler
– Messages are versioned starting in 0.8
 — with IncomingTCPConnection
StageManager – fancy java ThreadPoolExecutor
– SEDA design: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
Adding the API Methods
– open up cassandra.thrift file in the interface directory, this is where you describe methods and new data structures
– regenerate files with ant gen-thrift-java gen-thrift-py
– implement stubs: o.a.c.thrift.CassandraServer
StorageProxy – where local reads and writes happen.
– Called from o.a.c.thrift.CassandraServer
– write path changed in new version b/c of counters
 — notion of WritePerformer
– eventually to Table and others
– for reads, there’s a local read path and remote read path
 — Socket->CassandraServer.  Looks at permissions, request validation, and marshalling.  
ReadCommands created in CS.multigetSiceinternal, passed to StorageProxy — 1 per key.
StorageProxy iterates over the ReadCommands, then runs StorageProxy.read(), .fetchRows(), determines endpoints.
Locally, StorageProxy:
– READ stage executes a LocalReadRunnable
– True read vs. digest
– Table, ColumnFamilyStore
Remotely, StorageProxy:
– serializes read command
– Response handler
– Send to remote nodes
ReadRepair happens in StorageProxy.fetchRows()
Writing — follows similar pattern to reads — there is a local path and remote path.
– The marshalling turns into row mutations in CS.doInsert()
– StorageProxy.sendToHintedEndpoints
– RowMutation – one key per row (several CFs), so it calls ColumnFamilyStores.apply() to update the memtables.
RowMutation is serialized into a Message.
Then he goes into the challenges of working with the code, which I won’t reproduce here.
Cassandra Internals by Gary Dusbabek of Rackspace
Questions?
What’s the best way to access data if you’re running a program in the same JVM as Cassandra? — will talk about it during StorageProxy section of the talk
Performance characteristics of using MMAP vs. not using it? – won’t cover it.
When does repair happen?  will talk about it during repair part of the talk
How do Snitch and replication strategy work together? — will discuss though there is no slide on it.
Ring services – services that go throughout the ring.  These are in a class called StorageService.
Storage services – things that happen locally.  In a class called StorageProxy.
The cassandra executable in /bin executes cassandra.in.sh, which does:
– sets $CLASSPATH
– looks for the .jar files
– sets $CASSANDRA_CONF (mandatory, where yaml file lives)

 

– sets $CASSANDRA_HOME (not mandatory)
then it looks for another file [didn’t get what it was] which:
– determines heap size
– sets max heap size by default to 1/2 available memory
– sets the size for the young generation for Java GC
– sets “a whole bunch of other -X options for Java”
… then it goes to the main() class, org.Apache.Cassandra.Thrift.CassandraDaemon, which:
extends AbstractCassandraDaemon, the guts of the startup sequence.  Has a method called setup(), raises config file from a Database Descriptor class.  
“Database Descriptor is an awful class.”
– loads yaml file, reads into a config object, gets all the settings.
– then calls DatabaseDescriptor.loadSchemas() and loads the schema based on the last versionID, and sets them up to store them in the system column families (in the system datadir, schema column family).
– scrubs the data directories, takes out the trash (e.g. leftovers from compaction, bits and pieces from other SS tables)
– initializes the storage (keyspaces + CFs)
– Commit log recovery: CommitLog.recover() (row mutations)
– StorageService.initServer() and StorageService.joinTokenRing — this is where the magic of joining the ring happens
  — starts gossip
  — starts MessagingService
  — Negotiates bootstrap
  — knowledge of ring topology is in StorageService.tokenMetadata_ (btw underscore at end of a member variable means it’s old facebook stuff, b/c that’s their naming convention)
  — partitioner is also here.
Configuration
– in DatabaseDescriptor, really a side effect of AbstractCassandraDaemon.setup
– reads config settings from yaml
– defines system tables
– changes regularly
It uses a static initializer, so we might end up making a change that happens when we’re not ready for it.
MessagingService
– Verb handlers live here (initialized from StorageService)
 — main event handlers, haven’t changed much
– Socket listener
 — 2 threads per ring node
– Message gateway
 — MessagingService.sendRequestResponse()
 — MessagingService.sendOneWay()
 — MessagingService.receive() — when another node contacts you, this is the method that’s used to pass the message to a verb handler
– Messages are versioned starting in 0.8
 — with IncomingTCPConnection
StageManager – fancy java ThreadPoolExecutor
– SEDA design: http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf
Adding the API Methods
– open up cassandra.thrift file in the interface directory, this is where you describe methods and new data structures
– regenerate files with ant gen-thrift-java gen-thrift-py
– implement stubs: o.a.c.thrift.CassandraServer
StorageProxy – where local reads and writes happen.
– Called from o.a.c.thrift.CassandraServer
– write path changed in new version b/c of counters
 — notion of WritePerformer
– eventually to Table and others
– for reads, there’s a local read path and remote read path
 — Socket->CassandraServer.  Looks at permissions, request validation, and marshalling.  
ReadCommands created in CS.multigetSiceinternal, passed to StorageProxy — 1 per key.
StorageProxy iterates over the ReadCommands, then runs StorageProxy.read(), .fetchRows(), determines endpoints.
Locally, StorageProxy:
– READ stage executes a LocalReadRunnable
– True read vs. digest
– Table, ColumnFamilyStore
Remotely, StorageProxy:
– serializes read command
– Response handler
– Send to remote nodes
ReadRepair happens in StorageProxy.fetchRows()
Writing — follows similar pattern to reads — there is a local path and remote path.
– The marshalling turns into row mutations in CS.doInsert()
– StorageProxy.sendToHintedEndpoints
– RowMutation – one key per row (several CFs), so it calls ColumnFamilyStores.apply() to update the memtables.
RowMutation is serialized into a Message.
Then he goes into the challenges of working with the code, which I won’t reproduce here.