Since I introduced MongoDB into use on Codaset, I've seen a considerable slow down in the page load of users dashboard. Especially dashboards which contain a lot of events. I spent quite a bit of time trying to figure out what was slowing the page down. First I spent time optimising the Mongo queries used to return the events, I then began to wonder if it was maybe something other than MongoDB that was the cause. But then I quickly ruled that out and returned to the Mongo queries.
Yesterday, I had a spare hour or two, so I hopped into the #mongodb IRC room and asked a few questions, which promptly got answered by the great guys at 10Gen (they build MongoDB). In short, Javascript queries are bad, and very, very slow!
Basically, Mongo does not currently support the OR operator. I've been assured that it is forthcoming, but right now there is no alternative other than to write your queries using the $where operator, which involves building a query in Javascript. While this is actually quite cool - and a little strange to be writing JS code within Ruby code - it's also extremely inefficient. As Uncle Ben once said; "with great power comes great great responsibility".
db.mycollection.find( { $where : function() { return this.a == 3 || this.b == 4; } } );
The above is a very simple example of using the $where operator and javascript. You pass it a javascript function, which can contain pretty much anything you want. And that is why it can be quite powerful and flexible. But it is also why it is very slow. The JS function is called on every single document of your collection. Not good!
My query was a little more involved than the above, as I needed to check several keys for their existence, and then match them against the values I provide. But I don't want to return all documents only if they are all matched. I want to return documents if they match any one of the queries.
One of the JS functions I was passing was something like this:
function(){
var includePrivate = true;
var projects = [1,2,3,4,5,6,7];
var members = [5,6,7,8,9,11,56,23,544];
function oc(a) {
var o = {};
for (var i=0; i<a.length; i++) o[a[i]]='';
return o;
}
if (this.group && this.group['id'] == #{group.id}) {
if (includePrivate || !this.project) return true;
if (this.project) {
if (this.project['state'] != 'private') return true;
if (projects.length > 0 && this.project['id'] in oc(projects)) return true;
}
}
if (projects.length > 0 && this.project && this.project['id'] in oc(projects)) {
if (includePrivate || this.project['state'] != 'private') return true;
}
if (this.user['id'] in oc(members)) {
if (includePrivate || !this.project) return true;
if (this.project) {
if (this.project['state'] != 'private') return true;
if (projects.length > 0 && this.project['id'] in oc(projects)) return true;
}
}
return false;
}
Yeah, I know!! Nasty, but necessary.
After a little more investigation, it appears that if I break the above down into several native Mongo queries, and concatenate the results together, then it returns those results much faster than the single query above. Which kicks ass!! I'd love to show you how the above translates into these single native queries, but I haven't actually converted this one yet, but needless to say, whatever it converts to, I have no doubt that it will be much faster. I have already converted the user events queries that were slowing down the user dashboard, which has given it an instant speed boost.
I always wondered why my Mongo indexes never seemed to have actually made any speed improvements! ;)
Codaset has always had fully integrated spam protection for all your projects tickets and comments provided by Akismet, along with an easy to use interface to manage that flagged spam. Letting you mark items as not spam (ham), or deleting any confirmed spam. But now, hot on the heels of this weeks major release, I'm happy to say that you can now turn off spam protection for any of your projects.
This means that the tickets and comments posted to a project that has spam protection turned off, will NOT be checked for possible spam. And also, any newly created private projects will now be created with spam protection turned off. It's a private project, so you don't really need to protect against spammers, as they cannot see the project anyway.
Of course, you can turn it on and of any time you want via your project's spam protection admin screen.
Enjoy!
It's been a long time coming but, since we launched back on March 1st, I've been working hard on a few really cool and very important things. And last night these were unleashed on the world. So lets runs through the list of what is new and improved...
First up, are groups! Codaset has always had projects and users at its heart, but these on their own can be a little limited. I wanted a way to gather a bunch of related projects together, and gather a few users into the mix. All under one easy to find screen. So that's we now have.
Groups are free, even private ones, and you can create one right now. Add any or all of your projects to any group you create, and invite other users to join. Or you can even just open your group up, and let anyone join themselves.
I'll be building even further upon groups in the future, but right now, please play in any way you can. Would love to see what you do with them.
As I mentioned in my previous post a few days back, Codaset has a somewhat complex events system, which is growing all the time. Both in terms of complexity, and in usage and disk space. So this release attempts to free the internal events system by introducing a MongoDB backed infrastructure. This makes it much easier to scale the database schema (MongoDB has no need for pre-defined schemas). If you want to know more, check out my post on how Codaset is using MongoDB.
Codaset's permissions and member roles system has always been pretty powerful. But now, it is even more powerful after having received some backend love by improving the code base ad database schema, and especially by popular demand, the UI is now much improved. Making it really easy to manage your projects [and your groups) members and their permissions. I have also added in a few extra permissions for you to control, and will be adding more.
The Codaset Push API now has even more events for you to push out to your webhooks, or to Twitter. The Push API already makes it easy for you to push project and user events to Twitter, so I added a few more, and also added support for Group events too.
And finally, it is now possible to move any of your projects to another Codaset user, effectively giving the project to a new owner. Just check your projects main admin/edit screen for more on this.
Obviously, there are a lot more bits and bobs that I included in this release, as well as the usual slew of bug fixes, but I really hope that you like the new stuff that I mentioned above. The next major release will be all about the developer API, which I have already started work on. But until then, please let me know what you think of these improvements, and enjoy!
So immediately after launching the final release of Codaset back at the start of the month, I began work on implementing support for groups, which will allow you to manage any number of projects and users under a single umbrella. As I began, I realized that the current permissions system would need a little work in order to support groups. Then I remembered a few tickets requesting a few more specific permissions, allowing you to control even more of what can or cannot be done within a project.
So I figured I would kill two birds with one stone, and the permissions system was given a complete rewrite. In fact, it is almost at a stage when I could release it as a Rails plugin. But I think it's a little too specific right now. But we shall see.
Anyway, I then started adding event types for groups. Events in Codaset are what make up your activity streams, and are quite an integral part of the site. Currently, most actions that you perform on Codaset are logged as an event into a single MySQL table. It looks like this:
mysql> describe events;
+-----------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | MUL | 1 | |
| project_id | int(11) | YES | MUL | NULL | |
| action | varchar(50) | NO | | NULL | |
| data | text | YES | | NULL | |
| notes | text | YES | | NULL | |
| user_email | varchar(255) | YES | | NULL | |
| target_id | int(11) | YES | | NULL | |
| target_type | varchar(20) | YES | MUL | NULL | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| secondary_target_id | int(11) | YES | MUL | NULL | |
| secondary_target_type | varchar(20) | YES | | NULL | |
+-----------------------+--------------+------+-----+---------+----------------+
13 rows in set (0.01 sec)
Not the nicest of schema's I know, and it's a little clunky when trying to show this data into some sort of activity stream. The data contained within it could be about any number of models; projects, users, blogs, wiki pages, comments, commits, pushes, etc. etc. And each of these models has it's own mostly unique set of attributes. There are currently 104,226 rows in this events table, and it's growing at a steady rate.
So after reading some really interesting and very intriguing posts by John Nunemaker about a NoSQL database called MongoDB, I began shopping around for possible solutions to cleaning up events, and making them easier to work with. Needless to say, the NoSQL movement is pretty active at the moment, but I kept coming back to MongoDB.
MongoDB (from "humongous") is a scalable, high-performance, open source, dynamic-schema, document-oriented database. Written in C++, MongoDB features:
- Document-oriented storage (the power and flexibility of JSON-like data schemas)
- Dynamic queries
- Full index support, including secondary indexes, inner-objects, embedded arrays, geospatial
- Query profiling
- Fast, in-place updates
- Efficient storage of binary data large objects (e.g. photos and videos)
- Replication and fail-over support
- Auto-sharding for cloud-level scalability
- MapReduce for complex aggregation
- Commercial Support, Training, and Consulting
MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems.
Sounds perfect, and a lot of the examples and tutorials I read seem to talk mostly about an events system. So why the hell not, I thought.
Now I have to admit, that my Mongo-Foo is not the best, so I know that what I have come up with to replace the current MySQL backed events system will most likely leave a lot to be desired. But right now, I am importing the data from the MySQL table to the new MongoDB events collection (table). It's half way through having imported almost 50,000 records, but has only needed to create just short of 15,000 records. So obviously we're winning on space savings.
Using MongoDB simply means we can keep all related data for each event within the same record, all done via embedded data, and we get this all returned in a nice Ruby hash. I can also add and remove columns anytime I want. So when more events are introduced, I don't have to modify any database table schema's or run any migrations. I can chose write my code, and the data will be inserted and saved without having to define a new column. It's very liberating, I tell ya.
I was initially concerned with how I would be able to combine the use of MySQL and MongoDB within the same app, but choosing to use the MongoMapper gem really helped. I just created a new Event model, and included MongoMapper. Of course I don't have the luxury of automatic associations between ActiveRecord (MySQL) models and Mongo models, but right now I don't need to. Pretty much all the data I need for each event is already embedded within itself. So I very rarely need to fetch data from any other models.
And another thing... Mongo is fast, really fast.
I could go on further, but I only really wanted to introduce you to Mongo and what I am doing with it in Codaset. I hope to post more technical snippets about it some time in the future. But until then, I hope you like the improvements I am making - even though they are mainly ones that you don't easily see. I'm also thinking about expanding my use of Mongo into other areas of Codaset.
Woohoo! I never thought I'd make it this far, but episode 2 is here.
This week, I show you a web based ambilight video using HTML5 and Canvas. I also show you a new VPS deployment tool called Vagrant, and a great way to try and play with Redis. Then I talk about CoffeeScript, Javascript's alter-ego. Hope you enjoy this episode, and if you have any feedback or suggestions, please leave your comments here. And don't forget to subscribe via iTunes and check us out on Blip.TV
Show Links...
- Ambilight Sample; video and canvas
- Phusion
- Vagrant
- Try Redis
- CoffeeScript