Summer Fun with Cincom Smalltalk™ Objects and MongoDB
What Is NoSQL?
In the old days, database normalization was the thing. Data was to be broken apart into its tiniest constituent pieces, with each being a component stored in a separate table. Addresses were different from invoices, so they belonged in different tables with no wasteful overlap or duplication. But that was then. Now there is NoSQL (Not Only SQL), where gobs of related data are kept together in one piece—a single record as it were—typically in text format using JSON. For example, an entire customer transaction might be in the same “row” of a NoSQL table. This makes a normalization fanatic uncomfortable! But there is a great gain in speed.
Why Use NoSQL?
Big Data doesn’t have time to retrieve stuff from many different tables (customer, invoice, shipping, etc.) in order to complete a transaction. The time it takes for a hard drive to seek to a sector is substantial (fractions of a second), but once reached, a contiguous read from that sector takes only a tiny fraction of the seek time. Gathering data from multiple tables multiplies that seek time. It is much faster to grab everything in one go instead of having to fetch from a number of different tables. Space, or disk storage, is cheap, but real-time speed is crucial. So in the Big Data world, speed wins over space.
What Is MongoDB?
MongoDB is a popular NoSQL database. It uses BSON (B for binary) format for storage, which is very similar to JSON. In MongoDB, a table is called a “collection,” and a row has been upgraded to “document.” Fields (columns) are still “fields,” but there is one huge difference with MongoDB: In any given collection (think, table), every document (row) can have the same or entirely different fields! This flexibility is optional. If your data permits, you are free to adopt the same fields for every document in your collection. Even with such apparent disorder, MongoDB claims to have very powerful indexing capabilities.
BSON and JSON Can Store Smalltalk Objects
Documents are composed of text and other primitive types encoded in BSON format. Remember that JSON is a way of describing an object so that it can readily be re-created. Smalltalk is made from objects, and MongoDB stores and retrieves objects. So there’s a very strong natural resonance between MongoDB and Smalltalk! So, why not use MongoDB as an “object repository” for Smalltalk objects? Borrowing existing code from the public repository, written by expert Smalltalkers from the community a while back, I tweaked a bit here and there to match the current MongoDB API, and set about building an object repository.
A Smalltalk Repository for Objects
A toy demonstration system fell into place fairly quickly, consisting of only two new classes. The code is available in the Cincom Public Store Repository as MongoObjectRepository version 1.1. MongoIDTracker manages the objects to store, and to retrieve, it holds some shared variables. For example, there is a Set of “DomainClasses,” each of which deserves its own collection (think, table) in MongoDB. There is an ObjectIDDictionary that has a separate dictionary for every class visited. In each of these are kept references to the in-image objects and their assigned serial IDs. The other new class is PendingObject—a kind of proxy (an idea stolen from Glorp) that can be fetched via its ID. More on that in a minute.
Objects Are Like Dictionaries that Contain (key : value) Pairs—One for Each Instance Variable
The idea is that Smalltalk objects map to JSON (okay, BSON) objects that live in the database. Most of the conversion code was created by the early pioneers that published to the Cincom repository. (Cincom also has JSON generating code, but the public code was ready to roll with MongoDB.) Most of my effort went into adding some extension methods on Object to convert any arbitrary class instance into a BSON object—basically, these steps: For each ivar, create a key->value pair with the ivar name and the ivar value. Add all of these to a dictionary, convert it to BSON and store it. Retrieval works in reverse.
Not So Fast – a Few Surprises
Well, there were a few complications. What if the ivar isn’t a primitive object like Integer or Float? In that case, we need to convert it to a BSON dictionary first and use that for its value and so on recursively until we reach primitives. So, a single class instance can result in a fairly large object graph, where the final nodes are all primitive objects. To save space and CPU time and to avoid infinite recursion when an object’s graph leads back to itself, the ObjectIDDictionary described above is used to record objects that we’ve already processed. On storage, before a BSON conversion is attempted, the ObjectIDDictionary is first consulted. If the object is found there, we simply record <#refID : id> as a stub. Later, during retrieval, when we encounter such a stub, a PendingObject is created with that ID. When the real object is finally retrieved, any corresponding PendingObjects are replaced by the real thing.
Okay, but It’s Just a Toy, Right?
Well, yes it’s just an exploration into the kind of things possible with MongoDB. Smalltalk objects are a natural fit. There are a few more methods to implement for some classes. Typically, most classes inherit these methods from Object, but some classes have up to three extension methods, which their subclasses can usually inherit.
>>asMongoDocument returns a Dictionary, the BSON version of the receiver. Primitive classes like Character need to return self. Some collections override it, and if the receiver has behaviorType #bytes (like some graphics objects), we punt and return a Dictionary based on BOSS.
>>registerAll registers the object with the ID tracker. The superclass implementation on Object works by default, but primitive objects and some collections must override this.
>>asObjectFromMongoDocument restores the original object from the parameter, a dictionary. Object’s implementation is overridden for some collections.
What Can It Do?
Maybe not a lot, but the idea is fun! For a look at some unit tests with examples, click here. Perhaps this will inspire someone to use MongoDB for their own projects. MongoDB has a powerful cloud-based back-end that’s being used by a large number of companies.
Ready to Try Cincom Smalltalk For Yourself?
If this story has inspired you to use Cincom Smalltalk and MongoDB together, here’s your chance to try it out for yourself. Download your personal use copy for free here.