Choosing the data model for a project requires an extended conversation about the project itself. Its plans, goals and dreams. What does the project hope to be? It should not be a “choose your database and then fit your model”. You don’t buy a Ferrari and then see how much lumber you can fit on top: you choose the car for the purpose that you need it for.

For the BOBJ data model, several considerations were shared. And these pointed towards the direction of the best datastore type. The project had some particularities that demanded some thought considering the future of the platform, and the expected growth of the database. 

The most important of the particularities were:

  1. Flexibility
  2. Scale

1 – The project demanded an elevated level of flexibility since the main objects could have any number and type of attributes. So, for any particular object, it was near impossible to define the quantity of attributes. This requirement meant that if we were to store this data in a traditional RDBMS, there would be either (A) 1 table, with an infinite number of columns for all the possible attributes (and mostly a sparse table) or (B) a child table with X amount of rows per X amount of attributes with a unique data type (which would have to be character-based to allow maximum flexibility). 

Option A does not promote efficient data storage or data quality.

Option B makes indexing difficult and inefficient given the variety of possible data types (integer, character, dates and so forth).

2 – based on requirement #1, as the database grew, the options (A) and (B) from the previous point would reach a critical performance point. With either very sparse tables, or gigantic one-tables with billions of attributes and less than ideal indexing.

Just the first requirement would immediately point towards a non-relational datastore. Because that is the main feature that is always presented when talking about non-relational datastore: “schemaless” or “schema free” (and if you don’t think ahead, you can put anything into the database, and then you don’t know what you have, how to look for it and how to get it).

So, based on the flexibility and the required scale, MongoDB was chosen as the datastore, in this case, a document database. It allows for a flexible schema (for the different possible attributes), large collections (for a large amount of objects and if needed their attributes and relationships) and it can linearly scale by adding new nodes.

Hence, based on possible maximum dependencies, the schema was designed. Some adjustments were made based on revised estimates, to improve the future performance of the schema. And some indexes were defined based on expected access patterns.

And so, another Database Schema was born. May it have a happy, full and performant life.

João Pinela

João Pinela

Senior DBA and Data Consultant.

Leave a Reply