Original article was published on Artificial Intelligence on Medium
Story of Jay’s of NoSQL Interview (Design Fundamentals)
It was a crowded day at the lobby and Jay was waiting for the lift to arrive, he nervously looked at his watch for the time, he did not want to be late, he had an interview scheduled with Mr. Robert the database expert. Jay had worked hard to clear the previous rounds of interviews. Clear this one and he would land the position he was aspiring for, but this would not before he convinced Mr. Robert of his NoSQL skills, he had researched that the current projects were about NoSQL. The lift arrived and Jay hurried in.
Mr. Robert was already waiting for Jay in his cabin when Jay arrived. Hello, Jay come in said, Mr. Robert, take a seat pointing to a chair in front of him. I know that you have done well in your previous interviews, I want to talk to you about NoSQL before we proceed, It is very important for this project, Jay nodded in agreement.
Maybe you should start by giving an overview of NoSQL promoted Mr. Robert, Jay said sure and continued explaining and drawing diagrams, The word NoSQL is a broad term used to refer a group databases technologies that do not store the data in a relational structure, as the different kinds of heterogeneous data sources emerged, there was a need to store data differently than the traditional rows and columns and relational structures. Mr. Robert looked at Jay and said so you talk of a different kind of data storage, can you outline a few important ones?
Jay said certainly,
1. Key-Value Database
2. Column Store
3. Graph database
4. Document Database
Key-Value Database: This is a type of NoSQL DB that stores the data in Key-Value pairs, while the Key is unique and generally a type of Number, String, GUID, or similar. The value can be of a simple data type like String, Number or complex data types like JSON or Binary Image, Each table may support a different structure of the data set for example one row can have 5 attributes while that other may have 3, this provides tremendous flexibility to store unstructured and semi-structured data that can be retrieved quickly using the unique key.
Column Store Database: This type of database allows storage of the data in a family of columns vs rows, this especially works well when doing aggerations as the columns are stored together, another variant of this is also the BigTable, these support nested columns providing good performance on writes and easy scaleout.
Graph database: This kind of database organizes the data in a graph-like structure, with Nodes and Edges (Relations), this kind of organization enables many different graph algorithms to extract information. An easily identifiable application for storing data in this structure is the social media, however, this kind of database has many other applications like storing and making sense of semi-structured information such as language and knowledge graphs.
Document Database: These databases store data like documents, typically JSON. This allows the storage of semi-structured and nested data and the ability to store different data structures in a single container. Document databases have received a lot of interest in recent times as they are JSON based and hence work well with REST services. It is easy to scale out, query, and partition.
Mr. Robert looked at the drawings and said, looks accurate Jay. We are quite interested in a document database for our upcoming project, so let me give you a sample JSON document handing him a sheet of paper,
Mr. Robert continued, so I have designed a product store like this do you think this is a good design, if you would like to change it what changes would you make and why?
Jay studied the JSON document and started, Looks like a good NoSQL design but I would like to make some changes. The first thing is I would like to move out the Sales and Reviews sections out of the Product document and create them like this.
Mr. Roberts looked the Jay and said, don’t you think you are going towards a Normalized structure generally meant for relational databases?
Jay was quick to reply, at a first glance it looks like that but I am using the guiding principles to ensure a feasible NoSql design. How so demanded Mr. Robert. Jay continued, please consider following principles while embedding related data in a single document,
1. Relationships are contained between entities.
2. One-to-few relationships.
3. Data that is queried frequently together.
4. Data that changes infrequently.
5. Data will not grow without bound.
The data for Abstract, Price, Pages are quired frequently together so we need to have them together, Reviewers have only One-to-few relationships with the book, that is there are only a limited number of reviewers and they don’t change once the book is published, so it is also a good candidate for embedding. If we consider Reviews, on the other hand, the data is not limited, this data can grow without bound so we need to pull that out and have a document per review in the storage. Sales can also grow without bounds, it also changes frequently, updating a large document every time a sale happens is not efficient so we need to pull out the Sales and have a document per sale in the storage.
Mr. Robert nodded in partial agreement and said, ok we can agree on your rational, let me understand the reference keys a little bit more, can you show me how a publisher document would look?
Jay said sure drawing the Publisher JSON,
Jay continued, the rules governing the reference key relationships are as follows,
1. one-to-many relationships.
2. Representing many-to-many relationships.
3. Data changes frequently.
4. Referenced data could be unbounded.
While books are in a one-to-many relationship with the publisher and hence can be referenced. Mr. Robert looked at the JSON and said while I am in overall agreement with what you are saying this design will lead to multiple trips to server one to get the list of books and then several to others to get the book details, how can we improve that?
Jay said, oh yes, we need to improve this, the data that is quired frequently together can be stored in the same document even if they are redundant for more efficient reads, modifying the publisher document as below,
Mr. Robert smiled and said we are on the right track here, I am still not happy with what you did with the Product document removing the Sales and Reviews completely from it, in the spirit of NoSql I am sure you can find a better solution.
Jay thought about it for a minute and replied sure Mr. Robert we can have a hybrid model that would strike a balance between referencing and embedding. Instead of completely moving the data for sales and reviews, we could keep a summary of sales and an average rating which enables us to access enough information related to these entities without having to fetch the referenced documents. Jay drew up the new product document as follows,
Mr. Robert looked at the solution and seemed quite satisfied, he rose from his desk and said looks like we have found our man Jay welcome to the team. Jay was very happy as he walked out brimming with confidence.