NoSQL, Azure Cosmos DB and DocumentDB – Storage, Retrieval, and More

In my previous blog, I explained about NoSQL. This blog will give further insight on how data is stored and retrieved in NoSQL. We will be using Microsoft’s DocumentDB as an example to understand NoSQL.

Here is a quick revision on the comparison of NoSQL vs. traditional relational databases:

In relational databases, we first normalize data and save it in different tables. Then, when data is needed, we join these tables to retrieve information. As you would imagine, this adds a bit of extra time and effort to retrieve data. But, data is not duplicated and thus, more information can be saved in the same amount of space available. There is a single physical server, and, to scale up, more memory, processors and storage need to be added.

In ‘No SQL’ data is not stored in related tables; instead it is stored as individually wrapped pieces of information. Information can be stored in the form of key-value pair, or columns, or documents, or Graphs. Information is not rigid and does not have to follow a schema. Each piece of information has a unique id that distinguishes it from other information. Data structures used to store data in NoSQL are different than in relational databases. E.g. key-value pair uses dictionary, document database uses JSON. Data retrieval is very fast as each piece of information has all the data it needs without having to locate other pieces of information. NoSQL can be scaled out as far as needed just by increasing its hardware.

What is Azure Cosmos DB?

Azure Cosmos DB is a globally-distributed, multi-model data service that lets you elastically scale throughput and storage across any number of geographical regions using low latency, high availability, and consistency. Document DB is part of Cosmos DB.

What is DocumentDB?

DocumentDB is Microsoft’s flavor for nonrelational document database. The name may give out a false impression that it is a collection of documents such as a SharePoint Document Library but it is far away from that. You can consider it just like a traditional SQL database, as in, it saves information which can later be retrieved. Only difference is that it is nonrelational and schema-free. DocumentDB stores data as “documents,” which are actually JSON objects.

As I mentioned earlier, Document DB is a NoSQL database for saving data as “documents”. Document, in this context, is flat data which is saved as JSON objects. JSON stands for JavaScript Object Notation and it represents data as a collection of name-value pair.
Let’s consider an example in relational database and compare its storage with DocumentDB. Data in RDBMS is structured in tabular format with fixed number of columns, and each piece of information is saved as rows. A relational database table has a fixed structure, and in order to make changes to a table, such as adding a new column, changing a specific column to allow NULL values, or changing a data type, it is necessary to modify the table’s schema.

FirstName LastName Gender
Minal Wad F
Sid Atreva M
Roma Kole Null

 

JSON representation of the same data would be as follows:

Each row will be a JSON object and a table would be collection of JSON objects. In the 3rd row since above, ‘Gender’ is an optional field, and DocumentDB will not include it if its value is Null. This is in accordance with no-schema behavior of DocumentDB.

Some customers may have additional information such as Age which can simply be appended to the JSON without having to change the original definition of Customer object.

This means that different Customer objects may have different schemas and will still be valid customers. It may not be completely wrong to call them amorphic objects.

Now, let’s see how related data is stored in DocumentDB. In relational database, this is how an address for a customer is saved. It’s a separate table with a foreign key relation.

FirstName LastName Gender Address Id (FK)
Minal Wad F Null
Sid Atreya M 1
Roma Kole Null 2

 

AddressId AddressLine1 State Country Zipcode
1 123 Land Rd NJ USA 33345
2 456 Sunset Blvd VA USA 22278

In DocumentDB the related address information is saved with the customer information. It is also not uncommon in document databases to repeat some data so that each document has the data it needs without having to locate other documents.

JSON representation of the same data would be as follows:

Even multiple Addresses can be saved by not breaking the schema, because there is no-schema ☺

Querying DocumentDB

One of the best features of DocumentDB is that its native querying language is very similar to SQL. Lets build our first query.

SELECT * FROM Customers

Where Customers is the alias for Active Collection.

In my next blog, I will demonstrate how to create your first DocumentDB in Cosmos DB.

Share this post

Related Posts

Checking Your CMMC Progress

Written by Alec Toloczko With Cybersecurity Maturity Model Certification (CMMC) requirements on the horizon, it’s crucial for organizations handling Controlled Unclassified Information (CUI) to adhere

Read More »