Introduction to MongoDB and the Movie Database
MongoDB is known as a cross-platform, document oriented database that produces, high performance, easy scalability and high availability. MongoDB based on concept of collection and document.
Database
In MongoDB, Database is a physical container for collections. One and all database gets its own set of files on the file system. A single MongoDB server can have multiple databases.
Collection
Collection is a group of MongoDB documents. It is similar to the RDBMS table. A collection stays within a single database. Database Schema is not enforce by Collections. Documents can have different fields within a collection. ideally, all the documents within a collection are of similar or related purposes.
Document
Document is defined as a set of key-value pairs. Documents usually have dynamic schema. From Dynamic schema we means that documents in the same collection we do not need to have the same set of fields, and common fields in a collection’s documents may hold different data types.
The main challenge in data modeling to make balancing the needs of the application, and also the performance characteristics of the database engine, and the data Access/retrieval patterns. For designing the data models, we always consider the application usage of the data like queries, processing of the data, and updates and also the built-in structure of the data itself. (Sadalage and Fowler, 2012) .
In SQL databases we must needs to determine declare the database and table schema before insertion of data but in MongoDB database collections and its documents represents the same schema. These are:
- All documents within a single collection does not require to have same fields set and also datatype can be different over document in a collection
- To make any changes in the document structure within a collection, i.e add the new fields, change the field values to a new type or remove existing fields, and also update the document structure to new one.
This flexibility smoothes documents mapping to or an object an entity. Every document can match the data fields of the produced entity, when the document has significant variation from other documents with in the collection. Figure shows the database creation. (Vaish, 2013)
In MongoDB schema,database design will have one collection post with multiple documents and the following structure :
_id” : ObjectId(“5bb15535f15d627e984b0c57”), “MovieID” : 1, “MovieName” : “2001”, “Director” : “Stanley Kubrick”, “Leading actors” : [ “Daniel Richter”, “Gary Lockwood”, “Keir Dullea”, “William Sylvester” ], “ReleaseDate” : “1968”, “OscarsWon” : 1, “Country” : “USA” |
Datamodel(Relationship)
Here we will discussed about how the relationships were handled in the database
Database in MongoDB has a flexible schema. Documents in the same collection need not to have the same set of fields or structure, and also common fields in a collection’s documents may hold different types of data.
Some considerations have taken while designing Schema in MongoDB
- We should design database schema according to user requirements.
- We should combine objects into one document if we are using them together. Otherwise we needs to separate them , but have to make sure there should not be need of any joins).
- We can entered duplicated data ,but in limited amount because disk space is cheap as compare to compute time.
- We should do the joins on write, not on read.
- Optimize your schema for most repeated use cases.
- Complex aggregation can perform in the schema.
Suppose any client needs a database design for his Website/blog and see the differences between RDBMS and MongoDB schema design. Website has the following requirements.
- Each post must have the unique title, url and description.
- Any number of tags can have in each post.
- Each post must have its publisher name and total number of likes.
- Each post must have User comments along with their name, message, date-/time and their likes.
- Either zero or any number of comments can have in each post.
When we will design the above required database in RDBMS schema we needs atleast 3 tables.
While in designing in MongoDB schema, we will have one collection
The important decision for designing the data models for MongoDB applications go around the documents structure and how the application maintains relationships between document/data
(Plugge and Membrey, 2014)
For any database Data Modeling is the first step either it is relational or NoSQL. This refers to the method of creating database design iteratively to meet the requirement of application. This involves analysis and depiction of database entities and relationships between them for an application.
Creating a MongoDB Database for Movies
In MongoDB relationships are defined on the basis of matching data in columns in different collections. These relationships are defined on the basis of semantics. The MongoDB engine does not impose this relationship, this is completely dependent on the application how to implement and respect this relationship while we perform reading and writing data in collections for any database.
References must stores the relationships between data by references or links between documents. Database applications use these references to retrieve or access the related field/data. Broadly, these are normalized data models. In our database creation we have maintained the relationship by referencing the MovieId from “Movies” Table to “Ratings” table which is shown below:
References relationship should be used:
- For implementation of one-to-many(1:N) relationships between documents.
- For implementation of many-to-many(N:N) relationships between documents.
- When the referenced entities are updated frequently.
- When the referenced entities indefinitely growing.
In below rating collection we can see the document with the reference relationship as MovieID
_id” : ObjectId(“5bb1556ef15d627e984b0c70”), “MovieID” : 1, “ReviewedBy” : “Joe”, “Date” : “6/15/2018”, “Rating” : 9, “Comments” : “The best ever!” |
Alternative modeling discussed:
Here we will discuss potential alternatives to how the relationships could have been modeled and implemented in MongoDB and the benefits/issues of each
We can also reference the document by embedding documnents , but in this relationship the database will be denormalized. In this approach all the related data are maintained in a single document, which makes it easy to retrieve and maintain. The whole document can be accessed and retrieve in a single query Embedded documents create the relationships between data by storing related data within a single document. The particular denormalized data models allows applications for retrieval and manipulation of related data in a single database operation. But for large database this can be inefficient to handle because we needs to store much data in a document. (Chodorow and Dirolf, 2010).
Embedded documents should be used when:
- Contained relationship is existing between entities.
- The embedded entities are not updated repeatedly.
- The embedded entity is an essential part of the document.
- Relationships range from one to a few, between embedding and embedded entities.
- The embedded entities do not grow indefinitely.
Justification of indexes chosen:
Indexes plays important role in any database, and with MongoDB it is also important. By using Indexes, queries performance in MongoDB becomes more efficient.
If we have a collection with thousands of documents without indexes, and when we create query to find any certain documents, then MongoDB would need to search the whole collection to find the particular documents. But if we create indexes,these indexes would used by MongoDB to searched in the collection which takes comparatively very less time. (Banker, 2011)
Indexes is stores the partial part from the collection of data so data access becomes easy. Below we have created the index for director in movies collection(table).
> db.Movies.ensureIndex({“Director”:1}){ “createdCollectionAutomatically” : false, “numIndexesBefore” : 1, “numIndexesAfter” : 2, “ok” : 1 |
Recommendations:
We used here the director name as index whist is text based searching but we can also use phrase searching by creating index for phrase like a sentence like comments field to make as an index. (Hoberman, 2014) As an example we taka a collection messages and make index for content which contains a phrase,
1. db.messages.find({$text: {$search: “smart birds which cook”}}, {score: {$meta: “text Score”}}).sort({score:{$meta:”text Score”}}) |
The ouput will show the result
1. { “_id” : ObjectId(“55f5289cb592880356441ead”), “subject” : “Birds can be cook”, “content” : “Birds do not eat rats”, “likes” : 50, “year” : 2015, “location” : “Chicago”, “language” : “english”, “score” : 3 } 2. { “_id” : ObjectId(“55f5289bb592880356441eab”), “subject” : “Cats eat rats”, “content” : “Rats do not cook food”, “likes” : 60, “year” : 2015, “language” : “english”, “score” : 0.77777777777777} |
In case if you want to perform an exact phrase search (logical AND), you can do so by specifying double quotes in the search text.
References
Sadalage, P., S. and Fowler, M.(2012) NoSQL Distilled. USA: Addison-Wesley.
Chodorow,k. and Dirolf, M.(2010)MongoDB:The Definitive Guide. 2nd ed. USA: O’Reilly Media.
Banker, K. (2011) MongoDB in Action. USA: O’Reilly Media.
Plugge,E. and Membrey,P.(2014) MongoDB Basics. USA: Apress.
Hoberman,S. (2014) Data Modeling for MongoDB: Building WellDesigned and Supportable MongoDB . USA: Technics Publication.
Vaish,G. (2013) Getting Started with Nosql. UK: Packt Publishing.