Wednesday, March 11, 2015

mongoDB – an intro & usage



A database is a collection of information that is organized so that it can easily be accessed, managed, and updated – This is the definition
For a developer (like me) a database is a something that defines a back-end that looks like a cylinder in the three tier architecture diagram. Some place to store and write queries to get and put the data used by an application. I have mostly used RDBMS databases like MsSQL, Oracle etc. or flat files (years ago, when I was working on mainframes). RDBMS databases were developed to encounter the shortcomings of the flat file databases, to have a standardised data repository, to perform data manipulation with ease (there are many many other reasons why RDBMS were developed…)
Life was fine … until the concept of NoSQL databases came about. NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. That is, in RDBMS we have concept of table, row/tuple, column etc. NoSQL databases do not use these constructs to define how data is stored. One of the motivations for this approach includes simplicity of design. NoSQL databases are increasingly used in big data and real-time web applications. You can read about the NoSQL databases here.
                One of the approaches to NoSQL database design (there aremany others) is the Document oriented approach. It is about viewing the database design, the model of the database - as a Document. The general meaning of the word, would make you think of a text document with some data like that of a book, its name, ID, Author, Publisher, Price etc. and yes , data is stored like this but in BSON. BSON is a JSON like format. In the example below is how a book can be defined in mongoDB.

{
    _id:ObjectId(‘54fecdce3fc8e5af96a6c7c3’)
    name:'Intro to mongoDB'
    author:' XYZ '
    publisher:'Paperback'
    price:40$
}


mongo
              mongoDB is an open source, Scalable, Robust, highly flexible database. It is maintained by a company called 10Gen now known as mongodbinc. mongoDB works on concept of collection and document.
                In mongoDB, Collection is a group of mongoDB documents. It is the equivalent of an RDBMS table. A document is a set of key-value pairs. If you see the above example, that represents a document for a book, if there were multiple books, they would be multiple documents for books and that would define a collection for book.
Below is a collection for book that has 3 documents
{
    _id:ObjectId(54fecdce3fc8e5af96a6c7c3)
    name:'Book1
    author:'XYZ'
    publisher:'Paperback'
    price:40$
}
{
    _id:ObjectId(54fecdce3fc8e5af96a6c7c4)
    name:'Book2
    author:'XYZ'
    price:30$
}
{
    _id:ObjectId(54fecdce3fc8e5af96a6c7c5)
    name:'Book3
    author:'XYZ'
    publisher:'Paperback'
    price:30$
    Bestseller:’Yes’
}


                The above code represents a collection (can be thought of as a table) and each book (can be regarded as a data row) represents a document.
                The _id is mandatory and can be thought of (it is infact…) a primary key. _id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you didn't provide then mongoDB provide a unique id for every document. These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next 2 bytes for process id of mongoDB server and remaining 3 bytes are simple incremental value.
If you notice the first document has 5 fields - _id, name, author, publisher and price. The second one has only 4 fields - _id, name, author and price (publisher is not present).  And the 3rd document contains 6 fields with the inclusion on ‘Bestseller’. In mongoDB or in any document oriented databases there is no requirement that all the fields going in one row have to be present in all rows.  Also there can be multiple values in one field, look below
{
    _id:ObjectId(54fecdce3fc8e5af96a6c7c3)
    name:'Book1'
    author:'XYZ'
    publisher:['Publisher1','Publisher2']
    price:40$
}
{
    _id:ObjectId(54fecdce3fc8e5af96a6c7c4)
    name:'Book2'
    author:'XYZ'
    publisher:['Publisher1']
    price:30$
}

The publisher field has two values in the first document - publisher:['Publisher1','Publisher2']
And one in the second document - publisher:['Publisher1'], therefore we have this one-to-many relationship that is defined pretty easily without having to define another table and constraints.
In mongoDB, we can also have multiple document like structure for one field, look at the example below:

 {
    _id:ObjectId(54fecdce3fc8e5af96a6c7c3)
    name:'Book1'
    author:'XYZ'
    publisher:[
           {
             name:'publisher1',
             city:'Mumbai',
             country:'India'   
           },
           {
             name:'publisher2',
             city:'Delhi',
             country:'India'
           }
          ]
    price:30$
}

You see that the publisher field can have multiple document like data sets within a field. Here the publisher field has two values. In an RDBMS we would require another table for publisher with key constraints to the main(books) table. In mongoDB, data will be shown from one collection only.

Query Language
 
In mongoDB the data manipulation is done using what is called a Document Oriented Query Language. Therefore NoSQL does not mean that is not query language, in fact, the NoSQL means Not Only SQL. After the mongoDB server is set up, we can straight away go ahead executing queries to insert or manipulate data.
                The mongo server is already set up on the mongoDB site where you can straight away start executing code on your browser, you can navigate to http://try.mongodb.org/ where a test database for the mongoDB is already set up. As the title of page says ‘A MongoDB Shell in your browser Just enough to scratch the surface.’ Its helps do exactly that.

Now let us execute something .
I you type db, it’ll show you the current database that we are working on. If you type ‘show collections’ it’ll show you the collections in that database.
 

Right now there are no collections.
                One main thing to know is that the mongo shell acts as a JavaSript console, that is, if you were to execute JS code it would work, like
var x =’ mongoDB’;
x
this would print mongoDB
 
                      

This means that mongoDB supports JavaScript. How cool is that!


Now, let us create a document by inserting some data.
Type db.books.insert({name:'Book1',author:'XYZ',publisher:'publisher1',price:40}) in the browser console
 

 

We find that one document (record) has been inserted. It’s important to note here that the collection called books is automatically created by mongoDB. The insertion happened for string fields and a numeric field, we have various data types supported by mongoDB, here is a list of all of them - http://docs.mongodb.org/manual/reference/bson-types/

Now let’s insert another document.
db.books.insert({name:'Book2',author:'ABC',publisher:'publisher2',price:50})

 

So now we have two documents inserted in the collection called books. If you type
show collections, it should show up the one collection we have called books.
 
Now if we want to see all the documents under the collection then just do a
db.books.find()
this will return all the documents
 

Or you can use db.books.findOne() to retrieve the very first document in the collection.

If you notice the _id field is automatically added by mongoDB.

Let us search for a book using the id, that would be
db.books.find({_id:ObjectId("54ffd5561cdcaf4e4fd70a0a")}) 
– Here we are searching for the second document based of the _id field, this will give
 

Same way we can search using the name of the book

 

We can also selectively display what we want to show by adding parameters into the find query.
We can use multiple parameters to retrieve data.
The below example retrieves data based on the name of the book and price

 
 
If I were to give price as 30 and publisher as publisher2, then no data is returned

 
 

Suppose we want to retrieve only the name and the author of the book, then we can give
db.books.find({name:"Book2"},{name:1,author:1})
The above query specifies that we need the name and the author, the result gives us
 
Notice that the publisher and price are not shown but the _id field is always shown by default. If you want to hide the _id field then use:
db.books.find({name:"Book2"},{name:1,author:1,_id:0})

 


Note here that the 0 parameter for display works only with the _id and nothing else. That means if you do
db.books.find({name:"Book2"},{name:1,author:1,_id:0,publisher:0,price:0})
  It’ll give an error
 
Let’s insert a document with more documents like the publisher example above.

db.books.insert({name:'Book3',author:'MNO',publisher:[
             {
               name:'publisher1',
               city:'Mumbai',
               country:'India'   
             },
             {
               name:'publisher2',
               city:'Delhi',
               country:'India'
             }
            ]
  ,price:60})

 

This has inserted the document with two publishers with their addresses.
Lets try and retrieve this document using the city of one publisher; this can be done by
db.books.find({'publisher.city':"Mumbai"})
                notice the quotes to be used when using multiple references for db.books.find({'publisher.city':"Mumbai"}) otherwise it won’t recognize the . operator.

And hence we get:

 

And if we want to see only the name and author that has the publisher city as Mumbai then:
db.books.find({'publisher.city':"Mumbai"},{name:1,author:1})

  
 

The Update and Save methods
                        Updating a documents is simple enough. The syntax being
db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA, UPDATED_DATA)
Suppose we want to update the author for Book2, we simply need to write
db.books.update({name:"Book2"},{$set:{author:"saud"}})



 


And when we search for the document with find(Book2), we get





The save command simply replaces all data in the document. The syntax for save is
db.COLLECTION_NAME.save({_id:ObjectId(),NEW_DATA})
doing this :
db.books.save({"_id" : ObjectId("5500085440694708a5fcba65"),"name":"this document has been saved with a new one"})

 


Searching for that document gives


That was a very brief intro to mongoDB, so… why mongoDB?

  •  mongoDB's document data model makes it easy for you to store data of any structure and dynamically modify the schema.
  •  Ad hoc queries are supported, search by field, Search by range, usage of regular expressions
  •  mongoDB is very scalable and can run on multiple servers, Hence supports Big Data. New machines can be added to a running database – horizontal scaling is easily done.
  • Supports automatic load balancing
  • mongoDB can be used as a file system taking advantage of the load balancing features.
  •  I think one of the best features is that JavaScript can be used in queries!
  • Another nice feature is that it provides support for location data. This means that there is already support for latitude and longitude, instead of defining decimal fields and using them to define locations.


                    The installation in pretty easy, Just download the latest version of mongoDB right now in version 3.0, best suited for your system. I use a windows 7 32 bit PC and so I downloaded the appropriate version. Go to the download page of mongoDB http://www.mongodb.org/downloads
 

                    One thing we need to remember that mongoDB  is not well suited for applications involving complex transaction, this means, we cannot have multiple updates, inserts, then commit or rollback transactions. And mongoDB does not support constraints or joins!, all database constraints are to be manage from the application


Further reading:
             This article is intended to give a brief working idea of mongoDB, to use this in real time application it is necessary to link it to a programming language like php or java.
Combing mongoDB with JavaScript, NodeJS, AngularJS etc. there are new platforms being developed like meteor and meanJS .