A database is a collection of
information that is organized so that it can easily be accessed, managed, and
updated – This is the definition
For a developer (like me) a
database is a something that defines a back-end that looks like a cylinder in
the three tier architecture diagram. Some place to store and write queries to
get and put the data used by an application. I have mostly used RDBMS databases
like MsSQL, Oracle etc. or flat files (years ago, when I was working on
mainframes). RDBMS databases were developed to encounter the shortcomings of
the flat file databases, to have a standardised data repository, to perform
data manipulation with ease (there are many many other reasons why RDBMS were
developed…)
Life was fine … until the concept
of NoSQL databases came about. NoSQL database
provides a mechanism for storage and retrieval of data that is modeled in means
other than the tabular relations used in relational databases. That is, in
RDBMS we have concept of table, row/tuple, column etc. NoSQL databases do not
use these constructs to define how data is stored. One of the motivations for
this approach includes simplicity of design. NoSQL databases are increasingly used
in big data and real-time web applications. You can read about the NoSQL databases
here.
One of
the approaches to NoSQL database design (there aremany others) is the Document oriented approach. It is about viewing the database design, the model of
the database - as a Document. The general meaning of the word, would make you
think of a text document with some data like that of a book, its name, ID,
Author, Publisher, Price etc. and yes , data is stored like this but in BSON.
BSON is a JSON like format. In the example below is how a book can be defined
in mongoDB.
{
_id:ObjectId(‘54fecdce3fc8e5af96a6c7c3’)
name:'Intro to mongoDB'
author:' XYZ '
publisher:'Paperback'
price:40$
}
_id:ObjectId(‘54fecdce3fc8e5af96a6c7c3’)
name:'Intro to mongoDB'
author:' XYZ '
publisher:'Paperback'
price:40$
}
mongo:
mongoDB is an open source, Scalable, Robust, highly flexible
database. It is maintained by a company called 10Gen now known as mongodbinc. mongoDB works on concept of collection and document.
In mongoDB, Collection is a group of mongoDB documents. It is the equivalent of an RDBMS table. A document is a set of key-value pairs. If you see the above example, that represents a document for a book, if there were multiple books, they would be multiple documents for books and that would define a collection for book.
In mongoDB, Collection is a group of mongoDB documents. It is the equivalent of an RDBMS table. A document is a set of key-value pairs. If you see the above example, that represents a document for a book, if there were multiple books, they would be multiple documents for books and that would define a collection for book.
Below is a collection for book that has 3 documents
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c3)
name:'Book1
author:'XYZ'
publisher:'Paperback'
price:40$
}
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c4)
name:'Book2
author:'XYZ'
price:30$
}
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c5)
name:'Book3
author:'XYZ'
publisher:'Paperback'
price:30$
Bestseller:’Yes’
}
_id:ObjectId(54fecdce3fc8e5af96a6c7c3)
name:'Book1
author:'XYZ'
publisher:'Paperback'
price:40$
}
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c4)
name:'Book2
author:'XYZ'
price:30$
}
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c5)
name:'Book3
author:'XYZ'
publisher:'Paperback'
price:30$
Bestseller:’Yes’
}
The
above code represents a collection (can be thought of as a table) and each book
(can be regarded as a data row) represents a document.
The _id is mandatory and can be thought of (it is infact…) a primary key. _id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you didn't provide then mongoDB provide a unique id for every document. These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next 2 bytes for process id of mongoDB server and remaining 3 bytes are simple incremental value.
The _id is mandatory and can be thought of (it is infact…) a primary key. _id is a 12 bytes hexadecimal number which assures the uniqueness of every document. You can provide _id while inserting the document. If you didn't provide then mongoDB provide a unique id for every document. These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next 2 bytes for process id of mongoDB server and remaining 3 bytes are simple incremental value.
If you notice the first document
has 5 fields - _id, name, author, publisher and price. The second one has only
4 fields - _id, name, author and price (publisher is not present). And the 3rd document contains 6
fields with the inclusion on ‘Bestseller’. In mongoDB or in any document
oriented databases there is no requirement that all the fields going in one row
have to be present in all rows. Also
there can be multiple values in one field, look below
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c3)
name:'Book1'
author:'XYZ'
publisher:['Publisher1','Publisher2']
price:40$
}
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c4)
name:'Book2'
author:'XYZ'
publisher:['Publisher1']
price:30$
}
name:'Book1'
author:'XYZ'
publisher:['Publisher1','Publisher2']
price:40$
}
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c4)
name:'Book2'
author:'XYZ'
publisher:['Publisher1']
price:30$
}
The publisher field has two values in the first document - publisher:['Publisher1','Publisher2']
And one in the second document - publisher:['Publisher1'], therefore we have this one-to-many relationship that is defined pretty easily without having to define another table and constraints.
And one in the second document - publisher:['Publisher1'], therefore we have this one-to-many relationship that is defined pretty easily without having to define another table and constraints.
In mongoDB, we can also have multiple document like
structure for one field, look at the example below:
{
_id:ObjectId(54fecdce3fc8e5af96a6c7c3)
name:'Book1'
author:'XYZ'
publisher:[
{
name:'publisher1',
city:'Mumbai',
country:'India'
},
{
name:'publisher2',
city:'Delhi',
country:'India'
}
]
price:30$
}
You see that the publisher field can have multiple document
like data sets within a field. Here the publisher field has two values. In an
RDBMS we would require another table for publisher with key constraints to the
main(books) table. In mongoDB, data will be shown from one collection only.
Query Language
In mongoDB the data manipulation is done using what is
called a Document Oriented Query Language. Therefore NoSQL does not mean that
is not query language, in fact, the NoSQL means Not Only SQL. After the mongoDB
server is set up, we can straight away go ahead executing queries to insert or
manipulate data.
The mongo server is already set up on the mongoDB site where you can straight away start executing code on your browser, you can navigate to http://try.mongodb.org/ where a test database for the mongoDB is already set up. As the title of page says ‘A MongoDB Shell in your browser Just enough to scratch the surface.’ Its helps do exactly that.
The mongo server is already set up on the mongoDB site where you can straight away start executing code on your browser, you can navigate to http://try.mongodb.org/ where a test database for the mongoDB is already set up. As the title of page says ‘A MongoDB Shell in your browser Just enough to scratch the surface.’ Its helps do exactly that.
Now let us execute something .
I you type db, it’ll show you the current database that we
are working on. If you type ‘show collections’ it’ll show you the collections
in that database.
Right now there are no collections.
One main thing to know is that the mongo shell acts as a JavaSript console, that is, if you were to execute JS code it would work, like
One main thing to know is that the mongo shell acts as a JavaSript console, that is, if you were to execute JS code it would work, like
var x =’ mongoDB’;
x
x
this would print mongoDB
This means that mongoDB supports JavaScript. How cool is
that!
Now,
let us create a document by inserting some data.
Type db.books.insert({name:'Book1',author:'XYZ',publisher:'publisher1',price:40}) in the browser console
Type db.books.insert({name:'Book1',author:'XYZ',publisher:'publisher1',price:40}) in the browser console
We find that one document (record) has been inserted. It’s
important to note here that the collection called books is automatically
created by mongoDB. The insertion happened for string fields and a numeric
field, we have various data types supported by mongoDB, here is a list of all
of them - http://docs.mongodb.org/manual/reference/bson-types/
Now let’s insert another document.
db.books.insert({name:'Book2',author:'ABC',publisher:'publisher2',price:50})
So now we have two documents inserted in the collection
called books. If you type
show collections, it should show up the one collection we have called books.
show collections, it should show up the one collection we have called books.
Now if we want to see all the documents under the collection
then just do a
db.books.find()
this will return all the documents
db.books.find()
this will return all the documents
Or you can use db.books.findOne() to retrieve the very first
document in the collection.
If you notice the _id field is automatically added by
mongoDB.
Let us search for a book using the id, that would be
db.books.find({_id:ObjectId("54ffd5561cdcaf4e4fd70a0a")})
db.books.find({_id:ObjectId("54ffd5561cdcaf4e4fd70a0a")})
– Here we
are searching for the second document based of the _id field, this will give
Same way we can search using the name of the book
We
can also selectively display what we want to show by adding parameters into the
find query.
We can use multiple parameters to retrieve data.
The below example retrieves data based on the name of the book and price
We can use multiple parameters to retrieve data.
The below example retrieves data based on the name of the book and price
If
I were to give price as 30 and publisher as publisher2, then no data is
returned
Suppose we want to retrieve only the name and the author of
the book, then we can give
db.books.find({name:"Book2"},{name:1,author:1})
The above query specifies that we need the name and the author, the result gives us
db.books.find({name:"Book2"},{name:1,author:1})
The above query specifies that we need the name and the author, the result gives us
Notice that the publisher and price are not shown but the
_id field is always shown by default. If you want to hide the _id field then
use:
db.books.find({name:"Book2"},{name:1,author:1,_id:0})
db.books.find({name:"Book2"},{name:1,author:1,_id:0})
Note here that the 0 parameter for display works only with
the _id and nothing else. That means if you do
db.books.find({name:"Book2"},{name:1,author:1,_id:0,publisher:0,price:0})
db.books.find({name:"Book2"},{name:1,author:1,_id:0,publisher:0,price:0})
It’ll give an error
Let’s
insert a document with more documents like the publisher example above.
db.books.insert({name:'Book3',author:'MNO',publisher:[
{
name:'publisher1',
city:'Mumbai',
country:'India'
},
{
name:'publisher2',
city:'Delhi',
country:'India'
}
]
,price:60})
{
name:'publisher1',
city:'Mumbai',
country:'India'
},
{
name:'publisher2',
city:'Delhi',
country:'India'
}
]
,price:60})
This has inserted the document with two publishers with
their addresses.
Lets try and retrieve this document using the
city of one publisher; this can be done bydb.books.find({'publisher.city':"Mumbai"})
notice the quotes to be used when using multiple references for db.books.find({'publisher.city':"Mumbai"}) otherwise it won’t recognize the . operator.
And hence we get:
And
if we want to see only the name and author that has the publisher city as
Mumbai then:
db.books.find({'publisher.city':"Mumbai"},{name:1,author:1})
db.books.find({'publisher.city':"Mumbai"},{name:1,author:1})
The Update and Save methods
Updating a documents is simple enough. The syntax being
db.COLLECTION_NAME.update(SELECTIOIN_CRITERIA, UPDATED_DATA)
Suppose we want to update the author for Book2,
we simply need to writedb.books.update({name:"Book2"},{$set:{author:"saud"}})
And when we search for the document with find(Book2), we get
The save command simply replaces all data in the document. The
syntax for save is
db.COLLECTION_NAME.save({_id:ObjectId(),NEW_DATA})
doing this :
db.books.save({"_id" : ObjectId("5500085440694708a5fcba65"),"name":"this document has been saved with a new one"})
db.books.save({"_id" : ObjectId("5500085440694708a5fcba65"),"name":"this document has been saved with a new one"})
Searching for that document gives
That was a very brief intro to mongoDB, so… why mongoDB?
- mongoDB's document data model makes it easy for you to store data of any structure and dynamically modify the schema.
- Ad hoc queries are supported, search by field, Search by range, usage of regular expressions
- mongoDB is very scalable and can run on multiple servers, Hence supports Big Data. New machines can be added to a running database – horizontal scaling is easily done.
- Supports automatic load balancing
- mongoDB can be used as a file system taking advantage of the load balancing features.
- I think one of the best features is that JavaScript can be used in queries!
- Another nice feature is that it provides support for location data. This means that there is already support for latitude and longitude, instead of defining decimal fields and using them to define locations.
The installation in pretty easy, Just download the latest version of mongoDB right now in version 3.0, best suited for your system. I use a windows 7 32 bit PC and so I downloaded the appropriate version. Go to the download page of mongoDB http://www.mongodb.org/downloads
One thing we need to remember that mongoDB is not well suited for applications involving complex transaction, this means, we cannot have multiple updates, inserts, then commit or rollback transactions. And mongoDB does not support constraints or joins!, all database constraints are to be manage from the application
Further reading:
This article is intended to give a brief working idea of
mongoDB, to use this in real time application it is necessary to link it to a
programming language like php or java.
Combing mongoDB with JavaScript, NodeJS, AngularJS etc. there are new platforms being developed like meteor and meanJS .
Combing mongoDB with JavaScript, NodeJS, AngularJS etc. there are new platforms being developed like meteor and meanJS .
No comments:
Post a Comment