DataBase - MongoDB

August 18, 2020

What is MongoDB ? 🤔🔨

MongoDB is a popular NoSQL database management system that stores data as BSON (binary JSON) documents. It's frequently used for big data and real-time applications running at multiple different locations. Relative to the CAP theorem, MongoDB is a CP data store—it resolves network partitions by maintaining consistency, while compromising on availability.

MongoDB is a single-master system—each replica set can have only one primary node that receives all the write operations. All other nodes in the same replica set are secondary nodes that replicate the primary node's operation log and apply it to their own data set. By default, clients also read from the primary node, but they can also specify a read preference that allows them to read from secondary nodes.

When the primary node becomes unavailable, the secondary node with the most recent operation log will be elected as the new primary node. Once all the other secondary nodes catch up with the new master, the cluster becomes available again. As clients can't make any write requests during this interval, the data remains consistent across the entire network.

🛑 What is the CAP theorem?

Have you ever seen an advertisement for a landscaper, house painter, or some other tradesperson that starts with the headline, “Cheap, Fast, and Good: Pick Two”?

CAP Theorem

The CAP theorem applies a similar type of logic to distributed systems—namely, that a distributed system can deliver only two of three desired characteristics:

👉 consistency - that reads are always up to date, which means any client making a request to the database will get the same view of data. Or - Every read receives the most recent writes or an error. For consistency, any read operation that begins after a write operation completes must return that value, or the result of a later write operation.

👉In a consistent system, once a client writes a value to any server and gets a response, it expects to get that value (or a fresher value) back from any server it reads from.

👉 availability - database requests always receive a response (when valid). Or - Every request received by a non-failing node in the system must result in a response. Whether you want to read or write you will get some response back.

👉 partition tolerance - that a network fault doesn’t prevent messaging between nodes. The system continues to operate despite an arbitrary number of messages being dropped(or delayed) by the network between nodes. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost.

CAP theorem states that it is impossible for a distributed system to simultaneously provide more than two out of the above three guarantees.

🛑 All distributed system needs partition tolerance because no distributed system is safe from network failure. In presence of a partition tolerance, we can select one out of two options: Consistency and Availability.

👉 ⚠️ CAP Theorem is frequently misunderstood that one has to choose two out of three guarantees all the times. In fact, one has to choose between Consistency and Availability only when there is a network partition or failure happens. In absence of network partition or network failures, both availability and consistency can be satisfied.

AP(Availability and Partition tolerance): When availability is chosen over consistency, the system is will always process the client request and try to return the most recent available version of the information even if it cannot guarantee it is up to date due to network partitioning.

CP(Consistency and Partition tolerance): When consistency is chosen over availability, the system will return an error or time-out if particular information cannot be updated to other nodes due to network partition or failures. Database system designed with ACID guarantees (RDBMS) usually chooses consistency over availability whereas system Designed with BASE guarantees, chooses availability over consistency.

🛑 A distributed system is a network that stores data on more than one node (physical or virtual machines) at the same time. Because all cloud applications are distributed systems, it’s essential to understand the CAP theorem when designing a cloud app so that you can choose a data management system that delivers the characteristics your application needs most.

👉 In the context of distributed (NoSQL) databases, this means there is always going to be a trade-off between consistency and availability. This is because distributed systems are always necessarily partition tolerant (ie. it simply wouldn’t be a distributed database if it wasn’t partition tolerant.)

CAP Theorem when making database decisions

Although the CAP Theorem can feel quite abstract, it has practical, real-world consequences. From both a technical and business perspective the trade-offs will lead you to some very important questions. There are no right answers. Ultimately it will be all about the context in which your database is operating, the needs of the business, and the expectations and needs of users.

Consider things like:

  • Is it important to avoid throwing up errors in the client?
  • Or are we willing to sacrifice the visible user experience to ensure consistency?
  • Is consistency an actual important part of the user’s experience
  • Or can we actually do what we want with a relational database and avoid the need for partition tolerance altogether?

Up next