Databases are like a delivery service
MongoDB, however, is the weakest link in this popularized technology stack. Someone coming out of a bootcamp, whose only database of knowledge is Mongo, will often struggle as they build an application. I worry for those individuals who might walk out thinking they have all the tools necessary to do something great. They may be thinking MongoDB is an “all-purpose database”.
For anyone who hasn’t struggled to scale Mongo in production: Mongo is an incredibly easy database to use and develop with, but it’s not an all-purpose database. Mongo is a document-store database. While Mongo is flexible, it cannot possibly do everything a growing service demands.
That’s the thing: no database is all-purpose.
Let’s compare: Databases vs. Delivery Service
Let’s compare different types of databases to aspects of a delivery service. Imagine your application provides a product. Imagine that product is a vacuum cleaner.
Your product if nifty and neat, therefore, it’s in high demand. You want to be able to deliver these amazing vacuums to your customers quickly and efficiently!
Delivering products has a lot of logistics problems. There is no “one way” to deliver a product, you need different capabilities for different purposes.
A semi is quite versatile. It can do just about everything. You could even store all your vacuum cleaners on your semis. You can, if necessary, drive your semi up to your customer’s doorstep and drop off their brand new vacuum to them.
You can tell though, a semi isn’t going to be great at doing everything. Certainly storing all of your vacuum inventory on semi trucks can’t be the most efficient method; what happens when you run out of space? You’d have to buy another semi truck. You might also have difficulty finding the right vacuum you need to deliver due to having too many extra vacuums in the way. Often though, a semi won’t work for that last-mile delivery, getting the vacuum on the customer’s doorstep. Semi trucks are too big to fit on some streets or in some driveways.
Document storage databases, like MongoDB, are like a semi truck. Storing all of your data in a document-store can get you started, but it’s not going to be efficient. When you have enough data, you need to start sharding your database. This can add a lot of complexity finding the right data (or vacuum) when you need it. Document-stores aren’t optimized for searching, so searching over your data sets, even with an index, can be slow. It can be difficult to optimize queries to return only the data you need, resulting in a lot of wasted effort transmitting unnecessary data. Additionally, it isn’t quite fast enough to provide the blistering fast performance your application might demand.
Clearly using only semi trucks isn’t the best idea. What else do we need to make our delivery service work?
If semi trucks cannot provide the best storage for all of your vacuums, could you stored the vacuums you aren’t actively delivering in a warehouse? A warehouse is definitely going to be able to help keep your vacuums safe. With some foresight you can organize your warehouse so it’ll always be easy to find the right vacuum when you need it. Clearly, any good delivery service needs a warehouse.
SQL databases, like MySQL or PostgreSQL, are like warehouses. They are perfectly designed for storing all the data you need to organize and store.
Just like actual warehouses, SQL databases requires proper setup to scale. Without the proper foresight into your database setup, as you add data it will will increase retrievable times significantly, making your service slower. A poorly designed database will destroy your application’s ability to scale as the amount of data it’s indexing and organizing grows.
So what you might say? Rather than trying to organize your pile of vacuums, you might say “Let’s just buy another acre of land and expand our warehouse!” That doesn’t scale infinitely, just like throwing hardware at your failing database’s scaling needs. It cannot last.
There are times, even when your warehouse is perfectly organized, that retrieving the right vacuum from your inventory is still too slow. Warehouses are really only designed to hold your vacuum cleaners.
Since a warehouse cannot drop a vacuum off at a customer’s doorstep, if used only warehouses you’d have customers coming to pickup their order from your warehouse. That would be terrible slow and inconvenient to your customers. Clearly a warehouse isn’t going to be very good at those last-mile, to the doorstep, deliveries.
It is starting to feel like you’re going to need a warehouse and semi trucks; but clearly warehouses still have weaknesses. What can we do about that?
Robots are cool, aren’t they? When a warehouse has enough of those amazing vacuums being stored, it slows down the time it takes to find a specific one. What if we used robots to help us get the right vacuum even faster?
Warehouses often use robots to retrieve inventory from their shelves. This speeds things up, helping the products get to customers faster, while putting less pressure on the non-robotic employees of the warehouse.
Warehouse robots are similar to an indexing service, like Solr. An indexing service helps speed up searches on top of your massive amounts of data you keep in your databases. This performance can help reduce the time it takes to find whatever it is your application needs. As you continue to get more inventory in your warehouses, it becomes increasingly more important to have good indexing and automation.
Using warehouse robots can certainly help us speed up warehouses when they need to perform difficult searches. Now you’re getting to a reliable delivery service! If you use all three of the current options: semis, warehouses, and a robots, your delivery service will do pretty well! But what if you continue growing and have to optimize your delivery service even more?
As we know, a warehouse can’t even attempt to make a last-mile delivery; and semi trucks have a lot of challenges making it all the way to our customer’s doorsteps. So what about what about a van?
A van is much smaller than a semi, so it can fit down those narrow roads and onto tight driveways. That would allow our drivers to get those vacuums to our customers even faster! A van nimble, capable of tough crowded city roads or a country road.
A van is comparable to a key-value store, like Redis or any memcache. A key-value store is extremely fast! It provides amazing performance for data that’s been either loaded into it from a warehouse or semi truck, but it’s versatile enough it can be used for caching to help make the entire service faster.
Unfortunately, vans are cramped. You definitely cannot use them for storing your inventory. When they’re overfilled it becomes difficult to find the right vacuum very quickly if there is too much in the van.
Key-value stores provide amazing performance, but they rely heavily (or entirely) on a server’s memory to provide this performance. If you try to add too much to a key-value store it slows down the deliveries and it’s almost impossible to search a key-value store. They’re simply not designed for it.
Clearly your delivery service needs warehouses and either semi trucks or vans. Possibly all three! Depending on how popular our delivery service is, we might still need those warehouse robots. If you are using all of these options at your disposable, is it possible to outgrow what these provide us?
Sometimes a delivery service has too much going on. Sometimes it might just be getting so complicated, having multiple warehouses, dozens of semi trucks, potentially hundreds of vans and robots helping us out. We really need something to help piece everything together.
We clearly need something to help with these logistics. Something that can help us keep track of where everything is, whose is whats, and whats is whose.
You need an operations manager. You need someone, or something, that can stay aware of the high-level details, just the status of things. An operations manager doesn't need to know exactly how much inventory is available or when the earliest delivery possible might be; but the operations manager would help speed up operations between services, provide helpful logistics, reporting, insight, and analytics.
A graph database, such as Neo4j, is a great tool for this. While you’d be hard-pressed to use a graph database for storing any inventory, it can certainly help with that network of information. It can help speed up those messier queries and help you quickly know the exact status and location of the data you needed.
Clearly we now have a scalable delivery service. One that will allow us to continue to grow through new challenges and handle delivering any amount of vacuums our customers demand.
I hope you’ve come to realize that a single-database application will fail, like a delivery service that has only one type of mechanism to deliver a product. To build an application that can scale under a constantly growing demand, you need to use a variety of databases and tools. While a database like MongoDB is going to help you get started, if you don’t plan for growth you’ll quickly find challenges managing your growth.
I hope more people start mentioning the importance of using the right tool for the job and learn about the types of technologies that are available to help you through even the toughest scaling challenge.