Docker Application: Controlled Persistence

The past few days I’ve been working a lot with Docker. As with many powerful tools, there are quite a few ways to get the job done. The most esoteric of them being how to persist data within Docker’s containers. From bind-mounting directories to your local system, to spinning up containers with only one purpose in life – storage. Here I’m going to be discussing and experimenting with a slightly different way of handling the issue: Controlled Persistence.


The project I’m going to be talking about is located here. Before I get into the data part of it, I’d like to briefly talk about the application itself. The architecture is loosely based on this well-written demo Redis Application by Anand Mani Sankar. I especially liked the load balancing and you’ll see the same technique in this example. For the sake of keeping things simple and compartmentalized I only used the official images in the registry for ubuntu, mongo, redis, and nginx while only creating a Dockerfile for the node application itself. So really, each container only does one thing and one thing well. The application itself simply inserts a new cat into the Cat collection on every page refresh. It then displays the cat inserted and how many cats are in the system. Cool.

Automating Builds

Something I really love in theory is docker-compose, mainly because of the YAML. However, as of this writing it just doesn’t have the control and features I would like. So in the spirit of automation, I felt a Makefile was appropriate for handling the most common commands necessary while developing and deploying to staging and production.

It’s fairly straight forward.  It basically allows you to build the image and run and destroy the application. If you look deeper in the Makefile itself, you’ll see there isn’t anything too special happening. Here is how the mongo container gets built:

Notice that the data for the mongo db’s is not mounted.

Data & Beyond

There are 2 volumes being created – a mongod.conf file and a random backups directory. You’ll notice that there isn’t a  /data directory for local storage. That isn’t the best practice because it really relies on the host admin to make sure permissions are correct, the file exists, ect. You’ll also notice there isn’t a --volumes-from  flag pointing to another “dummy” instance of itself. How is the data any more or less safe in a different container? It’s not like we’re going to be destroying this one any more frequently than you’ll be rm -v ‘n the store container. Let the container do what it does well – database stuff.

Times to use a data-only container:

  1. Storing logs (or any other static data type – data backups, zip files, ect.) across load-balanced containers. Also, if you use a data-container, use the same image to avoid permission issues.

Times NOT to use a data-only container:

  1. If you’re load balancing database containers then you should be using a sharded cluster instead.
  2. You properly setup the architecture so that each container has one job to do. Don’t over complicate their job by splitting up where data is stored.

Which leads me to the make save  and make restore  commands. Using these you can take snapshots of your application’s data at any point in time. In turn, you can restore the data at any point in time as well.

Take a look at the Makefile:

Note that /var/backups is bound to the host. This way the actual data is stored within the container and backups (dumps) of the data are stored in a controlled and persistent state. More importantly, if the backup folder doesn’t exist at the time of running or if there are specific users involved, there will be no issues. If you really wanted to keep things in containers, you could use a data-container to hold these backups. There would be absolutely nothing wrong with that since it is static content.

Generally, containers won’t get completely removed after a certain point in the development so it’s not like you’re going to be destroying the database container regularly. However, those times when you feel like destroying everything can now be more controlled. In fact, even more desirable. You can easily visit your application in its squeaky clean state, and quickly get it back to a populated state in seconds. Or, roll back to a previous point in time (will be implementing this soon, currently only pulls the latest). I think that flexibility is pretty powerful.

Another interesting aspect is how deployments work. A post deployment script could look something like this:

Or how about a cron job for daily backups? Simple.  make save .

As time goes on I will be making updates and perhaps complete re-writes to this demo project in my discovery process for Docker. Please feel free to join me on GitHub.