Taming the Docker Blob

Or understanding how to best use Docker.

Docker is a great way to build services with modular and changeable components without borking your server / computer. I like to think of Docker containers as a system version of Python’s virtual environment – you can build a stack of services and applications through a Docker file, and then easily use this on different computers.

However, Docker can become an amorphous blob if you are not careful. Containers and volumes can multiply, until your computer starts freezing because you have used up 100GB of space.

There are two tricks I have learnt to manage my Docker-based systems:

  • Working out the difference between images and containers, and understanding the lifecycle of the latter; and
  • Clever use of volumes.

organic-1002892_640

Images

Images are like ISO disk images, the difference being that they are built layer-by-layer such that layers may be shared between images. An image may be thought of as a class definition.

Images are created when you issued a docker build command. To organise them make sure you build an image using the -ttag option (e.g. -t image_name). Images are normally identified by an ID in the form of a hash, so giving your image a name is useful. To view the images on your computer use: docker images .

Containers

Containers are the computers that are created from images. They can be thought of as virtual machines, or of instances of a class definition.

One image may be used to create multiple container instances. A container is created when you use the docker run command and pass an image name. I recommend also using the --name option to create a name for your container, e.g. --name my_container.

Running containers may be viewed using the docker ps command. What took me a while to work out is that stopped and exited containers are also around. These can be viewed by using the -a option, i.e. docker ps -a.

Another tip is to use the --rmflag to automatically remove temporary containers after use. Beware though: removing a container will also delete all data generated during the running of the container, unless that data is stored in a separate volume.

Containers are really designed to be run as continual background processes. If you are working on a desktop or laptop you may want to turn your machine off and on again. If you exit a container, you can restart it using docker restart my_container.

Volumes

Volumes are chunks of file system that are handled by Docker. They can be connected into multiple containers.

It’s good practice to explicitly create a volume, so that it is easier to keep track of. To do this use the docker volume create vol_name command.

I had a problem where I had very large databases that were filling up a local solid state drive (SSD) that I wanted to move to a 1TB hard drive. The easiest way I found to store volumes on a different drive is to create a symlink (by right-clicking in the Nautilus file manager or using ln -s) to the location where you want to store the data (e.g. /HDD/docker_volumes/vol_1) and then rename the link to match the proposed volume name (e.g. vol_1). Copy and paste this in the Docker volumes directory (typically, /var/lib/docker/volumes) and then create the volume, e.g. docker volume create vol_. The volume will now be managed by Docker but the data will be stored in the linked folder.

To use a volume with a container use the -v flag with the docker run command, e.g. docker run -v vol_1:path_in_container --name my_container my_image.

Checking Disk Usage

Once you get the hang of all this a good check on disk use may be performed using  docker system df -v. This provides a full output showing your Docker disk usage.

A Mongo Example

Here is an example that pulls this altogether. The situation is that we want to store some data in a Mongo database. Instead of installing Mongo locally we can use the mongoDocker image.

Now through detective work (docker inspect mongo_image) I worked out that the mongo Docker image is designed to work with two volumes: one that is mapped to the database directory /data/db in a container; and one that is mapped to the configuration database directory /data/configdb. If you don’t explicitly specify volumes to map to these locations, Docker creates them locally. Now these directories can grow quite large. I thus created two Docker volumes mongo_data and mongo_configwith symlinks to a larger hard disk drive as described above.

To download the image and start a container with your data you can then use: docker run -v mongo_data:/data/db -v mongo_config/data/configdb -p 27017:27017 --name mongo_container mongo. The -p flag maps the local port 27017 to the exposed Mongo port of the container, so we can connect to the Mongo database using localhost and the default port.

If you stop the container (e.g. to restart or switch-off your server/computer), you can restart it using docker restart mongo_container. You can now accidentally delete the container while still keeping the data, which is stored in the `mongo_data` volume. (Although I recommend backing up that data just in case you aggressively prune the wrong volumes!)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s