Docker images can get really big. Many are over 1G in size. How do they get so big? Do they really need to be this big? Can we make them smaller without sacrificing functionality?

Here at CenturyLink we've spent a lot of time recently building different docker images. As we began experimenting with image creation one of the things we discovered was that our custom images were ballooning in size pretty quickly (it wasn't uncommon to end up with images that weighed-in at 1GB or more). Now, it's not too big a deal to have a couple gigs worth of images sitting on your local system, but it becomes a bit of pain as soon as you start pushing/pulling these images across the network on a regular basis. I decided it was worthwhile to really dig into the docker image creation process so that I could understand how it works and whether there was anything we could do differently to minimize the size of our images.

As a quick aside, Adriaan de Jonge recently published an article titled Create The Smallest Possible Docker Container in which he describes how to create an image that literally contains nothing but a statically linked Go binary that is run when the container is started. His entire image is an amazingly small 3.6 MB. I'm not going to be discussing anything quite so drastic here. As someone who's more comfortable working with languages like Python and Ruby I need a little bit more OS-level support and will happily sacrifice a hundred megabytes of space so that I can continue to run Debian and apt-get install my dependencies. So, while I'm jealous of Adriaan's diminutive image, I need to support a wider range of applications than is practical with his approach.

Layers

Before we can talk about how to trim down the size of your images, we need to discuss layers. The concept of image layers involves all sorts of low-level technical details about things like root filesystems, copy-on-write and union mounts -- luckily those topics have been covered pretty well elsewhere so I won't rehash those details here. For our purposes, the important thing to understand is that each instruction in your Dockerfile results in a new image layer being created. Let's look at an example Dockerfile to see this in action:

RUN mkdir /tmp/foo
RUN fallocate -l 1G /tmp/foo/bar

This is a pretty useless image, but will help illustrate the point about image layers. We're using the debian:wheezy image as our base, creating the /tmp/foo directory and then allocating a 1 GB file named bar in that directory. Let's build this image: $ docker build -t sample, sending build context to Docker daemon 2.56 kB:

Step 0 : FROM debian:wheezy ---> e8d37d9e3476

Step 1 : RUN mkdir /tmp/foo ---> Running in 3d5d8b288cc2 ---> 9876aa270471 Removing intermediate container 3d5d8b288cc2

Step 2 : RUN fallocate -l 1G /tmp/foo/bar ---> Running in 6c797329ee43 ---> 3ebe08b36733 Removing intermediate container 6c797329ee43 Successfully built 3ebe08b36733

If you read the output of the docker build command you can see exactly what Docker is doing to construct our sample image:

  1. Using the value specified in our FROM instruction, Docker will docker run a new container from the debian:wheezy image (the container's ID is 3d5d8b288cc2).
  2. Within that running container Docker executes the mkdir /tmp/foo instruction.
  3. The container is stopped, committed (resulting in a new image with ID 9876aa270471) and then removed.
  4. Docker spins-up another container, this time from the image that was saved in the previous step (this container's ID is 6c797329ee43).
  5. Within that running container Docker executes the fallocate -l 1G /tmp/foo/bar instruction.
  6. The container is stopped, committed (resulting in a new image with ID 3ebe08b36733) and then removed.

We can see the end result by looking at the output of the docker images --tree command (unfortunately, the --tree flag is deprecated and will likely be removed in a future release):

$ docker images --tree Warning: '--tree' is deprecated, it will be removed soon. See usage. └─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest └─59e359cb35ef Virtual Size: 85.18 MB └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy └─9876aa270471 Virtual Size: 85.18 MB └─3ebe08b36733 Virtual Size: 1.159 GB Tags: sample:latest

In the output you can see the image that's tagged as debian:wheezy followed by the two layers we described above (one for each instruction in our Dockerfile). We often talk about "images" and "layers" as if they are different things -- in reality each layer is itself an image. An image's layers are nothing more than a collection of other images. In the same way that we could say: docker run -it sample:latest /bin/bash, we could just as easily execute one of the untagged layers: docker run -it 9876aa270471 /bin/bash. Both are images that can be turned into running containers -- the only difference is that the former has a tag associated with it ("sample:latest") while the later does not. In fact, the ability to create a container from any image layer can be really helpful when trying to debug problems with your Dockerfile.

Image Size

Knowing that an image is really a collection of other images, it should come as no surprise that the size of an image is the sum of the sizes of its constituent images. Let's look at the output of the docker history command:

$ docker history sample IMAGE CREATED CREATED BY SIZE 3ebe08b36733 3 minutes ago /bin/sh -c fallocate -l 1G /tmp/foo/bar 1.074 GB 9876aa270471 3 minutes ago /bin/sh -c mkdir /tmp/foo 0 B e8d37d9e3476 4 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 4 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f 85.18 MB 511136ea3c5a 13 months ago 0 B

This shows all of the sample image's layers along with the instruction that was used to generate that layer and the resulting size (note that the ordering of layers in the docker history output is reversed from that of the docker images --tree output). There are only two instructions that contribute anything of substance to our image: the ADD instruction (which comes from the debian:wheezy image) and our fallocate command. Together, these two layers total 1.16 GB which is roughly the size of our image. Let's actually save our image to a tar file and see what the resulting size is:

$ docker save sample > sample.tar $ ls -lh sample.tar -rw-r--r-- 1 core core 1.1G Jul 26 02:35 sample.tar

When an image is saved to a tar file in this way it also includes a bunch of metadata about each of the layers so the total size will be slightly bigger than the sum of the various layers. Let's add one more instruction to our Dockerfile:

FROM debian:wheezy RUN mkdir /tmp/foo RUN fallocate -l 1G /tmp/foo/bar RUN rm /tmp/foo/bar

Our new instruction will immediately delete the big file that was generated with the fallocate command. If we docker build our updated Dockerfile and look at the history again we'll see the following:

$ docker history sample IMAGE CREATED CREATED BY SIZE 9d9bdb929b00 8 seconds ago /bin/sh -c rm /tmp/foo/bar 0 B 3ebe08b36733 24 minutes ago /bin/sh -c fallocate -l 1G /tmp/foo/bar 1.074 GB 9876aa270471 24 minutes ago /bin/sh -c mkdir /tmp/foo 0 B e8d37d9e3476 4 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 4 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f 85.18 MB 511136ea3c5a 13 months ago 0 B

Note that our addition of the rm command has added a new (0 byte) layer to the image but everything else remains exactly the same. If we save our new image we should see that it's almost exactly the same size as the previous one (it'll differ only in the little bit of space that is needed to store the metadata about the new layer we added):

$ docker save sample > sample.tar $ ls -lh sample.tar -rw-r--r-- 1 core core 1.1G Jul 26 02:55 sample.tar

If we were to docker run this image and look in the container's /tmp/foo directory we would find it empty (after all, the file was removed). However, since our Dockerfile generated an image layer that contains a 1 GB file it becomes a permanent part of this image.

Note: Each additional instruction in your Dockerfile will only ever increase the overall size of your image.

Clearly, this example is a little silly, but the idea that images are the sum of their layers becomes important when looking for ways to reduce the size of your images. In the following sections, I'll discuss some strategies for doing just that.

Choose Your Base

This is a pretty obvious tip, but the base image you choose can have a significant impact on your overall image size. Here are the sizes for some of the more common base images:

$ docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE scratch latest 511136ea3c5a 13 months ago 0 B busybox latest a9eb17255234 7 weeks ago 2.433 MB debian latest e8d37d9e3476 4 days ago 85.18 MB ubuntu latest ba5877dc9bec 4 days ago 192.7 MB centos latest 1a7dc42f78ba 2 weeks ago 236.4 MB fedora latest 88b42ffd1f7c 10 days ago 373.7 MB

On our team, we had been using the ubuntu image as the base for most of our work -- only because most of the team was already familiar with Ubuntu. However, after playing with the debian image we realized that it actually did everything we needed and saved us 100+ MB in image size. The list of viable base images are going to vary depending on the needs of the image you're building, but it's certainly worth examining -- if you're using Ubuntu when BusyBox would actually meet your needs you're consuming a lot of extra space unnecessarily. One thing I would love to see is the image size displayed on the Docker registry. Right now, the only way to see the size of an image is to first pull it to your local system.

Re-use Your Base

One of the nice things about the layered approach to creating images is that layers can be re-used across different images. In the example below you can see that we have three different images that all use debian:wheezy as their base:

$ docker images --tree Warning: '--tree' is deprecated, it will be removed soon. See usage. └─511136ea3c5a Virtual Size: 0 B Tags: scratch:latest └─e8d37d9e3476 Virtual Size: 85.18 MB Tags: debian:wheezy ├─22a0de5ea279 Virtual Size: 85.18 MB │ └─057ac524d834 Virtual Size: 85.18 MB │ └─bd30825f7522 Virtual Size: 106.2 MB Tags: creeper:latest ├─d689af903018 Virtual Size: 85.18 MB │ └─bcf6f6a90302 Virtual Size: 85.18 MB │ └─ffab3863d257 Virtual Size: 95.67 MB Tags: enderman:latest └─9876aa270471 Virtual Size: 85.18 MB └─3ebe08b36733 Virtual Size: 1.159 GB └─9d9bdb929b00 Virtual Size: 1.159 GB Tags: sample:latest

Each of the three images adds something on top of the debian:wheezy image, but there aren't three copies of Debian. Instead each image simply maintains a reference to the single instance of the Debian layer (one of the reasons I like the docker images --tree view is that it makes the relationship between the different layers very clear). This means that once you've pulled the debian:wheezy image you shouldn't have to pull those layers again and the bits for the image only exist once on your local file system. So you can save a significant amount space and network traffic by re-using a common base across your different images.

Chain Your Commands

In our sample image above we created a file and then immediately deleted it. That was a contrived example but similar scenarios often do occur when building images. Let's look at a more realistic example:

FROM debian:wheezy WORKDIR /tmp RUN wget -nv RUN tar -xvf someutility-v1.0.0.tar.gz RUN mv /tmp/someutility-v1.0.0/someutil /usr/bin/someutil RUN rm -rf /tmp/someutility-v1.0.0 RUN rm /tmp/someutility-v1.0.0.tar.gz

Here we're pulling down a tar file, extracting it, moving some files around and then cleaning up. As we saw before, each one of these instructions results in a separate layer. Even though we're removing the tar file and the extracted directory those things are still part of the image.

$ docker history some utility IMAGE CREATED CREATED BY SIZE 33f4a994ae27 16 seconds ago /bin/sh -c rm /tmp/someutility-v1.0.0.tar.gz 0 B fec7b5e24c1f 17 seconds ago /bin/sh -c rm -rf /tmp/someutility-v1.0.0 0 B 0851974d72e7 18 seconds ago /bin/sh -c mv /tmp/someutility-v1.0.0/someuti 12.21 MB 5b6b996f0364 19 seconds ago /bin/sh -c tar -xvf someutility-v1.0.0.tar.gz 99.91 MB 0eebad5d831c 20 seconds ago /bin/sh -c wget -nv http://centurylinklabs.co 55.34 MB d6798fc80576 8 minutes ago /bin/sh -c #(nop) WORKDIR /tmp 0 B e8d37d9e3476 5 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 5 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f7685a1 85.18 MB 511136ea3c5a 13 months ago

The wget command resulted in a layer that was 55 MB and then extracting the tar created a 99 MB layer. We don't actually need either of these things in the final image so we've got 150+ MB of wasted space here. We can work around this issue by refactoring the Dockerfile a little bit:

FROM debian:wheezy WORKDIR /tmp RUN wget -nv http://centurylinklabs.com/someutility-v1.0.0.tar.gz && \ tar -xvf someutility-v1.0.0.tar.gz && \ mv /tmp/someutility-v1.0.0/someutil /usr/bin/someutil && \ rm -rf /tmp/someutility-v1.0.0 && \ rm /tmp/someutility-v1.0.0.tar.gz

Instead of executing each command as a separate RUN instruction we've chained them all together in a single line using the && operator. While this makes the Dockerfile a little harder to read it allows us to clean-up the tar file and extracted directory before the image layer is committed. Here's what the resulting image looks like:

$ docker history some utility IMAGE CREATED CREATED BY SIZE 8216b5f01ea8 7 seconds ago /bin/sh -c wget -nv http://centurylinklabs.co 12.21 MB d6798fc80576 17 minutes ago /bin/sh -c #(nop) WORKDIR /tmp 0 B e8d37d9e3476 5 days ago /bin/sh -c #(nop) CMD [/bin/bash] 0 B 59e359cb35ef 5 days ago /bin/sh -c #(nop) ADD file:1e2ba3d9379f7685a1 85.18 MB 511136ea3c5a 13 months ago 0 B

Note that we end-up with exactly the same result (at least as far as the running container is concerned) but we've trimmed some unnecessary layers and 150 MB out of the final image. I wouldn't recommend that you go and string together every command in your Dockerfile but if you find a pattern like the one above where you're creating and later removing files then chaining a few instructions together can help keep your image size down.

Flatten Your Image

The strategies we've discussed so far all assume that you're building your own image or at least have access to the Dockerfile. However, there may be situations where you have an image that was created by someone else that you'd like to slim down. To do this we can take advantage of the fact that creating a container from an image results in all of the image's layers being merged. Let's return to our sample image (the one with the fallocate and rm commands) and docker run it:

$ docker run -d sample 7423d238b754e6a2c5294aab7b185f80be2457ee36de22795685b19ff1cf03ec

Since our image doesn't really do anything it exits immediately and we're left with a stopped container that is union of all our image layers (I used the -d flag here only so that the ID of container would be displayed). If we export that container and pipe the contents into the docker import command we can turn the container back into an image:

$ docker export 7423d238b | docker import - sample:flat 3995a1f00b91efb016250ca6acc31aaf5d621c6adaf84664a66b7a4594f695eb $ docker history sample:flat IMAGE CREATED CREATED BY SIZE 3995a1f00b91 12 seconds ago 85.18 MB

Note how the history for our new sample:flat image has only one layer and the total size is 85 MB -- the layer containing our short-lived, 1 GB file is completely gone. While this is a neat trick it should be noted that there are some significant downsides to flattening your images in this way:

  • By squashing all the layers together you lose all the efficiencies described above about sharing layers between images. Our sample:flat image now has embedded within in it a separate copy of the debian:wheezy image.
  • All of the metadata typically stored alongside the image gets lost in the run/export/import process. Things that may have been defined in the original image like exposed ports, environment variables and the default command are lost.

So I certainly would NOT recommend that you go and flatten all of your images, but it can be a useful in a pinch if you're trying to optimize someone else's image. Also, it can be a helpful tool if you simply want to see just how much space you could squeeze out of your own images.

Summary

Managing the size of Docker images is a challenge. It's easy to make them smaller without sacrificing functionality. Don't have an account on CenturyLink Cloud? No problem. Get started for free and receive a substantial credit toward any of our products or services.

Sign up for our Developer-focused newsletter CODE. Designed hands-on by developers, for developers. Keep up to date on topics of interest: tutorials, tips and tricks, and community building events.