This blog post will help you understand the differences between two similar Dockerfile instructions – ADD and COPY – how they became what they are today, and our recommendation on which instruction you should use. (Hint: It's not ADD)

When building Docker images from a Dockerfile you have two instructions you can choose from to add directories/files to your image: ADD and COPY. Both instructions follow the same basic form and accomplish pretty much the same thing:

ADD <src>... <dest>
COPY <src>... <dest>

In both cases, directories or files (the <src>) are copied and added to the filesystem of the container at the specified <dest> path.

So if both instructions are equivalent, why do they both exist and which one should you use? Read on to find out.

TL;DR

If you're not interested in the nuances of ADD and COPY and just want an answer to "which one should I use?", all you need to know is: use COPY.

Monthly Docker Tips

In the Beginning...

Unlike the COPY instruction, ADD was part of Docker from the beginning and supports a few additional tricks beyond simply copying files from the build context.

The ADD instruction allows you to use a URL as the <src> parameter. When a URL is provided, a file is downloaded from the URL and copied to the <dest>.

ADD http://foo.com/bar.go /tmp/main.go

The file above will be downloaded from the specified URL and added to the container's filesystem at /tmp/main.go. Another form allows you to simply specify the destination directory for the downloaded file:

ADD http://foo.com/bar.go /tmp/

Because the <dest> argument ends with a trailing slash, Docker will infer the filename from the URL and add it to the specified directory. In this case, a file named /tmp/bar.go will be added to the container's filesystem.

Another feature of ADD is the ability to automatically unpack compressed files. If the <src> argument is a local file in a recognized compression format (tar, gzip, bzip2, etc) then it is unpacked at the specified <dest> in the container's filesystem.

ADD /foo.tar.gz /tmp/

The command above would result in the contents of the foo.tar.gz archive being unpacked into the container's /tmp directory.

Interestingly, the URL download and archive unpacking features cannot be used together. Any archives copied via URL will NOT be automatically unpacked.

Too Much Magic

Clearly, there is a lot of functionality behind the simple ADD instruction. While this makes ADD quite versatile it does NOT make it particularly predictable. Here's a quote from an issue that was logged against the ADD command back in December of 2013:

Currently the ADD command is IMO far too magical. It can add local and remote files. It will sometimes untar a file and it will sometimes not untar a file. If a file is a tarball that you want to copy, you accidentally untar it. If the file is a tarball in some unrecognized compressed format that you want to untar, you accidentally copy it. - amluto

The consensus seemed to be that ADD tried to do too much and was confusing to the user. Obviously, no one wanted to break backward compatibility with existing usage of ADD, so it was decided that a new instruction would be added which behaved more predictably.

Like ADD, But Less

When version 1.0 of Docker was released the new COPY instruction was included. Unlike ADD, COPY does a straight-forward, as-is copy of files and folders from the build context into the container.

COPY doesn't support URLs as a <src> argument so it can't be used to download files from remote locations. Anything that you want to COPY into the container must be present in the local build context.

Also, COPY doesn't give any special treatment to archives. If you COPY an archive file it will land in the container exactly as it appears in the build context without any attempt to unpack it.

COPY is really just a stripped-down version of ADD that aims to meet the majority of the "copy-files-to-container" use cases without any surprises.

Which to Use?

In case it isn't obvious by now, the recommendation from the Docker team is to use COPY in almost all cases.

Really, the only reason to use ADD is when you have an archive file that you definitely want to have auto-extracted into the image. Ideally, ADD would be renamed to something like EXTRACT to really drive this point home (again, for backward-compatibility reasons, this is unlikely to happen).

OK, but what about fetching packages from remote URLs, isn't ADD still useful for that? Technically, yes, but in most cases you're probably better off RUNning a curl or wget. Consider the following example:

ADD http://foo.com/package.tar.bz2 /tmp/
RUN tar -xjf /tmp/package.tar.bz2 \
  && make -C /tmp/package \
  && rm /tmp/package.tar.bz2

Here we have an ADD instruction which retrieves a package from a URL followed by a RUN instruction which unpacks it, builds it and then attempts to clean-up the downloaded archive.

Unfortunately, since the package retrieval and the rm command are in separate image layers we don't actually save any space in our final image (for a more detailed explanation of this phenomenon, see my Optimizing Docker Images article).

In this case you're better off doing something like this:

RUN curl http://foo.com/package.tar.bz2 \
  | tar -xjC /tmp/package \
  && make -C /tmp/package

Here we curl the package and pipe it right into the tar command for extraction. This way we aren't left with an archive file on the filesystem that we need to clean-up.

There may still be valid reasons to ADD a remote file to your image, but that should be an explicit decision and not your default choice.

Ultimately, the rule is this: use COPY (unless you're absolutely sure you need ADD).

Don't have a CenturyLink account? No problem. Get started with CenturyLink Cloud for free and receive a healthy credit toward any of our products or services.