Have you ever wondered how many images are hosted on the Docker Hub? Or what the most popular base images are? We've got answers to those questions and lots more in this overview of data we've pulled from the Docker Registry API.
The CenturyLink Labs team has been working with Docker for about a year and a half now. One of the things that has been fascinating to see is the growth of the community around Docker in that time. One place you can see this growth pretty clearly is in the number of public images hosted on the Docker Hub.
When the Docker Hub was announced at DockerCon 2014, Ben Golub (CEO, Docker Inc.) made a point of mentioning that there were already "over 14,000 Dockerized applications" in their public registry. Compare that to DockerCon 2015 where Marianna Tessel (SVP Engineering, Docker Inc.) showed that the Docker Hub had grown to over 150,000 repos. That's a tenfold increase in one year!
Seeing the crazy growth in the number of images pushed to the Docker Hub got me wondering what other stats we could tease out of all that data. Many of the projects we've worked on (Panamax, Image Layers, etc...) have direct linkages to the Docker Hub so we knew that we could get a lot of information by simply querying the Docker Registry API. We decided to see if we could mine the data from the Hub to learn anything interesting about the 150,000 repositories being hosted there.
Note that repositories and tags are constantly being added to and deleted from the Hub so the data below is just a snapshot at a particular point in time. The specific numbers shown below will likely have changed by the time that you read this, but the general scale of these numbers is still interesting.
Our last count showed 125,289 public repositories. This is slightly lower than the 150,000 repos mentioned at DockerCon in June (and that number has certainly grown since then), but the difference can likely be attributed to the number of unlisted and private repositories.
Of those public repos, 39,441 (31%) are using automated builds which means that the images are being built by the Hub itself using a published Dockerfile. The remaining 85,848 repos house images that have been uploaded manually via a
docker push command.
There are 84 official repositories representing common services like MySql, MongoDB and Redis. This a pretty big increase over the 13 repos that were part of the initial launch of the Official Repositories program when it was announced at DockerCon 14.
The CenturyLink Labs team has published a number of images but we can't even begin to compete with some of the more prolific organizations publishing to the Docker Hub. Below are the ten organizations with the most public repos:
I'm not sure what datdocker is doing with those 8261 repositories but it looks like they may be generating new repositories as part of some CI process (there are lots of similarly named repos with numerically increasing suffixes).
In case you were wondering, the CenturyLink Labs team with our 65 repositories ranks 30th on this list.
It would be interesting to see which images have been pulled the most but that information isn't available via the Docker Registry API. The closest metric available for image popularity is the star count.
Not surprisingly, the repos with the highest star counts are all official repositories. The top ten official repos are:
The top ten non-official repos are:
The most surprising entry on this list for me was the aspnet image. There are clearly a lot of people in the Microsoft dev community who are excited about Docker.
Across the 125,289 public repos, there are 243,966 tagged images. That's an average of 1.9 tags per repo. There are over 100,000 repos (82%) that have only a single tag and then there are a handful of repos with thousands of tags.
The overall winner is rstiller/jetty with 2104 tags (they appear to be creating tagged images for every possible combination of OS, JDK and Jetty version).
The top five most common tag names across all the repositories are:
Given that latest is the tag assigned by default if you don't specify one explicitly, it's not surprising to see it at the top of the list by a huge margin.
It didn't make the top-ten, but the most interesting tag we found was "kitten" which has been applied to 73 different images. My best guess is that it has something to do with this Kubernetes demo app.
Interestingly, there are 247 repositories with no tags at all -- they are just empty repos with no images.
One of the neat things about Docker's layered image format is that you don't have to build all of your images from scratch -- you can use any of the public images as a foundation for creating your own image. With that in mind we wanted to see which images were being referenced most often as the base for other images.
Here are the top ten most referenced base images:
|4||d0955f21bf24||ubuntu:trusty, ubuntu:14.04, ubuntu:latest *||7,196|
|5||9a61b6b1315e||debian:8, debian:8.1, debian:jessie, debian:latest||6,742|
|6||d2a0ecffe6fa||ubuntu:14.04, ubuntu:14.04.2, ubuntu:latest, ubuntu:trusty, ubuntu:trusty-20150630||6,301|
|7||b39b81afc8ca||ubuntu:trusty, ubuntu:14.04, ubuntu:latest *||5,766|
|9||bf84c1d84a8f||debian:jessie, debian:latest *||5,395|
|10||5506de2b643b||ubuntu:trusty, ubuntu:14.04, ubuntu:latest *||5,342|
The scratch image is a pretty obvious #1. Prior to Docker 1.5, scratch was the empty layer at the root of every image. Almost every image built with Docker 1.5 or earlier will have scratch at the base of its layer hierarchy.
Clearly, Ubuntu is extremely popular with different flavors occupying 6 of the top 10 spots on the list. It's a natural choice given that a lot of developers are familiar with it. However, I encourage you to look at alpine, especially if you are interested in minimizing the size of your images.
The alpine image is 5MB (compared to ubuntu's 188MB) and has a nice package manager. So you can start small and then add just the packages you absolutely need for your application. We've slowly been migrating a lot of our images from Ubuntu to Alpine and have seen some significant size reductions as a result.
Note that four of the layers shown in the "top 10 most referenced base images" list (marked with a *) don't currently have a tag associated with them. They were likely tagged at one point with the values shown in the table above, but those tags have since been reassigned.
For example, at some point in the past the layer
bf84c1d84a8f was tagged as debian:latest (as well as debian:jessie). While those tags were in place a number of images were built that used
FROM debian or
FROM debian:jessie in their Dockerfile.
Some time after those images were built, the Debian images were rebuilt and the tags were moved to point at different layers. This probably isn't a big deal for most of these images, but it is worth noting that image tags are not static -- they may change to point to different images over time.
If you want your image to always be built upon the latest and greatest Debian Jessie or Ubuntu Trusty you need to make sure that you track those tags and rebuild your image whenever it changes. The repository links feature on the Docker Hub can help you rebuild your images automatically whenever your base image changes.
Trends Over Time
As I said above, this is just a snapshot of the data (as of August 6th, 2015). It will be really interesting to see how these trends evolve over time.