Docker images are “supposed” to be small and fast. However unless you’re precompiling GO binaries and dropping them in the
busybox image they can get quite large and complicated. Without a well constructed
Dockerfile to improve build cache hits your docker builds can become unnecessarily slow.
Dockerfile’s are regularly [and incorrectly] treated like
bash scripts and therefore are often written out as a series of commands which you would
curl | sudo bash from a website to install. This usually makes for an inefficient and slow
When you’re building a new
Dockerfile for an application there can be a lot of trial and error in determining what packages are needed and what commands need to run. Optimizing your
Dockerfile ensures that the build cache will hit more often and each build between changes will be faster.
The general rule of thumb is to sort your commands by frequency of change, the time it takes to run the command and how sharable it is with other images.
This means that commands like
ENV should go towards the bottom while a
RUN apt-get -y update should go towards the top as it takes longer to run and can be shared with all of your images.
ADD ( or other commands that invalidate cache ) commands should go as far down the bottom as possible as this is where you’re likely to make lots of changes that will invalidate the cache of subsequent commands.
There’s a lot of base images to choose from from the bare OS images like
ubuntu:trusty to application specific ones for
java:7. Common sense might tell you to use
ruby:2 to run an ruby based app and
python:3 to run a python app. However now you have two base images with little in common that you need to download and build. Instead if you use
ubuntu:trusty for both then you only need to download the base image once.
Each command in a
Dockerfile is an extra layer. You can very quickly end up with an image that’s 30+ layers. This is not necessarily a problem, but by joining
RUN commands together, and using a single
EXPOSE line to list all of your open ports you can reduce the number of layers.
RUN commands together intelligently you can share more layers between containers. Of course if you have a common set of packages across multiple containers then you should look at creating a seperate base image containing these that all of your images are built from.
For each layer that you can share across multiple images you can save a ton of disk space.
If you use Volume containers, don’t bother trying to save space by using a small image, Use the image of the application you’ll be serving data to. If you do that and
docker commit the data volume you not only have your data commited to the container, but the actual application as well which is very useful for debugging.
If you’ve built an image and discover when you run it that there’s a package missing add it to the bottom of your
Dockerfile rather than in the
RUN apt-get command at the top. This means you can rebuild the image faster. Once your image is correct and working you can reorganize your
Dockerfile to clean such changes up before commiting it to source control.
Dockerfile for installing graphite would look something like this if it was written like a
FROM ubuntu:trusty MAINTAINER Paul Czarkowski "email@example.com" RUN apt-get -yq update # Apache RUN \ apt-get -yqq install \ apache2 \ apache2-utils \ libapache2-mod-python \ python-dev \ python-pip \ python-cairo \ python-pysqlite2 \ python-mysqldb \ python-jinja2 sqlite3 \ curl \ wget \ git \ software-properties-common RUN \ curl -sSL https://bootstrap.pypa.io/get-pip.py | python && \ pip install whisper \ carbon \ graphite-web \ 'Twisted<12.0' \ 'django<1.6' \ django-tagging # Add start scripts etc ADD . /app RUN mkdir -p /app/wsgi RUN useradd -d /app -c 'application' -s '/bin/false' graphite RUN chmod +x /app/bin/* RUN chown -R graphite:graphite /app RUN chown -R graphite:graphite /opt/graphite RUN rm -f /etc/apache2/sites-enabled/* ADD ./apache-graphite.conf /etc/apache2/sites-enabled/apache-graphite.conf # Expose ports. EXPOSE 80 EXPOSE 2003 EXPOSE 2004 EXPOSE 7002 ENV APACHE_CONFDIR /etc/apache2 ENV APACHE_ENVVARS $APACHE_CONFDIR/envvars ENV APACHE_RUN_USER www-data ENV APACHE_RUN_GROUP www-data ENV APACHE_RUN_DIR /var/run/apache2 ENV APACHE_PID_FILE $APACHE_RUN_DIR/apache2.pid ENV APACHE_LOCK_DIR /var/lock/apache2 ENV APACHE_LOG_DIR /var/log/apache2 WORKDIR /app # Define default command. CMD ["/app/bin/start_graphite"]
However an optmized version of this same Dockerfile based on what was discussed earlier would look like the following:
# 1 - Common Header / Packages FROM ubuntu:trusty MAINTAINER Paul Czarkowski "firstname.lastname@example.org" RUN apt-get -yq update \ && apt-get -yqq install \ wget \ curl \ git \ software-properties-common # 2 - Python RUN \ apt-get -yqq install \ python-dev \ python-pip \ python-pysqlite2 \ python-mysqldb # 3 - Apache RUN \ apt-get -yqq install \ apache2 \ apache2-utils # 4 - Apache ENVs ENV APACHE_CONFDIR /etc/apache2 ENV APACHE_ENVVARS $APACHE_CONFDIR/envvars ENV APACHE_RUN_USER www-data ENV APACHE_RUN_GROUP www-data ENV APACHE_RUN_DIR /var/run/apache2 ENV APACHE_PID_FILE $APACHE_RUN_DIR/apache2.pid ENV APACHE_LOCK_DIR /var/lock/apache2 ENV APACHE_LOG_DIR /var/log/apache2 # 5 - Graphite and Deps RUN \ apt-get -yqq install \ libapache2-mod-python \ python-cairo \ python-jinja2 \ sqlite3 RUN \ pip install whisper \ carbon \ graphite-web \ 'Twisted<12.0' \ 'django<1.6' \ django-tagging # 6 - Other EXPOSE 80 2003 2004 7002 WORKDIR /app VOLUME /opt/graphite/data # Define default command. CMD ["/app/bin/start_graphite"] # 7 - First use of ADD ADD . /app # 8 - Final setup RUN mkdir -p /app/wsgi \ && useradd -d /app -c 'application' -s '/bin/false' graphite \ && chmod +x /app/bin/* \ && chown -R graphite:graphite /app \ && chown -R graphite:graphite /opt/graphite \ && rm -f /etc/apache2/sites-enabled/* \ && mv /app/apache-graphite.conf /etc/apache2/sites-enabled/apache-graphite.conf
This is our most shareable layer. All the images running on the same host should start with this. You can see I’ve added a few things like
git which while they’re not necessarily needed they’re useful for debugging and because they’re in such a shareable layer, they don’t take up much room.
Here we get to our language specifications. I’ve included the Python and Apache sections here because it’s not super clear which should go first.
If we put python first, then any other image that uses Apache can get a few free python packages, If we put Apache first then we could have a Ruby app that also includes that layer and get Apache for free ( hell you can just give it python for free anyways ).
I’m calling these out seperately for a few reasons.
Firstly, they should come either directly directly after the Apache section so that it’s easier to make them common ( and cached ) between multiple images. You might not think it matters since calls like
ENV are so cheap, but I have seen random
ENV calls take 10 seconds or so. If you have a lot, then its good to keep them cached, but you also don’t want a changed
ENV to invalidated the cache of installing Apache.
They’re a pretty good example of something you might want to start with at the bottom of your container and move them up higher once you’re unlikely to change them again.
Secondly, to mention that I really wish Docker provided a way to specify multiple ENVS on the same line so that I can reduce the number of layers I end up with.
This contains some Graphite specific
pip packages. You could join them into a single command by joining them with
&& but I kept them seperate so that if
pip package requirements change it won’t need to also reget the
This contains a bunch of cheap commands like
VOLUME they’re probably less likely to change than the previous package installs, but are also cheaper to run, so its less important if their cache is invalidated.
Keep them towards the bottom though as you don’t want any changes to them to invalidate the cache for a more costly command.
You should wait until the last possible moment to use the
ADD command as any commands after it are never cached.
I have grouped these final commands into a single layer and they’re after the
ADD commands as they manipulate files that come from the
Hopefully this has given you some insight into how to build a better
Dockerfile. These are all things I have learned from experience in building my own Docker images and while they may not apply to all situations ( or may be flat out wrong ) they defintely seem to improve my development experience.