{ tech }

LXC and Docker

December 14, 2014 | 7 minutes read

Tags: devops, container, docker, lxc, talk

Slides from my Talk about LXC and Docker.

  • Introduction
  • LXC and Docker in a Nutshell
  • Isolation and Resource Managment
  • Security Considerations
  • Container Applications
  • Outlook

  • full virtualization: enables running an unmodified OS.
    • Examples: Parallels , VirtualBox, XEN
  • paravirtualization: enables running a modified guest system (kernel)
    • Examples: XEN
  • OS-level virtualization: enables running an isolated process (tree)
    • Examples: OpenVZ, LXC, BSD-jails, Linux-VServer, Solaris Zones
    • Virtualized Containers: LXC, Docker


  • Lightweight (almost no overhead)
  • Docker Inc.: 55M venture capital
  • Docker Supporters:
    • RedHat (OpenShift)
    • Microsoft (Azure, Windows Server)
    • Google (GCE)
    • Amazon (AWS Beanstalk)

From https://linuxcontainers.org:

  • LXC is a userspace interface for the Linux kernel containment features.
  • … it lets Linux users easily create and manage system or application containers.

Started in 2008, implemented in C/Python


From http://www.docker.com:

  • Build, Ship and Run Any App, Anywhere
  • Docker - An open platform for distributed applications for developers and sysadmins.
    • portable, lightweight runtime and packaging tool
    • cloud service for sharing applications and automating workflows
  • As a result, IT can ship faster and run the same app, unchanged, on laptops, data center VMs, and any cloud.

Started in 2013, implemented mostly in Go




  • Create debian wheezy LXC container
  • Install and test openssh-server
  • Cleanup container
  • lxc-create / lxc-start / lxc-stop / lxc-destroy
  • lxc-attach: shell in running container
  • lxc-ls: list container (status)
  • pstree: show process tree on host


  • Create salt-master image
  • Run salt-master container
  • Publish salt-master image


  • Filesystem (Layer)
  • Meta-Information:
    • Exposed ports
    • Mountable volumes
    • Entrypoint / Command
    • (typically) derived from other layer
  • Template aka Dockerfile

FROM ubuntu
RUN export DEBIAN_FRONTEND=noninteractive && \
    apt-get update && \
    apt-get install -y salt-master
    apt-get clean

EXPOSE 4505
COPY master /etc/salt/master

VOLUME ["/etc/salt/pki"]
VOLUME ["/srv/"]

ENTRYPOINT [ "/usr/bin/salt-master" ]


Containers are instances of images

  • Create & Start = Run
  • Can be stopped and restarted
  • Have their own filesystem layer ontop


Images with appropriate tag can be pushed / pulled.

  • docker [push|pull] imagename
  • Tag Format: [REGISTRYHOST/][USERNAME/]NAME[:TAG]
    • Public (“official”) busybox, ubuntu:trusty, redis
    • Public (user): martinhoefling/salt-minion:latest
    • Private: localhost:5000/saltmaster
  • Public Registry (aka Dockerhub)
  • Private Registry
    • docker run -p 5000:5000 registry


One of the most wanted features / new in docker 1.3

  • Run a shell to inspect a running container
  • Run a utility script, i.e. mongodump database

docker exec -t -i mycontainer /bin/bash

Should not be (ab)used to start multiple daemon processes per container (Docker antipattern)

See also Best Practices


LXC and Docker use same or similar kernel and userland functionality.


control groups (CG) allow limiting resources of a process tree. E.g. mem, cpu, device access, …



Isolation of:

  • processes (PID namespace), the parent pid ns sees all pids
  • network (network namespace), separates physical and virtual nic
  • hostname (UTS namespace)
  • file system layout (mount namespace)
  • users and groups (user namespace), user mapping including the root user (i.e. to nonpriviledged ones in the parent namespace)
  • sysv inter-process communication (IPC namespace), separates shared memory, pipes, sockets, etc.

Create namespaces via clone syscall instead of fork. Namespaces are found in /proc/$PID/ns/.

Verbose Article



Another union fs allows creting a file system union by merging separate filesystems as layers into one virtual filesystem.

mount -t aufs -o br=/home_ssd=rw:/mnt/storage/home_hdd=rw,udba=reval none /home
mount -t aufs -o br=/volatile=rw:/mnt/home_nfs=ro,udba=none none /home

  • Apparmor / SELinux: provide Mandatory Access Control via profiles
  • seccomp-bpf (LXC): blacklist or whitelist system calls
  • capability drop: dropping capabilities not required

Optional:

  • GRSEC / PAX enabled kernel

Summarizing Talk




Main attack vector is the Hypervisor

Examplary Xen weakness: CVE-2014-7188

  • Improper MSR range used for x2APIC emulation

Main attack vector: … not so simple …

  • Proper implementation of kernel isolation, i.e. namespaces, cgroups
  • Security of syscalls (kernel)
  • Exotic filesystems / Aufs (Docker)
  • Proper configuration of SELinux / Apparmor
  • Direct device access is dangerous! Avoid if possible.

  • Providing isolated Environments instead of running on one instance: better than without
  • Isolation of Simple Services, networking only, unprivileged user, no device access: mostly safe in default configuration
  • Service with hardware access: can be difficult to lock down
  • Multi Tenant Platform: quite some effort to lock down / stay secure



FROM ubuntu
MAINTAINER Martin Hoefling <martin.hoefling@gmx.de>
ENV GOPATH /root/go
RUN DEBCONF_FRONTEND=noninteractive \
    apt-get update && \ 
    apt-get install -y golang git mercurial && \
    apt-get clean && \
    mkdir /root/go && cd /root/go && \
    go get github.com/syncthing/discosrv && \
    apt-get remove -y --purge golang git mercurial && \
    apt-get autoremove -y && \
    cp bin/discosrv /usr/local/bin && rm -rf /root/go

EXPOSE 22026/udp
ENTRYPOINT /usr/local/bin/discosrv
  • Resulting image (layer) with only 7MB
  • Service with isolated environment ready to start

FROM ubuntu:trusty
RUN echo "deb http://repo.aptly.info/ squeeze main" > \ 
    /etc/apt/sources.list.d/aptly.list && \
    apt-get update && \
    apt-get install -y aptly ca-certificates && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

COPY aptly.conf /etc/aptly.conf
VOLUME ["/aptly"]
ENTRYPOINT ["/usr/bin/aptly"]

#!/bin/bash
docker run --rm -v /mnt/aptly:/aptly aptlyimage $@
$ aptly mirror create wheezy-security http://security.debian.org/ wheezy/updates main
$ aptly mirror update wheezy-security
$ aptly db cleanup
  • Portable to other machines, running Docker
  • Isolated upgrades / environment

FROM martinhoefling/salt-minion:debian
MAINTAINER Martin Hoefling <martin.hoefling@gmx.de>

COPY dovecot /srv/salt/dovecot
COPY pillar.example /srv/pillar/example.sls

RUN echo "file_client: local" > /etc/salt/minion.d/local.conf && \
    echo "base:" > /srv/pillar/top.sls  && \
    echo "  '*':" >> /srv/pillar/top.sls  && \
    echo "    - example" >> /srv/pillar/top.sls && \    
    salt-call --local --retcode-passthrough state.sls dovecot
docker build .

Jenkins pipeline for a web project python / javascript.

  • Python:
    • PyLint (Virtualenv)
    • Nosetests (Virtualenv)
  • Javascript:
    • Various Lints (Node packages)
    • Jasmine Tests (Node packages)
  • End to End:
    • CasperJS:
      • api, portal and realtime tornado servers (Virtualenv)
      • frontend build (Node packages, Bower components)
      • elasticsearch, mongo, redis
      • CasperJS client (Node packages)

FROM myregistry:5000/trusty:latest

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y tar git nodejs python 
RUN npm update -g && npm install -g grunt-cli bower casperjs
RUN pip3 install virtualenv

COPY dependencies/ /opt/
RUN virtualenv -p /usr/bin/python3 /opt/backend && \
    cd /opt/backend/ && ./bin/pip3 install -r requirements-dev.txt
RUN cd /opt && npm install && \
    cd /opt/frontend/portal/ && npm install
RUN cd /opt/frontend/ && bower --allow-root install

COPY start.sh /usr/local/bin/start.sh

EXPOSE 5000 5001 44444

VOLUME ["/sourcetree", "/data/log"]
ENTRYPOINT ["/usr/local/bin/start.sh"]

No sources in container, rebuild only required when dependencies change!


  • Update source tree & copy build dependencies
  • Build image (instantaneous if dependencies are the same)
  • Run all tests / lints / builds in parallel. Each:
    • Spin up container(s) with mounted source tree
    • Run testsuites / app / frontend build in parallel
    • Destroy containers
    • Evaluate Logs
  • ** Full Jenkins Pipeline Run in 2:50 minutes (before ~16 minutes) **

  • Rocket (App Container engine): Docker competitor from the CoreOS people.
  • Kubernetes (Docker Orchestration): Google (GoogleCloudPlatform, GCE), based on Salt
  • Docker Machine and Docker Swarm
  • LXD (LXC Orchestration): OpenStack Nova plugin from the makers of LXC.
  • Mesosphere “Datacenter Operating System”

(*) www.flockport.com/lxc-vs-docker