Self-hosting services w/Docker // steve gattuso

My approach to a reliable self-hosting setup using Docker.

Introduction

This guide is meant for the more technically adventurous amongst us. That is, those who are willing to get their hands dirty in some basic devops work in order to self-host applications on servers they own or control.

If the idea of self-hosting is intimidating to you, fear not! There are many paid hosting services that can abstract much of this complexity away from you. While I haven't tried it myself, Pikapods looks like an interesting service that can automate the deployment of self-hosted applications.¹ The service is from the creator of a backup service I've used for a while now and been happy with: Borgbase.

For those of you who are still interested in the DIY method, read on!

All of the self-hosted services I use are hosted on a single server using a docker-compose file. After years of playing around with various ways of hosting applications I've settled on this one as a reasonably simple, stable, secure, and repeatable process for keeping the services I rely on up and running. Your preferences and goals may vary from mine and I would encourage you to see this guide as a starting point. I'd also love feedback and suggestions. If you find ways to improve this framework, please reach out.

Lastly, there are a few places on this page where I hand waive over the details and recommend reading the documentation. I realize that isn't always the most helpful, so if you have questions or want elaboration on anything, please don't hesitate to send me an email!

Overview

`infra/`

The general framework for my docker setup is a single folder called infra/ that's laid out like this:

infra/
    volumes/
        photoprism/
        plex/
        freshrss/
        ...
    config/
        caddy/
        freshrss/
    scripts/
        update
        backup
    systemd/
        backup.service
        backup.timer
  docker-compose.yaml

Brief descriptions of what each directory/file does:

volumes/ is for storing the bigger volumes of data associated with each service. Some examples:
* A directory like photoprism/ would have your entire photo library within it.
* plex/ has all of your media libraries
* freshrss/ contains the database used by FreshRSS to manage your RSS feeds.
config/ stores all of the text-based configuration files for each service you run. These configuration files should be small enough that you're comfortable managing them in git.
* Each config file (or directory) will be teleported into each service's docker container via a volume mount later on.
scripts/ stores shell scripts that make basic maintenance tasks like backing up and updating container images quick and easy.
systemd/ contains any systemd unit files that may be necessary for running scheduled tasks (eg daily backups).
docker-compose.yaml defines all of the services/applications and configures the various environment variables, volume mounts, etc. Everything you'd expect docker to handle. More on this below.

The idea of this framework is that everything necessary to run your self-hosted services live within this infra/ directory. It's all self-contained. This setup has a few advantages:

It's easy to know where everything is. You don't need to worry about backing up configuration files hidden amongst other system configurations in a shared /etc directory.
All of your configuration can be tracked with git. If you add volumes/ to your .gitignore you'll have a small text-only repository with a change history to ensure you can never go too wrong in your experimentation.
Backups and restores are very simple. The backup tool only really needs to keep track of volumes/, where all of your data lives. Restoring is a matter of cloning the infra/ repository and restoring the volumes/ directory from the backup tool.

Accessing web services

Once your docker containers start spinning up, you'll next need to figure out how to securely access any web interfaces. You'll generally want to access these via some convenient domain; something like photoprism.yourdomain.com. Doing this will require using a publicly accessible web server that can reverse proxy requests back to your containers.

Nginx is a popular webserver used for this task, however I found its configuration and maintenance requirements to be overly complicated for the needs of a home server. Instead I've found success in using Caddy. It's open-source, has a much simplified configuration format, and has all of the bells and whistles necessary to securely host home services.

Additionally, I recommend locking down your services even further by using a virtual networking service like TailScale or ZeroTier. You can combine the private networks created by either of these services with rules in Caddy to ensure that only devices you trust (and that exist within your private network) can access services like your photo library or file server. This is helpful for reducing the surface area available for potential hackers to exploit. If you go down this path you can still use custom domains, just make sure they point to the IP address within your private network rather than your server's publicly facing IP.

Backups

As mentioned above, a major convenience of this setup is the ease of creating backup images and restoring from them. A full restoration consists of re-cloning the infra/ repository, restoring the contents of the infra/volumes directory, and running docker-compose up on each of your containers.

I use borg, a simple command-line backup utility and BorgBase as a host for storing my backup images. I've landed on this solution for a few reasons:

Borg has a relatively simple command-line interface that is easy to understand and build scripts around.
It deduplicates data to ensure that the backup repository doesn't balloon in size over time.
Borg also allows for backups to be encrypted on-device before being uploaded to a remote host. This means that my backup images are safe even if I don't trust the hosting service I'm uploading them to.
Borgbase was the cheapest option when I was browsing alternatives but, most importantly, has a feature to send an alert if a new backup hasn't been received in over N days.

I create backups daily and keep a few weeks of daily snapshots, a few months of monthly snapshots, and a yearly snapshot (see borg prune for details on how to set this up).

TODO: Switch to borgmatic for configuring backups. As of now I'm backing up Postgres databases via their data directory. The official Postgres docs explain why this is a bad idea, and pg_dump should be preferred.

Nitty Gritty Details

Up until this point, I've tried to avoid turning this page into a mess of configuration files and technical details. In general, I'd prefer to provide a high-level overview of how things work here and leave the details up to the reader. Each of these projects' documentation will always be better written and more up-to-date than what I can provide here.

That being said, there are a few small but important tricks I've landed on within some of my configurations that I think are worth sharing here

Configuring Docker

A typical service in my docker-compose.yaml looks like this:

  plex:
    image: plexinc/pms-docker
    restart: always
    container_name: plex
    env_file: ./env/plex.env
    environment:
      - TZ="America/New_York"
      - PLEX_UID=1010
      - PLEX_GID=1010
    volumes:
      - ./volumes/plex/config:/config
      - ./volumes/plex/transcode:/transcode
      - ./volumes/plex/Music:/data/music
    ports:
      ...

A major note is that, when possible, I try to specify a UID and GID parameter to the process in order to ensure that the service is running as a user/group that my linux user has permission to access. If you don't set these parameters explicitly you can end up in situations where all of the files in your ./volumes directory are inaccessible to you (a pain if you want to modify something manually) or inaccessible to the container's process (a pain if you want to upload files into a container's volume).

Adding restart: always will also save you time if your application's process crashes or if your server goes down. See the docs for the details on the various options available here, but you'll generally want something like this to prevent downtime.²

I also prefer splitting out most environment variables into separate .env files, however you can also just use the environment key if this doesn't matter much to you.

Configuring Caddy

Caddy's configuration file is generally pretty straightforward (see docs here). If you decide to make your services available to the internet at large, make sure you set up your services with HTTPS. If your services are only available via a private network (ie TailScale or ZeroTier) you may use HTTP, as you won't be able to procure a Let's Encrypt certificate unless your IP address is publicly facing. This should still be secure, as all traffic is encrypted as it passes through TailScale or ZeroTier.

Here are examples of a couple of Caddyfile server definitions with some useful clauses:

# This will only be accessible to devices within your private network
# thanks to the @remoteUsers matcher definition. Note the http:// prefix
# which tells Caddy not to use SSL.
http://internalservice.mydomain.com {
    @remoteUsers {
        not remote_ip 10.0.1.0/24
    }

    route @remoteUsers {
        respond "Unauthorized" 401
    }

    reverse_proxy containername:3000
}

# This service is publicly accessible. Since we've omitted the http://
# from the definition, Caddy will automatically provision a Let's Encrypt
# cert and ensure that all requests are served using HTTPS.
externalservice.mydomain.com {
    reverse_proxy container2name:3000
}

Configuring borg

The last, and arguably the most important, item on our list is configuring borg. I won't go too in-depth on this, as borg's documentation is comprehensive and very readable. However I would like to cover a couple of the unique ways I use borg in my setup.

For starters, I've chosen to use borg within docker rather than on my host machine. This is mainly to be able to keep versions consistent if I decide to change my home server's distro. The docker-compose.yaml config looks something like this:

  borg:
    image: dannyben/borg-client
    volumes:
      - './volumes:/volumes'
      - './backups:/repo'
      - './config/borg/id_rsa:/etc/id_rsa'
      - './config/borg/known_hosts:/etc/known_hosts'
    environment:
      BORG_REPO: '[repo url/path]'
      BORG_PASSPHRASE: '[encryption passphrase]'
      BORG_RSH: 'ssh -i /etc/id_rsa -o UserKnownHostsFile=/etc/known_hosts'

Notice that we're only mounting and backing up the infra/volumes/ directory. It is assumed the the rest of the infra/ directory is backed up via git.

With the container definition set up, I then have a script in scripts/backup that handles creating a snapshot. It's a bit long so you can view it in its entirety here. The script handles all of the basic tasks needed to create, maintain, and restore from snapshots. My goal with writing this wrapper was to make it as foolproof as possible to backup/restore, as my brain likes to forget things that I'm not using consistently over time. You don't want to be stuck in a situation where you're without all of your files and franticly trying to read through borg documentation to figure out how to get them back.

I'd also recommend setting up a systemd timer to automatically take backups on a regular basis. I have two systemd unit files that I use:

backup.service

[Unit]
Description=Create borg snapshot

[Service]
ExecStart=/path/to/infra/scripts/backup create
WorkingDirectory=/path/to/infra

[Install]
WantedBy=default.target

backup.timer

[Unit]
Description=Periodically backs up infra
Requires=backup.service

[Timer]
Unit=backup.service
OnCalendar=*-*-* 02:00:00

[Install]
WantedBy=default.target

You can find documentation about how to install these systemd files on the Arch wiki (which applies to any systemd-based distro).

If you do end up using PikaPods, please reach out and let me know how the experience is. I'd love to know if they're a service I can recommend more broadly or if it ends up being complicated or buggy.↩
Thanks to Timo Tijhof for the feedback to add this note in.↩