My approach to a reliable self-hosting setup using Docker.
Introduction
This guide is meant for the more technically adventurous amongst us. That is, those who are willing to get their hands dirty in some basic devops work in order to self-host applications on servers they own or control.
If the idea of self-hosting is intimidating to you, fear not! There are many paid hosting services that can abstract much of this complexity away from you. While I haven't tried it myself, Pikapods looks like an interesting service that can automate the deployment of self-hosted applications.1 The service is from the creator of a backup service I've used for a while now and been happy with: Borgbase.
For those of you who are still interested in the DIY method, read on!
All of the self-hosted services I use are hosted on a single server using a docker-compose file. After years of playing around with various ways of hosting applications I've settled on this one as a reasonably simple, stable, secure, and repeatable process for keeping the services I rely on up and running. Your preferences and goals may vary from mine and I would encourage you to see this guide as a starting point. I'd also love feedback and suggestions. If you find ways to improve this framework, please reach out.
Lastly, there are a few places on this page where I hand waive over the details and recommend reading the documentation. I realize that isn't always the most helpful, so if you have questions or want elaboration on anything, please don't hesitate to send me an email!
Overview
infra/
The general framework for my docker setup is a single folder called infra/
that's laid out like this:
infra/
volumes/
photoprism/
plex/
freshrss/
...
config/
caddy/
freshrss/
scripts/
update
backup
systemd/
backup.service
backup.timer
docker-compose.yaml
Brief descriptions of what each directory/file does:
volumes/
is for storing the bigger volumes of data associated with each service. Some examples:- * A directory like
photoprism/
would have your entire photo library within it. - *
plex/
has all of your media libraries - *
freshrss/
contains the database used by FreshRSS to manage your RSS feeds. config/
stores all of the text-based configuration files for each service you run. These configuration files should be small enough that you're comfortable managing them in git.- * Each config file (or directory) will be teleported into each service's docker container via a volume mount later on.
scripts/
stores shell scripts that make basic maintenance tasks like backing up and updating container images quick and easy.systemd/
contains any systemd unit files that may be necessary for running scheduled tasks (eg daily backups).docker-compose.yaml
defines all of the services/applications and configures the various environment variables, volume mounts, etc. Everything you'd expect docker to handle. More on this below.
The idea of this framework is that everything necessary to run your self-hosted services live within this infra/
directory. It's all self-contained. This setup has a few advantages:
- It's easy to know where everything is. You don't need to worry about backing up configuration files hidden amongst other system configurations in a shared
/etc
directory. - All of your configuration can be tracked with git. If you add
volumes/
to your.gitignore
you'll have a small text-only repository with a change history to ensure you can never go too wrong in your experimentation. - Backups and restores are very simple. The backup tool only really needs to keep track of
volumes/
, where all of your data lives. Restoring is a matter of cloning theinfra/
repository and restoring thevolumes/
directory from the backup tool.
Accessing web services
Once your docker containers start spinning up, you'll next need to figure out how to securely access any web interfaces. You'll generally want to access these via some convenient domain; something like photoprism.yourdomain.com
. Doing this will require using a publicly accessible web server that can reverse proxy requests back to your containers.
Nginx is a popular webserver used for this task, however I found its configuration and maintenance requirements to be overly complicated for the needs of a home server. Instead I've found success in using Caddy. It's open-source, has a much simplified configuration format, and has all of the bells and whistles necessary to securely host home services.
Additionally, I recommend locking down your services even further by using a virtual networking service like TailScale or ZeroTier. You can combine the private networks created by either of these services with rules in Caddy to ensure that only devices you trust (and that exist within your private network) can access services like your photo library or file server. This is helpful for reducing the surface area available for potential hackers to exploit. If you go down this path you can still use custom domains, just make sure they point to the IP address within your private network rather than your server's publicly facing IP.
Backups
As mentioned above, a major convenience of this setup is the ease of creating backup images and restoring from them. A full restoration consists of re-cloning the infra/
repository, restoring the contents of the infra/volumes
directory, and running docker-compose up
on each of your containers.
I use borg, a simple command-line backup utility and BorgBase as a host for storing my backup images. I've landed on this solution for a few reasons:
- Borg has a relatively simple command-line interface that is easy to understand and build scripts around.
- It deduplicates data to ensure that the backup repository doesn't balloon in size over time.
- Borg also allows for backups to be encrypted on-device before being uploaded to a remote host. This means that my backup images are safe even if I don't trust the hosting service I'm uploading them to.
- Borgbase was the cheapest option when I was browsing alternatives but, most importantly, has a feature to send an alert if a new backup hasn't been received in over N days.
I create backups daily and keep a few weeks of daily snapshots, a few months of monthly snapshots, and a yearly snapshot (see borg prune
for details on how to set this up).
TODO: Switch to borgmatic for configuring backups. As of now I'm backing up Postgres databases via their data directory. The official Postgres docs explain why this is a bad idea, and pg_dump
should be preferred.
Nitty Gritty Details
Up until this point, I've tried to avoid turning this page into a mess of configuration files and technical details. In general, I'd prefer to provide a high-level overview of how things work here and leave the details up to the reader. Each of these projects' documentation will always be better written and more up-to-date than what I can provide here.
That being said, there are a few small but important tricks I've landed on within some of my configurations that I think are worth sharing here
Configuring Docker
A typical service in my docker-compose.yaml
looks like this:
plex:
image: plexinc/pms-docker
restart: always
container_name: plex
env_file: ./env/plex.env
environment:
- TZ="America/New_York"
- PLEX_UID=1010
- PLEX_GID=1010
volumes:
- ./volumes/plex/config:/config
- ./volumes/plex/transcode:/transcode
- ./volumes/plex/Music:/data/music
ports:
...
A major note is that, when possible, I try to specify a UID
and GID
parameter to the process in order to ensure that the service is running as a user/group that my linux user has permission to access. If you don't set these parameters explicitly you can end up in situations where all of the files in your ./volumes
directory are inaccessible to you (a pain if you want to modify something manually) or inaccessible to the container's process (a pain if you want to upload files into a container's volume).
Adding restart: always
will also save you time if your application's process crashes or if your server goes down. See the docs for the details on the various options available here, but you'll generally want something like this to prevent downtime.2
I also prefer splitting out most environment variables into separate .env
files, however you can also just use the environment
key if this doesn't matter much to you.
Configuring Caddy
Caddy's configuration file is generally pretty straightforward (see docs here). If you decide to make your services available to the internet at large, make sure you set up your services with HTTPS. If your services are only available via a private network (ie TailScale or ZeroTier) you may use HTTP, as you won't be able to procure a Let's Encrypt certificate unless your IP address is publicly facing. This should still be secure, as all traffic is encrypted as it passes through TailScale or ZeroTier.
Here are examples of a couple of Caddyfile server definitions with some useful clauses:
# This will only be accessible to devices within your private network
# thanks to the @remoteUsers matcher definition. Note the http:// prefix
# which tells Caddy not to use SSL.
http://internalservice.mydomain.com {
@remoteUsers {
not remote_ip 10.0.1.0/24
}
route @remoteUsers {
respond "Unauthorized" 401
}
reverse_proxy containername:3000
}
# This service is publicly accessible. Since we've omitted the http://
# from the definition, Caddy will automatically provision a Let's Encrypt
# cert and ensure that all requests are served using HTTPS.
externalservice.mydomain.com {
reverse_proxy container2name:3000
}
Configuring borg
The last, and arguably the most important, item on our list is configuring borg. I won't go too in-depth on this, as borg's documentation is comprehensive and very readable. However I would like to cover a couple of the unique ways I use borg in my setup.
For starters, I've chosen to use borg within docker rather than on my host machine. This is mainly to be able to keep versions consistent if I decide to change my home server's distro. The docker-compose.yaml
config looks something like this:
borg:
image: dannyben/borg-client
volumes:
- './volumes:/volumes'
- './backups:/repo'
- './config/borg/id_rsa:/etc/id_rsa'
- './config/borg/known_hosts:/etc/known_hosts'
environment:
BORG_REPO: '[repo url/path]'
BORG_PASSPHRASE: '[encryption passphrase]'
BORG_RSH: 'ssh -i /etc/id_rsa -o UserKnownHostsFile=/etc/known_hosts'
Notice that we're only mounting and backing up the infra/volumes/
directory. It is assumed the the rest of the infra/
directory is backed up via git.
With the container definition set up, I then have a script in scripts/backup
that handles creating a snapshot. It's a bit long so you can view it in its entirety here. The script handles all of the basic tasks needed to create, maintain, and restore from snapshots. My goal with writing this wrapper was to make it as foolproof as possible to backup/restore, as my brain likes to forget things that I'm not using consistently over time. You don't want to be stuck in a situation where you're without all of your files and franticly trying to read through borg documentation to figure out how to get them back.
I'd also recommend setting up a systemd timer to automatically take backups on a regular basis. I have two systemd unit files that I use:
backup.service
[Unit]
Description=Create borg snapshot
[Service]
ExecStart=/path/to/infra/scripts/backup create
WorkingDirectory=/path/to/infra
[Install]
WantedBy=default.target
backup.timer
[Unit]
Description=Periodically backs up infra
Requires=backup.service
[Timer]
Unit=backup.service
OnCalendar=*-*-* 02:00:00
[Install]
WantedBy=default.target
You can find documentation about how to install these systemd files on the Arch wiki (which applies to any systemd-based distro).
- If you do end up using PikaPods, please reach out and let me know how the experience is. I'd love to know if they're a service I can recommend more broadly or if it ends up being complicated or buggy.↩
- Thanks to Timo Tijhof for the feedback to add this note in.↩