Skip to content

Nomad Orchestrator

Tags: homelab, nomad


Since January I have been running Nomad to power most of my Homelab, it has been a great experience and the tool has helped me move from a “hands on each machine” mindset into a more cloud oriented IaC mindset. Currently in Nomad I am running 12 services, 2 of which are the Traefik instances proxying my those services on each of my two networks (Internal and DMZ). I am running 3 nodes in my “datacenter” which are an “internal” VM, a “DMZ” VM, and a Raspberry Pi which is also operating as the NAS that stores all data in this cluster.

In ages long past, my Docker (Darker) times, I used the JWilder nginx-proxy and letsencrypt-proxy-companion, then I moved onto my own solution Nginx-Certbot to automatically handle the LetsEncrypt certificates required in my environments, you can read more about that change in another blog post.

With this new paradigm of Nomad I was no longer able to rely on the “known host” idea behind my own Nginx-Certbot solution, whenver a so I had to look at alternatives. Despite all the negative feelings I had for Traefik in my previous blog post I revisited it in the context of Nomad and was pleasantly surprised to find the IaC capabilities I had either missed or not been able to capitalize on in my past endeavors. Through Nomad’s configurations I am able to provision Traefik 100% through code, and do not enable any of its unnecessary services like their admin panel and monitoring.

Below is an example of how I am running Bitwarden on my DMZ node and through my DMZ Traefik instance, through node pools I am able to place my services on the proper nodes which are all on their respective VLANs. Currently I am just storing these in a git repo and updating the jobs by hand when I get an update, but I plan to hook CICD into my jobs repo and deploy updates by just updating the hcl file for any job in the repo.

A nice part of Nomad is the resource allocation, by default it allocates 100mhz of CPU and 300MB of RAM to every container. It pre-allocates these resources and if a client is starved Nomad won’t deploy any jobs to that client. Below you can see I reduce that default allocation for my Traefik instances but increase it for Bitwarden. A little song and dance I had to figure out with Immich specifically was that the DB would use ~64MB while running but when I updated the Immich image version the database migration would spike RAM usage over 1GB. To alleviate that I started declaring “max_memory” which would not impact how much RAM was required for pre-allocation but would let my DB perform a migration when required.

I’m running Backrest off a Pi4 for backups, this seems to be a great tool with a nice web UI so far, the Pi is also being used as a NAS sharing out my local SMB shares for basic files and NFS shares that are mounted on each Nomad node. As seen below, I am storing all data in /opt/nomad_data which is an NFS mounted point. This has been working fine for the most part, but heavy IO tasks can falter. I am not able to run Kavita off the NAS shares because of its requirement to use SQLite which apparently cannot run over any sort of network share. Postgres and other proper databases work just fine over the shares so I will keep running in this configuration for the foreseeable future.

variable "bitwarden_tag" {
type = string
default = "2026.2.1"
}
job "bitwarden" {
node_pool = "dmz"
constraint {
attribute = "${attr.kernel.name}"
operator = "="
value = "linux"
}
constraint {
attribute = "${attr.cpu.arch}"
operator = "="
value = "amd64"
}
group "containers" {
task "bitwarden" {
driver = "podman"
config {
image = "ghcr.io/bitwarden/lite:${var.bitwarden_tag}"
volumes = [
"/opt/nomad_data/bitwarden:/etc/bitwarden:z"
]
}
resources {
memory = 1024
}
env = {
BW_DOMAIN = ""
BW_DB_PROVIDER = "sqlite"
BW_INSTALLATION_ID = ""
BW_INSTALLATION_KEY = ""
}
service {
name = "bitwarden"
tags = [
"pool:${node.pool}",
"traefik.enable=true",
"traefik.http.routers.bitwarden.rule=Host(``)",
"traefik.http.routers.bitwarden.tls=true",
"traefik.http.routers.bitwarden.tls.certresolver=le",
"traefik.http.routers.bitwarden.entrypoints=web,websecure",
"traefik.http.services.bitwarden.loadbalancer.server.port=8080"
]
}
}
}
}
variable "traefik_tag" {
type = string
default = "v3.6.10"
}
job "traefik-dmz" {
node_pool = "dmz"
constraint {
attribute = "${attr.kernel.name}"
operator = "="
value = "linux"
}
constraint {
attribute = "${attr.cpu.arch}"
operator = "="
value = "amd64"
}
group "containers" {
network {
port "http" { static = 80 }
port "https" { static = 443 }
}
task "traefik" {
driver = "podman"
resources {
memory = 128
}
config {
image = "docker.io/traefik:${var.traefik_tag}"
args = [
"--providers.consulcatalog=true",
"--providers.consulcatalog.constraints=Tag(`pool:dmz`)",
"--providers.consulcatalog.endpoint.address=10.10.10.10:8500",
"--entryPoints.web.address=:80",
"--entryPoints.websecure.address=:443",
"--entryPoints.web.transport.respondingTimeouts.readTimeout=600s",
"--entryPoints.web.transport.respondingTimeouts.writeTimeout=600s",
"--entryPoints.web.transport.respondingTimeouts.idleTimeout=600s",
"--entryPoints.websecure.transport.respondingTimeouts.readTimeout=600s",
"--entryPoints.websecure.transport.respondingTimeouts.writeTimeout=600s",
"--entryPoints.websecure.transport.respondingTimeouts.idleTimeout=600s",
"--certificatesresolvers.le.acme.email=",
"--certificatesresolvers.le.acme.storage=/acme.json",
"--certificatesresolvers.le.acme.httpchallenge.entrypoint=web"
]
volumes = ["/opt/nomad_data/traefik-data/acme.json:/acme.json:z"]
ports = ["http", "https"]
}
}
}
}