So the mail server has been up and running for months and I haven’t touched it since the last mailcow update came out.
Today I get a monitoring message saying the server is down. I checked the bare metal and its online and functioning as expected however docker isn’t running.
If i try running the mailcow update it outputs the following.
./update.sh
Detecting if your IP is listed on Spamhaus Bad ASN List...
Check completed! Your IP is clean
Checking internet connection... OK
Detecting which build your mailcow runs on...
You are receiving stable updates (master).
To change that run the update.sh Script one time with the --nightly parameter to switch to nightly builds.
Checking for newer update script...
Updated 0 paths from b177975b
Are you sure you want to update mailcow: dockerized? All containers will be stopped. [y/N] y
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Please upgrade Docker to version 20.10.2 or above.
Validating docker-compose stack configuration...
Checking for conflicting bridges...
Saving diff to update_diffs/diff_before_update_2024-06-08-17-17-37...
Prefetching images...
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Error pulling mailcow/unbound:1.21, retrying...
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Error pulling mailcow/unbound:1.21, retrying...
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Error pulling mailcow/unbound:1.21, retrying...
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Error pulling mailcow/unbound:1.21, retrying...
Too many failed retries, exiting
Interesting bit is in bold
# docker version
Client: Docker Engine - Community
Version: **26.1.4**
API version: 1.45
Go version: go1.21.11
Git commit: 5650f9b
Built: Wed Jun 5 11:28:57 2024
OS/Arch: linux/amd64
Context: default
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Client: Docker Engine - Community
Version: 26.1.4
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.14.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.27.1
Path: /usr/libexec/docker/cli-plugins/docker-compose
scan: Docker Scan (Docker Inc.)
Version: v0.23.0
Path: /usr/libexec/docker/cli-plugins/docker-scan
Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info```
So I thought I'd run dockerd in live mode rather than systemd and that fails too
```# dockerd
INFO[2024-06-08T17:19:42.816947033Z] Starting up
WARN[2024-06-08T17:19:42.817002859Z] Running experimental build
INFO[2024-06-08T17:19:42.817982678Z] detected 127.0.0.53 nameserver, assuming systemd-resolved, so using resolv.conf: /run/systemd/resolve/resolv.conf
INFO[2024-06-08T17:19:56.168336375Z] [graphdriver] using prior storage driver: overlay2
ERRO[2024-06-08T17:19:56.190270154Z] Failed to get event error="rpc error: code = Unavailable desc = error reading from server: EOF" module=libcontainerd namespace=plugins.moby
INFO[2024-06-08T17:19:56.190345659Z] Waiting for containerd to be ready to restart event processing module=libcontainerd namespace=plugins.moby
INFO[2024-06-08T17:19:56.209868913Z] Loading containers: start.
ERRO[2024-06-08T17:19:56.210374969Z] Failed to get event error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\"" module=libcontainerd namespace=moby
INFO[2024-06-08T17:19:56.210431627Z] Waiting for containerd to be ready to restart event processing module=libcontainerd namespace=moby
ERRO[2024-06-08T17:19:56.240124266Z] failed to restore container with containerd container=d16720d852e02d1dc2a579ef726d96d245aeabc5f93ed2348ba35d43a5fdc412 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.240960855Z] failed to restore container with containerd container=bb42af5f284874c763c7cbb11307dc2eabed0e246f132c3a27830776425f42c1 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241308259Z] failed to restore container with containerd container=13be226b1e5d6cb4d7495f8c7cc697908b92800c328a2fc72a2cdbe6aec30a67 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241324807Z] failed to restore container with containerd container=e153b1b955dedd9fdc6a865027653f10c46ae30003e49d642b4272232ac85a73 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241418322Z] failed to restore container with containerd container=69860de32902b413f22a1f706ddb40303a4ea261727852e0105f21377b55fbb1 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241429977Z] failed to restore container with containerd container=8032412066c61ceb1fd003c7e47bdc136beb78969dd97af353eba9e264530ece error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241469030Z] failed to restore container with containerd container=7a7d235a13c24b42c86e360a190c0c90fe9b851246df5232e402dd3b32842f1f error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241529966Z] failed to restore container with containerd container=957e9facf3926c0e6ea736708fd2533fc4cfbfed5fd135f5cca55dc79da06d4f error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241533329Z] failed to restore container with containerd container=1048f05f9d4644395d430d51f750935ae9a9d78f9a6df29d83723b4d6bc2aef6 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241610710Z] failed to restore container with containerd container=15854ae0c6b08541747f707d74a8887e58d5aea378585765f392eb95ed0bd4c9 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.241686796Z] failed to restore container with containerd container=5877eca56dc3caafe39463711e38cdb7f18e87ceab997b05e9ebefba7777bd41 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243000210Z] failed to restore container with containerd container=6453ab8900c078a9e8a0eb1475a699513d9765d1d4b37b107629e2c7920aa0fd error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243067926Z] failed to restore container with containerd container=a47be2e701676c3a5a4b4514f20198d9210825b1ae48f9e58167f0e65af147b2 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243130264Z] failed to restore container with containerd container=da7250d57b743fb813ae4998478c86481127961dad9157f628e0909361902a6d error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243206062Z] failed to restore container with containerd container=b226099e4d7fda62b9ff27da8dc2fa8967e2d149d8cc5983d4001df47553c038 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243320461Z] failed to restore container with containerd container=5ddddacfa9e084dcd60f17050533c40e8e83f83d1cc4def92e00a0f6ec0dea96 error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243367897Z] failed to restore container with containerd container=bdc3bb6991568cd17becd977f3d6e17fa9c027ae2988ca3d7301ee001d0943cb error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
ERRO[2024-06-08T17:19:56.243411530Z] failed to restore container with containerd container=0619a616091f726ae6198d82ca0f54e2ee938d9e0809ff820c5b3207bca5c5ed error="connection error: desc = \"transport: Error while dialing: dial unix /run/containerd/containerd.sock: connect: connection refused\": unavailable" paused=false restarting=false running=false
panic: invalid freelist page: 20, page type is leaf
goroutine 1 [running, locked to thread]:
go.etcd.io/bbolt.(*freelist).read(0x0?, 0x7f63f053d000)
/root/build-deb/engine/vendor/go.etcd.io/bbolt/freelist.go:267 +0x20e
go.etcd.io/bbolt.(*DB).loadFreelist.func1()
/root/build-deb/engine/vendor/go.etcd.io/bbolt/db.go:420 +0xb7
sync.(*Once).doSlow(0x5633363443e5?, 0x56333696efe0?)
/usr/local/go/src/sync/once.go:74 +0xbf
sync.(*Once).Do(...)
/usr/local/go/src/sync/once.go:65
go.etcd.io/bbolt.(*DB).loadFreelist(0xc000e3c480?)
/root/build-deb/engine/vendor/go.etcd.io/bbolt/db.go:413 +0x45
go.etcd.io/bbolt.Open({0xc001054ed0, 0x29}, 0x3634b71e?, 0xc0013f1298)
/root/build-deb/engine/vendor/go.etcd.io/bbolt/db.go:295 +0x430
github.com/docker/docker/libnetwork/internal/kvstore/boltdb.New({0xc001054ed0, 0x29}, 0xc00105e870)
/root/build-deb/engine/libnetwork/internal/kvstore/boltdb/boltdb.go:50 +0x105
github.com/docker/docker/libnetwork/datastore.newClient({0x5633380f58f7?, 0x563336344065?}, {0xc001054ed0?, 0x30?}, 0x30?)
/root/build-deb/engine/libnetwork/datastore/datastore.go:131 +0x96
github.com/docker/docker/libnetwork/datastore.New({{{0x5633380f58f7, 0x6}, {0xc001054ed0, 0x29}, 0xc00105e870}})
/root/build-deb/engine/libnetwork/datastore/datastore.go:145 +0xe9
github.com/docker/docker/libnetwork.(*Controller).initStores(0xc0012322a0)
/root/build-deb/engine/libnetwork/store.go:18 +0x66
github.com/docker/docker/libnetwork.New({0xc000042700, 0x8, 0xe})
/root/build-deb/engine/libnetwork/controller.go:119 +0x238
github.com/docker/docker/daemon.(*Daemon).initNetworkController(0xc000af2500, 0x1000?, 0xc000c14900)
/root/build-deb/engine/daemon/daemon_unix.go:841 +0x4e
github.com/docker/docker/daemon.(*Daemon).restore(0xc000af2500, 0xc0001f3180)
/root/build-deb/engine/daemon/daemon.go:574 +0x67b
github.com/docker/docker/daemon.NewDaemon({0x563338d1e838?, 0xc0000f16d0}, 0xc001107b80, 0xc000c760c0, 0xc000ad61a0)
/root/build-deb/engine/daemon/daemon.go:1220 +0x349a
main.(*DaemonCli).start(0xc00099c940, 0xc00020dd00)
/root/build-deb/engine/cmd/dockerd/daemon.go:264 +0xfa5
main.runDaemon(...)
/root/build-deb/engine/cmd/dockerd/docker_unix.go:13
main.newDaemonCommand.func1(0xc000061300?, {0x56333a330f80?, 0x7?, 0x5633380f1432?})
/root/build-deb/engine/cmd/dockerd/docker.go:37 +0x94
github.com/spf13/cobra.(*Command).execute(0xc0008fe000, {0xc0000520b0, 0x0, 0x0})
/root/build-deb/engine/vendor/github.com/spf13/cobra/command.go:983 +0xabc
github.com/spf13/cobra.(*Command).ExecuteC(0xc0008fe000)
/root/build-deb/engine/vendor/github.com/spf13/cobra/command.go:1115 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
/root/build-deb/engine/vendor/github.com/spf13/cobra/command.go:1039
main.main()
/root/build-deb/engine/cmd/dockerd/docker.go:106 +0x17b
The primary error event seems to be this one
rpc error: code = Unavailable desc = error reading from server: EOF" module=libcontainerd namespace=plugins.moby
But I’m confused as nothing has changed since the last known good/running version of anything there’s been no updates its like it just randomly stopped and I’ll be damned if I can work out how to fix it, there’s lots of “similar” issues on the old net but I can’t find anything similar to this.
Any suggestions/clues/things I can look at that may help me further debug/resolve as right now all my mail is offline 🙁