Good moo-ning everybody! 🐮
I feel like there are some issues with the backup and restore script of mailcow. As I cannot find how to start a discussion on github, I’ll post it here ^^
d-c
vs d c
After going from docker-compose
to docker compose
some of the internal templates for names change (a personal contender for the TOP10 ideas of the year 😶🌫️). This change is only partially mirrored in the script.
The queries done with docker volume ls
(see 1 and >= 12 other places) only scan for docker compose
volumes (underscore: <project>_<name>
) and not docker-compose
(dash: <project>-<name>
) ones. There should be a switch. Running instances seem to be fine (ie: mine). But starting with a fresh clone will break things.
Side-quest: Documentation
Docs list volumes in docker-compose
way. Also the statement “take good care of these volumes” is weird to me, because clamd-db-vol-1
, sogo-userdata-backup-vol-1
, sogo-web-vol-1
, and vmail-index-vol-1
are not included in the backup script at all (mysql-socket-vol-1
also isn’t, but that makes sense).
Leaving the side-quest, I feel like the volumes sogo-userdata-backup-vol-1
and sogo-web-vol-1
should be included in the backup-and-restore, because they include user config (sogo-userdata-backup-vol-1
) and possible changes to SoGO (sogo-web-vol-1
). For me, the clamdb and mail index files can be left out in the b-n-r script.
Network on backup
mailcow-backup
joins the docker default network and not the compose network (exception: see 2). Depending on the config of the host, this might even be considered a security problem (I only realized that because I started monitoring the default net, because I don’t run anything on it). Except for 2 I think no network is needed. So --network none
might be a good addition of the docker run
commands (if kept as-is).
Possible race condition on restore
I cannot fully confirm due to not knowing the insides of mailcow too well, but the restore process always just stops the container it is currently working on. I feel like a race condition can occur, when the first services are up again, but other services just receive their data. Primary concern is the DB host that is the last one that is restored (using all
), but postfix and dovecot are already up again at that time.
Code quality and optimizations
There are some issues, including splitting issues with variables that are used as directory names, comparing lowercase’d strings against uppercase chars, 1-item-loop, indentation issues, and not using stderr
for error outputs.
Also, I think that it would be an option to move much of the process to run inside the backup container with a little helper script. That could help with the race condition mentioned above, but also make it easier to streamline the process. I imagine a helper script that, for the backup
part, takes the args just as the current script does, but optimizes the backup, like
CMPS_SPLIT="_"
function backup_docker() {
docker run --name mailcow-backup --rm \
--network $(docker network ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}mailcow-network$) \
-v ${BACKUP_LOCATION}/mailcow-${DATE}:/backup:z \
-v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}vmail-vol-1$):/vmail:ro,z \
-v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}crypt-vol-1$):/crypt:ro,z \
-v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}redis-vol-1$):/redis:ro,z \
-v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}rspamd-vol-1$):/rspamd:ro,z \
-v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}postfix-vol-1$):/postfix:ro,z \
${DEBIAN_DOCKER_IMAGE} /bin/run-backup.sh $1
}
function backup() {
RUN_BACKUPS=()
while (( "$#" )); do
case "$1" in
vmail|all)
RUN_BACKUPS+=( "vmail" )
;;&
crypt|all)
RUN_BACKUPS+=( "crypt" )
;;&
redis|all)
RUN_BACKUPS+=( "redis" )
;;&
rspamd|all)
RUN_BACKUPS+=( "rspamd" )
;;&
postfix|all)
RUN_BACKUPS+=( "postfix" )
;;&
esac
shift
done
if [[ "${#RUN_BACKUPS[@]}" -gt 0 ]]; then
backup_docker "$RUN_BACKUPS[@]"
fi
}
As mentioned, the restore process is always “just” working on one job. Since pigz
is not really good in multithreaded decompression, multiple files could be decompressed in parallel to reduce runtime.
MySQL Backup
The DB backup part scans for mysql
and mariadb
in the compose file, but uses mariabackup
for every operation on the found container. This tool is not available in mysql images - I checked mysql:8
and assume mysqldump
would be a good option to go with.
Also, I see that a --prepare
is done after each backup (see 4). Does this make sense AFTER a backup, but not BEFORE a restore? The maria docs are vague on that (“If you try to restore the database without first preparing the data, InnoDB rejects the new data as corrupt.”), but I feel that the backup is prepared using the available DB as reference. I could totally be wrong here (usually I am 🤷♂️), since I did not use MariaDB for > 8y now… Also, this preparation, the owner change, and the compression are done, even if the backup process fails.
The delete-days logic’s find
can also be optimized not to use any calc (-mtime
) and can make use of the -delete
option to reduce risk of word splitting. Possibly resulting in find "${BACKUP_LOCATION}/mailcow-"* -maxdepth 0 -mtime +${1} -delete
- but -delete
can only handle files and empty dirs, so for older backups the db dumps are not removed, so an -exec rm
might still be needed fro 100% backwards compatibility. The DB backup is also the only one not using pigz
for zipping.
—
conclusion
I would offer to refactor/rewrite the backup and restore script after agreeing with you moo’ists, if all (or at least some, please 🤓) of my findings are correct. Because the script would run though a complete rewrite, I am hesitant to provide a PR before discussion…
And sorry for the long post. 🤷♂️