• Feedback
  • USEnglish
  • Optimizing the backup and restore script

Good moo-ning everybody! ๐Ÿฎ

I feel like there are some issues with the GitHub Icon backup and restore script

of mailcow. As I cannot find how to start a discussion on github, Iโ€™ll post it here ^^

d-c vs d c

After going from docker-compose to docker compose some of the internal templates for names change (a personal contender for the TOP10 ideas of the year ๐Ÿ˜ถโ€๐ŸŒซ๏ธ). This change is only partially mirrored in the script.

The queries done with docker volume ls (GitHub Icon see 1

and >= 12 other places) only scan for docker compose volumes (underscore: <project>_<name>) and not docker-compose(dash: <project>-<name>) ones. There should be a switch. Running instances seem to be fine (ie: mine). But starting with a fresh clone will break things.

Side-quest: Documentation

docs.mailcow.email Icon Docs

list volumes in docker-compose way. Also the statement โ€œtake good care of these volumesโ€ is weird to me, because clamd-db-vol-1, sogo-userdata-backup-vol-1, sogo-web-vol-1, and vmail-index-vol-1 are not included in the backup script at all (mysql-socket-vol-1 also isnโ€™t, but that makes sense).

Leaving the side-quest, I feel like the volumes sogo-userdata-backup-vol-1 and sogo-web-vol-1 should be included in the backup-and-restore, because they include user config (sogo-userdata-backup-vol-1) and possible changes to SoGO (sogo-web-vol-1). For me, the clamdb and mail index files can be left out in the b-n-r script.

Network on backup

mailcow-backup joins the docker default network and not the compose network (exception: GitHub Icon see 2

). Depending on the config of the host, this might even be considered a security problem (I only realized that because I started monitoring the default net, because I donโ€™t run anything on it). Except for GitHub Icon 2
GitHub Icon GitHub
mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e ยท mailcow/mailcow-dockerized
mailcow: dockerized - ๐Ÿฎ + ๐Ÿ‹ = ๐Ÿ’•. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.
mailcow: dockerized - ๐Ÿฎ + ๐Ÿ‹ = ๐Ÿ’•. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.
I think no network is needed. So --network none might be a good addition of the docker run commands (if kept as-is).

Possible race condition on restore

I cannot fully confirm due to not knowing the insides of mailcow too well, but the restore process always just stops the container it is currently working on. I feel like a race condition can occur, when the first services are up again, but other services just receive their data. Primary concern is the DB host that is the last one that is restored (using all), but postfix and dovecot are already up again at that time.

Code quality and optimizations

There are some issues, including splitting issues with variables that are GitHub Icon used as directory names

, comparing GitHub Icon lowercaseโ€™d strings against uppercase chars
GitHub Icon GitHub
mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e ยท mailcow/mailcow-dockerized
mailcow: dockerized - ๐Ÿฎ + ๐Ÿ‹ = ๐Ÿ’•. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.
mailcow: dockerized - ๐Ÿฎ + ๐Ÿ‹ = ๐Ÿ’•. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.
, GitHub Icon 1-item-loop
GitHub Icon GitHub
mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e ยท mailcow/mailcow-dockerized
mailcow: dockerized - ๐Ÿฎ + ๐Ÿ‹ = ๐Ÿ’•. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.
mailcow: dockerized - ๐Ÿฎ + ๐Ÿ‹ = ๐Ÿ’•. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.
, indentation issues, and not using stderr for error outputs.

Also, I think that it would be an option to move much of the process to run inside the backup container with a little helper script. That could help with the race condition mentioned above, but also make it easier to streamline the process. I imagine a helper script that, for the backup part, takes the args just as the current script does, but optimizes the backup, like

CMPS_SPLIT="_"
function backup_docker() {
    docker run --name mailcow-backup --rm \
        --network $(docker network ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}mailcow-network$) \
        -v ${BACKUP_LOCATION}/mailcow-${DATE}:/backup:z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}vmail-vol-1$):/vmail:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}crypt-vol-1$):/crypt:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}redis-vol-1$):/redis:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}rspamd-vol-1$):/rspamd:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}postfix-vol-1$):/postfix:ro,z \
        ${DEBIAN_DOCKER_IMAGE} /bin/run-backup.sh $1
}

function backup() {

  RUN_BACKUPS=()
  while (( "$#" )); do
    case "$1" in
    vmail|all)
      RUN_BACKUPS+=( "vmail" )
      ;;&
    crypt|all)
      RUN_BACKUPS+=( "crypt" )
      ;;&
    redis|all)
      RUN_BACKUPS+=( "redis" )
      ;;&
    rspamd|all)
      RUN_BACKUPS+=( "rspamd" )
      ;;&
    postfix|all)
      RUN_BACKUPS+=( "postfix" )
      ;;&
    esac
    shift
  done

  if [[ "${#RUN_BACKUPS[@]}" -gt 0 ]]; then
    backup_docker "$RUN_BACKUPS[@]"
  fi
}

As mentioned, the restore process is always โ€œjustโ€ working on one job. Since pigz is not really good in GitHub Icon multithreaded decompression

, multiple files could be decompressed in parallel to reduce runtime.

MySQL Backup

The DB backup part scans for mysql and mariadb in GitHub Icon the compose file

, but uses mariabackup for every operation on the found container. This tool is not available in mysql images - I checked mysql:8 and assume mysqldump would be a good option to go with.

Also, I see that a --prepare is done after each backup (GitHub Icon see 4

). Does this make sense AFTER a backup, but not BEFORE a restore? The maria docs are vague on that (โ€œIf you try to restore the database without first preparing the data, InnoDB rejects the new data as corrupt.โ€), but I feel that the backup is prepared using the available DB as reference. I could totally be wrong here (usually I am ๐Ÿคทโ€โ™‚๏ธ), since I did not use MariaDB for > 8y nowโ€ฆ Also, this preparation, the owner change, and the compression are done, even if the backup process fails.

The delete-days logicโ€™s find can also be optimized not to use any calc (-mtime) and can make use of the -delete option to reduce risk of word splitting. Possibly resulting in find "${BACKUP_LOCATION}/mailcow-"* -maxdepth 0 -mtime +${1} -delete - but -delete can only handle files and empty dirs, so for older backups the db dumps are not removed, so an -exec rm might still be needed fro 100% backwards compatibility. The DB backup is also the only one not using pigz for zipping.

โ€”

conclusion

I would offer to refactor/rewrite the backup and restore script after agreeing with you mooโ€™ists, if all (or at least some, please ๐Ÿค“) of my findings are correct. Because the script would run though a complete rewrite, I am hesitant to provide a PR before discussionโ€ฆ

And sorry for the long post. ๐Ÿคทโ€โ™‚๏ธ

15 days later

Okay, that discussion didnโ€™t quite work out. ๐Ÿ˜ญ

As there seemed not to be that much interest in the bโ€™nโ€™r script, I contacted the server cow team about my findings and ideas. Iโ€™m now in the process of implementing a new backup and restore script. Further discussion (๐Ÿซฃ) will take place in a github issue I open once there is a version available to look at.

Have something to say?

Join the community by quickly registering to participate in this discussion. We'd like to see you joining our great moo-community!

I guess most people neglect backup and the even more important restore (you can see it here in the forums ๐Ÿ˜‰

Others (like me) use probably a different backup solution, e.g. I have snapshots and Veeam Backup running.

No one is typing