Optimizing the backup and restore script

boppy · Nov 11, 2023

Good moo-ning everybody!

I feel like there are some issues with the backup and restore script

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at master · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

of mailcow. As I cannot find how to start a discussion on github, I’ll post it here ^^

`d-c` vs `d c`

After going from docker-compose to docker compose some of the internal templates for names change (a personal contender for the TOP10 ideas of the year ). This change is only partially mirrored in the script.

The queries done with docker volume ls ( see 1

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

and >= 12 other places) only scan for docker compose volumes (underscore: <project>_<name>) and not docker-compose(dash: <project>-<name>) ones. There should be a switch. Running instances seem to be fine (ie: mine). But starting with a fresh clone will break things.

Side-quest: Documentation

Docs

docs.mailcow.email

Information & Support

None

list volumes in docker-compose way. Also the statement “take good care of these volumes” is weird to me, because clamd-db-vol-1, sogo-userdata-backup-vol-1, sogo-web-vol-1, and vmail-index-vol-1 are not included in the backup script at all (mysql-socket-vol-1 also isn’t, but that makes sense).

Leaving the side-quest, I feel like the volumes sogo-userdata-backup-vol-1 and sogo-web-vol-1 should be included in the backup-and-restore, because they include user config (sogo-userdata-backup-vol-1) and possible changes to SoGO (sogo-web-vol-1). For me, the clamdb and mail index files can be left out in the b-n-r script.

Network on backup

mailcow-backup joins the docker default network and not the compose network (exception: see 2

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

). Depending on the config of the host, this might even be considered a security problem (I only realized that because I started monitoring the default net, because I don’t run anything on it). Except for

2

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

I think no network is needed. So --network none might be a good addition of the docker run commands (if kept as-is).

Possible race condition on restore

I cannot fully confirm due to not knowing the insides of mailcow too well, but the restore process always just stops the container it is currently working on. I feel like a race condition can occur, when the first services are up again, but other services just receive their data. Primary concern is the DB host that is the last one that is restored (using all), but postfix and dovecot are already up again at that time.

Code quality and optimizations

There are some issues, including splitting issues with variables that are used as directory names

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

, comparing

lowercase’d strings against uppercase chars

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

,

1-item-loop

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

, indentation issues, and not using stderr for error outputs.

Also, I think that it would be an option to move much of the process to run inside the backup container with a little helper script. That could help with the race condition mentioned above, but also make it easier to streamline the process. I imagine a helper script that, for the backup part, takes the args just as the current script does, but optimizes the backup, like

CMPS_SPLIT="_"
function backup_docker() {
    docker run --name mailcow-backup --rm \
        --network $(docker network ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}mailcow-network$) \
        -v ${BACKUP_LOCATION}/mailcow-${DATE}:/backup:z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}vmail-vol-1$):/vmail:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}crypt-vol-1$):/crypt:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}redis-vol-1$):/redis:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}rspamd-vol-1$):/rspamd:ro,z \
        -v $(docker volume ls -qf name=^${CMPS_PRJ}${CMPS_SPLIT}postfix-vol-1$):/postfix:ro,z \
        ${DEBIAN_DOCKER_IMAGE} /bin/run-backup.sh $1
}

function backup() {

  RUN_BACKUPS=()
  while (( "$#" )); do
    case "$1" in
    vmail|all)
      RUN_BACKUPS+=( "vmail" )
      ;;&
    crypt|all)
      RUN_BACKUPS+=( "crypt" )
      ;;&
    redis|all)
      RUN_BACKUPS+=( "redis" )
      ;;&
    rspamd|all)
      RUN_BACKUPS+=( "rspamd" )
      ;;&
    postfix|all)
      RUN_BACKUPS+=( "postfix" )
      ;;&
    esac
    shift
  done

  if [[ "${#RUN_BACKUPS[@]}" -gt 0 ]]; then
    backup_docker "$RUN_BACKUPS[@]"
  fi
}

As mentioned, the restore process is always “just” working on one job. Since pigz is not really good in multithreaded decompression

GitHub

Decompressing with pigz is not parallel · Issue #36 · madler/pigz

If I decompress a file with pigz -d foo.gz it only uses one core, on my multicore system. Compressing uses all my cores, but decompressing does not. Is this by design? I'm trying to compare decompr...

, multiple files could be decompressed in parallel to reduce runtime.

MySQL Backup

The DB backup part scans for mysql and mariadb in the compose file

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

, but uses mariabackup for every operation on the found container. This tool is not available in mysql images - I checked mysql:8 and assume mysqldump would be a good option to go with.

Also, I see that a --prepare is done after each backup ( see 4

GitHub

mailcow-dockerized/helper-scripts/backup_and_restore.sh at a366494c3492157947b494d89c7839bfd1c8f09e · mailcow/mailcow-dockerized

mailcow: dockerized - 🐮 + 🐋 = 💕. Contribute to mailcow/mailcow-dockerized development by creating an account on GitHub.

). Does this make sense AFTER a backup, but not BEFORE a restore? The maria docs are vague on that (“If you try to restore the database without first preparing the data, InnoDB rejects the new data as corrupt.”), but I feel that the backup is prepared using the available DB as reference. I could totally be wrong here (usually I am

), since I did not use MariaDB for > 8y now… Also, this preparation, the owner change, and the compression are done, even if the backup process fails.

The delete-days logic’s find can also be optimized not to use any calc (-mtime) and can make use of the -delete option to reduce risk of word splitting. Possibly resulting in find "${BACKUP_LOCATION}/mailcow-"* -maxdepth 0 -mtime +${1} -delete - but -delete can only handle files and empty dirs, so for older backups the db dumps are not removed, so an -exec rm might still be needed fro 100% backwards compatibility. The DB backup is also the only one not using pigz for zipping.

—

conclusion

I would offer to refactor/rewrite the backup and restore script after agreeing with you moo’ists, if all (or at least some, please ) of my findings are correct. Because the script would run though a complete rewrite, I am hesitant to provide a PR before discussion…

And sorry for the long post.

boppy · Nov 26, 2023

Okay, that discussion didn’t quite work out.

As there seemed not to be that much interest in the b’n’r script, I contacted the server cow team about my findings and ideas. I’m now in the process of implementing a new backup and restore script. Further discussion () will take place in a github issue I open once there is a version available to look at.

esackbauer · Nov 27, 2023

I guess most people neglect backup and the even more important restore (you can see it here in the forums

Others (like me) use probably a different backup solution, e.g. I have snapshots and Veeam Backup running.

d-c vs d c

Side-quest: Documentation

Network on backup

Possible race condition on restore

Code quality and optimizations

MySQL Backup

conclusion

`d-c` vs `d c`