Backup is the procedure that copies data from a primary server (which is running production service) to a backup server.
Backup is an important procedure to keep data safe. The basic backup procedure described in this documentation has a few drawbacks:
The backup is done in fixed "backup windows" (once per day or a few times per day). The latest data written between two backup windows will be lost if the primary server storage is damaged.
The backup procedure backup database and data directory separately. In the backup server, some entries in the database may become inconsistent with the data directory. This causes some libraries become "corrupted" after restore.
The real-time backup server uses a syncing algorithm similar to the Seafile desktop client to retrieve data from the primary server. It works as follows:
Whenever a library is updated, the primary server notifies the backup server to retrieve the changed data. With a delta syncing algorithm, this procedure runs quickly and updates the backup server in nearly real-time.
The backup server also checks all libraries on the primary server at a fixed period. Any new or updated libraries will be synced to the backup server. This will pick up any legged updates due to glitches in the above real-time sync procedure.
The backup server always keep the database and data directory consistent. So no libraries on the backup server will be in corrupted state (unless they're already corrupted on the primary server).
The full history of all libraries will be backed up. This is not like the desktop client, which only syncs the latest state of a library.
There are two sets of data that need to be backup:
The seafile-data directory and the core library metadata tables in the seafile database. This data is the core data structures of the libraries in Seafile. They're synced to the backup server with Seafile's syncing algorithm. In this procedure, the metadata tables are kept consistent with the seafile-data directory.
All other tables in the database (including seafile, ccnet and seahub databases) are backup with mysqldump. mysqldump can't backup the database in real time. You can setup a crontab for mysqldump at regular intervals. The latency of backup for these tables doesn't affect the integrity of library data.
In the following discussion, we'll use "primary server" and "master server", "backup server" and "slave server" interchangeably.
We assume you already have a primary server running, and now you want to setup a backup server.
The steps to setup the backup server are:
Install Seafile on the backup server
Configure Seafile syncing between the primary server and the backup server
Backup the tables in the database by
mysqldump at regular intervals
You should install Seafile Pro Edition on the backup server according to this documentation. Since the real-time backup feature is only available for 5.1.0 or later, you also have to upgrade your primary server to 5.1.0 version or later.
When installing Seafile on the backup server, you have to notice:
The database names (ccnet, seafile and seahub database) should be the same as the names on the primary server.
You don't need to enable other Pro features, such as Office file preview, search indexing, file auditing etc.
You can't start the seahub progress on backup server. It means that usually the Seafile backup server can't provide service.
On the primary server, add following options to seafile.conf:
[backup]backup_url = http://backup-serversync_token = c7a78c0210c2470e14a20a8244562ab8ad509734
On the backup server, add following options to seafile.conf:
[backup]primary_url = http://primary-serversync_token = c7a78c0210c2470e14a20a8244562ab8ad509734sync_poll_interval = 3
backup_url: the backup server's address in url format. You can use http or https.
primary_url: the primary server's address in url format.
sync_token: a secret that shared between the primary and backup server. It's 40 character SHA1 generated by the system admin. You can use
uuidgen | openssl sha1 command to generate a random token.
sync_poll_interval: The backup server polls all libraries of the primary server periodically. You can set the poll interval in the unit of hours. The default interval is 1 hour, which mean the backup server will poll the primary every hour. You should choose larger intervals if you have large number of libraries.
If you use https to sync between primary and backup servers, you have to use the correct Seafile server package for your system. If you run CentOS, you should use the Seafile package named without "Ubuntu" suffix; if you run Debian or Ubuntu, you should use the Seafile package named with "Ubuntu" suffix. Otherwise you may meet CA error in https requests.
After saving the configuration, restart seafile service on the primary and backup servers. The backup server will automatically start backup on restart.
Note: Don't start the seahub progress on the Seafile backup server.
Backup data from the databases on the primary server's MySQL with mysqldump:
mysqldump -u <user> -p<password> --databases \--ignore-table=<seafile_db>.Repo \--ignore-table=<seafile_db>.Branch \--ignore-table=<seafile_db>.RepoHead \<seafile_db> <ccnet_db> <seahub_db> > dbdump.sql
You should replace
<password> with your MySQL admin user and password. You should replace
<ccnet_db> with your database names.
The three ignored tables are core tables related to library data and are synced by Seafile backup server in a real-time manner. They're kept in the seafile database of the backup server and are separated from the mysqldump process.
You should setup crontab to run the mysqldump at regular intervals.
If you want to backup the tables (except for the 3 tables synced by Seafile) in a more real-time manner, you can deploy the master-slave replication for the MySQL/MariaDB database from the primary node to another database server. The database running on the backup server must not be used as the target of this replication. Otherwise you'll end up with replication conflicts, since the db on backup server will also be updated by Seafile backup process too.
After the above setup, you should now have the below layout of your backup data:
Library data is backed up and managed by Seafile backup server. The data can be stored on external storage, object storage, or local disk, depending on your setup for the backup server.
Database tables are splited into two parts:
3 core library tables are backup in real-time to the backup node's MySQL database.
Other tables are regularly dumped to a file with mysqldump. The backup files are stored somewhere other than the primary server.
status command to view the backup status. The output is like:
# ./seaf-backup-cmd.sh statusTotal number of libraries: xxxNumber of synchronized libraries: xxxNumber of libraries waiting for sync: xxxNumber of libraries syncing: xxxNumber of libraries failed to sync: xxxList of syncing libraries:xxxxxxList of libraries failed to sync:xxxxxx
There are a few reasons that may fail the backup of a library:
Some data in the primary server is corrupted. The data may be in the latest state or in history. Since the backup procedure syncs the full history, corruption in history will fail the backup.
The primary server has run seaf-fsck, which may restore a library back to an older state.
In the unfortunate situation of severe data corruptoin happens on the primary server, you can restore your service quickly directly on the backup server. The recovered service can directly be run on the backup server.
There are tow steps to restore on the backup server:
Import the latest mysql dump file into the Seafile backup server's MySQL database.
Enable other Pro features on the Seafile backup server, and start seahub progress
Importing the latest mysql dump file into the backup server's database:
mysql -u <user> -p<pass> < dbdump.sql
<pass> with your MySQL admin user name and password.
Copy the seafile's configuration to the backup server, then start the seahub progress on the backup server.
If your primary service runs as a Seafile cluster, you have two points to notice when setting up a backup server:
You should only use one MySQL instance as the replication master, if you're using MariaDB cluster.
You have to change seafile.conf and set the
sync_token options on each Seafile node. The configuration on all primary Seafile node should be the same. They all point to the same backup server.
Currently you cannot deploy the backup service as a cluster. That is, you can only use a single node as backup server. This support may be added in the future.
seaf-backup-cmd.sh script is the tool for managing the backup server. The
seaf-backup-cmd.sh script provides the following commands:
You can use the
sync command to manually schedule backup of a library:
# ./seaf-backup-cmd.sh sync <library id>
The command will block until the backup is finished.
--force option of
sync command can be used to force failing backup to complete. Permanent backup failures are usually caused by data corruption of a library in the primary server. The
--force option asks the backup to skip corrupted objects and finish the backup.
When you find a backup error, follow two steps:
Run seaf-fsck on the primary server, for the failing libraries. Fsck fixes any corruption for the latest state of the libraries.
seaf-backup-cmd.sh sync --force <library id> on the backup server.