Skip to content

Seafile GC

Seafile uses storage de-duplication technology to reduce storage usage. The underlying data blocks will not be removed immediately after you delete a file or a library. As a result, the number of unused data blocks will increase on Seafile server.

To release the storage space occupied by unused blocks, you have to run a "garbage collection" program to clean up unused blocks on your server.

The GC program cleans up two types of unused blocks:

  1. Blocks that no library references to, that is, the blocks belong to deleted libraries;
  2. If you set history length limit on some libraries, the out-dated blocks in those libraries will also be removed.

Before running GC, you must shutdown the Seafile program on your server if you use the community edition. For professional edition, online GC operation is supported. If you use Professional edition, you don't need to shutdown the Seafile program if you are using MySQL. This is because new blocks written into Seafile while GC is running may be mistakenly deleted by the GC program.

At the bottom of the page there is a script that you can use to run the cleanup manually or e.g. once a week with as cronjob.

Run GC

Dry-run Mode

To see how much garbage can be collected without actually removing any garbage, use the dry-run option:

seaf-gc.sh --dry-run [repo-id1] [repo-id2] ...

The output should look like:

[03/19/15 19:41:49] seafserv-gc.c(115): GC version 1 repo My Library(ffa57d93)
[03/19/15 19:41:49] gc-core.c(394): GC started. Total block number is 265.
[03/19/15 19:41:49] gc-core.c(75): GC index size is 1024 Byte.
[03/19/15 19:41:49] gc-core.c(408): Populating index.
[03/19/15 19:41:49] gc-core.c(262): Populating index for repo ffa57d93.
[03/19/15 19:41:49] gc-core.c(308): Traversed 5 commits, 265 blocks.
[03/19/15 19:41:49] gc-core.c(440): Scanning unused blocks.
[03/19/15 19:41:49] gc-core.c(472): GC finished. 265 blocks total, about 265 reachable blocks, 0 blocks can be removed.

[03/19/15 19:41:49] seafserv-gc.c(115): GC version 1 repo aa(f3d0a8d0)
[03/19/15 19:41:49] gc-core.c(394): GC started. Total block number is 5.
[03/19/15 19:41:49] gc-core.c(75): GC index size is 1024 Byte.
[03/19/15 19:41:49] gc-core.c(408): Populating index.
[03/19/15 19:41:49] gc-core.c(262): Populating index for repo f3d0a8d0.
[03/19/15 19:41:49] gc-core.c(308): Traversed 8 commits, 5 blocks.
[03/19/15 19:41:49] gc-core.c(264): Populating index for sub-repo 9217622a.
[03/19/15 19:41:49] gc-core.c(308): Traversed 4 commits, 4 blocks.
[03/19/15 19:41:49] gc-core.c(440): Scanning unused blocks.
[03/19/15 19:41:49] gc-core.c(472): GC finished. 5 blocks total, about 9 reachable blocks, 0 blocks can be removed.

[03/19/15 19:41:49] seafserv-gc.c(115): GC version 1 repo test2(e7d26d93)
[03/19/15 19:41:49] gc-core.c(394): GC started. Total block number is 507.
[03/19/15 19:41:49] gc-core.c(75): GC index size is 1024 Byte.
[03/19/15 19:41:49] gc-core.c(408): Populating index.
[03/19/15 19:41:49] gc-core.c(262): Populating index for repo e7d26d93.
[03/19/15 19:41:49] gc-core.c(308): Traversed 577 commits, 507 blocks.
[03/19/15 19:41:49] gc-core.c(440): Scanning unused blocks.
[03/19/15 19:41:49] gc-core.c(472): GC finished. 507 blocks total, about 507 reachable blocks, 0 blocks can be removed.

[03/19/15 19:41:50] seafserv-gc.c(124): === Repos deleted by users ===
[03/19/15 19:41:50] seafserv-gc.c(145): === GC is finished ===

[03/19/15 19:41:50] Following repos have blocks to be removed:
repo-id1
repo-id2
repo-id3

If you give specific library ids, only those libraries will be checked; otherwise all libraries will be checked.

Notice that at the end of the output there is a "repos have blocks to be removed" section. It contains the list of libraries that have garbage blocks. Later when you run GC without --dry-run option, you can use these libraris ids as input arguments to GC program.

Removing Garbage

To actually remove garbage blocks, run without the --dry-run option:

seaf-gc.sh [repo-id1] [repo-id2] ...

If libraries ids are specified, only those libraries will be checked for garbage.

As described before, there are two types of garbage blocks to be removed. Sometimes just removing the first type of blocks (those that belong to deleted libraries) is good enough. In this case, the GC program won't bother to check the libraries for outdated historic blocks. The "-r" option implements this feature:

seaf-gc.sh -r

Libraries deleted by the users are not immediately removed from the system. Instead, they're moved into a "trash" in the system admin page. Before they're cleared from the trash, their blocks won't be garbage collected.

Removing FS objects

Since Pro server 8.0.6 and community edition 9.0, you can remove garbage fs objects. It should be run without the --dry-run option:

seaf-gc.sh --rm-fs

Using Multiple Threads in GC

You can specify the thread number in GC. By default,

  • If storage backend is S3/Swift/Ceph, 10 threads are started to do the GC work.
  • If storage backend is file system, only 1 thread is started.

You can specify the thread number in with "-t" option. "-t" option can be used together with all other options. Each thread will do GC on one library. For example, the following command will use 20 threads to GC all libraries:

seaf-gc.sh -t 20

Since the threads are concurrent, the output of each thread may mix with each others. Library ID is printed in each line of output.

Run GC based on library ID prefix

Since GC usually runs quite slowly as it needs to traverse the entire library history. You can use multiple threads to run GC in parallel. For even larger deployments, it's also desirable to run GC on multiple server in parallel.

A simple pattern to divide the workload among multiple GC servers is to assign libraries to servers based on library ID. Since Pro edition 7.1.5, this is supported. You can add "--id-prefix" option to seaf-gc.sh, to specify the library ID prefix. For example, the below command will only process libraries having "a123" as ID prefix.

seaf-gc.sh --id-prefix a123

GC cleanup script for Community Version

To use this script you need:

  • Setup the seafile-service file at '/etc/init.d/seafile-server'
  • Files of seafile setup need to be owner by 'seafile:nogroup' or 'seafile:seafile'
  • Run the script with sudo or as root
  • Put the script into crontab of a root user

Create the script file (change the location to your liking):

touch /opt/haiwen/seafile/cleanupScript.sh

Use your favorite text editor and paste the following code:

#!/bin/bash

#####
# Uncomment the following line if you rather want to run the script manually.
# Display usage if the script is not run as root user
#        if [[ $USER != "root" ]]; then
#                echo "This script must be run as root user!"
#                exit 1
#        fi
#
# echo "Super User detected!!"
# read -p "Press [ENTER] to start the procedure, this will stop the seafile server!!"
#####

# stop the server
echo Stopping the Seafile-Server...
systemctl stop seafile.service
systemctl stop seahub.service

echo Giving the server some time to shut down properly....
sleep 20

# run the cleanup
echo Seafile cleanup started...
sudo -u seafile $pathtoseafile/seafile-server-latest/seaf-gc.sh

echo Giving the server some time....
sleep 10

# start the server again
echo Starting the Seafile-Server...
systemctl start seafile.service
systemctl start seahub.service

echo Seafile cleanup done!

Make sure that the script has been given execution rights, to do that run this command.

sudo chmod +x /path/to/yourscript.sh

Then open crontab with the root user

crontab -e

Add the following line (change the location of your script accordingly!)

0 2 * * Sun /opt/haiwen/seafile/cleanupScript.sh

The script will then run every Sunday at 2:00 AM.

GC in the seafile docker container

To perform garbage collection inside the seafile docker container, you must run the /scripts/gc.sh script. Simply run docker exec <whatever-your-seafile-container-is-called> /scripts/gc.sh.