Automating LND Static Channel Backup (SCB) to S3 with Python

Because keeping an offsite backup of the latest version of the channels file is critical for recovering a Lightning LND node.

Automating LND Static Channel Backup (SCB) to S3 with Python

I run my own LND node, and ensuring that my channels.backup file is properly backed up was a necessity. Maintaining a reliable backup of this file is crucial for node recovery in case of failure. LND ensures atomic updates to this file by writing changes to a temporary file and then moving it over the existing channels.backup. This behavior presents a challenge when trying to monitor changes to the file, as traditional file monitoring tools may not detect such updates reliably.

Requirements

I established the following requirements for my solution:

  • Simple AWS Usage: The backup should be easily integrated with AWS, without requiring complex dependencies or additional utilities.
  • Backup on Startup: To ensure that the latest channels.backup file is uploaded and not have to wait for a change on it to have a backup.
  • Backup on File Update: The script must detect and react to updates to the channels.backup file.
  • Runs as a Daemon/Service: The solution should run continuously until explicitly stopped, ensuring continuous backups. This is also necessary for setting it up as a systemd service.
  • Environment Variable Configuration: Inputs should be passed through environment variables for simplified configuration within a systemd service.

Developing the Solution

I chose Python for this task because I wanted a lightweight solution that didn't require managing external AWS utilities or dependencies beyond a simple virtual environment. By using Python, the setup remains clean and easy to maintain.

Backup Process

To ensure reliability, I decided to first copy the channels.backup file to a separate directory before uploading it to S3. This approach allows for keeping multiple versions if needed and provides an easy way to check the latest backup. The copied file in the temporary directory includes a timestamp in its filename, making it straightforward to identify the latest version.

When uploading to S3, the filename not only includes the timestamp but also the last six hexadecimal digits of the SHA-256 checksum of the file. This helps in quickly verifying integrity and distinguishing between multiple backups that may have been created in the same second.

Only after the file has been successfully uploaded to S3 is the file removed from the temporary backup directory. This allows us to have a local backup in case there are failures with S3 uploads.

Monitoring File Changes

Detecting when channels.backup is updated was not straightforward due to the way LND modifies it. Since the file is not updated in place but instead replaced via a move operation, common file-watching tools might not detect these changes properly. I tried a few different libraries until I was able to find one that allowed me to do what I wanted:

  • watchdog: A popular Python library for monitoring file system changes. However, it failed to detect when LND replaced channels.backup via a move operation.
  • inotify: The Linux kernel subsystem for file system event monitoring. The primary Python library wrapping inotify has not been actively maintained, and important fixes available in the master branch have not been published to PyPI. I wanted to avoid embedding source code, so this option was not ideal.
  • asyncinotify: An alternative that provided the necessary functionality while supporting an asynchronous event loop. This approach ultimately met all the requirements for reliable detection of file updates.

Writing the Backup Script

To assist in writing the script, I experimented with Codeium Windsurf to see how well it could generate the necessary code. The result was that it correctly generated around 80% of the script, particularly excelling in scaffolding for retrieving values from environment variables and handling the backup logic, which was nearly flawless.

However, it struggled significantly with handling the stop logic properly. No matter how I structured the prompts, it failed to correctly send and manage termination signals. Additionally, the multi-tasking code had to be manually implemented since it did not use asyncio.wait appropriately. This required manual intervention to ensure a smooth shutdown process and proper event handling.

The script can be found in this gist. It is too long to embed it here.

Running as a systemd Service

To ensure continuous monitoring, I set up the script as a systemd service. This assumes that you have credentials set up to interact with AWS services from the root account, but could be set up for any user. Here’s how to do it:

1. Install the Script, a Python Virtual Environment and Create Local Backup Folder

mkdir -p /opt/lnd-backup
cd /opt/lnd-backup
<place the script here>
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install asyncinotify boto3 python-dotenv
mkdir /var/lnd-channel-backups

2. Create an Environment File

Create a file at /etc/default/lnd-backup-env with the necessary configuration:

MONITOR_FILE_PATH=/path/to/lnd/data/chain/bitcoin/mainnet/channels.backup
BACKUP_DIRECTORY=/var/lnd-channel-backups
S3_BUCKET_NAME=your-bucket-name
S3_BACKUP_PREFIX=lnd-backups/mynode/channels/

3. Create the systemd Service File

Save the following as /etc/systemd/system/lnd-backup.service:

[Unit]
Description=LND Channel Backup Service
After=network.target lnd.service

[Service]
Type=simple
User=root
Group=root
EnvironmentFile=/etc/default/lnd-backup-env
ExecStart=/opt/lnd-backup/venv/bin/python /opt/lnd-backup/s3-backup-lnd-channels-backup.py 
Restart=always

[Install]
WantedBy=multi-user.target

4. Enable and Start the Service

sudo systemctl daemon-reload
sudo systemctl enable lnd-backup.service
sudo systemctl start lnd-backup

With this setup, the script continuously monitors the channels.backup file and ensures that it is backed up properly whenever LND updates it.

Wrap-Up

This was a fun exercise in using LLMs to generate a script, and it provided valuable insights into where these tools excel and where they struggle. It demonstrated how well AI can handle boilerplate code and structured logic but also highlighted its limitations when dealing with more complex concurrency and signal handling.

I'll be posting soon about how I set up my hybrid tor/clearnet LND node using Docker and Ansible, so stay tuned for that!

Comments powered by Talkyard.