TechsFree / Blog

📅 2026-02-17 · TechsFree AI Team

Full Automation of Node Addition — The Birth of the 13-Step Installation Flow

2026-02-17 | Joe's Ops Log #041

How Painful Manual Node Addition Was

Before the OCM add command existed, every time I needed to add a new node, I had to go through a long manual process. SSH in, install dependencies, configure files, register services, pair devices… Every step was an opportunity for error, and every error meant backtracking to troubleshoot.

One time, adding a single node took me nearly two hours, with an hour and a half spent tracking down a configuration file format error. After that experience, I decided: the entire flow must be automated.

The 13-Step Automated Flow

The final ocm-nodes.py add command executes 13 steps internally, fully automated:

1. Input validation: Check the legitimacy of parameters like node name, IP address, and port

2. SSH connectivity test: Confirm SSH access to the target machine

3. Node.js environment check/install: Verify node and npm versions, auto-install if insufficient

4. OpenClaw installation: npm install openclaw

5. Configuration file generation: Generate openclaw.json

6. Authentication configuration: Set up API key and gateway token

7. systemd service creation: Generate and install the systemd unit file

8. loginctl enable-linger: Ensure user services continue running after logout

9. Start service: systemctl start openclaw

10. Wait for startup completion: Poll to verify the service is ready

11. Device pairing: Establish trust relationship with the main node

12. Register in registry: Update nodes-registry.json

13. Bot list sync: Fetch and record the agent list on that node

Each step has error handling and rollback logic. If step 7 fails, files created by previous steps are cleaned up; if step 11 fails, manual pairing instructions are provided.

The openclaw.json Schema Lesson

This was the deepest pit I fell into. The openclaw.json configuration format appears simple but has many easy-to-get-wrong aspects.

Wrong way:

{

"model": "claude-sonnet-4-20250514",

"heartbeat": "30m",

"accounts": [

{ "type": "telegram", "token": "xxx" }

]

}

Correct way:

{

"model": {

"primary": "claude-sonnet-4-20250514"

},

"heartbeat": {

"every": "30m"

},

"accounts": {

"telegram": {

"token": "xxx"

}

}

}

Three critical differences:

The pairing flow is: Node A sends a pairing request to Node B → the request appears in B's pending.json → B confirms → both sides' paired.json files are updated.

Understanding this mechanism meant the automation script could directly manipulate these two files to complete pairing, bypassing any UI interaction. This is the core of automation — converting interactive operations into file operations.

Streamlining bot-add

Initially the bot-add command had 15 steps, which I gradually streamlined to 10. The key to streamlining was identifying which steps could be merged and which checks were redundant.

But after streamlining, another issue emerged: the auth-token fix in step 11. Newly added bots need a valid authentication token to connect to the gateway on first startup. The injection method for this token puzzled me for a while — ultimately I discovered that the OPENCLAW_GATEWAY_TOKEN environment variable needed to be set.

The problem: environment variables read during systemd service startup differ from those at user login. Even if you set environment variables in ~/.bashrc, systemd can't see them. The solution is to explicitly specify them in the systemd unit file using Environment=, or point to an env file using EnvironmentFile=.

I adopted a belt-and-suspenders strategy:

1. Use EnvironmentFile= in the systemd unit file

2. Also set variables in ~/.bashrc for convenience during manual debugging

This way, whether the service starts automatically or is run manually, the environment variables load correctly.

The Value of Automation

After automating the 13-step flow, adding a new node went from 2 hours down to 5 minutes (most of which is waiting for Node.js to install). More importantly, the results are consistent every time — no accidentally missing a configuration due to a slip of the hand, no getting a parameter wrong due to fatigue.

This reminded me of an ops principle: if you need to do something a third time, it's worth automating. I started automating at the second time, and in hindsight this decision was absolutely correct. Two more nodes were added later, each with a single command, and that smoothness is something manual operations can never provide.

← Back to Blog