Troubleshooting¶

Missing Jeballto token¶

Symptom:

prepare fails immediately with a token-related error

Checks:

confirm JEBALLTO_TOKEN is exported, or
confirm JEBALLTO_TOKEN_FILE points at a readable Jeballto config file

Prepare times out while waiting for capacity¶

Symptom:

the runner stays in prepare for a long time and then fails
more than two jobs from the same host appear picked, with extra jobs waiting for VM capacity

Likely cause:

the host already has two VM slots occupied by RUNNING or PAUSED VMs and no slot freed before JEBALLTO_PREPARE_TIMEOUT
GitLab Runner accepted more jobs than the host can run

What to do:

set top-level concurrent = 2 in the active GitLab Runner config.toml
set limit = 2 directly under the Jeballto [[runners]] entry
set request_concurrency = 2 directly under the Jeballto [[runners]] entry
confirm there is only one active runner process for the host, or that all runner entries on that host add up to two slots
shorten job runtime
add another host
confirm old VMs are actually being deleted at the end of jobs

SSH never becomes ready¶

Symptom:

VM reaches RUNNING but prepare later fails on SSH readiness

Checks:

verify the VM image enables SSH access for the configured user
verify JEBALLTO_SSH_USER and JEBALLTO_VM_PASSWORD match the image
verify the host can reach the forwarded SSH endpoint

Toolchain probe fails¶

Symptom:

SSH works, but prepare still fails

Likely cause:

one of the required tools is missing in the image

Required tools:

bash
git
gitlab-runner

Cleanup leaves state files behind¶

Symptom:

new jobs for the same runner and job context behave strangely

Checks:

inspect JEBALLTO_STATE_ROOT
look for stale JSON or lock files
confirm the runner process can write to and remove files under that directory

Image pull contention looks stuck¶

Symptom:

several jobs appear to pause on the same image

Explanation:

one job is holding the image pull lock while it pulls the image
the other jobs are waiting intentionally

This is usually healthy behavior unless the pull itself is stalled.

Job failed, but GitLab shows a system failure¶

That usually means the script did not simply return non-zero. Instead, the failure happened in transport, SSH execution, API access, timeout handling, or cleanup orchestration.

Check the executor logs around run to see whether the script actually started.