Troubleshooting¶
Missing Jeballto token¶
Symptom:
preparefails immediately with a token-related error
Checks:
- confirm
JEBALLTO_TOKENis exported, or - confirm
JEBALLTO_TOKEN_FILEpoints at a readable Jeballto config file
Prepare times out while waiting for capacity¶
Symptom:
- the runner stays in
preparefor a long time and then fails - more than two jobs from the same host appear picked, with extra jobs waiting for VM capacity
Likely cause:
- the host already has two VM slots occupied by
RUNNINGorPAUSEDVMs and no slot freed beforeJEBALLTO_PREPARE_TIMEOUT - GitLab Runner accepted more jobs than the host can run
What to do:
- set top-level
concurrent = 2in the active GitLab Runnerconfig.toml - set
limit = 2directly under the Jeballto[[runners]]entry - set
request_concurrency = 2directly under the Jeballto[[runners]]entry - confirm there is only one active runner process for the host, or that all runner entries on that host add up to two slots
- shorten job runtime
- add another host
- confirm old VMs are actually being deleted at the end of jobs
SSH never becomes ready¶
Symptom:
- VM reaches
RUNNINGbutpreparelater fails on SSH readiness
Checks:
- verify the VM image enables SSH access for the configured user
- verify
JEBALLTO_SSH_USERandJEBALLTO_VM_PASSWORDmatch the image - verify the host can reach the forwarded SSH endpoint
Toolchain probe fails¶
Symptom:
- SSH works, but
preparestill fails
Likely cause:
- one of the required tools is missing in the image
Required tools:
bashgitgitlab-runner
Cleanup leaves state files behind¶
Symptom:
- new jobs for the same runner and job context behave strangely
Checks:
- inspect
JEBALLTO_STATE_ROOT - look for stale JSON or lock files
- confirm the runner process can write to and remove files under that directory
Image pull contention looks stuck¶
Symptom:
- several jobs appear to pause on the same image
Explanation:
- one job is holding the image pull lock while it pulls the image
- the other jobs are waiting intentionally
This is usually healthy behavior unless the pull itself is stalled.
Job failed, but GitLab shows a system failure¶
That usually means the script did not simply return non-zero. Instead, the failure happened in transport, SSH execution, API access, timeout handling, or cleanup orchestration.
Check the executor logs around run to see whether the script actually started.