Concepts¶
Ephemeral by default¶
Every VM is created with ephemeral: true. That means:
- jobs do not reuse VM state
- leftover VMs are less likely after crashes
- cleanup can be aggressive without worrying about persistence
This is the core isolation guarantee of the executor.
Deterministic VM naming¶
VM names are derived from runner ID and job ID. This enables crash recovery:
- if the executor creates a VM but crashes before saving state
- the next attempt can look up the same VM by deterministic name
- that avoids leaking a hidden orphan for the same job
File-backed state¶
The executor stores per-job JSON under JEBALLTO_STATE_ROOT:
{runner}-{job}.json
{runner}-{job}.lock
image-{reference}.lock
Why it matters:
- GitLab Runner can invoke separate hook processes
- process-local memory is not enough to coordinate them
- file locks serialize GitLab Runner hook processes and deduplicate concurrent pulls
Capacity waiting¶
The Jeballto Agent currently supports up to two concurrent VM slots on one host. RUNNING and PAUSED VMs both consume those slots. The executor does not treat a full host as an immediate failure. Instead it:
- retries regular transient API errors with exponential backoff
- treats
VM_LIMIT_REACHEDspecially - keeps polling for capacity until the prepare timeout is exhausted
That behavior is why a busy but healthy host does not instantly fail every queued job.
Failure classes¶
The executor separates:
- build failures: the script ran and returned a non-zero exit code
- system failures: networking, timeouts, API errors, or host-level issues
That distinction matters because GitLab Runner may retry system failures.