Concepts¶

Ephemeral by default¶

Every VM is created with ephemeral: true. That means:

jobs do not reuse VM state
leftover VMs are less likely after crashes
cleanup can be aggressive without worrying about persistence

This is the core isolation guarantee of the executor.

Deterministic VM naming¶

VM names are derived from runner ID and job ID. This enables crash recovery:

if the executor creates a VM but crashes before saving state
the next attempt can look up the same VM by deterministic name
that avoids leaking a hidden orphan for the same job

File-backed state¶

The executor stores per-job JSON under JEBALLTO_STATE_ROOT:

{runner}-{job}.json
{runner}-{job}.lock
image-{reference}.lock

Why it matters:

GitLab Runner can invoke separate hook processes
process-local memory is not enough to coordinate them
file locks serialize GitLab Runner hook processes and deduplicate concurrent pulls

Capacity waiting¶

The Jeballto Agent currently supports up to two concurrent VM slots on one host. RUNNING and PAUSED VMs both consume those slots. The executor does not treat a full host as an immediate failure. Instead it:

retries regular transient API errors with exponential backoff
treats VM_LIMIT_REACHED specially
keeps polling for capacity until the prepare timeout is exhausted

That behavior is why a busy but healthy host does not instantly fail every queued job.

Failure classes¶

The executor separates:

build failures: the script ran and returned a non-zero exit code
system failures: networking, timeouts, API errors, or host-level issues

That distinction matters because GitLab Runner may retry system failures.