cyberneticlibrary

Checkpoint long-running batch jobs

checkpoint-resume-long-jobskillsetup L264
Tibsfox/gsd-skill-creator
What it does

Create checkpoints and resume long-running jobs

Best for

Enable resumption of long-running jobs with periodic checkpoints rather than full restart on failure.

Inputs
  • · job config
  • · checkpoint interval
  • · resume parameters
Outputs
  • · checkpoint state file
  • · resume config
  • · progress metrics
Requires
  • · job scheduler
  • · storage
Preconditions

Long-running job supports stateful checkpoint interface

Failure modes
  • · stale checkpoint (state drifted)
  • · incompatible resume config
Trust signals
  • · checkpoint interval configurable
  • · state serialization validated
  • · resume path explicit