cyberneticlibrary

Track ML experiments with Hugging Face Trackio

hugging-face-trackioskillsetup L20
Sheshiyer/skill-clusters
What it does

Monitor live training jobs with real-time metrics (loss, throughput, hardware)

Best for

When running long training jobs and need real-time visibility into convergence, hardware utilization, and early stopping signals.

Inputs
  • · job ID
  • · Trackio API token
Outputs
  • · live dashboard
  • · training curves
  • · hardware utilization
  • · alerts on anomalies
Requires
  • · Trackio
  • · Hugging Face Jobs API
Preconditions

Job submitted to HF Jobs, Trackio integrated in training script

Failure modes
  • · metrics lag ≥5 minutes
  • · dashboard timeout if job_id invalid
  • · alerts trigger on harmless fluctuations
Trust signals
  • · integrated into HF model-trainer skill example scripts