The library

Everything we index — ranked by what works, never by stars.

untested
Bootstrap skill evaluation lab reposkillEngineeringL2
os-eval-lab-setup · Bootstrapping isolated test repos with consistent eval harness before autoresearch iterations.
untested
Schedule work resumption at specific timeskillProductivityL1
gsd:resume-at · Scheduling future Claude sessions around usage caps or scheduled overnight batch work.
untested
Score and gate skill improvementsskillEngineeringL3
os-eval-runner · Autonomously iterating skill improvements using empirical scoring gates instead of human feedback.
untested
Restore previous session with full contextskillProductivityL1
gsd:resume-work · Seamlessly restoring GSD session state from checkpoints across multiple Claude conversations.
untested
Plan and delegate plugin evolutionskillEngineeringL3
os-evolution-planner · Proposing and executing agent evolution paths (update/create/orchestrate) with structured planning.
untested
Promote backlog items to active milestoneskillProductL1
gsd:review-backlog · Systematically triaging project tasks and surfacing critical path blockers.
untested
Verify artifact evolution from architectureskillEngineeringL3
os-evolution-verifier · Validating that evolved agents meet their original requirements before applying to production.
untested
Request peer review from external AI CLIsskillEngineeringL2
gsd:review · Gate-checking workflow completion before advancing to next phase.
untested
Log agentic OS experiment runsskillEngineeringL2
os-experiment-log · Durable tracking of agent experiments with structured metadata and cross-run analysis.
untested
Lightweight codebase assessmentskillEngineeringL1
gsd:scan · Quick status checkup showing active phase, pending tasks, and system health.
untested
Extract clinical NLP entitiesskillDataL2
clinical-nlp-extractor · Extracting structured medical data from unstructured clinical notes at scale.
untested
Learn agentic OS conceptsskillProductivityL1
os-guide · Onboarding users to Agentic OS workflows without requiring manual documentation reading.
untested
Verify threat mitigations for phaseskillLegalL2
gsd:secure-phase · Retroactively auditing threat mitigations before shipping a production phase
untested
Multi-agent concurrent improvement loopskillEngineeringL4
os-improvement-loop · Multi-agent systems needing concurrent eval-driven improvement cycles
untested
Generate session report with metricsskillProductivityL1
gsd:session-report · Generating shareable post-session summaries with estimated token costs
untested
Visualize evaluation progress over cyclesskillDataL2
os-improvement-report · Tracking and visualizing agent loop improvement trends over cycles
untested
Switch GSD model profileskillOpsL1
gsd:set-profile · Switching GSD agent quality/cost tradeoffs without restarting work
untested
Initialize agent environment setupskillEngineeringL1
os-init · Bootstrapping Agentic OS structure for new multi-agent projects
untested
Configure GSD workflow settingsskillOpsL1
gsd:settings · Interactive configuration of GSD workflow toggles and model profiles
untested
Manage session memory and learningsskillProductivityL1
os-memory-manager · Closing sessions with proper three-tier memory promotion and deduplication
untested
Create and merge pull requests safelyskillEngineeringL1
gsd:ship · Creating PRs and tracking merges after verify-work passes
untested
Auto-heal agentic system failuresskillEngineeringL3
self-evolution · Self-healing failed selectors and stale scripts via tier-gated edits
untested
Package design findings into reusable skillsskillProductEngineeringL1
gsd:sketch-wrap-up · Distilling throwaway sketches into persistent design skills
untested
Audit code for technical debtskillEngineeringL1
todo-check · Pre-flight auditing a file for technical debt before committing
untested
Sketch throwaway UI mockupsskillProductEngineeringL1
gsd:sketch · Exploring design directions with fast HTML mockup variants
untested
Partition work across parallel agentsskillEngineeringL3
agent-swarm · Partitioning large features into parallel independent sub-tasks
untested
Lock requirements before implementationskillProductEngineeringL1
gsd:spec-phase · Clarifying phase requirements with Socratic interview before discuss-phase
untested
Delegate tasks to worker agentsskillEngineeringL3
dual-loop · Balancing dual concurrent execution cycles with separate eval gates
untested
Preserve spike discoveries as reusable patternsskillProductEngineeringL1
gsd:spike-wrap-up · Converting research findings into reusable project-local skills
untested
Self-directed research and knowledge captureskillL1
learning-loop · Continuous improvement of agent skills through feedback loops
untested
Prototype and test ideas experimentallyskillProductEngineeringL1
gsd:spike · Fast exploratory research on technical unknowns before planning
untested
Route tasks to specialized agentsskillEngineeringL2
orchestrator · Coordinating multi-agent task dispatch across a distributed team
untested
Display project metrics and timelineskillProductOpsL1
gsd:stats · When you need a single authoritative view of all project phases, plans, and metrics across the entire codebase.
untested
Iterate designs through red team reviewskillProductL2
red-team-review · When complex work needs independent adversarial critique before commitment, especially for architectures and research.
untested
Maintain context across sessionsskillEngineeringL1
gsd:thread · When work spans multiple sessions but doesn't belong to a specific phase and needs persistent context.
untested
Continuously improve agentic workflowsskillEngineeringL4
triple-loop-learning · When you want autonomous continuous improvement of system instructions based on objective headless benchmarks.
untested
Generate frontend UI design contractsskillProductEngineeringL1
gsd:ui-phase · When starting a frontend phase and need a design specification contract from research to verification.
untested
Manage memory across agent sessionsskillEngineeringL2
memory-management · When you need a tiered recall system balancing hot-cache speed with deep semantic search across sessions.
untested
Navigate code using knowledge graphsskillEngineeringL2
gitnexus-exploring · When learning a large unfamiliar codebase and need execution flows and symbol relationships mapped.
untested
Audit recursive language model coverageskillEngineeringL1
rlm-audit · When you've added new files and need to identify exactly what's missing from the semantic cache.
untested
Clean stale knowledge cache entriesskillEngineeringL1
rlm-cleanup-agent · When files are deleted or moved and the RLM ledger contains stale entries to remove.
untested
Safely revert phase commitsskillEngineeringL2
gsd:undo · When you need to safely revert phase or plan commits with dependency checking, not blind git reset.
untested
Distill and maintain knowledge ledgerskillEngineeringL2
rlm-curator · When you need to distill files into the RLM Summary Ledger and keep the cache in sync with codebase.
untested
Summarize code into semantic cacheskillEngineeringL2
rlm-distill-agent · When you have uncached files and need high-quality 1-sentence summaries written to the ledger.
untested
Validate phase completion gapsskillProductEngineeringL2
gsd:validate-phase · When a phase is complete but validation coverage is unknown and needs retroactive audit and gap-filling.
untested
Initialize RLM semantic cacheskillEngineeringL2
rlm-init · When setting up RLM for the first time in a project or adding a new cache profile.
untested
Validate features through conversational testingskillProductEngineeringL1
gsd:verify-work · When you want conversational user acceptance testing with automatic diagnosis of found issues.
untested
Search RLM ecosystem code and docsskillEngineeringL2
rlm-search · When searching for architecture, code, or documentation and need O(1) RLM scan before semantic or exact search.
untested
Manage parallel development workstreamsskillOpsEngineeringL1
gsd:workstreams · When validating project work and need to execute gsd:validate-phase workflow.
untested
Audit vector database coverage gapsskillEngineeringDataL2
vector-db-audit · When you need O(1) keyword lookup across RLM summaries without semantic search overhead.
page 102 / 161