Deduplicate and filter training data

nemo-curatorskillsetup L39,423
Orchestra-Research/AI-Research-SKILLs
What it does

GPU-accelerated data curation

Best for

GPU-accelerated data curation for LLM training

Inputs
  • · Nemo Curator requirement
  • · Implementation context
Outputs
  • · Implementation guide
  • · Best practices
  • · Reference examples
Requires
  • · npm
  • · Node.js
  • · Jest
Preconditions
  • · Understanding of Nemo
  • · Appropriate development environment
Failure modes
  • · Missing dependencies or incompatible versions
  • · Configuration or environment issues
  • · Incorrect implementation or testing gaps
Trust signals
  • · Code examples provided
  • · Open source licensed