Generate → Refine → Score

A library to support LLM-based generation of checklist criteria for evaluation. Available as a package, CLI, or local UI.

→ See examples

$ pip install autochecklist
Full installation guide →

Generate one checklist per input for fine-grained evaluation, or a single shared checklist across your entire dataset, with five generation strategies.

Instance-level checklist generation Corpus-level checklist generation

Other Features

  • 2 instance-level and 3 corpus-level generator abstractions
  • 8+ built-in methods from literature
  • Multi-provider LLM backend (OpenAI, OpenRouter, vLLM)
  • CLI and local UI