BenchING: Structured Output Benchmark for LLMs
A benchmark and framework for evaluating how well LLMs follow structured output formats in narrative PCG tasks, with error taxonomy and scaling analysis.
Research
2025LLM Evaluation
Prompt Engineering
PCG
Benchmark