BenchING: Structured Output Benchmark for LLMs
A benchmark and framework for evaluating how well LLMs follow structured output formats in narrative PCG tasks, with error taxonomy and scaling analysis.
Selected research, systems, and prototypes I've built or led.
A benchmark and framework for evaluating how well LLMs follow structured output formats in narrative PCG tasks, with error taxonomy and scaling analysis.
Interactive timeline visualizer for Thailand earthquake events with filtering and temporal context.
Built and shipped 7+ web apps integrating LLM capabilities as part of a rapid prototyping initiative.
An open Thai reasoning model (research preview) exploring test-time reasoning strategies and instruction-following for Thai language tasks.
Low-latency speech assistant using pre-generated LLM messages to guide drivers in simulator-based assessment.
Algorithms and tools for generating and evaluating branching storylines using LLMs, including UI and utilities.