openbench provides standardized, reproducible benchmarking for LLMs across 30+ evaluation suites (and growing) spanning knowledge, math, reasoning, coding, science, reading comprehension, health, long ...
CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...
The annual holiday variety show Christmas Without Tears will return to the Orpheum Theater on Tuesday, December 16, marking its 20th year. The event, created by vocalist Judith Owen and actor-comedian ...
Abstract: Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM -generated ...