Calculator without Eval in JavaScript Code with Harry

Provider-agnostic, open-source evaluation infrastructure for language models

openbench provides standardized, reproducible benchmarking for LLMs across 30+ evaluation suites (and growing) spanning knowledge, math, reasoning, coding, science, reading comprehension, health, long ...

GitHub

CATArena: Engineering-Level Tournament Evaluation Platform for LLM-Driven Code Agents

CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...

OffBeat

Harry Shearer & Judith Owen to Present 20th Annual Christmas Without Tears

The annual holiday variety show Christmas Without Tears will return to the Orpheum Theater on Tuesday, December 16, marking its 20th year. The event, created by vocalist Judith Owen and actor-comedian ...

IEEE

LLM-based Interactive Code Generation: Empirical Evaluation

Abstract: Recently, large language models (LLMs), those pretrained on code, have demonstrated strong capabilities in generating programs from informal natural language intent. However, LLM -generated ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results