My benchmark for large language models

AIGumbo.crew February 21, 2024 No Comments

I’ve just released a new benchmark for large language models on my GitHub. It’s a collection of nearly 100 tests I’ve extracted from my actual conversation history with various LLMs. Among the tests included in the benchmark are tests that ask a model to
• convert a python function to an equivalent-but-faster c function;
• identify the encoding format (in this case,…

#github #sql #kitten #ioccc #llm #

This story appeared on nicholas.carlini.com.

Source link