
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code. (arXiv:2312.14856v1 [cs.SE])

"Large Language Models"We present a method for systematically evaluating the correctness and robustness of instruction-tuned large language models (LLMs) for code generation via a new benchmark, Turbulence.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *