Evo Wins AutoGPT Arena Hackathon! 🏆

Last month, over 5,000 participants across 500 teams competed in the AutoGPT Arena Hacks, where agent performances were measured by the most comprehensive AI agent benchmark to-date. We’re proud to announce that Evo emerged victorious by scoring highest on these benchmarks. You can try Evo today here.

After winning the SuperAGI hackathon in September, the Evo.ninja team was determined to keep improving Evo. AutoGPT’s Arena Hacks was the perfect platform to showcase Evo’s capabilities.

AutoGPT’s Arena was not your typical hackathon. Over 4 weeks, the main task was to develop an AI agent that can handle AutoGPT’s rigorous challenges through natural language input.

The challenges were grouped into 3 benchmarking categories:

Scrape & Synthesize: Extract data from the web and creating datasets
Data Mastery: Perform essential data science tasks
Coding Excellence: Master the art of coding

These categories were meant as specialization tracks – an agent would typically only do well in one category. Evo ended up scoring highest in all 3 categories. It also won the grand prize of best generalist agent! 🥇

Let’s dive into the technology and architecture that makes Evo so reliable.

Evo is a multi-agent application. Each agent persona has its own specialization and capabilities to achieve users’ goals. The best suited personas are selected in Evo’s execution loop:

Predict. With each iteration of the execution loop, Evo starts by making an informed prediction about what the best next step should be.
Select. Based on this prediction, Evo selects a best-fit agent persona.
Contextualize. Based on the prediction from step 1 and the agent persona in step 2, the complete chat history is “contextualized” and only the most relevant messages are used for the final evaluation step.
Evaluate and Execute. A final evaluation step is run to determine what agent function is executed to try and further achieve the user’s goal.

Visit Evo’s repo to learn more about its architecture in detail.

The best way to contribute to the Evo project is to try it out and give our team feedback on Discord.

It would be incredibly helpful to Evo’s development if you:

⭐ Try out Evo in a simple-to-use UI that we’ve created here
Let us know which prompts you used and how it went in our Discord server

Developers could also check out our Github repo for instructions on how to run Evo locally. We’re excited to hear your thoughts on Evo!

We’re actively shaping Evo into the most reliable and high-performing agent, capable of tackling complex, real-world tasks. We’re proud of the progress that Evo has made in a short amount of time, and we’re excited to see it continue to lead the forefront of AI agent technology. We invite you on the journey to explore the AI agent space with Evo! 🚀