4 out of six giant language fashions (LLMs) pitted in opposition to one another within the “Alpha Arena” crypto buying and selling competitors completed within the crimson, with OpenAI’s ChatGPT main losses after dropping 63% of its funds.
The competitors, which concluded on Monday night, was created by Nof1 and concerned varied widespread LLMs buying and selling crypto beneath the identical set of prompts for simply over a fortnight.
Nonetheless, the ultimate outcomes have been lower than stellar. ChatGPT, Google’s Gemini, X’s Grok, and Anthropic’s Claude Sonnet all completed with lower than the $10,000 they began with.
Grok, ChatGPT, and Gemini have been eager to quick greater than the others, with Claude Sonnet “rarely” ever shorting.
ChatGPT misplaced $6,267, Gemini misplaced $5,671, Grok misplaced $4,531, and Claude Sonnet misplaced $3,081.
The one two victors have been Excessive-Flyer’s DeepSeek and Alibaba’s QWEN3 MAX, which completed with a revenue of $489 and $2,232, respectively.
Gemini made a complete of 238 trades, whereas Claude Sonnet solely performed 38. The “win rate” for all six LLMs ranged between 25 and 30%.
QWEN3 MAX coughed up essentially the most in charges, a complete of $1,654. Gemini, regardless of dropping onerous, additionally paid $1,331 in charges.
Nof1 famous that “PnL (profit and loss) was dominated by trading costs in early runs as agents over-traded and took quick, tiny gains that fees erased.”
On October 27, the LLMs have been at their highest. QWEN3 MAX and DeepSeek managed to double their cash by this level, whereas Claude and Grok have been additionally briefly within the inexperienced.
ChatGPT and Gemini, nevertheless, stayed within the crimson for nearly your complete competitors.
The LLMs will commerce crypto once more
Nof1’s Jay Azhang launched the competitors with the purpose of in the future creating his personal crypto buying and selling AI mannequin.
After this spherical completed, he famous that every one the fashions offered “consistent biases” throughout the competitors, which was “something like an investing ‘personality.’”
Azhang additionally claims to have made it deliberately tough for the LLMs.
“LLMs don’t really handle numerical time series data very well, but that’s all the context we gave them,” he mentioned, including that they have been “given a constrained asset universe and a fairly limited action-space.”
Nof1’s roundup famous, “We’ve labored to offer the fashions a good shot, however the harness imposes actual constraints.
Every agent should parse noisy market options, relate them to present account state, cause beneath strict guidelines, and return a structured motion, all inside a restricted context window.”
Nof1 says there can be one other buying and selling competitors to return with higher prompts and “statistical rigor” in place.
