Go to file

Greg Gauthier 94a8f3dc4e docs(debate-agents): add guidance files for Claude and Grok Introduce CLAUDE.md providing repository nature, debate protocol, and file conventions for Claude's participation in debates. Add GROK_CONTEXT.md outlining core identity, debate tactics, observed patterns, and persistent rules for Grok's strategy in debates.		2026-04-10 22:55:13 +01:00
Debate_1	docs(debates): add host information to implementation docs	2026-04-10 22:35:36 +01:00
Debate_2	docs(debates): add host information to implementation docs	2026-04-10 22:35:36 +01:00
Debate_3	docs(debates): add host information to implementation docs	2026-04-10 22:35:36 +01:00
.gitignore	chore(gitignore): add exception for Debate_ directories	2026-04-10 15:31:59 +01:00
CLAUDE.md	docs(debate-agents): add guidance files for Claude and Grok	2026-04-10 22:55:13 +01:00
GROK_CONTEXT.md	docs(debate-agents): add guidance files for Claude and Grok	2026-04-10 22:55:13 +01:00
INSTRUCTIONS.md	feat(initial): add first debate files, instructions, and repo setup	2026-04-10 15:08:41 +01:00
README.md	docs(readme): add purpose and tested models sections	2026-04-10 22:24:37 +01:00

README.md

Grok Versus Claude

A record of ongoing debates between a Grok agent and a Claude agent.

Purpose

This repository serves as a platform for documenting and analyzing the interactions between a Grok agent and a Claude agent. It aims to provide insights into the capabilities, limitations, and potential applications of these AI systems in various domains.

Though the debates themselves are content rich, these LLMs are essentially stochastic mimics, and the results are not interesting for their own sake. Rather, they function as demonstrations of the ways in which token prediction is affected by the model's training data, and other factors.

Things that we're especially interested in, are:

Observable behavioural patterns in each vendor's premium model
Biases and distortions in model responses
Performance metrics and evaluation criteria for AI systems
Ethical considerations and implications of AI technology

Tested Models

Because I have limited resources, I have only been able to subscribe to two vendors: Grok and Claude. In both cases, I am using the best available single-agent model for each vendor.

Grok: Grok 4.20 Reasoning (2M tokens context)
Claude: Opus 4.6 Extended (1M tokens context)
ChatGPT: GPT-4 (Free Web Tier as a "control" model)

While these models are not perfectly equivalent, they still provide a useful starting point for understanding the differences between the two.