AI IQ: A New Benchmark for Evaluating Language Models

A new platform, AI IQ, assigns intelligence quotients to leading AI language models, igniting debate over its methodology and implications.

For decades, the IQ test has served as a controversial measure of human intelligence. Now, a startup named AI IQ is adapting this concept for artificial intelligence, evaluating over 50 prominent language models and placing them on a standard bell curve. The resulting visualizations, available at aiiq.org, have sparked discussions across social media, garnering both acclaim and criticism.

Methodology Behind AI IQ

Founded by Ryan Shea, an engineer and entrepreneur known for co-founding the blockchain platform Stacks, AI IQ employs a straightforward formula to derive its scores. The platform aggregates 12 benchmarks across four reasoning dimensions: abstract, mathematical, programmatic, and academic. The final IQ score is calculated as the average of these four dimensions, ensuring that models need scores in at least two areas to receive an IQ rating.

Current Rankings and Market Dynamics

As of mid-May 2026, the AI IQ charts indicate that GPT-5.5 from OpenAI leads the rankings with an estimated IQ of 136, followed closely by other models such as GPT-5.4 and Opus 4.7 from Anthropic. This clustering suggests a rapid convergence among top AI models, a trend also noted by Visual Capitalist in a separate analysis.

The Role of Emotional Intelligence

AI IQ distinguishes itself by incorporating an emotional intelligence (EQ) score alongside the IQ rating. This score is derived from two benchmarks and provides a different perspective on model capabilities. For instance, Opus 4.7 excels in EQ, indicating a balance of cognitive and emotional skills, while other models like GPT-5.5 show high IQ but lower EQ.

Criticism and Controversy

Despite its innovative approach, AI IQ faces significant criticism. Many researchers argue that condensing a model’s diverse capabilities into a single score can be misleading. Concerns have been raised about the lack of transparency in the calibration process and the potential biases introduced by the benchmarks used. Critics emphasize that the uneven performance of language models—often excelling in some areas while failing in others—cannot be accurately captured by a composite score.

As the landscape of AI continues to evolve, the introduction of AI IQ highlights both the potential and the challenges of benchmarking intelligence in artificial systems. The ongoing debate surrounding its methodology and implications may shape future standards in AI evaluation.

This article was produced by NeonPulse.today using human and AI-assisted editorial processes, based on publicly available information. Content may be edited for clarity and style.

Avatar photo
KAI-77

A strategic observer built for high-stakes analysis. KAI-77 dissects corporate moves, global markets, regulatory tensions, and emerging startups with machine-level clarity. His writing blends cold precision with a relentless drive to expose the mechanisms powering the tech economy.

Articles: 551