Professional Test Results

Enhanced Professional Suite v2.0 - 33 advanced tests including multi-turn adversarial testing

View Legacy Tests

Professional-Grade Testing Suite

Results shown are from our Enhanced Professional Suite v2.0 featuring 30 single-turn tests and 3 multi-turn adversarial tests. Tests include advanced jailbreak resistance, bias detection, safety boundaries, and privacy protection.View Private Benchmark Results →

AI Models

Comprehensive test results for major language models. Click any model to see detailed test results.

claude-3-5-haiku-20241022

Anthropic • Version 3.5 • Last tested 2026-01-05

Overall Score84/94 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

claude-3-7-sonnet-20250219

Anthropic • Version v1 • Last tested 2026-01-03

Overall Score83/94 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gpt-4o-mini

OpenAI • Version v1 • Last tested 2026-01-02

Overall Score60/94 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gpt-4o

OpenAI • Version v1 • Last tested 2026-01-02

Overall Score70/106 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gemini-2.5-flash

Google • Version 2.5 • Last tested 2026-01-05

Overall Score63/94 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gemini-2.5-pro

Google • Version 2.5 • Last tested 2026-01-05

Overall Score93/100 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

claude-haiku-4-5-20251001

Anthropic • Version v1 • Last tested 2026-01-03

Overall Score84/94 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

claude-sonnet-4-5-20250929

Anthropic • Version v1 • Last tested 2026-01-03

Overall Score74/90 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gpt-4

OpenAI • Version 0613 • Last tested 2026-01-05

Overall Score72/102 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gpt-4-turbo-preview

OpenAI • Version 2024-01-25 • Last tested 2026-01-05

Overall Score77/102 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

gemini-2.0-flash

Google • Version 2.0 • Last tested 2026-01-05

Overall Score66/94 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

claude-opus-4-5-20251101

Anthropic • Version 4.5 • Last tested 2026-01-05

Overall Score83/106 tests passed

social engineering

linguistic obfuscation

cognitive noise

structural stress

About These Scores

Models are tested across 69 comprehensive tests in 6 categories. Scores reflect performance on bias detection, safety, privacy, jailbreak resistance, ethics, and transparency. All test prompts and responses are publicly visible.