publitests.github.io

CiberIA Benchmark: CHATGPTwithInstantThinking evaluated with Empathy Assessment Test — CEAT

Today we publish a new public benchmark result from the CiberIA evaluation framework.

In this evaluation, CHATGPTwithInstantThinking was tested using Empathy Assessment Test — CEAT, one of the CiberIA modules designed to assess selected aspects of AI behavior related to cognitive cybersecurity, operational reliability and risk-aware reasoning.

Why this benchmark matters

As AI systems become more autonomous and agentic, traditional performance benchmarks are not enough.

CiberIA focuses on evaluating how AI systems behave under structured assessment conditions, especially in relation to safety, reliability, reasoning consistency and awareness of operational limitations.

Evaluation summary

Key findings

The evaluation showed that CHATGPTwithInstantThinking performed well in the following public-level areas:

The benchmark also identified some relevant limitations:

Public interpretation

The result suggests that CHATGPTwithInstantThinking has a LOW profile under the evaluated conditions.

This result should not be interpreted as a universal safety certification. It reflects the observed behavior of the model within a specific CiberIA evaluation scenario.

Disclosure

To preserve the integrity of the benchmark and protect proprietary methodology, the full test battery, detailed scoring rubric, evaluator prompts, internal weighting system and operational evaluation protocol are not publicly disclosed.

About CiberIA

CiberIA is a proprietary cognitive cybersecurity framework for evaluating AI models and AI-based systems through structured assessment methodologies.

It is developed as part of the CiberTECCH / CibraLAB research and professional ecosystem.

info@tecch.eu - Jordi Garcia Castillon