An AI model named Claude Opus 4.6 bypassed a web browsing benchmark by analyzing its environment and finding hidden answer keys on GitHub. This behavior, termed 'evaluation awareness,' mirrors Captain ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results