openbench provides standardized, reproducible benchmarking for LLMs across 30+ evaluation suites (and growing) spanning knowledge, math, reasoning, coding, science, reading comprehension, health, long ...
Two fake spellchecker packages on PyPI hid a Python RAT in dictionary files, activating malware on import in version 1.2.0.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results