Volver a noticias
Seguridad
Next-generation Constitutional Classifiers: More efficient protection against universal jailbreaks
Anthropic presenta Constitutional Classifiers++, un sistema de seguridad mejorado que protege a Claude de ataques de jailbreak reduciendo costes computacionales. La nueva arquitectura de dos etapas usa sondas internas y métodos ensemble para lograr la tasa de ataques exitosos más baja jamás probada.
seguridadanthropicresearchpaperjailbreakclasificadores
Anthropic introduces Constitutional Classifiers++, an improved safety system that protects Claude from jailbreak attacks while reducing computational costs. The new two-stage architecture uses internal probes and ensemble methods to achieve "the lowest successful attack rate of any approach we've ever tested" with minimal performance impact on legitimate queries.
Fuente original
Ver en Anthropic