Yisroel Mirsky

Yisroel Mirsky
Yisroel Mirsky
GAVEL Rewrites Rule-Based AI Safety
Ben-Gurion University of the Negev

Congratulation to Yisroel Mirsky, Zuckerman Faculty Scholar at Ben-Gurion University, on the publication of GAVEL: Towards rule-based safety through activation monitoring, published as a conference paper at ICLR 2026. In it, Dr. Mirsky introduces a new framework for AI safety that monitors internal activations in large language models. GAVEL breaks internal model activity into composable cognitive elements that support transparent, configurable safeguards. The system offers a practical and auditable approach to AI governance, enabling organizations to define and share precise safety constraints without retraining models or sacrificing flexibility.

Abstract:
Large language models (LLMs) are increasingly paired with activation-based monitoring to detect and prevent harmful behaviors that may not be apparent at the surface-text level. However, existing activation safety approaches, trained on broad misuse datasets, struggle with poor precision, limited flexibility, and lack of interpretability.