# AI Incidents, Audits, and the Limits of Benchmarks ![rw-book-cover](https://share.snipd.com/share-image/resize?url=https%3A%2F%2Fimg.transistorcdn.com%2FWMlp2ug34XB6LDJ3-vnzti_-_y144LUlFW0Xzzn3fss%2Frs%3Afill%3A0%3A0%3A1%2Fw%3A1400%2Fh%3A1400%2Fq%3A60%2Fmb%3A500000%2FaHR0cHM6Ly9pbWct%2FdXBsb2FkLXByb2R1%2FY3Rpb24udHJhbnNp%2Fc3Rvci5mbS8wMTZi%2FZWJmNWIwNDdmYTcw%2FNGJjMTExZjNjZmYy%2FM2ZjNS5wbmc.jpg&width=512&height=512) URL:: https://share.snipd.com/episode/036a7ea7-a435-4e59-8413-b249f4a8502c Author:: Practical AI ![rw-book-cover](https://share.snipd.com/share-image/resize?url=https%3A%2F%2Fimg.transistorcdn.com%2FWMlp2ug34XB6LDJ3-vnzti_-_y144LUlFW0Xzzn3fss%2Frs%3Afill%3A0%3A0%3A1%2Fw%3A1400%2Fh%3A1400%2Fq%3A60%2Fmb%3A500000%2FaHR0cHM6Ly9pbWct%2FdXBsb2FkLXByb2R1%2FY3Rpb24udHJhbnNp%2Fc3Rvci5mbS8wMTZi%2FZWJmNWIwNDdmYTcw%2FNGJjMTExZjNjZmYy%2FM2ZjNS5wbmc.jpg&width=512&height=512) ## AI-Generated Summary None ## Highlights > **Verifying General Purpose Models Breaks Traditional Safety Frames** > - Frontier models are hard to assess because they're general-purpose, breaking safety processes that assume a specific operating context. > - You can't verify safety across all circumstances, so we must redefine how to encapsulate and verify assurance for customers. > Transcript: > Sean McGregor > A fundamental problem that we have, particularly at the frontier model level with things like OpenAI, Anthropic, Google's Gemini, and so forth, is it's very difficult to even know How safe your systems are because they're general purpose systems. This basically broke the safety frame. All the safety processes that we have are built around starting presumption of there's a specific context it's operating in, and you reason about its safety within that context. Well, if your context is just wildcard, star, everything, where does that leave you? Do you need to verify across all circumstances that it's going to be safe? Because the answer is going to be no. So how do you approach the task of verifying claims? ([TimeĀ 0:14:20](https://share.snipd.com/snip/a0fd342f-ade6-4492-81f8-6de3a054f537))