Institutional Mechanisms (‘IM’)

The report, Toward Trustworthy AI development: Mechanisms for Supporting Verifiable Claims attempts to break down the mechanisms which aim to achieve trustworthy AI. The previous article addressed the need to support verifiable claims in AI. This article will discuss the first of three mechanisms outlined in the report which is Institutional Mechanisms (‘IM’) and address the four recommendations in the report.

IM’s are processes that form the motivations of individuals involved in AI development and ensure their behaviour is transparent or otherwise able to be held accountable. One framing function of IM’s is understanding organisational goals and values. Whilst this sounds like a reasonably simple recommendation, it is the first step along the path of evaluating AI claims and thus working towards trustworthy AI. A second function of IM’s is increasing transparency within an organisation. This allows for a greater ability to create norms, standards and regulations to be in place and contextualised to the specific body.

Thirdly, by incentivising organisations to act responsibly, evidence of such incentives may be provided regarding how an actor will behave in the future. This may assist in the credibility of claims. Finally, IM’s can foster exchange of information between developers to provide a collective knowledge base where lessons learned and norms established can be shared. This supports the public interest of interpersonal understanding to achieve global trustworthy AI.

The four recommendations below have been taken directly from the report and are possible solutions to identified problems within the above IM’s.

Third Party Auditing

Problem: The process of AI development is often opaque to those outside a given organization, and various barriers make it challenging for third parties to verify the claims being made by a developer. As a result, claims about system attributes may not be easily verified.

It is clear that AI developers are apprehensive about full transparency when it comes to privacy of information and data. However, third party auditors can be given secured access to sensitive information to assess its trustworthiness.

Recommendation: A coalition of stakeholders should create a task force to research options for conducting and funding third party auditing of AI systems.

Red Team Exercises

Problem: It is difficult for AI developers to address the “unknown unknowns” associated with AI systems, including limitations and risks that might be exploited by malicious actors. Further, existing red teaming approaches are insufficient for addressing these concerns in the AI context.

There needs to be procedures in place to address safety and security risks. Red team exercises assist organisations to identify vulnerabilities in their systems. This is a structured plan organised by dedicated teams to emulate attackers and improve security systems.

Recommendation: Organisations developing AI should run red teaming exercises to explore risks associated with systems they develop, and should share best practices and tools for doing so.

Bias and Safety Bounties

Problem: There is too little incentive, and no formal process, for individuals unaffiliated with a particular AI developer to seek out and report problems of AI bias and safety. As a result, broad-based scrutiny of AI systems for these properties is relatively rare.

“Bug Bounty” programs are a way to ensure bugs identified within AI are brought to light. This eliminates the issue of human error. This increases inspection of AI systems and the likelihood of claims being verified or disproved.

Recommendation: AI developers should pilot bias and safety bounties for AI systems to strengthen incentives and processes for broad-based scrutiny of AI systems.

Sharing of AI Incidents.

Problem: Claims about AI systems can be scrutinized more effectively if there is common knowledge of the potential risks of such systems. However, cases of desired or unexpected behaviour by AI systems are infrequently shared since it is costly to do unilaterally.

By publishing case studies on incidents regarding AI, future incidents can be avoided. This can essentially improve the credibility of AI claims and begin to eliminate public mistrust of the technology.

Recommendation: AI developers should share more information about AI incidents, including through collaborative channels.

As discussed, this section of the report focuses on how institutions can employ systems to improve the credibility of AI in a ground up approach. The next post will address the second mechanism of verifiable claims relating to software.