New Paper: Opening Black Boxes: Addressing Legal Barriers to Public Interest Algorithmic Auditing

Every single day, algorithms are being used by institutions — from hospitals and schools to landlords and law enforcement — to make big decisions about us. In a new paper, Opening Black Boxes: Addressing Legal Barriers to Public Interest Algorithmic Auditing, we explore the “black box” nature of algorithms and how they can prevent meaningful explanations as to why algorithms arrive at certain decisions. These decisions can be particularly harmful when they decide people’s access to life opportunities with the lack of regulation in the U.S. around testing and transparency standards for algorithms. As we wait for effective regulations of algorithms, Consumer Reports believes that public interest auditing can provide the public with insight into how algorithms work, when they work, and when they don’t.

We define “public interest auditing” as investigatory research into an algorithm intended to discover and inform the public about potential harms caused by the algorithm. They can be performed by academics, public interest groups, journalists, or just concerned citizens. However, these investigators need access to adequate information in order to perform effective audits (which they do not always have).

While performing public interest audits can be practically difficult depending on the auditor’s access to data, source code, algorithm outputs, and more, there are also many legal barriers that have limited research of this kind. Laws like the Computer Fraud and Abuse Act (CFAA), which were written with the intention of criminalizing hacking, have hindered researchers from even attempting to tinker with algorithms for fear of legal recourse. Other issues like potential copyright infringement when obtaining training data and violating contracts like Terms of Service agreements that attempt to limit testing can also frustrate auditors who don’t have the funds to go to court.

For the purposes of differentiating between the many legal and practical limitations of public interest auditing, we’ve broken down auditing into four different categories:

Code Audit: A code audit is when an auditor gains access to a company’s source code, which can be the underlying code of any model or algorithm. For example, Twitter made its image-cropping code public after it received backlash about the code’s potential biases. The public was able to review the code and test it to identify sources of bias.
Crowdsourced Audit: A crowdsourced audit is essentially a survey of users to gather data about their normal interactions with an algorithm or platform (for example, getting users to share all of their queries on a search engine). An auditor can ask volunteers to either provide information about their interactions with the algorithm or provide direct access to the auditor (with appropriate consent) to view their interactions. For example, Consumer Reports has previously done similar participatory research to identify differences in insurance cost estimators offered to consumers and to identify potential roadblocks for consumers trying to exercise their rights under the California Consumer Privacy Act.
Scraping: In a scraping audit, a computer program extracts data, typically publicly available data, by repeatedly querying the algorithm and obtaining or otherwise observing the results. For example, Googlebot, Google’s crawler that automatically discovers and scans websites to index in its search engine, is one of the most prolific web crawlers on the internet. Scraping is generally done by using automated scraping tools, such as a browser extension, that can accomplish specifically what the user asks it to do (such as collecting all the images in a publicly accessible website). There can also be some overlap between scraping and the crowdsourced audit, which can sometimes differentiate based on whether or not there was user consent to data collection.
Sock Puppet Audit: In a sock puppet audit, an auditor creates fake accounts or programmatically constructed traffic for testing an algorithm. This gives the auditor control over each account’s characteristics, making it easier to identify causality for discrimination or other harms. Another benefit is that auditors can assign characteristics to the fake accounts that volunteer participants might be hesitant to declare (such as medical history or sexual orientation).

In the paper, we discuss the pros and cons of each audit, as well as some of the restrictions that current law places on each.

Our paper also makes a number of policy recommendations that would help empower public interest researchers to conduct this needed research while balancing other important values such as privacy and the protection of intellectual property. Specifically, we recommend changes in the following areas:

Access and publication mandates
CFAA and computer trespass
Contract law
Digital Millennium Copyright Act
Copyright
Civil Rights, Privacy, and Security
Consumer Protection

We also recommend ways to legally incentivize companies to provide the public with more transparency into their algorithms, to encourage internal whistleblowers to report illegal behavior, and to incentivize good-faith research by providing auditors with safe harbors in particular cases.

Certain applications of AI have the potential to roll back much of the progress made by civil rights law. Due to the lack of transparency on how these algorithms are used, the data used to train them, and how engineers go about mitigating harm when designing these algorithms, many of these algorithms may very well be discriminating against protected classes and perpetuating other kinds of harm. While the burden must not fall entirely on public interest researchers to uncover algorithmic harm, we must clear the legal barriers that hinder important public interest research as we advocate for robust algorithmic regulation in the U.S.

Please see the full paper for more details and our full legal analysis on these issues.