By Alex Engler
The regulation of general-purpose AI (GPAI) is currently being debated by the European Union’s legislative bodies as they work on the Artificial Intelligence Act (AIA). One proposed change from the Council of the EU (the Council) would take the unusual, and harmful, step of regulating open-source GPAI. While intended to enable the safer use of these tools, the proposal would create legal liability for open-source GPAI models, undermining their development. This could further concentrate power over the future of AI in large technology companies and prevent research that is critical to the public’s understanding of AI.
What is GPAI?
The Council’s approach is to define a subset of AI systems as general-purpose, then require GPAI developers to meet requirements for risk management, data governance, technical documentation, transparency instructions, as well as standards of accuracy and cybersecurity. The Council defines GPAI as AI that performs “generally applicable functions” and may be used in a “plurality of contexts,” but that definition is still quite vague. While there is no widely used definition of GPAI, the current generation of GPAI is characterized by the training of deep learning models on large datasets, using relatively intensive compute, to perform many or even hundreds of tasks. These tasks may include generating images, translating languages, moving a robotic arm, playing video games, or all the above.
The Council has reasons to consider regulating GPAI models. The capabilities of these models are increasing quickly, and as a result they are being used in new applications, such as in writing assistants or photo alteration tools. There are also concerns about their use for generating disinformation and deepfakes, although this is less common.
The Council also appears concerned about the opaqueness of these models—training deep learning models on enormous datasets has led to more complex and difficult to understand behavior. Further, some companies are making GPAI only available through application programming interfaces, or APIs. This means users can only send data to the GPAI system and then get a response—they cannot directly interrogate or evaluate the model, leading to real challenges in developing downstream AI systems that would meet the AIA requirements. These are some of the reasons why the Council is considering requirements on GPAI models.
Open-source GPAI contributes to responsible GPAI development
While the goals of the Council’s approach to GPAI are understandable, the explicit inclusion of open-source GPAI undermines the Council’s ambitions. Open-source GPAI models are freely available for use by anyone, rather than being sold or otherwise commercialized. The proposed AIA draft will create legal liabilities, and thus a chilling effect, on open-source GPAI development. Open-source GPAI projects play two key roles in the future of GPAI: first, they disseminate power over the direction of AI away from well-resourced technology companies to a more diverse group of stakeholders. Second, they enable critical research, and thus public knowledge, on the function and limitations of GPAI models.
Very few institutions have the resources to train cutting-edge GPAI models and it is reasonable to estimate an individual GPAI model might cost many millions of dollars to develop, although each additional model an institution creates should cost much less. While some major technology companies open-source their models, such as Google’s BERT or Open AI’s GPT-2, the corporate incentives to release these models will diminish over time as they become more commercialized.
There are already very few open-source models from non-profit initiatives, leaving the field dependent on large technology companies. The Allen Institute for AI released ELMo in 2019, but the organization announced earlier in July that they may be refocusing away from developing language models. Since mid-2020, a collaborative group of researchers called EleutherAI managed to build open-source versions of large language models and scientific AI models. Most promising is the recent release of Bloom, a large language model developed by a broad collaboration of over 900 open science researchers and organized by the company HuggingFace. These efforts enable a far more diverse set of stakeholders to the future of GPAI, perhaps best exemplified by Bloom’s support of 46 human languages. Notably, Bloom was developed using a French government supercomputer, making it more exposed to the new regulations.
Beyond shaping the broad direction of GPAI research, the specific knowledge from open-source GPAI models contributes dramatically to the public interest. In a prior Brookings paper, I analyzed how open-source AI software speeds AI adoption, enables more fair and trustworthy AI, and advances the sciences that use AI—this is largely true for GPAI as well.
Without open-source GPAI, the public will know less, and large technology companies will have more influence over the design and execution of these models.
Further, the public availability of GPAI models helps identify problems and advance solutions in the societal interest. For instance, open-source large language models have shown how bias manifests in the model’s associations with specific words and demonstrate how they might be intentionally manipulated. Other papers use open-source GPAI models to compare their reliability in generating code, or build new benchmarks to gauge their understanding of language, or measure the carbon cost of AI development. Especially as GPAI models become more common in impactful applications such as search engines and newsfeeds, as well as use in factories or public utilities, understanding their limitations will be paramount.
This research not only leads to scientific advances, but also more appropriate criticism of their use by large tech companies. For instance, understanding how GPAI models work generally can aid crowdsourced algorithmic audits, where groups of individuals collaborate to test the function of a corporate algorithmic system from the outside. A group of content creators recently used this approach to demonstrate that YouTube was unfairly demonetizing LGBTQ content.
Allowing for more open-source GPAI provides more transparency in their development. Without open-source GPAI, the public will know less, and large technology companies will have more influence over the design and execution of these models. Notably, researchers at these companies do not have an entirely free hand—recall that criticisms of Google’s large language models were at the center of the conflict resulting in the termination of one of the companies star researchers, Dr. Timnit Gebru.
Further, by disincentivizing open-source GPAI, there could be a greater dependence on the corporate GPAI models that are hidden behind APIs. Since APIs restrict how a user can interact with a GPAI model, even a well-documented GPAI model that is only available through an API may be much harder to use safely than an open-source GPAI model.
Regulate risky and harmful applications, not open-source AI models
On net, open-source AI models deliver tremendous societal value, but the Council’s treatment of GPAI (open-source and otherwise) is also a noteworthy departure from the AIA’s broader perspective, referred to as its ‘risk-based’ approach. In the original European Commission proposal, regulatory requirements were applied only to certain risky applications of AI (such as in hiring, facial recognition, or chatbots), rather than the existence of a model at all. So, GPAI models would have been exempt until they were used for an application covered by the risk-based approach.
The Council’s draft of the AIA includes two exemptions that circumstantially apply to open-source GPAI models, but both have serious problems. The first exemption excludes all AI models that are only used for research and development from the entirety of the AIA. Yet open-source developers are most motivated by the idea of building things that people use, meaning this restriction decreases the incentive to contribute to open-source AI. The second exemption allows GPAI models to be exempted if its developers ban and manage to prevent misuse of the model. However, it is completely impossible for open-source developers to realistically monitor for and prevent misuse once they release a model. These exemptions will not sufficiently relieve open-source AI developers of regulatory responsibilities or legal liability.
As a result, open-source developers would be right to be concerned about how various EU member state regulators interpret the AIA. Further, it is not hard to imagine that, following a disastrous outcome of some application of a GPAI model, the company responsible attempts to deflect blame and legal responsibility by suing the open-source developers on which they built their work. These two sources of potential liability would create a significant incentive not to release OSS GPAI models, or possibly any software that contains a GPAI model.
In the end, the Council’s attempt to regulate open-source could create a convoluted set of requirements that endangers open-source AI contributors, likely without improving use of GPAI. Open-source AI models deliver tremendous societal value by challenging the domination of GPAI by large technology companies and enabling public knowledge about the function of AI. The European Council’s former approach—exempting open-source AI until it is used for a high-risk application— would lead to far better outcomes for the future of AI.
Google is a general, unrestricted donor to the Brookings Institution. The findings, interpretations, and conclusions posted in this piece are solely those of the author and not influenced by any donation.