When OpenAI revealed the details of its competent AI language model GPT-4, which powers ChatGPT, it was a comprehensive document spanning 100 pages. Yet, it omitted crucial information about how the model was constructed and operated.
“In the quest for advancing artificial intelligence, are we losing the spirit of openness and collaboration that once marked this scientific discipline?”
This omission was not an error but a deliberate choice. Major corporations such as OpenAI prefer to keep the mechanisms of their most valuable algorithms under wraps. This is partly due to concerns about the potential misuse of the technology but equally due to the competitive advantage these algorithms provide.
The Growing Secrecy in AI Research
A recent study by researchers at Stanford University highlights the depth of secrecy surrounding cutting-edge AI systems like GPT-4. The study suggests that we are on the brink of a fundamental shift in pursuing AI research. This change could undermine the field’s scientific progress, lessen accountability, and reduce reliability and safety.
The Stanford team evaluated ten different AI systems, primarily large language models, including commercial models like GPT-4 from OpenAI, PaLM 2 from Google, and Titan Text from Amazon, along with models offered by startups. They also examined open-source AI models that can be downloaded for free, such as the image-generation model Stable Diffusion 2 and Llama 2, which Meta released.
The Transparency Index
The researchers rated the transparency of these models on 13 different criteria, including the developer’s openness about the data used to train the model, the hardware employed, the software frameworks used, and the project’s energy consumption. Surprisingly, no model scored more than 54 percent on their transparency scale across all these criteria. Amazon’s Titan Text was judged the least transparent, while Meta’s Llama 2 was deemed the most open.
The Implications of AI Secrecy
As AI becomes increasingly influential, the study’s findings indicate it is also becoming more opaque. This starkly contrasts the last significant AI boom when transparency fueled substantial technological advancements, including speech and image recognition.
The study further suggests that models can be secretive for competitive reasons. A range of leading models scored relatively high on different transparency measures, meaning they could become more open without losing to rivals.
AI: A Scientific Discipline or a Profit-Driven One?
The increasing secrecy could risk transforming AI from a scientific discipline into a profit-focused one. Jesse Dodge, a research scientist at the Allen Institute for AI (AI2), warns about the current trend: “The most influential players building generative AI systems today are increasingly closed, failing to share key details of their data and processes.”
AI2 aims to create a more transparent AI language model, OLMo, trained using data sourced from the web, academic publications, code, books, and encyclopedias. This data set, called Dolma, has been released under AI2’s ImpACT license. Once OLMo is ready, AI2 will share the working AI system and its underlying code, enabling others to build upon the project.
Transparency: A Path Towards Progress
Given AI models’ widespread deployment and potential risks, more openness could prove beneficial. As Dodge rightly points out, advancing science requires reproducibility. “Without access to these crucial building blocks of model creation, we will remain in a ‘closed,’ stagnating, and proprietary situation.”
The increasing secrecy in AI research raises critical questions about the future of this vital scientific field. As we continue to push the boundaries of AI, it’s crucial that we also uphold the principles of transparency and collaboration that have fueled its progress thus far.