Don’t expect major language models like the next GPT to be democratized

This article is part of our coverage of the latest in AI research

In early May, Meta released Open Pretrained Transformer (OPT-175B), a large language model (LLM) that can perform various tasks. Large language models have become one of the most popular areas of artificial intelligence research in recent years.

OPT-175B is the latest entrant in the LLM arms race triggered by OpenAIs GPT-3, a deep neural network with 175 billion parameters. GPT-3 showed that LLMs can perform many tasks without undergoing additional training and seeing only a few examples (zero- or few-shot learning† Microsoft later integrated GPT-3 into several of its products, demonstrating not only the scientific but also the commercial promises of LLMs.

Greetings humanoids

Sign up now for a weekly recap of our favorite AI stories

What makes OPT-175B unique is Meta’s commitment to ‘openness’, as the model’s name implies. Meta has made the model (with some caveats) available to the public. It has also released a ton of details about the training and development process. In a message published on the Meta AI blogthe company described the release of OPT-175B as “Democratizing Access to Large-Scale Language Models.”

The move from Meta to transparency is commendable. However, the competition for large language models has reached a point where it can no longer be democratized.

https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2021/12/large-language-models.jpg?resize=696%2C435&ssl=1

Meta’s release of OPT-175B has several key features. It contains both pre-trained models and the code needed to train and use the LLM. Pre-trained models are especially useful for organizations that don’t have the computing power to train the model (training neural networks is much more resource-intensive than running them). It will also help the massive carbon footprint caused by the computing power needed to train large neural networks.

Like GPT-3, OPT is available in different sizes ranging from 125 million to 175 billion parameters (models with more parameters have more learning power). At the time of writing, all models up to OPT-30B are available for download. The full 175 billion-parameter model will be made available to selected researchers and institutions who complete an application form.

According to the Meta AI blog, “To maintain integrity and prevent abuse, we are releasing our model under a non-commercial license to focus on research use cases. Access to the model is granted to academic researchers; those who are affiliated with organizations in government, civil society and academia; along with industrial research labs around the world.”

In addition to the models, Meta has released a full log with a detailed technical timeline of the development and training process of major language models. Published articles usually only contain information about the final model. The log provides valuable insights into “how much computing power was used to train OPT-175B and the human overhead required when the underlying infrastructure or the training process itself becomes unstable at scale,” Meta said.

https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2020/09/microsoft-openai-gpt-3-license.jpg?resize=696%2C464&ssl=1

In his blog post, Meta states that large language models are usually accessible through “paid APIs” and that limited access to LLMs “has limited researchers’ ability to understand how and why these large language models work, hindering progress in efforts to improve their robustness.” improve and mitigate known problems such as bias and toxicity.”

This is a shot at OpenAI (en by extension Microsoft), which released GPT-3 as a black-box API service instead of making the model’s weights and source code available to the public. One of OpenAI’s reasons for not disclosing GPT-3 was to control exploits and malicious application development.

Meta believes that by making the models available to a wider audience, it will be better able to study and prevent any damage they could cause.

Here’s how Meta describes the effort: “We hope OPT-175B will bring more voices to the frontier of great language model creation, help the community jointly design responsible release strategies, and add an unprecedented level of transparency and openness.” to the development of large language models in the field.”

It is worth noting, however, that “transparency and openness” is not the equivalent of “democratizing large language models”. The costs of training, configuring, and running large language models remain prohibitive and are likely to increase in the future.

According to Meta’s blog post, the researchers have managed to significantly reduce the cost of training large language models. The company says the model’s carbon footprint has been reduced to one-seventh from GPT-3. Experts I spoke to earlier estimated the training cost of GPT-3 at: up to $27.6 million

This means that OPT-175B will cost several more million dollars to train. Fortunately, the pre-trained model will eliminate the need to train the model, and Meta says it will provide the codebase used to train and deploy the full model “with just 16 NVIDIA V100 GPUs.” This is the equivalent of an Nvidia DGX-2, costing about $400,000, no small sum for a cash-limited research lab or individual researcher. (According to an paper giving more details about OPT-175B, Meta trained their own model with 992 80GB A100 GPUs, which significantly faster than the V100

The Meta AI log further confirms that training large language models is a very complicated task. OPT-175B’s timeline is full of server crashes, hardware failures, and other complications that require highly technical personnel. The researchers also had to restart the training process several times, adjust hyper parameters and change loss functions. All this adds extra costs that small labs can’t afford.

https://i0.wp.com/bdtechtalks.com/wp-content/uploads/2021/09/tech-giants-artificial-intelligence.jpg?resize=696%2C392&ssl=1

Language models such as OPT and GPT are based on the transformer architecture† One of the key features of transformers is their ability to process large sequential data (eg text) in parallel and at scale.

In recent years, researchers have shown that by adding more layers and parameters to transformer models, they can improve their performance on language tasks. Some researchers believe that achieving higher intelligence levels is just a scaling problem. Accordingly, money-rich research labs such as Meta AI, DeepMind (owned by Alphabet), and OpenAI (backed by Microsoft) are on their way to creating bigger and bigger neural networks

Last year, Microsoft and Nvidia created a 530 billion parameter language model called Megatron-Turing (MT-NLG). Last month, Google introduced the Pathways Language Model (PaLM), an LLM with 540 billion parameters. And there are rumors that OpenAI will release GPT-4 in the coming months.

However, larger neural networks also require greater financial and technical resources. And while larger language models will have new bells and whistles (and new failures), they will inevitably centralize power in the hands of a few wealthy corporations by making it even harder for smaller research labs and independent researchers to work on large language models.

On the commercial side, large tech companies will have an even greater advantage. Running large language models is very expensive and challenging. Companies like Google and Microsoft have dedicated servers and processors that allow them to run these models at scale and profitably. For smaller companies, the overhead of running their own version of an LLM like GPT-3 is prohibitive. Just as most businesses use cloud hosting services instead of setting up their own servers and data centers, turnkey systems such as the GPT-3 API will gain traction as large language models become more popular.

This, in turn, will further centralize AI in the hands of major tech companies. More AI research labs will have to partner with big technology to fund their research. And this gives big tech more power to determine the future directions of AI research (which will likely align with their financial interests). This can be at the expense of research areas that have no return on investment in the short term.

The bottom line is that, as we celebrate Meta’s move to bring transparency to LLMs, let’s not forget that the nature of large language models is undemocratic and in favor of the companies that publish them.

This article was originally written by Ben Dickson and published by Ben Dickson at: TechTalks, a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the bad side of technology, the dark implications of new technology, and what to watch out for. You can read the original article here