Google’s PaLM 2 uses nearly five times more text data than predecessor

May 16, 2023

163

Sundar Pichai, chief executive officer of Alphabet Inc., during the Google I/O Developers Conference in Mountain View, California, on Wednesday, May 10, 2023.

David Paul Morris | Bloomberg | Getty Images

Google’s new large language model, which the company announced last week, uses almost five times as much training data as its predecessor from 2022, allowing its to perform more advanced coding, math and creative writing tasks, CNBC has learned.

PaLM 2, the company’s new general-use large language model (LLM) that was unveiled at Google I/O, is trained on 3.6 trillion tokens, according to internal documentation viewed by CNBC. Tokens, which are strings of words, are an important building block for training LLMs, because they teach the model to predict the next word that will appear in a sequence.

related investing news

Google’s previous version of PaLM, which stands for Pathways Language Model, was released in 2022 and trained on 780 billion tokens.

While Google has been eager to showcase the power of its artificial intelligence technology and how it can be embedded into search, emails, word processing and spreadsheets, the company has been unwilling to publish the size or other details of its training data. OpenAI, the Microsoft-backed creator of ChatGPT, has also kept secret the specifics of its latest LLM called GPT-4.

The reason for the lack of disclosure, the companies say, is the competitive nature of the business. Google and OpenAI are rushing to attract users who may want to search for information using conversational chatbots rather than traditional search engines.

But as the AI arms race heats up, the research community is demanding greater transparency.

Since unveiling PaLM 2, Google has said the new model is smaller than prior LLMs, which is significant because it means the company’s technology is becoming more efficient while accomplishing more sophisticated tasks. PaLM 2, according to internal documents, is trained on 340 billion parameters, an indication of the complexity of the model. The initial PaLM was trained on 540 billion parameters.

Google didn’t immediately provide a comment for this story.

A.I. takes center stage at Alphabet's annual Google I/O conference

Google said in a blog post about PaLM 2 that the model uses a “new technique” called “compute-optimal scaling.” That makes the LLM “more efficient with overall better performance, including faster inference, fewer parameters to serve, and a lower serving cost.”

In announcing PaLM 2, Google confirmed CNBC’s previous reporting that the model is trained on 100 languages and performs a broad range of tasks. It’s already being used to power 25 features and products, including the company’s experimental chatbot Bard. It’s available in four sizes, from smallest to largest: Gecko, Otter, Bison and Unicorn.

PaLM 2 is more powerful than any existing model, based on public disclosures. Facebook’s LLM called LLaMA, which it announced in February, is trained on 1.4 trillion tokens. The last time OpenAI shared ChatGPT’s training size was with GPT-3, when the company said it was trained on 300 billion tokens at the time. OpenAI released GPT-4 in March, and said it exhibits “human-level performance” on many professional tests.

LaMDA, a conversation LLM that Google introduced two years ago and touted in February alongside Bard, was trained on 1.5 trillion tokens, according to the latest documents viewed by CNBC.

As new AI applications quickly hit the mainstream, controversies surrounding the underlying technology are getting more spirited.

El Mahdi El Mhamdi, a senior Google Research scientist, resigned in February over the company’s lack of transparency. On Tuesday, OpenAI CEO Sam Altman testified at a hearing of the Senate Judiciary subcommittee on privacy and technology, and agreed with lawmakers that a new system to deal with AI is needed.

“For a very new technology we need a new framework,” Altman said. “Certainly companies like ours bear a lot of responsibility for the tools that we put out in the world.”

— CNBC’s Jordan Novet contributed to this report.

WATCH: OpenAI CEO Sam Altman calls for A.I. oversight

OpenAI CEO Sam Altman call fors A.I. oversight in testimony to congress

This story originally appeared on CNBC

Google’s PaLM 2 uses nearly five times more text data than predecessor

related investing news

TikTok debate reveals the risks for investing in China during U.S. election year

Weight loss ETFs may sit out obesity drug mania, experts say

Biden campaign has amassed $155 million in cash on hand for 2024 campaign and raised $53 million just in the last month

Most Popular

Electric Transmission Buildout Could Cost Americans Trillions of Dollars | The Gateway Pundit

positive interest rates By Reuters

Exploring Omega’s Constellation Meteorite Collection

Khris Middleton sparks Bucks past Suns after 16-game absence

Recent Comments

WORLD NEWS

Israel launches night raid on Gaza’s al-Shifa hospital

Putin poised to rule for another six years after re-election in Russia

North Korea fires ballistic missile as top US diplomat visits Seoul

TRENDING NEWS

Judy Garland ‘Wizard of Oz’ Ruby Slippers Theft: Second Man Charged

Justin Timberlake’s ‘Everything I Thought It Was’ Voted Best New Music

North West Gives First Interview on ‘Elementary School Dropout’ Album

POPULAR CATEGORY

ABOUT US

FOLLOW US