본문 바로가기
일상Lifestyle/책Book

The world that Bert built - 2022-6-11 The Economist

by Retireconomist 2022. 6. 18.

BERLIN AND SAN FRANCISCO “Foundation models” are greatly increasing the potential of artificial intelligence 

The “good computer” which Graph core, a British chip designer, intends to build over the next few years might seem to be suffering from a ludicrous case of nomi nal understatement. Its design calls for it to carry out 1019calculations per second. If your laptop can do 100bn calculations a second—which is fair for an average lap top—then the Good computer will be 100m times faster. That makes it ten times faster than Frontier, a behemoth at America’s Oak Ridge National Laboratory which came top of the most recent “Top500” list of power ful supercomputers and cost $600m. Its fourpetabyte memory will hold the equiv alent of 2trn pages of printed text, or a pile of A4 paper high enough to reach the Moon. “Good” hardly seems to cut it. But the word is not being used as a qual itative assessment: it is honouring an in tellectual heritage. The computer is named after Jack Good, who worked with Alan Tu ring as a codebreaker during the second world war and followed him into computer science. In 1965 Good wrote an influential, if offthewall, article about what the field could lead to: “Speculations concerning the first ultraintelligent machine”. Graph core wants its Good computer to be that ultraintelligent machine, or at least to be a big step in its direction. That means building and running artificial intelligence (ai) models with an eye watering number of “parameters”—coefficients applied to different calculations within the program. Four years ago the 110m parameters boasted by a gamechang ing model called bertmade it a big model. Today’s most advanced aiprograms are 10,000 times larger, with over a trillion pa rameters. The Good computer’s incredibly ambitious specifications are driven by the desire to run programs with something like 500trn parameters. One of the remarkable things about this incredible growth is that, until it started, there was a widespread belief that adding parameters to models was reaching a point of diminishing returns. Experience with models like bertshowed that the reverse was true. As you make such models larger by feeding them more data and increasing the number of parameters they become better and better. “It was flabbergasting,” says Oren Etzioni, who runs the Allen In stitute for ai, a research outfit. The new models far outperformed older machinelearning models on tasks such as suggesting the next words in an email or naming things which are present in an im age, as well as on more recondite ones like crafting poetry. Theverse created by the second iteration of Wu Dao—“Enlighten ment”—a trillionparameter model built at the Beijing Academy of Artificial Intelli gence, is said to be excellent. They also exhibited new capabilities their creators had not expected. These do not always sound impressive. Doing arith metic, for example, seems trivial; 50year old pocket calculators could do it. But those calculators were specifically de signed to that end. For the ability to say what the sum of 17 and 83 is to arise as an unlookedfor sideeffect of simply analys ing patterns in text is remarkable. Other emerging properties border on the uncanny. It is hard to read some of the accounts of Economist covers made using Microsoft’s Florence model and gpt3, a model made by Openai, without the feel ing that they are generated by something with genuine understanding of the world (see panel on next page). Texttoimage processes are also im pressive. The illustration at the top of this article was produced by using the article’s headline and rubric as a prompt for an ai service called Midjourney. The illustration on its third page is what the model made out of “Speculations concerning the first ultraintelligent machine”. Less abstract nouns give clearer representations; a page further on you will see “A woman sitting down with a cat on her lap”. Putting an art ist’s name in the prompt produces an im age with traits the model expects in images associated with that word. The effects dif ferent artist’s names have can be seen in the online version of this story. Emergent properties are linked to an BERLIN AND SAN FRANCISCO “Foundation models” are greatly increasing the potential of artificial intelligence The world that Bert built 01222The Economist June 11th 2022Briefing Artificial intelligence other highly promising feature: flexibility. Earlier generations of aisystems were good for only one purpose, often a pretty specific one. The new models can be reas signed from one type of problem to anoth er with relative ease by means of fine tun ing. It is a measure of the importance of this trait that, within the industry, they are often called “foundation models”. This ability to base a range of different tools on a single model is changing not just what aican do but also how aiworks as a business. “aimodels used to be very spec ulative and artisanal, but now they have become predictable to develop,” explains Jack Clark, a cofounder of Anthropic, an ai startup, and author of a widely read news letter. “aiis moving into its industrial age.” The analogy suggests potentially huge economic impacts. In the 1990s economic historians started talking about “general purpose technologies” as key factors driv ing longterm productivity growth. Key at tributes of these gpts were held to include rapid improvement in the core technology, broad applicability across sectors and spillover—the stimulation of new innova tions in associated products, services and business practices. Think printing presses, steam engines and electric motors. The new models’ achievements have made ai look a lot more like a gptthan it used to. Mr Etzioni estimates that more than 80% of airesearch is now focused on foun dation models—which is the same as the share of his time that Kevin Scott, Micro soft’s chief technology officer, says he de votes to them. His company has a stable of such models, as do its major rivals, Meta and Alphabet, the parents of Facebook and Google. Tesla is building a huge model to further its goal of selfdriving cars. Start ups are piling in too. Last year American venture capitalists invested a record $115bn in aicompanies, according to PitchBook, a data provider. Wu Daoshows that China is making the field a national priority. Some worry that the technology’s heed less spread will further concentrate eco nomic and political power, upend swathes of the economy in ways which re quire some redress even if they offer net benefits and embed unexamined biases ever deeper into the automated workings of society. There are also perennial worries about models “going rogue” in some way as they get larger and larger. “We’re building a supercar before we have invented the steering wheel,” warns Ian Hogarth, a Brit ish entrepreneur and coauthor of the “State of ai”, a widely read annual report. To understand why foundation models represent a “phase change in ai”, in the words of FeiFei Li, the codirector of Stan ford University’s Institute for HumanCen tred ai, it helps to get a sense of how they differ from what went before. All modern machinelearning models are based on “neural networks”—program ming which mimics the ways in which brain cells interact with each other. Their parameters describe the weights of the connections between these virtual neu rons, weights the models develop through trial and error as they are “trained” to re spond to specific inputs with the sort of outputs their designers want. Net benefits For decades neural nets were interesting in principle but not much use in practice. The aibreakthrough of the late 2000s/early 2010s came about because computers had become powerful enough to run large ones and the internet provided the huge amounts of training data such networks required. Pictures labelled as containing cats being used to train a model to recog nise the animals was a canonical example. The systems created in this way could do things that no programs had ever managed before, such as provide rough translations of text, reliably interpret spoken com mands and recognise the same face when seen in different pictures. Part of what allowed the field to move beyond these already impressive achieve ments was, again, more processing power. Machine learning mostly uses chips called “graphics processing units” (gpus) devel oped for video games by such firms as Nvi dia, not just because their processing pow er is cheap but also because their ability to run lots of calculations in parallel makes them very well suited to neural nets. Over the 2010s the performance of gpus im proved at an impressive rate. The conceptual breakthrough needed to make full use of this power came about in 2017. In a paper entitled “Attention is all you need” researchers at Google and the University of Toronto described the novel software architecture to be used by Goo gle’sbert.They had thrown away all the mechanisms which worked on input data sequentially, mechanisms researchers had previously seen as essential; instead they just used a mechanism that looked at things all at once. This new approach meant that programs could “pay attention” to patterns they had learned were salient in a field of text, rather than having to work through it word by word. Such models are trained using a tech nique called selfsupervised learning, rather than with prelabelled data sets. As they burrow through piles of text they hide specific words from themselves and then guess, on the basis of the surrounding text, what the hidden word should be. After a few billion guesscompareimproveguess cycles this MadLibs approach gives new statistical power to an adage coined by J.R. Frith, a 20thcentury linguist: “You shall know a word by the company it keeps.” It has since turned out that all these clever techniques can be applied to se quential data other than language, includ ing pictures, videos and even large molec ular data sets. Instead of guessing the next combination of letters, graphical models such as Openai’s dall-epredict the next cluster of pixels. The discovery that these models work better the bigger they get turned an excit ing new approach into a breakthrough. The revelation came with the release of Open ai’s bertlike gpt3 in 2020. Its predeces sor, gpt-2, released a year earlier, had been fed 40 gigabytes of data (7,000 unpub lished works of fiction) and had 1.5bn pa rameters. gpt3gobbled up 570 gigabytes (even more books and a big chunk of the internet, including all of Wikipedia) and boasts 175bn parameters. Its training re quired far more resources (see chart on next page). But it handily outperformed gpt2 on established tests and boasted skills for which its predecessor provided no precedent. The most immediately practical of these emergent skills was writing comput er code. Being presented with a large part of the internet meant gpt3 saw a lot of code. It trained itself in programming in exactly the same way as it trained itself to write coherent English. Two services based AnAIexplainsanEconomistcover: The image is of a cover of The Economist magazine. The image features a roller coaster in the clouds. The roller coaster is red and blue and has people on it. Above the roller coaster are the words “When the ride ends”. The roller coaster in the clouds is a metaphor for the economy. It’s a fun, exciting ride that everyone loves until it crashes down to earth, causing economic loss and recession. A market crash is the final nail in the coffin, leaving people reeling in its wake. 01223TheEconomistJune11th2022BriefingArtificialintelligence on gpt3, Codex and Copilot, now aim to turn programmers’ descriptions of what they want into the code which will do it. It doesn’t always work; our attempt to have Copilot program a webbased carousel of Economist covers to the strains of Wagner was a washout. But give it easily described, discrete and constrained tasks that can act as building blocks for grander schemes and things go better. Developers with ac cess to Copilot on GitHub, a Microsoft owned platform which hosts opensource programs, already use it to provide a third of their code when using the most impor tant programming languages. Bring on the stochastic parrots Scarcely a week now passes without one firm or another announcing a new model. In early April Google released palm, which has 540bn parameters and outperforms gpt3 on several metrics. It can also, re markably, explain jokes.Socalled multi modal models are proliferating too. In May DeepMind, a startup owned by Google, un veiled Gato, which, having been trained on an appropriate range of data, can play vid eo games and control a robotic arm as well as generating text. Meta, for its part, has begun to develop an even more ambitious “World Model” that will hoover up data such as facial movements and other bodily signals. The idea is to create an engine to power the firm’s future metaverse. This is all good news for the chipmak ers. The aiboom is one of the things that have made Nvidia the world’s most valu able designer of semiconductors, with a market value of $468bn. It is also great for startups turning the output of foundation models into pro ducts. Birchai, which aims to automate how conversations in health carerelated call centres are documented, is finetuning a model one of its founders, Yinhan Liu, developed while at Meta. Companies are using gpt3 to provide a variety of services. Viable uses it to help firms sift through customer feedback; Fable Studios creates interactive stories with it; on Elicit it helps people directly answer research questions based on academic papers. Openaicharges them between $0.0008 and $0.06 for about 750 words of output, depending on how fast they need the words and what quality they require. Foundation models can also be used to distil meaning from corporate data, such as logs of customer interactions or sensor readings from a shop floor, says Dario Gil, the head of ibm’s research division. Fer nando Lucini, who sets the ai agenda at Ac centure, another big corporatetech firm, predicts the rise of “industry foundation models”, which will know, say, the basics of banking or carmaking and make this available to paying customers through an interface called an api. The breadth of the enthusiasm helps make generalpurposetechnologylike ex pectations of impacts across the economy look plausible. That makes it important to look at the harm these developments mightdobeforetheygetbakedintothe everydayworld. “Onthedangersofstochasticparrots: Canlanguagemodelsbetoobig?”apaper publishedinMarch2021,providesagood overviewofconcerns;italsoledtooneof theauthors,TimnitGebru,losingherjobat Google.“Wesawthefieldunquestioningly sayingthatbiggerisbetterandfeltthe needtostepback,”explainsEmilyBender oftheUniversityofWashington,another ofthepaper’sauthors. Theirworkraisesimportantpoints. Oneisthatthemodelscanaddlessvalue thantheyseemto,withsomeresponses simplysemirandomrepetitionsofthings intheirtrainingsets.Anotheristhatsome inputs,suchasquestionswithnonsensi calpremises,trigger“hallucinations”rath erthanadmissionsofdefeat. Andthoughtheyhavenomonopolyon algorithmic bias, the amount of internet data they ingest can give foundation mod els misleading and unsavoury hangups. When given a prompt in which Muslims are doing something, gpt3 is much more likely to take the narrative in a violent di rection than it is if the prompt refers to ad herents of another faith. Terrible in any model. Worse in models aimed at becom ing foundations for lots of other things Avoid the Turing trap Modelmakers are developing various techniques to keep their ais from going toxic or off the rails, ranging from better curation of training data to “red teams” that try to make them misbehave. Many al so limit access to the full power of the models. Openaihas users rate outputs from gpt3 and then feeds those ratings back into the model, something called “re inforcement learning with human feed back”. Researchers at Stanford are working on a virtual scalpel, appropriately called mend, meant to remove “bad” neurons. Bias in the field’s incentives may be harder to handle. Most of those involved— technologists, executives and sometimes politicians—want more powerful models. They are seen as the path to academic ku dos, gobs of money or national prestige. Ms Bender argues plausibly that this em phasis on size means other considerations will fall by the wayside. The field is focused on standardised benchmark tests—there are hundreds, ranging from reading com prehension to object recognition—and ne glecting more qualitative assessments, as well as the technology’s social impact. Erik Brynjolfsson, an economist at Stanford, worries that an obsession with scale and personlike abilities will push societies into what he calls a “Turing trap”. He argues in a recent essay that this focus lends itself to the automation of human activities using brute computational force when alternative approaches could focus The blessings of scale Sources:“Computetrendsacrossthreeerasofmachinelearning”,byJ.Sevillaetal.,arXiv,2022;OurWorldinData AI training runs, estimated computing resources used, floating-point operations Selected systems, by type, log scale Theseus ADALINE Neocognitron NPLM GPT-2 GPT-3 DALL-E PaLM (540B) 60701950802000221090 BERT-Large LaMDA NetTalk DrawingLanguage OtherVision 104 108 1012 1016 102 1 1024 01224The Economist June 11th 2022Briefing Artificial intelligence on augmenting what people do. And as more people lose their jobs their ability to bargain for a fair share of the benefits of automation will be stymied, leaving wealth and power in fewer and fewer hands. “With that concentration comes the peril of being trapped in an equilibrium in which those without power have no way to improve their outcomes,” he writes. Some concentration is already evident: witness the roles played by Google and Mi crosoft both as developers of models and as owners of capacious clouds in which those and other models can run. No one can build a foundation model in a garage. Graphcore wants to sell Good computers for more than $100m. Somewhat selfserv ingly, Nvidia executives are already talking about models that will cost $1bn to train. Some companies continue to make their models opensource, and thus freely avail able; bertis one such, as is a 30bnparam eter version of a model from Meta. There is good research to be done at such scales. But it takes significant power to run even what counts as a small model today. The big ones can only really live in the cloud, which means researchers on the other side of their apis cannot see into their guts. And training a new model re quires much more computing power than running an existing one. “Academic institutions can no longer keep up,” warns Anthropic’s Mr Clark. Openai, founded as a nonprofit with the goal of ensuring that aideveloped in hu manfriendly ways, spawned a “capped profit” company in which others can in vest to raise the money it needed to keep working on big models (Microsoft has put in $1bn). Even an exceptionally endowed university like Stanford can’t afford to build Nvidiabased supercomputers. Its ai research institute is pushing for a govern mentfunded “National Research Cloud” to provide universities with computing power and data sets so that the field does not end up entirely dominated by the re search agendas of private companies. Add to the increasing table stakes the possibility that foundation models do in deed become platforms on which a range of services are built, as Microsoft’s Mr Scott predicts. The history of computing sug gests that the more users and developers gravitate towards a given platform—be it an operating system or a social network— the more attractive it becomes for other us ers and developers. Winners take, if not all, then most. Foundation and empire National interests may drive centralisa tion, too—up to a point. Experts say that China’s best foundation model is one which its Sesame Streetsmart creators at Baidu have contrived to name Enhanced Representation through kNowledge IntE gration, or ernie. But it is Wu Dao which is being treated as a national champion. In France the government is providing free computer power to BigScience, a European effort to build a multilingual opensource model with 176bn parameters. Is it that far fetched to imagine the development of a Modèle Republicainuniquely able to ex press all the subtleties of the French lan guage and culture? National security will also come into play. Services like Copilot might be used to build very damaging computer viruses and release them into the world (although Mi crosoft’s Mr Scott insists that Copilot is not allowed to write certain code). Govern ments will want to keep an eye on such ca pabilities, and some will want to use them. Foundation models which can think up strategies for corporate consultants may be able to do the same for generals; if they can create realistic video streams they can create misinformation; if they can create art they can create propaganda. “The spooks don’t want to depend on the private sector,” says Mr Clark. Just as big military powers insist on having their own means of launching satellites, so they will insist on having their own big brains. Unless, that is, the brains in question have other ideas. Practically no aiexperts think today’s models might actually be come sentient. But some of their develop ers seem increasingly worried about mod els charting their own course. “Covid has taught us that exponentials move very quickly,” says Connor Leahy, one of the leaders of Eleuther, an ambitious open source aiproject. “Imagine if someone at Google builds an aithat can build better ai’s, and then that better aibuilds an even better ai—and it can go really quickly.” Having a new form of intelligence on the planet might be dangerous even if it is never more than a tool and the people who controlled it were benign. The idea that there will always be uniquely human ways in which to be productive is attractive, but it cannot be proved. The coming decades could see further developments that re duce or eliminate the need for whole swathes of human activity, as Mr Brynjolfs son fears. But there are some signs that such models can expand the realm of the human, rather than restrict it. Take the work of Reeps One, a British composer whose real name is Harry Yeff. He has trained a model by feeding it hours of his drummachinelike beatbox vocali sations. The way that model reacts when it hears him in person allows what he calls a “conversation with the machine”. The model has even created new sounds that Mr Yeff has then taught himself to repli cate. “Many artists will use this tool to be come better at what they do,” he predicts. So might humble hacks. aibased tran scription tools have already made one par ticularly tiresome aspect of journalism far easier; could the same be true for others? To investigate, your correspondent asked a doctoral candidate at Stanford, Mina Lee, to finetune a gpt3based writing tool called “CoAuthor” using his most recent 100 articles for The Economistand a host of material on aifrom one of the university’s courses. He then consulted this EconoBot off and on while writing this article. The experience was enlightening. Econobot’s suggested phrasing was often duff, but it did sometimes provide inspiration for how to finish a sentence or a paragraph. EconoBot itself seems to like the idea. Appropriately prompted with the phrase “Foundation models are great for journal ists”, it had this to say:They take away the heavy lifting of figuring out what a story is about. But sometimes, a good story needs more than just a foundation model. It needs something to kick off the writing process, something that sparks the journalist's imagi nation and offers a clearpathtowards writ ing. The best models, then,arenot just predic tive but also inspirational.

댓글