数量和质量评价模型_数量对于语言模型可以具有自己的质量
數量和質量評價模型
The recent advances in language modeling with GPT-3 got me thinking: at what point does a quantitative change in a machines language generation ability cross a boundary into a qualitative change in our assessment of its intelligence or creativity?
使用GPT-3進行語言建模的最新進展使我想到:在什么時候,機器語言生成能力的定量變化會越過邊界,而對我們的智力或創造力評估會發生質的變化嗎?
當沙堆遇見Eubulides (When a sand heap met Eubulides)
How many grains of sand can you take from a sand heap until it’s not a heap? Or more personally, how many hairs on your head can you afford to lose before you’re bald, or pounds before you’re thin? Maybe it’s fun to annoy someone by asking one of these Sorites Paradoxes, attributed to the Greek philosopher Eubulides, precisely because they arise when language is imprecise. They expose that words we commonly use without hesitation, like heap, bald, thin, or even intelligent and creative, where we think we know exactly what we mean, actually have boundaries that can be quite vague when you really start to dig into them.
您可以從沙堆中取出多少粒沙,直到不是沙堆? 或更個人而言,在禿頂之前,您可以承受掉多少根頭發,或者在變薄之前,可以承受多少磅的損失? 通過問這些歸因于希臘哲學家Eubulides的Sorites 悖論之一而使某人煩惱可能很有趣,這恰恰是因為它們是在語言不精確時出現的。 他們揭露了我們通常毫不猶豫地使用的詞,例如堆,禿頭,薄薄甚至是聰明而富有創造力的詞,在這些詞中,我們認為我們確切地知道了我們的意思,但實際上當您真正開始研究它們時,它們之間的界限可能就很模糊了。
You can think about what’s going on here as a quantitative change: in grains of sand, hair, or weight, leading to a qualitative change that ascribes a property to something, like being a heap, bald, or thin.
您可以考慮一下這里發生的定量變化:沙子,頭發或重量的顆粒,導致質的變化,使某種屬性歸于某種屬性,例如堆積,禿頭或稀薄。
Hegel developed an explicit relation between quality and quantity in Science of Logic:
黑格爾在邏輯科學中建立了質與量之間的明確關系:
[W]e have seen that the alterations of being in general are not only the transition of one magnitude into another, but a transition from quality into quantity and vice versa, a becoming-other which is an interruption of gradualness and the production of something qualitatively different from the reality which preceded it — Hegel
[W] e已經看到,一般性的改變不僅是從一個量級到另一個量級的過渡,而且是從質量到量級的過渡,反之亦然,成為另一個量級的過渡是對漸進性和某種事物的產生的干擾。與之前的現實在質量上有所不同-黑格爾
The idea was then taken further by Marx and Engels into the law of passage of quantitative changes into qualitative changes, and finally arrived in the most familiar and widely misattributed form you’ve likely heard:
馬克思和恩格斯進一步將這個想法帶入了從數量變化到質變的傳遞定律,并最終以您可能已經聽說過的最熟悉,分布最廣泛的形式出現:
Quantity has a quality of its own -Various
數量具有自己的質量-各種
While it’s not what any of them had in mind, at what point does a quantitative change in a machines language generation ability cross a boundary into a qualitative change in our assessment of its intelligence or creativity?
盡管并不是所有人都想到的,但是在什么時候機器語言生成能力的數量變化會越過邊界,變成我們對其智力或創造力評估時的質變?
語言模型和GPT-3 (Language Models and GPT-3)
The release of GPT-3 from OpenAI has shown that an incredibly wide variety of language generation applications — from writing fiction to poems to computer code — can be performed by a fairly typical language model scaled up and trained on the largest amount of data yet.
OpenAI的GPT-3 版本表明,可以通過相當典型的語言模式執行各種各樣的語言生成應用程序(從寫小說到詩歌到計算機代碼),這些語言模式 可以按比例擴展并接受最多的數據訓練。
Language models have been used in the NLP community for decades, becoming increasingly more complicated, and relying on more and more data. A language model is a technical term for a mathematical model of language that is produced by an algorithm that uses existing written text to calculate the probabilities of words appearing next to each other, specifically how likely the next word or sequence of words is from a previous sequence of words. After training the language model by computing these probabilities, the model can be used to generate new text: start with a word or phrase as a prompt, and continue calculating the most probable next word for as long as you want.
語言模型已在NLP社區中使用了數十年,變得越來越復雜,并且依賴于越來越多的數據。 語言模型是語言數學模型的技術術語,該數學模型由一種算法生成,該算法使用現有的書面文本來計算彼此相鄰出現的單詞的概率,特別是下一個單詞或單詞序列與上一個單詞的可能性單詞順序。 通過計算這些概率來訓練語言模型之后,該模型可用于生成新文本:以單詞或短語作為提示開始,并根據需要繼續計算最可能的下一個單詞。
When built well, they generate syntactically fluent language, although it used to be fairly easy to tell when text was generated from a model — it was clunky, repetitive, and lost coherence within at most a few sentences.
如果構建得當,它們可以生成語法上流利的語言,盡管過去通常很容易分辨何時從模型生成文本-它笨拙,重復且在幾句話之內就失去了連貫性。
The algorithm used to build GPT-3 is still only trained by predicting the next sequence of words, but it is doing so for a model with 175 billion parameters — several orders of magnitude more than most previous language models — and on a huge amount of data taken directly from the internet (i.e. produced by us); a very impressive engineering feat.
用于構建GPT-3的算法仍然只能通過預測下一個單詞序列來進行訓練,但是對于具有1750億個參數(比大多數以前的語言模型要高幾個數量級)的模型,它的確需要這樣做。直接從互聯網獲取的數據(即由我們產生); 一個非常令人印象深刻的工程壯舉。
流利,一次騙我 (Fluency, fool me once)
The most striking aspect of the language produced by GPT-3 is how fluent it is across a variety of genres, how well it stylistically adapts to the given prompt, and how long the coherence of the generated text lasts.
GPT-3產生的語言最引人注目的方面是它在各種類型中的流利程度,在樣式上對給定提示的適應程度以及所生成文本的連貫性持續了多長時間。
It’s natural to associate the fluency of language with how intelligent the process that generated the language must be. In other words, it’s hard to separate thinking up something to say from being able to say it well. What to say from how to say it. It’s a human bias that helps explain why we’re taken in by a smooth talker before realizing there’s little substance, or vice versa, assume a lack of cognitive capabilities when someone can’t express themselves.
將語言的流利程度與生成語言的過程的智能程度聯系起來是很自然的。 換句話說,很難將要說的話與能夠說的話分開。 從怎么說開始怎么說。 這是一種人為偏見,可以幫助解釋為什么在意識到幾乎沒有實質內容之前,我們會被一個平穩的談話者所接受,反之亦然,即當某人無法表達自己的能力時,我們會認為他們缺乏認知能力。
What to say starts by purposefully selecting some concept to represent in language. Whether the concept is an abstract idea in your mind or a spreadsheet table, it is a form of data, and you want to transform it into language as correctly and faithfully as possible. If you express your idea in language well enough to allow the reader to interpret what you’re saying correctly, your language has sufficient adequacy or accuracy.
要說什么首先要有目的地選擇一些用語言表示的概念。 無論該概念是您腦海中的抽象概念還是電子表格表格,它都是一種數據形式,并且您希望將其盡可能正確和忠實地轉換為語言。 如果您在語言不夠好,以讓讀者理解你在說什么正確表達你的想法,你的語言有足夠的準確性和充分 性。
How to say it comes back to the fluency, whether the language used is understandable, regardless of whatever it is you’re saying. You can write an exceptionally fluent essay on bees, but if you were trying to give someone a quinoa recipe, it’s completely inadequate. A process, whether human or machine, can generate fluent language describing Mars or Elon Musk, and it doesn’t have to have any connection to reality or truth to be comprehensible.
怎么說又回到流利度 ,無論您說什么,使用的語言是否都能理解。 您可以寫一篇關于蜜蜂的特別流利的文章,但是如果您想給某人做藜麥食譜,那是完全不夠的。 一個過程,無論是人類還是機器,都能產生流利的描述火星或埃隆·馬斯克的語言,并且不必與現實或真理有任何聯系即可理解。
Fluency without adequacy, that’s easy to imagine. Fluency is on the surface, it’s visible. It can be untethered from trying to represent anything specific and still come off fine.
沒有足夠的流利度,這很容易想象。 流利性在表面上,可見。 它可以不受限制地嘗試代表任何特定的東西,但仍然沒有問題。
What’s harder to imagine is adequacy without fluency. For me to assess the adequacy of what you’re saying, I need to know that you’re trying to give me a recipe, and not talk about bees. Or I need to trust that whoever (or whatever) wrote the facts about Mars I’m reading knew what they (or it) was talking about. In either case, I need to be able to create an interpretation of the concept and data you’re relaying through language. But in order for me to create an interpretation, you need to first be coherent enough.
很難想象的是,沒有流利性就足夠了。 為了讓我評估您在說什么的充分性,我需要知道您正在嘗試給我一個食譜,而不是談論蜜蜂。 或者我需要相信,無論誰(或任何人)寫了我正在閱讀的火星的事實,都知道他們(或它)在說什么。 無論哪種情況,我都需要能夠對您要通過語言中繼的概念和數據進行解釋。 但是,要讓我進行解釋,您首先需要足夠連貫。
Adequacy requires selecting something specific to represent, and being able to compare how well it’s represented. I think that’s why fluency is both easier to artificially manufacture and gives the impression of adequacy. Our cognitive bias is to default to truth. If language is fluent, we understand it; if we understand it, we create an interpretation of what is being said; if we create an interpretation, we assume it’s accurately representing the concept and data it set out to represent. Why else would someone take the time to write it, right? :)
充分性要求選擇要代表的特定內容,并能夠比較其代表程度。 我認為這就是為什么流暢性既易于人工制造,又給人以足夠的印象。 我們的認知偏見是違背真理 。 如果語言說得很流利,我們就能理解。 如果我們理解,我們將對所講內容進行解釋; 如果我們創建一個解釋,我們就假設它準確地表示了要表示的概念和數據。 為什么有人還會花時間寫它,對嗎? :)
也許我也是語言模特 (Maybe I’m a language model too)
When we write or speak, words usually come out of our mouth or our hands without any conscious effort of how they got there. We have an unconscious process for generating the next word, are we similar to a language model in finding the most probable next word from our prior experience with language? Is our ability to write not only fluently but adequately a matter of having several orders of magnitude more parameters in our brains than the current language models, and having seen lots and lots of text?
當我們書寫或講話時,通常是從我們的嘴巴或手中說出單詞,而沒有任何意識地努力知道它們是如何到達那里的。 我們有一個無意識的生成下一個單詞的過程,是否類似于語言模型,可以從我們以前的語言經驗中找到最可能的下一個單詞? 我們的書寫能力不僅流利而且足夠,是因為我們的大腦中的參數比當前的語言模型多了幾個數量級,并且看到了大量的文本?
Certainly the things we say are not always correct, i.e. what we say is not adequate to what we mean. Whether we think it is or not; people make mistakes. I misremember and make things up, how is that different from the language model is doing?
當然,我們所說的話并不總是正確的,即我們所說的話并不符合我們的意思。 我們是否認為; 人們會犯錯誤。 我記錯了并整理了一切,這與語言模型的作用有何不同?
適應能力就在這里 (Adaptability is where it’s at)
Going one step further, the most impressive part of GPT-3 is likely not the fluency of the language it generates, but the ease with which it can perform different tasks with only a few prompting examples. Most machine learning models are trained to perform a specific, discrete task, like predicting the sentiment of a restaurant review, or answering trivia questions, but GPT-3 has shown an impressive ability to perform many different kinds of language generation without being specifically trained to do so.
更進一步,GPT-3最令人印象深刻的部分可能不是它所生成的語言的流利程度,而是它僅需幾個提示示例就可以輕松地執行不同任務。 大多數機器學習模型都經過訓練以執行特定的,離散的任務,例如預測餐廳評論的情緒或回答瑣事問題,但是GPT-3顯示出令人印象深刻的能力,可以執行多種不同類型的語言,而無需經過專門的訓練即可這樣做。
Adaptability is a core human trait. We all build models of the world in our minds — of your house, your friends, yourself. You use the model of the world you’ve built from all your prior experience to go into novel situations and make reasonable decisions. Not only do you not forget how to brush your teeth just because the color, size, or shape of the toothbrush changed, but if you have the intent to brush your teeth and there’s no toothbrush around, you can create something that will act like a toothbrush from completely different materials.
適應能力是人類的一項核心特征。 我們都在我們的思維中建立了世界的模型-您的房子,您的朋友,您自己。 您使用從以前的經驗中建立的世界模型進入新穎的情況并做出合理的決定。 您不僅會忘記僅僅因為牙刷的顏色,大小或形狀發生了變化就如何刷牙,而且如果您有刷牙的意圖而又沒有牙刷,那么您可以創造出一種像牙刷一樣的功能。由完全不同的材料制成的牙刷。
Adaptability is closely tied to creativity, the ability to create something new and worthwhile. Adequacy is critical in a legal memo or biography, and it’s relatively easy to judge the adequacy by comparing these fact-based writings to some reality, but what about fiction, poetry, and other forms of creative writing? How useful or measurable is adequacy there? Is fluency sufficient for creativity?
適應能力與創造力,創造新事物和有價值的能力緊密相關。 在法律備忘錄或傳記中,適當性至關重要。通過將這些基于事實的作品與某種現實進行比較,判斷適當性相對容易,但是小說,詩歌和其他形式的創造性寫作呢? 那里的充足性有多有用或可衡量? 流利度足以創造力嗎?
The language produced by even the simplest language models from decades ago can be said to create something new, maybe that’s sufficient to say any such process is being creative, but that doesn’t seem like a satisfactory answer.
即使是幾十年前最簡單的語言模型所產生的語言,也可以說是創造了一些新事物,也許足以說明任何這樣的過程都具有創造力,但這似乎并不是令人滿意的答案。
If you didn’t know a word of French, but randomly picked words from a French dictionary until you filled 100 pages, and happened to produce a coherent work of fiction, were you being creative? Taken a bit further, if you have a monkey, a typewriter, and infinite time, eventually it will type out any book you can think of, but it’s unlikely you’d call that creative.
如果您不懂法語,而是從法語詞典中隨機選擇單詞,直到您填滿100頁并碰巧產生了連貫的小說作品,您是否有創造力? 再進一步,如果您有猴子,打字機和無限的時間 ,最終它將打出您能想到的任何書籍,但您不太可能會稱呼該創意。
Where do we draw the line? It seems like we need to look at the worthwhile aspect of creativity, but how do we measure whether a work of fiction is worthwhile? (that sounds awfully close to asking what the purpose of art is…)
我們在哪里畫線? 似乎我們需要研究創造力的重要方面,但是我們如何衡量小說作品是否值得? (聽起來很像在問藝術的目的是什么……)
打算救援? (Intent to the rescue?)
It seems like the adequacy and creativity questions of language models, including GPT-3, come down to introspection and intent. A typical flow in human conversation can be seen as four steps. First, you intentionally choose what to say (let’s leave free will out of this for now). You start with an intent: a concept or idea of what you want to say. Second, you choose words to transform that intent or concept into language. Third, the listener hears or reader reads the words. Fourth, they interpret the words into a concept in their mind.
似乎語言模型(包括GPT-3)的充分性和創造力問題歸結為內省和意圖。 人類對話中的典型流程可以看作是四個步驟。 首先,您有意選擇說些什么(暫時請假)。 您從意圖開始:您想說的概念或想法。 其次,您選擇單詞以將意圖或概念轉換為語言。 第三,聽者聽到或閱讀者閱讀單詞。 第四,他們在腦海中將單詞解釋為一個概念。
You use language for a reason: to transform something conceptual from one form into words. That concept can take the form of a sales report, where your words reference customers, transactions, dollars, profits and losses; or it can be a creative idea for a novel, where you imagine a character, and the words describe a person, their hair color, how they walk, their own thoughts and concepts (I know, meta).
您使用語言是有原因的:將某種概念從一種形式轉換為單詞。 這個概念可以采用銷售報告的形式,用您的話語來指代客戶,交易,美元,損益。 或者這可能是一部小說的創意,您可以想象一個人物,這些文字描述一個人,他們的頭發顏色,他們的行走方式,他們自己的思想和觀念(我知道,元)。
The point is, when you think of words, they represent something in the real world, they refer to objects, whether real or imagined. Words are connected to your other perceptions of the world, and the actions you can take.
關鍵是,當您想到單詞時,它們代表現實世界中的某些事物,它們指的是實物還是虛構的對象。 言語與您對世界的其他看法以及您可以采取的行動有關。
When GPT-3 produces sequences of characters, that’s all they are, even though we see them as meaning-carrying words. For GPT-3, the words it produces do not refer to any concept, intent it is trying to represent, or action it is trying to take. There is no concept behind the words. When it produces a poem about Elon Musk on Mars, it has no concept of who Elon Musk is or what Mars or a poem are; no connection to any objects.
當GPT-3生成字符序列時,就已經足夠了,即使我們將其視為帶有含義的詞也是如此。 對于GPT-3,它產生的詞語并不表示任何概念,意圖表示的意圖或試圖采取的行動。 這句話背后沒有任何概念。 當它寫一首關于火星上的埃隆·馬斯克的詩時,它并不知道埃隆·馬斯克是誰,火星或詩歌是什么。 沒有任何對象的連接。
Instead of the four steps above, when you read text produced by a language model like GPT-3 it’s different in a very important way. The language model doesn’t have its own intent. It’s not an agent acting in the world. A human has to start by prompting GPT-3 with the seed text. The language model is taking your concept, that you transformed into words, so you’re still doing the first two steps, and continues the second step by generating words that are the most probable to occur next in the sequence.
除了上面的四個步驟,當您閱讀由GPT-3之類的語言模型生成的文本時,它的區別也非常重要。 語言模型沒有自己的意圖。 它不是在世界上行動的代理商。 人類必須首先用種子文本提示GPT-3。 語言模型采用了您的概念,即您將其轉換為單詞,因此您仍在執行前兩個步驟,并通過生成在序列中最有可能出現的單詞來繼續第二步。
The human prompter seems more analogous to a teacher prompting an essay topic that the student (GPT-3) needs to write. We as humans are still reading and interpreting a meaning, because for us words actually have meaning and refer to objects, but those references were not intended by the model. The fact we can interpret them is a result of the fluency, not adequacy.
人工提示似乎更類似于教師提示學生(GPT-3)需要寫的論文題目。 作為人類,我們仍在閱讀和解釋含義,因為對我們而言,單詞實際上具有含義并指向對象,但是這些引用不是模型所要的。 我們可以解釋它們的事實是流利程度而非適當性的結果。
Even for creative writing, there’s a reason why someone wrote a poem or a novel, and one or more concepts they were trying to express. Maybe we need to separate out creativity into the process of introspection, the effort that goes into the proper translation of a concept into language, and the final linguistic expression.
即使是創造性的寫作,也有一個人為什么寫詩或小說以及他們試圖表達的一個或多個概念的原因。 也許我們需要將創造力分解為內省的過程,將概念正確翻譯成語言的努力以及最終的語言表達。
GPT-3 has certainly produced writing that is funny, sarcastic, or makes you think, so it would qualify for the third form of creativity. Since it has no understanding of the words, through no intent of it’s own is it trying to be funny, sarcastic, or make you think. Those are your interpretations, and could even be a result of GPT-3 using large sequence of words it has previously been given directly from people’s writings on the internet.
GPT-3肯定產生了有趣,諷刺或讓人想起的作品,因此符合第三種形式的創造力。 由于它對單詞不了解,因此它本身并不是故意的,它試圖變得有趣,諷刺或讓您思考。 這些是您的解釋,甚至可能是GPT-3使用大量單詞的結果,該單詞以前是直接從互聯網上的人們的著作中直接給出的。
Many of the examples of its writing are also cherry picked by humans. Maybe it would be unfair to do otherwise, after all, many human attempts at writing fail, but are we then using our human standards and judging, or are we choosing a biased sample produced by a small percentage of monkeys?
其寫作的許多例子也是人類采摘的櫻桃。 否則,這樣做也許是不公平的,畢竟,許多人類的寫作嘗試都失敗了,但是我們是在使用人類的標準并進行判斷,還是我們選擇了由一小部分猴子生產的偏見樣本?
The lack of intent to be funny or make you think can similarly be said for writing produced by people, so perhaps the effect on the reader is what matters. If some future language model can produce thousands of novels a day whose storylines and characters resonate with readers and sell, despite the model not having any intent to do so, maybe it will be quaint that I think something critical is missing on the side of the writer.
人們創作的文字也缺乏幽默感或想讓您思考的意圖,因此對讀者的影響才是最重要的。 如果某種未來的語言模型每天可以制作成千上萬本小說,其故事情節和人物在讀者中引起共鳴并進行銷售,盡管該模型沒有任何意圖,那么也許我會覺得古怪的是,我認為書中缺少一些重要的東西。作家。
智力就像智力一樣 (Intelligence is as intelligence does)
It’s clear that with GPT-3’s size and training data it has been able to achieve language generation capabilities that necessitate sharpening some of the questions we need to understand about machine intelligence. It’s taken fluency, adaptability, and perhaps even a form of creativity, to a level we have not seen before in language models. While some qualitative transitions in our interpretations of its writing seem to be justified, it should not be seen as having qualitatively crossed the boundary for the general type of intelligence we associate with people. Without the ability to connect the words it produces to concepts in the world beyond other words, how can it be said to understand, and without understanding what it’s saying, how can something be intelligent?
顯然,憑借GPT-3的規模和培訓數據,它已經能夠實現語言生成功能,從而有必要加深一些我們需要了解的有關機器智能的問題。 流利度,適應性,甚至可能是一種創造力,達到了我們在語言模型中從未見過的水平。 盡管我們對其著作的解釋在質量上有一些轉變是合理的,但不應將其視為在質性上超越了我們與人交往的一般智力的界限。 如果沒有能力將產生的單詞與世界上其他單詞的概念聯系起來,那么如何說才能理解它,而又不了解它在說什么,那么事物又如何變得聰明呢?
If in the future this mathematical model of language is coupled with other types of models for vision, action, and other perceptions, we may have something that does have concepts that imbue adequacy to its language. We may also need to be more exact in our definition or what “intelligent” or “intelligence” means, and define different kinds of intelligence. There has certainly been continuous progress in the biological sciences to understand our own and other animal cognitive behaviors, abilities, and limitations. But if the problems of defining precisely what AI is for the last 70 years, and other far simpler seeming terms, like heap, are any indication, precision in our definition may be a moving target. Maybe there’s a range of behavior where it’s truly indeterminate if something is exhibiting intelligence or creativity or not. Or maybe the meaning comes down to how we use the words, and what function they serve in our everyday language. If we think of something as intelligent or creative, then it is.
如果將來這種語言的數學模型與其他類型的視覺,動作和其他感知模型結合在一起,我們可能會擁有某些確實不適合其語言的概念。 我們可能還需要更精確地定義“智能”或“智能”的含義,并定義不同種類的智能。 在了解我們自己和其他動物的認知行為,能力和局限性方面,生物學領域肯定取得了持續的進步。 但是,如果要準確地定義過去70年里什么是AI的問題以及其他看起來更簡單的術語(例如堆)可以說明問題,那么我們定義的精確度可能就是一個移動的目標。 也許在某些行為上,如果某物是否表現出智力或創造力,它實際上是不確定的。 也許含義取決于我們如何使用這些單詞,以及它們在我們的日常語言中所起的作用。 如果我們認為某件事是聰明的或有創造力的,那就是。
Originally published at machineopinings.com on August 8, 2020.
最初于 2020年8月8日 發布在 machineopinings.com 上。
翻譯自: https://medium.com/machine-opinings/quantity-can-have-a-quality-of-its-own-for-language-models-fe5e665869a3
數量和質量評價模型
總結
以上是生活随笔為你收集整理的数量和质量评价模型_数量对于语言模型可以具有自己的质量的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 超预期目标1000倍!《流浪地球2》周边
- 下一篇: 深圳全市将禁止机动车鸣喇叭:违者处罚!