Future #67: It takes two rainbows to jump from Hawaii to seventeen
This week, we explore many aspects of machine intelligence: What does it mean, philosophically, for computers to create original texts and imagery? How do we define humanness, and how might that change in response to new kinds of creative production by AI systems? How might computational systems be taught “common sense”? And are algorithmic recommendation systems affecting our own use of language?
—Alexis & Matt
1: To speak, perchance to dream
This was quite a week for conversations about large language models (LLMs) and what they mean for humanity. The discussion popped up in Steven Johnson’s longform New York Times Magazine piece, as well as in Tobias Rees’s more philosophy-oriented deep dive in Daedalus. And finally, Emily Bender authored a detailed critique of Johnson’s Times article, arguing that it fell prey to AI boosterism and elided many important distinctions about the capabilities of LLMs.
One of the core questions at the heart of these pieces was what it means for computers to compose language that, in many cases, is indistinguishable from human-generated texts. Bender’s perspective is that LLMs are simply “stochastic parrots”. She argues that true comprehension and mastery of language involves a relationship between linguistics and the experience of the world outside of the language model. AI systems have only the signifiers with no referent, and therefore the appearance of intelligence is simply mimicry.
Rees, in his essay, actually begins with Bender’s analysis (these three authors all quote one another in a referential ouroboros). He states that, while it’s a strong argument, it only holds true as long as the underlying ontology does. He then proceeds on a historical journey to explore: “When and under what circumstances did the idea that language is about meaning, and that only existentially situated subjects can have words, first emerge? What sets this concept apart from prior conceptualizations?”
Rees comes to the conclusion that we are in the midst of a fairly profound restructuring of our worldview, which was previously grounded in Enlightenment concepts of human as distinct from nature and machines (with language as a key point of differentiation):
“By undoing the formerly exclusive link between language and humans, GPT-3 created the condition of the possibility of elaborating a much more general concept of language: as long as language needed human subjects, only humans could have language. But once language is understood as a communication system, then there is in principle nothing that separates human language from the language of animals or microbes or machines. A bit as if language becomes a general theme and human language a variation among many other possible variations.”
→ On NYT Magazine on AI: Resist the Urge to be Impressed | Emily Bender
→ AI is mastering language: should we trust what it says? | The New York Times
2: Leg booty, spicy eggplant, and other codewords
Not only are computational systems able to generate language in new ways, they are also transforming the way that humans speak. History is rife with examples of coded language being used to communicate within subcultures, from slang for criminal behavior to evade detection by police to secret terms for homosexuality in oppressive social contexts. That behavior usually develops in response to fear of punishment by the government or ostracism from a peer group. But now, we’re seeing the same kind of linguistic phenomenon happening in response to algorithmic power structures. This Taylor Lorenz article documents the rise of “algospeak” — the phenomenon of creators using code words to avoid being downranked or demonetized by platforms. Coded terms are used for everything from sexuality (“le dollar bean” for lesbian) to the pandemic (“panini” or “panda express”), as creators continuously try to reverse engineer platforms’ recommendation logic to uncover words that will trigger the algorithm.
We’re always fascinated by the way that humans and technology are continuously co-evolving. But we see more and more of these kinds of adaptations, where instead of designing technology to support desired behavior, we are building clumsy systems that force us to contort our behavior in unwanted ways. We know that content moderation is a difficult problem to solve, but in most cases, blacklisting words and phrases without consideration of context is only going to lead to more behaviors like these.
→ Internet ‘algospeak’ is changing our language in real time, from ‘nip nops’ to ‘le dollar bean’ | The Washington Post
3: Androids draw electric sheep
If you’ve been on social media in the last few weeks, you may have seen friends raving about DALL-E 2, a text-to-image generator that can take phrases like “a dog wearing a beret and a black turtleneck” and create photo-realistic images. DALL-E 2 is built on the same tech as GPT-3, namely a neural network that understands how to describe pictures, just run in reverse. (I’m deeply oversimplifying this, of course.)
Given the dangers that unsupervised image generation could create, the team behind DALL-E has put in some thoughtful safeguards. Access has been limited to a small group of researchers who have been asked not only to test the system’s skills, but also to point out where there are biases in the results it generates. The training data DALL-E used excluded some objectionable imagery and doesn’t create faces of actual people, though as with all training datasets it was impossible to eliminate all human bias. (For example, when researchers attempted to remove training images of a sexual nature, they found that output images underrepresented women.)
While imperfect, these protections are more thoughtful than we’ve seen with releases of other large ML tools, and indicate that researchers are thinking more critically about the harm their systems could perpetuate. That leaves us to enjoy images like “a bowl of soup that looks like a monster, knitted out of wool” with slightly less guilt.
4: From mimicry to meaning?
It’s a common lesson programmers learn early, and learn often: a computer will do exactly what you tell it to, no matter what you might mean by your instructions. Computing is quite literal, built on top of simple binary numbers and math that can be used to express far more nuance and meaning than 1 and 0 would imply. Given the advances in machine learning, is it possible that someday computers would learn to understand intent and nuance?
This basic question has plagued computer scientists since the advent of AI research, and so far, the answer has been a resounding “no”. No matter how compelling the output of a GPT-3 model, certain simple prompts can create nonsensical results, including the title of this week’s newsletter. Contemporary machine learning models do not describe anything about the meaning of the symbols or facets an algorithm may use, but rather, simple patterns of repetition are used to assume importance. This is why many algorithms tend to find “cheats” — the diagnostic algorithm that flagged patients lying down as being more sick than those standing up, for example.
Researchers are working hard to impart some meaning to their models, and results are starting to hew closer to what human responses would be. While more reliably logical responses from AI would be welcome and could lead to safer inclusion in our daily decisions, we offer two counterpoints. First, machine learning needs to move beyond the simple “bag of words” processes that power today’s models. Early AI research used factual tenets to try and build an understanding of how the world worked, but these models were hamstrung by scarce computing resources and small data sets; a system combining a Large Language Model with understanding of meaning and syntax may be the next best approach. Secondly, at least some of the joy of working with these systems is to see what they come up with, to give us a view of the world that’s entirely alien to our own and use it as a jumping off point for more creative work. Letting the machine “think” like a machine may be fruitful in ways that trying to teach it to think like a human may not.
→ Can computers learn common sense? | The New Yorker
5: Devout data
If Silicon Valley had a motto, it might be the “Move Fast and Break Things” slogan put forward in Facebook’s early days. Few people would apply this slogan to the Vatican or the Catholic Church, though that doesn’t mean that ethical issues of artificial intelligence and its impact on humanity have escaped its notice.
Father Paolo Benanti, a trained engineer and coder, now works and lives in the Vatican, helping to broker discussions between tech firms and the church. These conversations consider the impact that AI may have on society and humanity, conversations too often left aside when these systems are first being conceived and developed. The church’s position on AI is evolving, but centers human dignity first and foremost. “Algorithms make us quantifiable. The idea that if we transform human beings into data, they can be processed or discarded,” Benanti laments, giving just one example of how the church is concerned that AI may increase inequality.
The Vatican’s efforts to bring together industry and clergy have led to other surprising partnerships as well. This May in Abu Dhabi, Christian, Muslim, and Jewish leaders will all jointly agree on a statement of AI ethics to help “protect human society from AI harms”. “To my knowledge, these three monotheistic faiths have never come together and signed a joint declaration on anything before,” Benanti says, giving us hope that if these three groups can agree on ways forward, then perhaps those building AI systems will actually consider their warning and create systems that enhance equality and human connection.
→ The Franciscan monk helping the Vatican take on — and tame — AI | Financial Times
6: Idiosyncratic interfaces
The design aesthetic of screen-based devices for the past 10–15 years can best be described as “flat glass rectangles”. Since the launch of the iPhone in 2007, all other phones have inexorably slid towards a certain sameness. Put another way, as the phone becomes primarily a vehicle for a world of software, the hardware has become more of a featureless, unobtrusive window into that digital space. There’s nothing wrong with this, per se, but as Clive Thompson notes in this essay about his 2004 Sidekick, there may be some qualities that got lost along the way in this optimization.
After discovering his 18-year old device, and seeing his teenage son’s reaction to it, Thompson goes on to elucidate “reasons why the Sidekick was the most stylish phone ever”. Those reasons include the ergonomics of the device, from the satisfying feel of swiveling it open to the tactility of the keyboard. With touchscreens and XR interfaces, we effectively lose the sense of touch in our ability to navigate digital space. He also calls attention to the Sidekick’s affordances for ambient information display. One could customize the glow of the device so that you knew what kind of information you were being alerted to, without having to pick up the phone or read anything. While we have some of this capability today with audio and haptics, it’s less fine-grained. As we’re all painfully aware, there’s very little we are alerted to that isn’t distracting, constantly pulling our attention away from whatever it was we meant to do.
While, as Thompson puts it, “there’s something extremely useful about today’s form factor”, we can’t help but wonder if we will continue down this “glass rectangle” path because of its efficiency, or if there will be some pendulum swing away from homogeneity. What would a Sidekick built for 2022 even look like? How might we create ergonomic, glanceable, and calm devices — not for the sake of novelty or nostalgia, but built with the best of both worlds to create a desirable experience?
→ The Sidekick was the best smartphone ever | Clive Thompson