Wednesday, November 20, 2024

An AI experiment for writing the Blueprint - text without context

This is an excerpt from Blueprint2025, which will publish on January 15, 2025. 

When I was writing Blueprint 2025 I started out trying an experiment to review the last 15 years. First, I commissioned an independent reader to review and let me know what they learned. 

Second, I used a large language model (LLM – the methodology underpinning most of the Chatbots in use) to analyze 15 Blueprints for me. I chose a tool called NotebookLM, a Google product made specifically for writers to be a research and note-taking tool that can quickly search, find text, and generate summaries. 

Part of my reasoning was that Google already has sucked up all the Blueprint content – it’s been on the web for years, is licensed as Creative Commons, and I knew I wouldn’t be feeding the digital beast any information it hadn’t already taken from me. Then I was then going to write a short piece comparing what I learned from Susan and what I learned from NotebookLM.

 

I scrapped the 3-part idea. First, because I didn’t learn much from NotebookLM. The few things I did learn are outlined below.

 

But I mostly scrapped the longer section because I don’t want to encourage playing around like this with AI. I want to encourage you to be very, very skeptical of how AI systems are being developed and by whom. I want you to think twice, and then again, before playing with them with information from your organization. I want you to read Jill Lepore’s words again, and think about how it’s the slow drip, drip of promised convenience that embeds technology in our lives in inextricable ways. I want to encourage you to seek out noncommercial options and non-government options. I want you to be VERY clear on the risks and benefits, to your mission, your constituents, and your colleagues. And I want your organization to participate in building any such systems in better ways – better for the environment, better for human rights, better for your purpose.

 

Reflection 2: Text without Context

 

NotebookLM is a large language model (the same structure that underlies chatbots such as ChatGPT) that uses a writer's own documents as its source material. Google says it is designed to help writers gain insights into their own documents faster. The team that developed it includes the writer Steven B. Johnson (The Ghost MapThe Invention of Air, and other books).

 

To find out what insights NotebookLM could provide, I uploaded 15 Blueprints, queried them in a variety of ways, and made a few observations on what I learned in doing so.[i] An example of a question I posed was “In what year did the Blueprint first discuss data donations?” Through this and many other queries, I learned that the system is good at answering questions that ask it to find facts within the pile of text, such as “what year” or “how many.”

 

I also tried several of the pre-loaded questions that the system prompts you to ask, such as “Create a thematic outline” of the documents. This is basically what I asked Susan Joanis to do—read 15 Blueprints and tell me what they argue. I learned that the AI uses certain types of text as signposts—so subheadings and the tables of contents are transformed into emphasis.  Beyond that, nothing. NotebookLM can only find text, it can’t add understand or add context, certainly not any context beyond the words in its database. There’s a huge difference between pattern-matching text (what AIs do) and understanding the context (what humans do). [aaf1] 

 

NotebookLM also provided misleading and false emphasis, which can best be experienced through its Audio Overview. With the click of a button the site will generate an audio summary – you can hear it here. These fake voices attempt to add context by adding tone and emphasis; and, indeed, they sound like real people. That’s frightening, precisely because it sounds so real.

 

Here's what I learned: 

    this AI system is good at counting, pattern finding, and it’s fast. 

That's all.

 

These benefits come at a cost – I’ve paid it in my data, my IP, and my time, all of which contribute to the growth of Google.

 

One of the last things I think we need right now is ways to manipulate information. We already can’t tell truth from fiction. It’s not good for us as readers, as neighbors, as professionals, or as citizens. It’s not good for democracy to be creating and proliferating systems that further corrode trust and truth. We’re already watching disaster responders and city managers get attacked because of good old fashioned human-spread lies. Building systems that spread more lies, faster and further, is self-destructive. It’s not good for disaster response; it’s not good for democracy.

 



[i] I believe the AI companies have illegally taken the copyrighted material of countless authors and should be penalized. However, I wasn’t concerned about this for the Blueprints as 1) I figured they live on the web, they’ve already been used by every AI company; 2) I license them under creative commons to make them easy to use; and 3) basically, everything in them is already publicly available to do with as you (almost) please. There are a lot of other things that I would not put into this system or any other AI.