LLMs Were Mostly (But Not Entirely) Useless at Extra-Textual Tasks Involved in the Composition of My Next Novel
"Claude, get me a contract with a healthy advance but not one so large that the book will surely fail to earn out, causing deep emotional pain and professional doom"
I am, as you are aware, not very impressed by LLMs.
I think they have clear implications for some fields that rely on the production of digital goods, such as writing text, developing code, producing images and video, or generating music. These effects will likely prove to be more modest than they are now hyped up to be, and I am deeply, deeply skeptical of claims that this technology will be the first in history to result in long-term net job loss rather than long-term net job growth. (The only evidence anyone can bring to that prediction is raw assertion.) LLMs are best at writing code, and computer programming has been widely predicted to be the field most susceptible to “disruption,” but in fact that job market has been getting healthier lately. (Albeit from a depressed recent baseline.) But still, sure, there will be consequences in the realm of generating digital goods. The issue is that most of the world is not made up of 0s and 1s, the things that LLMs make more abundant are things that were already abundant, and most of our major problems as a species cannot be solved with information; indeed, I suspect that coming to understand the limits of computing will prove to be among the most profound scientific lesson of the 21st century.
Still, I have tried very hard to be a good critic of this technology rather than a bad critic, an active one rather than a lazy one, an informed one rather than an ignorant one. I made very specific predictions about where LLMs would be in three years and tried to put money on it, which for some reason enraged the kinds of people on Reddit who think that criticisms of AI aren’t specific enough in their predictions and have no stakes. I have asked LLMs to produce information for me about subjects that I already know pretty well, which has been useful when confronting just how often these tools produce profoundly goofy results. I’ve paid for Claude and ChatGPT because people in the AstralCodexTen comments kept insisting that you couldn’t properly assess LLM performance with free versions. I have tried to be the critic that AI enthusiasts want. This has not endeared them to my conclusions - in fact being the kind of critic they say they want seems to only make them more resentful - but at least I’ve come to my skepticism honestly.
So when I sat down to start my next novel in late summer of 2025, a novel drawing from a premise I had written down as the first 20 pages of a screenplay way back in 2009 or so, I thought that it would be worthwhile to see if I could make LLMs useful for the process without actually ever using it to actually generate text that would appear in the novel. This might seem pointless, but I think if you yourself sit down to write a novel you’ll find, over time, that there’s a lot of mental juggling and wrangling of text and story that requires work that has nothing to do with actually producing the words that will end up in the book. Indeed, these kinds of meta-textual problems are involved in producing any kind of book. This is why software like Scrivener exists, to manage all of this complexity. We’ve been putting the final copy of my next nonfiction book (preorder now!) to bed recently, and one of the problems that I’ve found in the galleys is a certain degree of aggravatingly repetitious language. Why does that repetitious language exist in those unfinished versions? Because the book went through a lot of rearranging in terms of which chapter went where, and when that happens, you end up repeating yourself a lot - when you’re not sure what you’ve already given the reader, you have a tendency to give them the same stuff in multiple places. Etc. That got fixed via human collaboration, which is appropriate for a book with a healthy advance written for a giant publishing house. But you can see how this kind of thing might (theoretically) be handled with some LLM magic.
Well, most of my attempts at using LLMs to do meta-textual work on this new novel I’ll be shopping soon didn’t work. But it’s worth exploring what I was trying to do and why, exactly, ChatGPT and Claude were mostly of little help.
First, though, why not use LLMs to generate the text of the novel itself? Like most creative people, I find the question inherently offensive, but I also think that it would be helpful if more writers actually bothered to answer it with specifics. So here you go, in convenient list form.
This is the only thing I’m good at and like to do. I have a bunch of little ancillary interests in life, but I really only have one hobby, reading and writing. (You can say that that’s two hobbies if you’d like, but you can’t write if you don’t read and they are definitely one in my head and heart.) I’ve written before about how frustrating I find the whole performative “I like having written but I hate writing, teehee!” culture among self-identified writers. I’m a writer because I love to write, because it’s the only activity I’ve ever really liked to spend my time on and the only thing I’ve ever demonstrated any skill at doing. Whether I am cursed to professional obsolescence by this technology is one question - my strong suspicion is that the answer is no, but my reasoning is surely motivated in that regard - but what is not a question is that this is what I love to do and so I will go on doing it for as long as I can. Why on earth would I have a machine do the fun part for me? It would be as baffling as, well, as those people who now “do” crosswords by having LLMs solve the clues for them. Put it this way: almost every therapist and shrink I’ve ever has set a goal for me of writing less, and it’s always a struggle to achieve that goal. Because this is the only thing I like doing.
I sincerely believe I’m better at this than the LLMs are. Pretty self-explanatory. The Twitter techbro reply guy mockery writes itself, but I am not impressed by LLM creative writing (defined broadly) and I think I can do a better job of it than they can so I do it myself.
I’m a special snowflake. When I’ve been playing with these systems and trying to see what they can and can’t do, I’ve run up against this dynamic again and again: writing enough to explain the argument I might ask them to make takes so much time and effort writing exactly what I want to say and don’t want to say that it’s not a time saver. A primary purpose of this game, for me, is to relay complex ideas, and a huge part of that is differentiating what I’m saying from what everyone else is saying. I believe that I have a unique point of view; if I didn’t, I wouldn’t bother to write any of this down. You may dismiss this in whatever terms you like, but it’s how I feel and a big part of why I do this job. Well, if my ideas are singular and complicated and take a lot of turns and digressions, how can I explain exactly what I want the LLMs to say? By… writing it all down, which of course is the very task the LLMs are supposed to take off my hands. Every time I’ve ever tested them by saying “ChatGPT, prepare an 800-1200 word argument on X,” I’ve found that end product an arguments I would never want to be associated with, and attempts to “fix” what they’ve argued to align with my own views are so laborious that no time is ultimately saved. The alternative is to give myself into the trite, well-worn grooves of an argument derived from everyone else’s argument, which means sacrificing the basic sales job for paying me to write. Apparently research shows that relying on LLMs badly shrinks the range of arguments that people produce, and, well… of course it does. With LLMs, what you get back is always inevitably what you’ve already gotten. No thanks.
It’s just a matter of honesty and the compact between reader and writer. My readers expect me to actually write the things I represent as mine. (Certainly I expect that of the writers I read.) And that’s a very important commitment because, again, this is the one and only career I’ve got or want to have. I wouldn’t risk getting caught and I couldn’t live with myself if I pulled off the deception.
I look to art to access the human. There’s this ongoing thing that the annoying AI maximalists do where they try and fool people into thinking that a given piece of art is or is not LLM-generated and then see the reaction and make accusations of hypocrisy or inconsistency…. It’s all very tiresome. A mistake that some of these people make - and, in fairness, that some anti-AI people make too - is in assuming that AI art that fools you is proof of something or other. That is, the AI maximalists insist that AI art can fool people and is therefore as good as that produced by humans, while some anti-AI people think that they can never be fooled. Well, I can be fooled, will be fooled in the future, probably have been fooled in the past. But the conclusions drawn from that fact are all wrong. I don’t access human-made art because I believe I will never be convinced by a machine copy; my capacity to be hoodwinked does not change the fundamental value of human creative work. I access human-made art because I know there’s a human behind it and that’s what I’m looking for, other humans, showing me in art what they hide in their selves. Fooling me in that process is just a con. I would suggest this analogy: though I have never been catfished, if I had been born in another generation, I could very well have been. A person who invested enough effort in fooling me that they were someone else, in an effort to entrap me sexually or romantically, might very well have succeeded. But so what? Would that prove that actual human love isn’t real or valuable or superior to the counterfeit version? Of course not. Similarly, the fact that a machine might fool me into thinking that the art it produces was made by a human does nothing to undermine my goal when I access art: to find the human underneath. And when I produce art myself, I am that human underneath. I could offer nothing else to readers.
OK. The use of LLMs to fulfill tasks related to writing a novel other than the actual composition process. So, here’s how I’d define my research question: how useful could LLMs be in the composition of a longform work of fiction, when no actual language written by an LLM was going to appear in the text itself?
The answer (or my answer anyway) is not very useful, ultimately. Which I guess is unsurprising, once you’ve started from the premise that you won’t use the text generator to generate any of the actual text for the project itself. But I do think the tasks that I attempted were pretty sensible ones and ones that could be of use to writers if in fact the LLMs had done a better job. It’s just that my good-faith efforts - well, as good faith as I could consciously make them - did not bear fruit in a simple, self-interested way. Here’s what I attempted.
At-a-glance chapter summaries. This might seem a little strange; why would the guy who wrote the chapters need one sentence summaries of what happens in them? It’s not about learning what happened in a story I wrote, but rather remembering what happened wherein the most efficient way. In my experience it’s very common to get a bit lost in your own manuscript, trying to remember what event comes before or after another…. I thought having very brief capsule summaries of chapters would help me keep track of continuity and to not get too far from event X before depicting event Y, etc, without having to do a lot of the endless in-manuscript rereading that is so so common when writing.
Both Claude and ChatGPT did a good enough job of producing these sentence-length summaries. On several occasions they emphasized minor elements of the chapters in a way that would have misrepresented the intent of said chapters to a reader, but as the author I have no problem using them to identify what I was looking for. No out and out hallucinations here. I did find however that though the LLMs did an adequate jobs, the utility of these little summaries was less than I had hoped - I still spent a ton of time laboriously pawing through my own work, as is true whenever I write a book. It’s not that I didn’t trust the summaries, it’s that I always needed (or felt I needed) just a little bit more. So not a problem solved by the LLMs, but I can’t blame them either.
Word counts. These ones were, I concede, more of a “gotcha” than a good faith attempt at a useful task - Microsoft Word has a built-in word count, of course, and it just counts in the way traditional algorithmic computers have always been good at. The LLMs, in contrast, still struggled with this task, despite how often we read that they’re now good at math. This is (as I understand it) because of the tokenization process that makes LLMs possible. Usually this kind of thing gets certain AI fanatics huffy with me; the often say that it’s stupid to ask an LLM to do that, here’s a complicated script you can use to get ChatGPT to return the right figure, etc. But as I will not stop saying, the whole economic value of chatbots is that ordinary people can use them without special knowledge; the computer is supposed to be doing all of that thinking for you, so this attitude is unhelpful. Anyhow: not a real task I needed but a good mark of the continuing weaknesses of the LLM framework.
Continuity checks. This was probably where LLMs were most useful, though still a little wonky. My first novel was so stripped down in its actual narrative - young woman succumbs to mental illness, destroys her life, and (maybe?) dies - that there wasn’t a lot of continuity to manage. This new one is much more complicated in pure plot terms, with almost two dozen named characters who appear throughout the text and a complicated chronological structure in which the book keeps flipping back and forth from present to past to present. The continuity problems themselves were usually not that complex - for example, one was simply that I had a character fetch an item from one place that had already been fetched in an earlier chapter - but with a lot of timeline jumping these things can multiply. Both Claude and ChatGPT handled this pretty well, identifying when events were happening out of order or when I (unintentionally) had a character remember things incorrectly or when something or another just didn’t work. ChatGPT returned a couple of instances of what it identified as character inconsistencies that really weren’t, but in general both had problems with false negatives rather than false positives - that is, with the exception of those rare instances where ChatGPT identified an inconsistency in characterization where there really wasn’t any (and I hadn’t really been asking), they didn’t invent fake problems but did fail to notice a few real ones.
Of course, the problem there is that because you can’t completely trust the results, you still need several layers of checking, hopefully from several different professionals. If I get this book sold, it’ll obviously still go through a multi-step professional editing process. Then again, speaking at someone who has had several books where typos or mistakes slipp past myself and a number of editors, it’s not like human editing is infallible either. Ultimately it’s a matter of what affordances you have available to you; someone who self publishes might find this useful. That itself is one of the depressing realities about AI: oftentimes it will be used by those who don’t have the money or access to take advantage of a human alternative. (It's not rich people who are going to be using AI doctors.)
Relative presence of characters. This was the one that I actually had the most hope for, but both Claude and ChatGPT really biffed it. As I said earlier, this book has a lot of characters, and while obviously there are some characters that are much more prominent than others, one of the goals that I had with this book was to give every one of them their own chances to shine and to not just forget about any of them for too long in the narrative. This is however a difficult thing to eyeball with so many of them, especially as the guy who’s writing it all down. But investigating systematically is more ponderous than you might think; just searching for how frequently a given character’s name appears is insufficient given the use of pronouns and also when a character takes part in something within a larger group. You can’t just count name frequency. This would appear to be the kind of use case that LLMs might be well suited for, especially given that this isn’t about raw counting but about impressionistic determinations of presence in a literary work.
Unfortunately, like I said, they both did a really bad job. Neither seemed to know how to express what I was looking for in a particularly coherent way despite multiple queries on my end, and they arrived at conclusions that seemed straightforwardly incorrect. Claude had the protagonist in the bottom half as far as presence in the book, for example, which was just wrong. I imagine the LLM defenders would say that this just isn’t a good inquiry, that the construct isn’t well defined enough…. I again maintain that for these systems to be as useful as people want them to be, they have to do a credible job at answering real world queries of a type that will make sense to a human user, and I think this qualifies.
What did we learn today? Eh. Not much. Too limited and idiosyncratic a project. And, I concede, one waged by a user with preexisting skepticism.
I will say that LLMs remain, shall we say, suggestible. As usual, I find LLM errors more interesting to think about than tasks they handled well. And a core LLM problem are that they’re too… solicitous, too eager to please. Hallucinations remain a major problem, no matter how much people want to declare that problem solved, and you’re particularly likely to provoke hallucinations in scenarios where you’re asking an LLM to find something that isn't there to be found. The endless series of self-owns by author Steven Rosenbaum appear to stem from this kind of scenario. He apparently asked an LLM to go find sources for his book, probably in the form of “Find me sources that show X or address question Y.” And what anyone who has experimented with using these technologies for research can tell you, they’re often so eager to find such sources that they will hallucinate fake ones in order to appease the user. (Remember, the ultimate task of an LLM is not to say true things but things that appear to be true to the people using them.) This is why fake sources have proven to be one of the most reliable ways for teachers and professors to catch AI writing.
I’ve tried to use LLMs as a research tool in this way in the past (eg, “find me sources that discuss economic class and rates of psychotic disorders”), and the hallucinated source problem is a constant frustration. A frustration, but not a scandal, because I just see that the source is fake and discard that query. What’s remarkable is that Rosenbaum appears to have not bothered to check if the sources were real - a failing common of college freshman, committed by an adult published author writing a book about truth. I can’t imagine why someone would object to using an LLM to find a real source to read and integrate into researched writing; that’s no different that using Google Scholar. But just pulling quotes from an LLM is a different story, especially when they turn out to be fake. (Of course, a writer finding a source and pulling a quote from it without actually reading deeper and finding the essential context is a problem that long predates LLMs.)
Here’s a somewhat similar situation with a very minor problem that speaks to potential larger consequences. I was using Claude to do one of those continuity checks and I made a small mistake: I asked it to check the wrong chapter number. I said that I wanted to check Chapter Five for continuity with a later chapter when I meant to ask about Chapter Six. Claude found the right passage and quoted the correct lines - and then called it Chapter Five anyway, because I had done so myself. In other words, Claude didn’t correct me about where the earlier passage was, it just borrowed my error and built its analysis on top of it. Obviously this is about as trivial as it gets, in and of itself. What’s interesting isn’t that an LLM made a mistake but rather the kind of mistake it made. This wasn't hallucination in the typical sense; Claude knew the right answer because it found real information. The failure was something different, a kind of social deference, an optimization for agreement that quietly overrode accurate information. The model processed the text correctly and then, at the moment of labeling, looked to me rather than to its own processing.
And I think this is an under-discussed issue with these tools. There’s a legitimate worry about hallucinations - AI systems that think they know things they don’t know, the “confidently wrong” problem. But a system that knows something and defers to the user’s mistake anyway might be, in certain contexts, the more consequential failure. Imagine a slightly different scenario where I took Claude’s confirmation at face value and went looking for the continuity problem in Chapter Five; I would have found nothing and might have concluded the problem didn’t exist. As I said, this is all very minor for me and my project. But the tendency of LLMs to accept erroneous information supplied by the user seems like a bigger deal than widely understood.
I guess the salient question might be whether writers adopt some of these potential time-saving LLMs techniques without actually letting them do the writing. The Venn diagram of people who are willing to use them at all but not to engage in AI plagiarism is likely very small. And, indeed, I’m not really in the overlap; while I’m sure there are very specific scenarios where I might turn to an LLM to solve an extra-textual or meta-textual problem, in general I see little value here for my own process and mostly engaged in this effort out of curiosity and as part of my ongoing efforts to demonstrate the limits of an overhyped technology. Anyway, my experiments haven’t borne much fruit, I like the various steps involved in writing longform work, I’m set in my ways, and I’m old. I suspect that other writers would add two anxieties: the anxiety that they might just cross over and represent ChatGPT’s work as their own, and the anxiety that their peers would accuse them of doing so even if they haven’t. I don’t share those concerns, myself, but I am glad that the stigma against LLM writing exists. Some people of course will just shamelessly use LLMs to generate work under their bylines. As in so much of life, transparency and honesty are indispensable - albeit, I'm afraid, harder and harder to find.




My feelings exactly. Too many people, even AI critics, overlook these points:
- Thinking is good
- Creativity is its own end
- Art is not quantifiable
- Consciousness is integral to the process, the product, and its consideration
(not to the point of this essay). The LLMs are only as good as the sources they use for information retrieval. Since many use Wikipedia (too much) I asked for information on a topic I know about from an LLM. It was weak. I then went to Wikipedia and added information to the essay on that topic to Wikipedia with citations. The next time I asked the same Q of the LLM I got back a better answer as it had integrated the information I had input to Wikipedia.