No, I Put That (Em)Dash There…

I would like to supply a TLDR; for this document, but it's too long for there to be a simple summary. The closest would be the conclusion of the article, but there is a lot of information along the way that would be lost if you jump to that. Therefor, if you clicked on this article, I would recommend reading it completely.

Introduction

I read Jola's article The social contract of writing the other day, and was impressed by their well reasoned arguments. I then read the response from Segun Famisa: No. You can't tell it was written by AI, and felt that it was quite a bit off the point.

Jola's primary arguments were:

Segun, on the other hand, seems to have disagreed with concept that LLM assisted writing can be identified. He makes the arguments that:

There is a lot to be said about both of these pieces. I have some things that haven't been considered to toss into the discussion surrounding the use of LLMs in writing. And, along the way, I want to rebut a couple of the arguments that have been made.

I Am the Author

Before we dig into the exceptionally muddy waters of the ethical questions of using LLMs in writing it's necessary to be clear about this: I am the sole author of this article. Yes, I use some tools to assist in the writing of this article: (self-hosted) LanguageTool and Harper. Why? Because my grammar and spelling sucks.

But, rest assured, all the words on your screen were typed by me. All the awkward analogies, idioms, and other quirks of this text emerged from my mind, and not from an LLM. I will most certainly guarantee that this will lead vocabulary choices that are unusual, and may, therefore, not read in the same way that others might have written this.

And, I am the one making the choice to use properly formatted em-dashes and ellipses throughout this text. These have only been the convention for typeset prose since the invention of the movable type printing press by Johannes Gutenberg around the year 1440.

Donald Knuth spent years working on TeX, trying to find a way to automate properly typeset texts. Along the way he solved many problems that were far more complicated than anyone had thought they were. I don't understand why we don't honor the work of Knuth and use these machines to the fullest extent of their capabilities, especially when it comes to typesetting.

I do feel like these silly “tell-tales” were originally intended as a joke: “An em-dash? No one writes like that! Right?” and somewhere along the way the joke was lost.

Simple Tell-Tales Are a Lie

That's one of the first things I would like to contribute to the discussion. Simple, easy, tell-tales just aren't a reality. LLMs are not simple pieces of software that are just glorified “autocomplete” machines as some people would try to have us believe.

Even a cursory study of how an LLM is designed and trained will make this fact self-evident. While I am not an expert in this field (far from it), I have watched and read enough background information to have some appreciation for the complexity of the accomplishments in the field.

The facts are that there are many elements that go into an LLM:

This is only a top-level view of some of the factors. It's pretty clear that the resulting output from an LLM can be vastly different based on changes to any or all of these factors. This also points to the fact that this should still be considered a research technology, and not something that is being implemented as a tool for the public to use, In my opinion.

Given this level of complexity, how can we think that there are simple tell-tales for LLM written works? It's simply ridiculous.

There Are Indicators of LLM Writing

There is a whole field of study around linguistics which can, using varying methods be used to identify patterns in written language. This is often used with historically significant documents to verify the authorship of said documents.

How is this done? By analyzing works known to be authored by a person, and comparing the documents in question to the known writings of the author. This process involves numerous factors:

This idea of examining the constructs and patterns found within a document isn't limited to written bodies of work. These concepts are applied to analyze things like music compositions.

Then why have tools like OpenAI's AI Classifier failed?

Because it's kind of like a dog chasing its own tail. A dog generally chases its own tail not because it wants to catch it, but because the activity itself is fun. The dog knows how to catch its tail, it does so all the time by laying down bending over to get to its tail.

An LLM trying to analyze a text to determine its provenance is like the dog chasing its tail. It can do the assigned activity, but the results are unlikely to end in success. Why? Quite simply, an LLM is likely unable to make a distinction between the information it has been trained on versus information it has generated. In terms of an LLM these two things are equal. The LLM written document is the result of the information it was trained on. What things can it do to find a distinction?

The Author's Social Contract

This is a topic that authors discuss at length. The question of the relationship between an author and their audience is frequently questioned.

Take, for example, a mystery author. Just how much information do they have to present to the reader throughout the text to make the solution satisfying? Is there an issue with presenting too many facts that lead to the solution? What amount of misdirection is appropriate in a story? Where is the line between misdirection and confusing the reader to the point frustration?

Similarly, Science Fiction authors faced the dilemma of writing “hard” Sci-fi, aka basing the technology strictly on known science and technology, versus using known science and technology as the basis for more advanced systems. In some cases, is there a basis for inventing whole new scientific concepts or principles in order to introduce a new technology to their world. Where is the line? What will the reader accept or reject?

These, and many more, types of questions exist throughout all genres of fiction writing. And, likely exist through many more forms of written communication. This discussion is something that is not going away anytime in the future. And now, there is a new topic to add to this: what is the line with LLMs?

Jola's reference to the Oxide Computing RFD 576 brings a lot of subtlety to the questions surrounding the use of LLMs in writing. It simultaneously seems to be advocating for some roles for LLMs, while still acknowledging that there are issues and dangers in using LLMs.

The issue I see with this is there is little to no objective measurement for the effective use of LLMs in their environment. I see this as a missed thought that needs to be carefully addressed. For example, it is suggested that LLMs can be used as research assistants (which is something that I've thought about myself). However, the document warns of the propensity for LLMs to make things up, insert claims that aren't true, or hallucinate, or fail in other ways. Therefore, the user of the LLM for research needs to verify all the claims in the output, and go through all the references to make certain they are valid.

When I considered this question, I started to wonder what the impact would be on my personal research process? In other words, I wondered if the time that I would save would be sufficiently greater than the amount of time I needed to validate the work of the LLM? While I have no experiential data on which make such a judgment, I do have an experience that informed my thoughts on this topic.

A few years ago I asked ChatGPT to write a profile of an artist I am deeply familiar with. I asked it to write a profile under 500 words in length to be used in the liner notes for a new release by said artist. The results were scary. The first part was not all that bad, it got the artists real name, his approximate age, and the region of the world he was from. It then went on to explain his style of music, and his range within that style of music.

Then things fell completely apart. It started listing the most popular releases from the artist. First, several of the releases were little known works by him. But if that wasn't bad enough, it listed a work that he has publicly and widely disavowed. And, to take things to the last level: it made up two works.

I spent over an hour researching the releases, especially because I couldn't any reference to any of these works in the artists online discographies. Once I had determined all the facts surrounding those works, I went back to the LLM and challenged it on the works it had listed. It took numerous exchanges to get it to recognize the fictional works, but it would never explain where those works had come from. As for the disavowed work, it did acknowledge that the artist had disavowed it, but wouldn't answer why that work had been included on the list.

Realize, this is only a 500 word profile. About 2 pages of typewritten text. Several hours spent in validating, and interrogating the LLM about its output. This was a task that would have taken me approximately 30 minutes to complete. At this point I saw that using an LLM for such tasks had an actual negative impact on my work process.

Now, a sample set of one does not make for a good basis to draw broader conclusions. However, when considering what is the line with LLMs, it should be considered whether the use of the tool is going to significantly contribute to the quality of the work, or if it is going to become an unduly burdensome tool. While some roles, such as a proofreading or critiquing one's writing might be a viable and useful option, other areas such as researching, writing assistance, or editing a work might be more burden than useful.

Does AI Fit?

There is something that I have been thinking about. We have been worried that LLM writing are changing the way that we communicate. However, isn't that something that we have been doing throughout history? Consider a few brief examples.

Pens

Yeah, this might seem like a boring way to start things, but there are things that we have to consider, seriously.

The first writing implements were reeds or feathers that were cut to a point, and slit in them. When dipped into an ink pot they held a small amount of ink to allow a person to write. This was an actual advancement over earlier forms of writing. And, because of the limited education, at the time, only Royalty (and possibly extremely wealthy people) had access to these tools. From the quill, we moved on the dip pen. Same concept as before, but now we had nibs made from steel.

The next major change would be to what we now call “dropper” pens, but were referred to at the time as reservoir pens. These were pens where the barrel portion was treated as a reservoir which could be filled with ink. The ink would flow into a part of the pen known as the feed. The feed was connected to the nib, and supplied ink to it. There is a lot of uncertainty about when this was first invented. There are claims that such a pen was invented in the mid-900s in Egypt, but there is no physical evidence of such a pen existing. It is also believed that Leonardo da Vinci may have made one for his own use. His journals contain designs for such a pen, and it's notable that his journals reflect a more continuous ink flow than other writings of the same period. However, there is definitely evidence that there were reservoir pens being produced and sold in 17th century Germany.

Innovation in the design and manufacture of pens remained steady through the 19th century. However, at the beginning of the 20th century a new era of fountain pens was born when self-filling pens began hitting the market. These pens used various mechanisms to allow the pens to be filled by sucking ink into the barrel of the pen. These pens quickly became runaway hits with the general population.

However, their years were numbered. There were efforts to come up with what is known now as the ballpoint pen dating all the way back to the late 1880s. However, all the early versions of such a pen had problems with ink flow, reliability, and material choices. These were much the same issues that had plagued the progress in fountain pen development. However, László Bíró and his brother György decided to undertake this problem in the 1930s, and by 1938 they filed for their first patent for the Biro pen. After World War II, the design of the pen was refined, and eventually came down in price to the point where it was easily accessible to the general public, and quickly supplanted the fountain pen.

The first patented typewriter was developed in 1829. Known as the Typographer by William Austin Burt. (There were several machines before this one, but this is the first extensively documented machine.) In the mid-19th century the desire and need for speeding up communications brought the development of the typewriter to the forefront of technology. Typists, stenographers, and telegraphers could take down information at the rate of approximately 130 words per minute, whereas people writing with a pen tended to top out around 30 words per minute.

The first commercially successful typewriter was patented in 1868, and was sold under the name “Shoales and Glidden Typewriter”. In 1873 Remington would bring its first typewriter to market. In the early 1900s the design of typewriters reached a point where the design was somewhat standardized, and there were at least a dozen notable manufacturers of typewriters.

The first electric typewriter began production in 1900. However, the first practical electric typewriter wasn't produced until 1914, and successfully brought to market in 1920. From the 1920s to the 1940s the main company producing electric typewriters (Northeast Electric Company) changed hands several times, saw its typewriter division spun-off into a separate company, and was eventually acquired by IBM.

IBM would take the technology, and being producing its Electromatic series of typewriters, which introduced the ability to vary the spacing of the characters, producing a typewritten page the appeared more like a typeset document. In 1961 IBM would introduce the Selectric typewriter, which used a typeball, which could be changed, enabling different fonts or type styles to be achieved.

The final step in the development of the typewriter was the electronic (not to be confused with electric typewriters). Electronic typewriter distinctions were the use of the daisy wheel type head, and using circuitry to control the type head, instead of the purely mechanical mechanisms of the previous electric typewriter.

The typewriter market began to recede in the 1990s after the invention of the personal computer. Of course, much of the invention around the typewriter continues on today in the computer keyboards used by many people.

Getting to the Point

Why discuss the invention and progression of pens and typewriters? They are highly relevant to this discussion, as other communication inventions throughout the ages. These were just the two most direct examples that came to my mind.

What is worth considering is how did these technologies impact the work of creating written documents? The progress from using a quill / reed and ink or dip pens to the self filling fountain pens enabled writers to expand on their work. It made it easier for them to write in a continuous flow of thoughts. Writers like James Joyce and Ernest Hemingway were known to use fountain pens in composing their works. How much would they have been impacted if they only had a dip pen to work with? The invention of the ballpoint pen took writing to another level altogether, being more portable than fountain pens, and requiring fewer and less messy refills.

The typewriter removed many of the restrictions of the pen in terms of the speed at which a work could be produced, and the accuracy and legibility of the document produced by an author. The penmanship of the author was less of an issue. The speed at which a document could be typed was up to four times the speed of handwriting, and potentially higher with the invention of the electric and electronic typewriters. And, by the time electric and electronic typewriters emerged, the ability to correct the text as it was written was greatly improved.

We can move along and look at what the personal computer and the internet have enabled for authors… That is such a large topic I didn't even want to start writing about it.

Is AI Next?

Is AI the next fountain pen or typewriter? Is this a technology that will have impact on writing? That seems to be a foregone conclusion at this point as we look at the works that are being produced now.

The question then becomes where will AI fit in? AI is current still in its early stages, not the mature technology that the marketers and AI companies would like us to believe that it is. But, is there a point where it crosses the point of maturity and become a tool that is going to be seen in the same way as earlier tools were?

There were authors that resisted using typewriters. There were authors that resisted using specific brands of pens (there is a humorous story in which H. P. Lovecraft complained bitterly when he was forced to use a Conklin fountain pen after losing his Waterman pen. Meanwhile, Mark Twain was so in love with the Conklin pens that he endorsed them.)

What It Looks Like

The final question I have been asking myself win regard to AI and writing is: what does it look like? That is, if, and when the technology reaches a point where it is considered to be mature enough to become just another tool for writers and artists?

This is another question that I don't have an answer for, and I don't have any predictions on it. I only have some hopes for what will come. What hopes do I have?

My hope is that the technology will not remain under the direction of large corporations. I would rather see the technology becoming something that individuals chose implement for themselves, and have the ability to customize what it does for them, and how they interact with it.

This basically means local LLM implementations. We have machines that are capable of running small local LLM's right now. (I'm typing this on an AMD Ryzen AI MAX+ system right now). I think there are ways to allow individuals to implement and customize this tool in the same manner that many of us chose to install and customize our Linux systems. I think this could also open the door(s) to ways of correcting the wrongs of the current AI industry in terms of their use of other people's property. But that's a whole other thought process that I have been going through, which belongs in a different article.

Conclusion

The future of AI as it stands is uncertain. There are people that are both bullish and bearish on the state of the industry from a business standpoint. I tend to align myself more with the bears. I believe that the financing is a shame, and there will (hopefully) be either some kind of market correction, or day of reckoning where these companies are concerned.

While the things that many people think are tells really aren't, that doesn't mean it isn't possible to identify AI generated writing more accurately given the correct set of linguistic analysis tools. And at this point we should be identifying these works as the technology is really not at a level where it should be so broadly accepted.

When I look at a company like Oxide Computers I am encouraged that they are taking an approach that addresses many of the subtleties of the questions surrounding AI. However, what I am not encouraged by is that the missed the singularly most critical point: how does one quantify the usefulness of this technology? My personal experience showed me that it could quite easily and substantially get in my way, turning a thirty-minute task into two hours of work. That's not a productivity boon.

But, I am also wondering if we are looking at this form the wrong perspective. Throughout history there have been technologies that have substantially impacted our ability to write and create. Has the impact of those technologies been positive or negative? That's a question with an unknown answer, and one that should be researched more deeply in order to understand what we should expect as the impact of AI.

I know there have been artists and creators that have resisted the technologies that I brought up in this article. I mentioned that one author couldn't stand a self-filling fountain pen (Lovecraft), while another (Twain) loved it so much he endorsed the company. There were authors that resisted using the typewriter when it became a reasonably commonplace and affordable tool for writing, despite all of its benefits.

Where does AI fit in for writers? I don't know, but I do have some hopes for it. The primary hope is that it is a technology that does not remain in the hands of large corporations. I hope that it instead becomes a personal technology that the individual can implement for themselves, and customize it to integrate it into their life.

FediRing
◀️ Prev Home Next ▶️