Table of Contents
The complaint shows up on r/WritingWithAI every few weeks, and this week it was blunt: “Creative writing got so much worse lately I wanted to just scream.” A writer who drafts most of their story themselves and uses ChatGPT for tricky scenes and dialogue watched the output slide from genuinely good prose to one-line dialogue fragments, flattened description, and what they called “flat, generic, too careful and essentially lifeless” writing. Nothing about their process had changed. So what happened, and what do you actually do about it?
⚡ TL;DR – Key Takeaways
- •“ChatGPT got worse” is usually three problems stacked: long chats that degrade as the context fills with the model's own output, models converging to the statistical middle of every possible story, and silent model updates that change writing behavior overnight.
- •A 61,608-story study (University of Maryland + Google DeepMind) found the real AI tells live in story structure, not prose — stated morals, zero subplots, chronological reveals — and prose-level editing does not remove them.
- •The fixes Reddit converges on: fresh chat per scene, show the style with pasted examples instead of describing it with adjectives, generate dialogue and action beats in separate passes, and keep the story architecture human.
- •Ask for several deliberately divergent treatments of a scene and pick the strangest one — you're reintroducing the variance the model averages away.
- •Sometimes it really is the model, not you. Test the same scene brief on a different model before rewriting all your prompts.
The Thread That Prompted This Article
The post appeared in r/WritingWithAI and collected upvotes fast, because half the subreddit has lived it. The author isn't a lazy one-prompt generator — they write most parts themselves and bring ChatGPT in for scenes they struggle to envision, or for dialogue. For a long stretch that partnership worked. Then, over roughly two weeks, it stopped working.
r/WritingWithAI
Creative writing got so much worse lately I wanted to just scream
“The dialogue just is weird and generic, everything is flattened and it's driving me mad… If it does the chat gets too long and we're back at square one… But now everything is flat, generic, too careful and essentially lifeless.”
View on Reddit →Three details in that post matter, because each points to a different cause: the quality drops within long chats, the dialogue collapses into fragments, and the whole thing started abruptly a couple of weeks ago. Let's take those one at a time.
Cause One: Long Chats Poison Themselves
The most common and most fixable cause. A chat model's context fills up with everything in the conversation — including its own previous answers. The deeper you get into one long chat, the more of that context is the model's own mediocre output, and the more it imitates itself instead of you. Drift compounds: a slightly flat paragraph becomes the style reference for the next one, which becomes the reference for the next.
The OP even describes this without naming it: things improve briefly after steering, then “the chat gets too long and we're back at square one.” That's not the model getting worse. That's the chat getting long.
The fix is unglamorous: start a fresh chat per scene, and re-paste only what the scene needs — a short style note, the scene brief, and the last few paragraphs for continuity. Never the whole conversation. This is the same principle behind keeping a story bible outside the chat, which we covered in depth in our guide to keeping an AI-generated book consistent: the model should see a compressed, curated state, not its own raw history.
Cause Two: Models Write From the Middle
The second cause runs deeper than any chat hygiene, and a new study put numbers on it. A paper called StoryScope, from the University of Maryland and Google DeepMind, was posted to the subreddit the same week and became one of its top threads. (We broke the full study down in a separate deep dive; here's what matters for the quality problem.)
r/WritingWithAI
A new study analysed 61,608 AI-written stories. The tells aren't in the prose, they're in the story itself, and editing doesn't remove them.
“They deliberately threw away all the style signals… They only looked at narrative decisions: plot structure, character agency, how information gets revealed, how endings resolve. From story structure alone, a classifier told human from AI 93.2% of the time.”
View on Reddit →The study had five models (including Claude, GPT and Gemini) and human authors write stories from 10,272 identical premises — 61,608 stories in total. The findings explain the “flat and generic” feeling better than any anecdote:
The tells are structural, and editing doesn't remove them
AI stories stated their moral outright in 77% of cases, ran almost no subplots, and revealed information in strict chronological order. When the researchers rewrote AI stories with a professional editing framework, the prose improved — and the structural tells stayed. A classifier could still spot the AI story, because editing polishes sentences, not narrative decisions.
The human story is the outlier
Given six versions of the same premise, the human-written one was the statistical outlier 57.8% of the time (chance would be 16.7%). Models sample from the middle of the distribution of plausible stories; humans wander off it. That's the “no surprising twists anymore” part of the complaint, quantified.
Each model has a fingerprint
The classifier could tell which model wrote a story 68% of the time. Claude wrote restrained, quiet endings and loved epilogues; GPT leaned on gossip and rumor as a plot engine in 64% of its stories; Gemini tagged 88% of its settings bleak and oppressive. One commenter said they'd had to hard-code negative constraints like “do not state the theme of the book” just to fight the defaults — which matches the 77% stated-morals figure almost too well.
Cause Three: Sometimes the Model Actually Changed
The abrupt “two weeks ago” timing in the original post is the one thing chat hygiene can't explain. Model providers ship silent updates, and writing behavior shifts with them — OpenAI has pulled a ChatGPT update after user complaints before. You can't fix a model update with prompting. You can only detect it: run the same scene brief, same style examples, on another model. If the output is dramatically better elsewhere, it was never your prompt.
Commenters in the thread had already arrived there — one recommended testing the same material across models, noting different models have different genre strengths, from Claude for warmer character work to Gemini's love-it-or-hate-it prose. The study's fingerprint data says the same thing from the other direction: these models genuinely write differently, so switching is a real lever, not superstition.
The Workflow That Fixes Most of It
Here's the answer I posted in the thread, expanded. (Disclosure: I'm Stefan, founder of Automateed, an AI book creator with a publishing marketplace — fighting exactly this failure mode is a large part of what we build.)
Stefan | Founder of Automateed
“Long chats are the killer. Quality sinks the deeper you get into one conversation, because the context fills up with the model's own mediocre output and it starts imitating itself… Stop describing the style you want and show it instead. Paste two or three paragraphs of prose in the register you're after… Models follow examples far better than adjectives like 'vivid' or 'creative.'”
View on Reddit →1. One scene, one chat
Treat chats as disposable. Each scene gets a fresh context containing exactly three things: a short style note, the scene brief (who's in it, what must happen, what it sets up), and the closing paragraphs of the previous scene. Everything else — plot state, character facts, timeline — lives outside the chat and gets summarized in, never pasted in raw.
2. Show the style, don't describe it
“Write vivid, punchy prose” selects for the model's average idea of vivid, which is exactly the flatness you're fighting. Pasting two or three paragraphs in the register you want — your own best pages, or a public-domain author's — and asking the model to match sentence rhythm and dialogue density works dramatically better. If you need ready-made scaffolding, our collection of ChatGPT prompts for writing a book is built around this show-don't-describe principle.
3. Two passes for dialogue
The one-line fragmented dialogue the OP describes usually appears when the model juggles dialogue, blocking and interiority in a single pass and collapses into screenplay style. Generate the dialogue exchange alone first, then run a second pass that adds action beats and interior reaction. Two mediocre passes merge into one scene that reads composed.
4. Keep the architecture human, and order the variance
The study's deepest lesson: everything that gave AI stories away — stated morals, no subplots, chronological reveals — is an outline-level decision, cheap to fix before drafting and nearly impossible after. So own the outline. And when you want the model to surprise you, don't ask for “an unexpected twist”; ask for four deliberately divergent directions the next scene could take, ranked from safe to strange, and pick the one that unsettles you a little. You choose the outlier instead of hoping the model lands on it.
5. Track state outside the model
Style drift and fact drift feed each other. A story-state note — where each character is, what's been revealed, what the next chapter must accomplish — updated after every chapter and pasted fresh into each new chat kills most of it. This is the same discipline covered in our broader guide to using AI for book creation, and it's what we automated in Automateed's pipeline: the generator never sees “the book so far,” it sees the outline plus a compressed state that can't rot. If you'd rather not run that loop by hand in chat windows, an AI ebook creator that manages outline, state and per-chapter context for you is the lazier path to the same result — though plenty of writers in these threads get there with ChatGPT or Claude plus discipline, or with drafting tools like Sudowrite and Novelcrafter, which give you more direct knobs over prose voice.
What This Isn't
One honest caveat from the study thread: several of the most upvoted commenters use AI only for brainstorming and grammar, not for drafting whole stories — and one pointed out that 5,000-word studies say little about novel-length work, where models struggle without heavy human supervision anyway. They're right about the ceiling. The workflow above narrows the gap between AI-assisted and human-only drafts; it doesn't erase it. The writers getting the best results in these threads all converge on the same division of labor: human owns the story decisions, model drafts inside them, never the reverse.
FAQ
Did ChatGPT actually get worse at creative writing?
Sometimes, temporarily, yes — providers ship model updates that change writing behavior, and OpenAI has rolled at least one back after complaints. But most perceived degradation is reproducible on any model: long chats degrade as the context fills with the model's own output. Test the same scene brief in a fresh chat and on a second model before concluding the model broke.
Why does AI writing quality drop in long chats?
Because the model's context increasingly consists of its own previous answers, it starts imitating itself instead of your instructions. Drift compounds with every exchange. Fresh chat per scene, with only a style note, scene brief and the last few paragraphs carried over, resets the spiral.
Can editing fix AI-sounding stories?
Editing fixes AI-sounding sentences. The 61,608-story StoryScope study found that after professional-framework rewriting, classifiers still identified AI stories from structure alone — stated morals, no subplots, chronological reveals survive any line edit. Structure has to be fixed at the outline stage, by a human.
Which AI model is best for fiction writing?
They differ more than people assume — the study identified each model's stories 68% of the time from narrative habits alone (Claude: restrained, quiet endings; GPT: rumor-driven ensemble plots; Gemini: bleak settings and tidy endings). There is no universal best; there's a best fit for your genre, which is why running one scene brief across two or three models is the fastest diagnostic there is.
Sources
The Reddit threads this article draws on: the original complaint thread and the StoryScope study discussion, both in r/WritingWithAI. Comments by other users are paraphrased; the quoted answer is my own.







