My Experience with AI Hallucinations: ChatGPT's Perfectly Titled Sources

24/02/2025

I had an idea for my first blog post. But it had nothing to do with AI, ChatGPT, or anything technology related. My goal was to write about what adjective should be used to describe the political systems of the Eastern Bloc countries that existed from roughly 1948 to 1989. Were they Communist (or communist)? Or were they socialist? Or both? Are these terms interchangeable? And most importantly, which term should contemporary scholars studying this region use? Well, I guess that will be a post for another day because once I had a few sources ready to go, I decided to see if I couldn't find a few more. Enter, ChatGPT.

I didn't turn to AI straight away, mind you. It wasn't even my plan. I swear! First, I wanted to analyze the frequency of two terms – "Communist Czechoslovakia" and "Socialist Czechoslovakia" – in Google's text corpora by using G oogle Books Ngram Viewer. I limited my search to books published between 1948 and 1989.

Here are the results:

In the 1950s Communist Czechoslovakia was the go-to term. This period was the height of the second Red Scare and McCarthyism in the USA. Socialist Czechoslovakia barely registered at the time. So, no surprise here, really.

But we see an emerging trend around 1960, when the use of Socialist Czechoslovakia was on the rise and Communist Czechoslovakia in steep decline. Again, this is no coincidence. The 1960 Constitution of Czechoslovakia joyously declared that the transition from capitalism to socialism had been completed and that "Socialism has triumphed in our country!" The country was officially renamed the Czechoslovak Socialist Republic, and therefore, in the language of the regime, the right term was now Socialist Czechoslovakia.

Following the Ngram graph, we see that in the early 1970s Socialist Czechoslovakia overtook Communist Czechoslovakia. (Note too, however, the absolute decrease in the frequency of Communist Czechoslovakia after the 1950s.)

I was curious about the reasons for these changes. I looked quickly at some of the Google Books sources in which the term Socialist Czechoslovakia appeared in the 1970s. After having skimmed through several of them, I hypothesized that they were largely translations of official texts from Czechoslovakia or that they were written by authors sympathetic to the socialist cause.

So, I decided to give ChatGPT a try. Over the past several months, I have been experimenting with how to incorporate AI into my copy editing and translation workflow. But here was an opportunity to use it as the highly powerful research tool it is often touted to be. I couldn't read through all the cited works contained in the Google Books corpus. Maybe AI could do it for me and summarize the results. It'd be worth a try, right?

Here is the following "conversation" I had with ChatGPT-4. I am presenting it warts and all. Yes, I am a professional copy editor and proofreader, and yes, my queries below contain lots of typos. I was writing quickly, knowing that the AI would understand me either way, and I never intended to publish the text.

Here is how I started:

Could you analyze why the term "Socialist Czechoslovakia" became more prevalent than "Communist Czechoslovakia" in the mid-1970s. Here is your source. https://books.google.com/ngrams/graph?content=Communist+Czechoslovakia%2CSocialist+Czechoslovakia&year_start=1948&year_end=1989&corpus=en&smoothing=3 I have a two ideas. 1) Many of the publications using the term "Socialiast Czechoslovakia" were published or written by entitites sympathetic to the socialist/Communist/Soviet cause (or perhaps were translated directly from sources orginally written in the Eastern Bloc). 2) The term "socialist Czechoslovakia" occurs frequently in citations.

Here is ChatGPT's initial response:

The trend visible in the Google Ngram data reflects a confluence of political, linguistic, and scholarly factors:

Official Terminology and Translation Practices

Czechoslovakia's government officially styled the country as the Czechoslovak Socialist Republic. In international and academic texts—especially those translating official documents or reflecting insider perspectives—the adjective "socialist" was often preferred because it mirrored the state's self-identification. In this way, texts originating within or sympathetic to the Eastern Bloc (or those translating such sources) naturally adopted the term "Socialist Czechoslovakia." This official branding, in turn, became a template for many publications.

Shifts in Discourse and Scholarly Citation

During the early Cold War, Western discourse sometimes employed "communist" as a catch-all descriptor with a strong ideological charge. However, as political analysis matured, scholars and commentators began to adopt more nuanced terminology. By the mid-1970s, many academic works—and the texts they cited—had shifted to using "socialist" to describe the state apparatus and its policies. This change was partly self-reinforcing: as influential works used "socialist" in their titles and citations, subsequent research followed suit, making the term more prevalent in the literature. This pattern of citation reinforces the usage trend observed in the Ngram data.

Conclusion

Both of your ideas play a role here:

Source and Translation Effects: Publications from or influenced by Eastern Bloc perspectives, including direct translations, naturally used "socialist" in line with the official state terminology.
Citation Dynamics: As key texts and official documents used "socialist," later works—often citing these sources—helped cement this term in academic and public discourse.

Thus, the increased prevalence of "Socialist Czechoslovakia" in the mid-1970s appears to be a result of the interplay between official nomenclature, translation choices, and the cumulative effect of scholarly citation practices.

Source: Google Ngram Viewer

I thought to myself, "Hey! This sounds great! ChatGPT has confirmed my hypotheses and even added a bit more explanation for the Ngram trends!" I found the paragraph titled "Shifts in Discourse and Scholarly Citation" particularly promising. "Wow! It seems as if there are actual studies about this very issue! Wouldn't it be nice to know what those might be?"

So here was my next question:

Are there any academic studies about the issues covered in the paragraph "Shifts in Discourse and Scholarly Citation"?

And here is the response:

There isn't a single, definitive study devoted exclusively to the shift from "Communist Czechoslovakia" to "Socialist Czechoslovakia," but several bodies of work touch on the underlying issues:

Cold War Historiography and Ideological Framing:
Scholars such as John Lewis Gaddis, Odd Arne Westad, and Michael Hogan have analyzed how Cold War narratives evolved over time. Their work shows that as political conditions changed, so did the language used by historians and policymakers. These studies indicate that shifts in naming—often reflecting changes in ideological emphasis and the source of information (for example, whether texts were produced by or translated from Eastern Bloc sources)—can influence academic discourse and citation practices.
Discourse Analysis and Corpus Linguistics:
Researchers using corpus-based methods have examined long‐term trends in political terminology. Although not focused solely on Czechoslovakia, such studies reveal that politically charged terms (e.g., "communist" versus "socialist") change in prevalence over time in both popular and academic texts. These methods can help trace how ideological shifts—in part driven by translation practices and institutional affiliations—affect scholarly language.
Citation Practices in Political History:
There is also literature in the field of citation analysis that suggests that the way scholars cite sources reflects not only methodological choices but also underlying ideological and institutional orientations. In studies of Cold War and Eastern European history, some researchers have noted that works from or translated from Eastern Bloc contexts tend to favor the term "socialist," a trend that is then reinforced by subsequent citations.

While none of these studies exclusively "zoom in" on the case of Czechoslovakia's naming shift, together they provide a conceptual framework for understanding how—and why—discourse and citation practices evolve in politically charged environments.

For further reading, you might look into works on Cold War historiography (e.g., Gaddis's We Now Know or Westad's writings) as well as articles that apply corpus linguistic methods to political language. These sources collectively shed light on the interplay between ideology, source material, and scholarly citation practices.

Would you like suggestions for specific articles or further details on one of these areas?

Again, this answer still seemed promising, but not specific enough. Indeed, ChatGPT, I "would like suggestions for specific articles"! So, I wrote the following:

Give me references for works addressing issues 2 and 3. Specifically, you say "such studies reveal that politically charged terms (e.g., "communist" versus "socialist") change in prevalence over time in both popular and academic texts.". Give me examples, with page numbers, of "such studies. You also claim that "some researchers have noted that works from or translated from Eastern Bloc contexts tend to favor the term "socialist," a trend that is then reinforced by subsequent citations." Please tell me which scholars, and refer to exact citations from the literature.

ChatGPT had to think long and hard about this one. It took more than a minute to get this response:

Issue 2. Politically Charged Terms in Discourse

Boyd, Michael A. "Political Discourse and Ideological Change: A Corpus‐Based Analysis of Cold War Terminology in American Media."
Journal of Cold War Studies 7, no. 3 (2005): 112–115.
This study uses a corpus-based approach to show that terms like "communist" and "socialist" did not remain static in American print media. In particular, Boyd documents (pp. 112–115) how—over time—the frequency of "communist" decreased while "socialist" began to appear more often, reflecting shifts in ideological framing.
McEnery, Tony, and Andrew Hardie. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press, 2003.
While not exclusively focused on Cold War texts, Chapter 4 (pp. 65–80) provides a detailed methodological framework for tracking changes in political language over time. Subsequent studies have applied these methods to Cold War-era discourse, revealing systematic changes in the use of politically charged terms.

Issue 3. Citation Practices and Translation in Eastern Bloc Contexts

Kováčová, Helena. "Naming Nations in Eastern Europe: Translation Practices and the Politics of Terminology in the Cold War."
East European Politics and Societies 12, no. 2 (1998): 85–91.
Kováčová shows that official documents and texts produced in the Eastern Bloc consistently use "socialist" (reflecting the state's self-identification) rather than "communist." On pp. 88–89, she explains that this practice is then reinforced in translations and in the subsequent academic literature.
Filipova, Milena. "Cold War Linguistics: Ideology and the Translation of Political Discourse."
Comparative Studies in Society and History 52, no. 4 (2010): 731–751.
Filipova examines how ideological considerations shape translation practices. On pp. 740–742 she documents how texts originating in Eastern Bloc states favor the term "socialist" due to official usage, and how later scholarly works—by citing these texts—further cement this terminology in the academic record.

These references provide concrete examples of studies that (a) use corpus and discourse methods to track shifts in politically charged terminology over time and (b) analyze how translation and citation practices from Eastern Bloc sources reinforce the preference for the term "socialist."

Wow, here I was getting concrete examples of studies that had already examined the very issue I wanted to write about! Perfect! Now just to look them up on Google Scholar and in other academic databases…

But oops! None of the three cited articles exist (the journals do though). But the names of the fake articles are so tantalizing! They'd be perfect… if only they were real. I should note that Corpus Linguistics: Method, Theory and Practice is not a fake book, but ChatGPT says it does not directly address my question. Interesting…

So, I called ChatGPT out on its inventiveness:

Except for [C]orpus Linguistics: Method, Theory and Practice, those are all made-up sources! They don't exist (at least not the way you have cited them!).

At this point I have decided to truncate ChatGPT's response, as it gets quite repetitive at this point. Of course, it profusely apologized for "providing fabricated references," which were "not accurate." It then spat out an additional four or five non-existing references, concluding on a positive note: "Let me know if you'd like help with locating these materials or further assistance with your literature search."

Um… no thanks.

Clearly, my little report on my experience with ChatGPT's literature research capabilities is only anecdotal evidence at best. But my experience does fit into a trend. As a member of several editing groups on Facebook, I've noticed how in recent months other contributors have been mystified when working on papers that cite non-existing sources. Of course, AI was the culprit in these cases.

AI-fabricated sources have naturally become a subject of research, too. See, for example, Walters and Wilder's 2023 article "Fabrication and errors in the bibliographic citations generated by ChatGPT" in Scientific Reports. The authors found that 18% of cited journal papers in essays written by ChatGPT-4 were fabricated, whereas 70% of cited book chapters were made up. The authors write, "It is important to realize. . . that ChatGPT is fundamentally not an information-processing tool, but a language-processing tool. It mimics the texts—not necessarily the substantive content—found in its information base." This observation couldn't be more spot on. In my case, ChatGPT did an excellent job of producing article titles that mimic what would be expected of academic papers on such topics. And they sounded sooo good to my ears. I really wanted to read them!

So, what's my takeaway? Obviously, don't rely on ChatGPT to suggest literature or to conduct a literature review.

I also have the distinct feeling that ChatGPT is biased toward the user. In the case I present here, the AI immediately confirmed my hypotheses. I don't know if it's capable of simply saying "I don't know" or "You are wrong." Perhaps, if I had used more neutral language in my queries, ChatGPT would have suggested fewer fabricated sources. In any case, the University of Waterloo has produced a very helpful guide for writing bias-free prompts for use with ChatGPT: ChatGPT and Generative Artificial Intelligence (AI): Potential for bias based on prompt.

If you are a scholar, editor, translator, or anyone else who works with academic texts, I suggest you play around with ChatGPT and ask it to provide you with references or conduct a literature review. I'm sure you'll be just as dismayed as I was!

Need assistance formatting your real citations (not AI hallucinations)? Get in touch. I'm an academic copy editor who can help you get your research ready for publication.

My Experience with AI Hallucinations: ChatGPT's Perfectly Titled Sources

Advanced settings