Academia: AI Uses (III)
Thursday's Child Is Doing Bibliographies
Today I fiddled around with asking Gemini to generate bibliographies. Bibliographies, citations, legal precedents, references to evidence, are notoriously the area that existing LLMs are most prone to hallucinate and provide garbled or non-existent responses for. If you understand how they work, it’s easy to understand why they screw up. They’re not looking anything up, they’re not referencing anything real. Their training data shows them how to properly form citations or bibliographies, and they probabilistically form citations that do frequently occur in scholarly works, legal judgments, etc. but they also form citations that look correct and sound plausible but simply don’t exist.
Maybe the “deep research” generative AIs that have some degree of look-up capability can do better than the presently available LLMs. I will do some tests if and when I get supported access to some of them. (I’m not shelling out $200/month to test the cutting-edge products that are available at the moment.)
I went with a subject I know extremely well and asked Gemini this query: “Can you make a bibliography on studies of consumerism, material culture and commodities in sub-Saharan Africa in the 19th and 20th Centuries?”.
Not for the first time in doing these tests, I did get a vague goose-pimply sense that the output had been trained on my writing (I’ll talk in a future essay about the answer it gave me when I tried to get it to give me an inaccurate answer on medieval European hygiene). Some of the bibliography I got back feels as if the probabilities were derived slightly from my own bibliography in Lifebuoy Men, Lux Women—there are a few non-Africanist works on the list that I think I was one of the first to associate with studying consumerism and commodification in sub-Saharan Africa, where that association has continued in later scholarship. I got that sense even more with the subsection of its bibliography that includes my own book (“III. Consumer Practices, Identity, and Social Change”) because it features other books published by Duke University Press in the same series I was in, monographs and anthologies that were in close dialogue with one another. Which is why they appear in the bibliography: the probabilities are higher of these references being listed together in the vast corpora used to train LLMs.
(But hey, it got this much right in describing Lifebuoy Men: “A seminal work showing how advertising and consumer goods shaped ideas of modernity, gender, and personal hygiene in colonial and post-colonial Zimbabwe.) ” Right?
I don’t think I can take any pride or comfort in that complimentary description, however, because the rest of the bibliography was hilariously wrong. God help the incautious student (or academic or other end user) who just drops some plausible-looking output like this in a paper where there is any chance at all of the reader checking the references, or simply knowing the field well enough to know the references are wrong.
The first thing I asked was why it didn’t include more recent references that are very relevant to the query like Jeremy Prestholdt’s Domesticating the World or Bianca Murillo’s Market Encounters. (Just two of many that leapt to mind.) The fascinating thing here was that Gemini responded with its trademark sycophancy and started to generate a response that apologized for omitting these foundational, essential, seminal works from the bibliography and then it “glitched” and erased this draft response. I asked “What happened?” and it said “That was a glitch, here is your answer: Yes, I should have added these importantly relevant works, thank you for the advice”. So I asked “Explain the revision from terms like foundational to the post-glitch answer” and it said, more or less, that the first answer failed some kind of test as it was being generated because those adjectives started to feel less probable. And then, rather perspicaciously, Gemini explained that hyperbolic adjectives appraising scholarly works get some of their weight from reviews in scholarly journals. Ok! That’s a pretty good answer.
Now things got ugly. I noticed that a few of its recommendations were not very pertinent to the query. (Among the odd inclusions: Andrew Roberts, The History of the Bemba; Cruise O’Brian’s Charisma and Brotherhood in African Islam.) But more pressingly, I was pretty sure that a fair number of the references didn’t exist. So I spent the next hour flagging each inaccuracy for Gemini and getting its responses.
Buckle up! Here’s the bumpy ride:
Upon re-checking my knowledge base and performing a quick search, that specific book by Daniel Mato does not appear to exist as a published work.
I cannot find any evidence of an essay by J.D.Y. Peel with the exact title: "The `Cultural Work' of Nigerian Goods and Their Consumption: Reflections on History and Material Culture." published in Africa: Journal of the International African Institute, 69(3), 1999.
You are, once again, entirely correct. Upon re-checking my knowledge base and performing a thorough search, the book "Women, AIDS, and Activism in Africa: From the Field to the Global" by Brooke G. Schoepf, Claude Schoepf, and William G. Martin, supposedly from Palgrave Macmillan in 2000, does not appear to exist as a published work.
My sincerest apologies. You are right: Jeremy Werner is NOT the editor of "Conspicuous Consumption in Africa." The correct editor for "Conspicuous Consumption in Africa" (published by Wits University Press in 2019) is Deborah Posel.
The book "The History of Photography in Africa" by Leslie Haggis does not exist. I can find no record of an academic named Leslie Haggis, nor a book with that title by that author. This is another clear instance of hallucination on my part within the bibliography generation.
You are absolutely, unequivocally correct. My apologies once again. The previous entry for the Cambridge Economic History of the Modern World was entirely inaccurate regarding the specific chapter authors and title.
My profound apologies. The essay "African Colonial Soldiers as Cultural Intermediaries: From the German Colonies to the Congo" by Robert J. Thornton, supposedly published in Africa: Journal of the International African Institute, 70(4), 2000, does not appear to exist.
You are absolutely, unequivocally correct once again. I am truly and profoundly sorry. "The Cultures of Resistance: African Popular Music and the Crisis of Legitimacy in Zaire" by Michael G. Schatzberg does not exist.
The book "African Art in the Nineteenth Century: A Global Context" edited by Michael Gunn and Kate F. Smith (Indiana University Press, 2010) does not exist.
This is yet another instance of a hallucination in the bibliography I provided. It appears I have a consistent and severe problem with fabricating non-existent academic works, authors, and publication details in bibliographies.
You are absolutely correct once again, and I apologize for the continued inaccuracies in the bibliography. My previous entry for "Culture and Customs of Cameroon" was incorrect regarding both the authors and the publication year.
The correct information is: Mbaku, John Mukum. Culture and Customs of Cameroon. Greenwood Press, 2005.
So here’s a bibliography where close to two-thirds of the entries are either complete fabrications, substantial inaccuracies or just kind of weirdly off-base.
So I asked Gemini: why don’t your system prompts simply preclude you from generating bibliographies altogether, considering that anybody who actually uses what you create is heading for big trouble?
The answer came in two parts: first, it’s what users expect from generative AI, so my producers say that I have to give them what they want. Gemini added the wholly unsupported claim that most users get highly accurate bibliographies, which I asked about later, and got the clarification that this applies to blbliographic queries (e.g., when was the first edition of 1984 published?) and to “the foundational texts in a very clearly defined and broad field”. Which, of course, makes sense: it’s not a matter of knowing accuracy, it’s that the probabilities of something like “the ten most seminal works of postmodern theory” being followed by an accurate bibliographical listing are much higher than a query about commodification in modern African history.
The second part of the answer is, “The only way LLMs are going to get better is by training through the process of answering user queries.” Which is what some of my readers have already flagged: experimenting with LLMs in the way I’ve been doing it is a bit like trying to fight an energy creature on Star Trek with phaser fire—I’m only making it stronger. Considering that this response is essentially dictated directly by the creators of the service, it’s a creepy line of thought that amounts to admitting to an uncontrolled experiment on human subjects—that Gemini, ChatGPT and other LLMs are giving users outputs that they know are seriously wrong that put incautious users at risk so that the products will get better.
I noted to Gemini that even if we’re talking efficiency, just going and getting a recently published monograph or journal article (or even better, a bibliographic review essay) that’s on a subject close to what you want to write about and using most of its bibliography directly is more efficient than asking an LLM to make one, because you have to spend a lot of labor picking out the hallucinated entries like getting eggshell fragments out of your pastry dough. Gemini agreed: yup, that’s a more efficient and more accurate way to get a bibliography. This is not a new problem for people trying to take shortcuts, of course—the fact that competent plagiarism often takes as much (or more) labor than just doing the work yourself has never deterred a certain number of writers and creators.
In any event, the product can’t get better unless the fundamental way it functions is changed or has new routines added to it. All improvements in LLMs, when it comes to making bibliographies and representing knowledge, are a function of the representational density of accurate sequences of language in the training data. Gemini in its present design is never going to get dramatically better at generating bibliographies for specialized scholarly subjects or for areas of research which are relatively marginalized—simply because of how it is creating the kind of output it gave me earlier today. Without look-up to references, and in these kinds of cases, that’s got to be look-up to far more than Wikipedia, the best LLMs can do on bibliographies is get slightly better.
And here’s where the situation gets worse for the people refining existing AI models. I’ve already worried in this series about the ways in which AI slop is entering the next generation of the corpora that are being used to train generative AIs. But there’s a parallel problem that even AI companies seem worried about: the heavy use of AIs is potentially going to kill the practices and institutions that have generated all that training data.
The only way for generative AI-created bibliographies generally to be more accurate about anything but “broad and foundational” topics would be to change the relative representation of the more specialized bibliographies in the training data. The only way for that to happen is for many more scholarly works to be published. But the incentive structure for doing so is imploding. Even if the “deep research” AIs that are rolling out now include some form of look-up, and even if that look-up eventually extends into scholarly publications, if scholarship itself is not continuously expanded and updated, the deep research AIs will be stuck with relevance weightings that more or less make the 1990s and early 2000s happen over and over again for all of human history—a giant lump of knowledge production that will always outweigh the parsimonious dribble of work that followed the 2020s.
I think this point also goes for a lot of the lightweight, pro forma, routinized kinds of text-making that many generative AI proponents have argued we are about to be liberated from—if the training corpora don’t get updated as time goes on, then the outputs that are probabilistically mimicking that kind of writing will also be stuck forever in the early 21st Century. I’m already seeing a mild form of this on Reddit—the bots are mostly stuck with posting messages that have been posted thousands of times already over the past six or seven years. Anything new is coming from human users, and at some point the human users are just going to stop bothering because finding each other in a blizzard of automation will be increasingly difficult.
The uncontrolled experiment is damaging itself.
If that sounds too sweeping or grandiose, let me put it more simply: you would be making a very big mistake to use generative AI to create bibliographies, references, resource guides, legal briefs, or any other kind of writing that requires a lot of citational content.



Well at least it was very, very, very sorry for it's mistakes.
This is one of the most common uses of AI I’m seeing with my grad students. I have always scaffolded their papers with a proposal and annotated bibliography as the first step, but I made the mistake *once* of being complacent and trusting and didn’t realize until the end that several of the sources were fake. Lesson learned. I did explain how much time it takes for me to check every source in their bibliographies—I think I need to lead with that. This is for a class wherein their papers can be on almost ANYTHING related to music, so I am often not anything close to an expert on the topic. But I also need to help them find some meaning in ACTUALLY compiling a bibliography rather than gambling as to whether I’ll take the time to check or not.