Some years ago, I made a few waves with an ill-tempered call to “burn the catalog”. Specifically what I meant was the Library of Congress subject headings, which I argued had become profoundly useless for the purpose of discovery, whether by undergraduates or skilled researchers. My main point was that conventional approaches to expert bibliographic control had produced subject headings that were so fine-grained and specific that books and journal articles were being herded into categories that they would be the sole inhabitants of for decades to come. In that case, you’d only find that publication via a subject search if you already knew it was there—and you’d only use the subject term in a keyword search if you already knew a lot about the subject.
In some alternative cases, you’d come across a book or journal article that was associated with a subject heading that was ridiculously broad and thus equally unhelpful. In the meantime, actual clusterings of scholarly work that were in fact in meaningful dialogue with one another were discoverable only through looking at bibliographies and citations within one work—and that had the drawback that older work couldn’t cite its way forward to subsequent work that belonged within the cluster.
I’ve been experimenting in the last week with some searches in different catalogs and realizing that things have gotten far worse. First, some of my favorite work-arounds are now completely crippled. I used to tell people to use Amazon’s “people who bought this book also bought” as a crude form of crowd-sourcing the process of discovery, where you’d discover there was a cluster of work that was connected essentially because some other set of readers out there knew there was such a cluster. Amazon no longer particularly cares if you find what you’re looking for, or if you find something that you didn’t know you were looking for but that is highly germane to what you are thinking about. Now you only get information that someone is paying Amazon to make you see (various “sponsored” searches) or other information that is part of their revenue streams. Amazon now only takes information from you: it doesn’t give anything back.
But the unravelling of bibliographic control or cataloging more generally as a public good is sufficiently advanced that the catalogs that corporate vendors provide are now highly balkanized and therefore also just profoundly chaotic in terms of discovery-oriented searches.
When I do test keyword searches in my local institutional catalog that is a product owned by ProQuest, I rarely get the results that I would expect in any kind of reasonably well-functioning relevance weighting.
For example, earlier this week, a colleague asked me about my recommendations for readings that would help familiarize him with the historiography on East Africa. I don’t expect bibliographic control or any form of cataloging to solve a basic high-level problem with that request, which is that the useful answer to that query is not going to be about East Africa as an overall region—that the last ‘layer’ of work by historians which attempted to be regional in a comprehensive way is fairly old. You need an area expert to field the query so that any search processes fit the actually-existing scholarship.
What I wanted to do was to do a bit of discovery work to make sure I’m not missing recent scholarship, since it’s a field I don’t keep up on as closely as I do southern Africa and I haven’t taught to the field for about for five or six years. So I knew that I might get back useful results if I looked for recent scholarship on coastal Swahili communities or society; national histories in the region (including work that uses national names to frame colonial and precolonial studies); work associated with a handful of prominent ethnonyms (Kikuyu, Luo, etc.); and work associated with the “Great Lakes” region and the Horn of Africa as a region. Also perhaps work on the Indian Ocean as an overall region. All of this with the goal of finding good “entry-level” scholarship for a historian who doesn’t study African history.
The other thing I had in hand was a list of books and authors I knew I’d recommend already—and that’s not a bad place to start with a discovery-inspired search, I thought.
How about John Middleton, The World of the Swahili?
In one listing in our catalog, the one that’s a JSTOR e-book, here’s the subject heading:
Swahili-speaking peoples
For the physical copy:
Swahili-speaking peoples -- Commerce
Swahili-speaking peoples -- Kinship
Swahili-speaking peoples -- Social life and customs
Lamu (Kenya) -- Commerce
Zanzibar -- Commerce
Zanzibar -- Social life and customs
Same book! Different listings.
But not, peculiarly, Swahili-speaking peoples—History despite the fact that the book most assuredly is about that subject. (Also, oddly, the physical copy has become linked visually to a thumbnail of a book titled Schnitzler, Hofmannsthal, and the Austrian Theatre.)
Partly as a result, a keyword search for “Swahili history” doesn’t turn up Middleton’s 1992 The World of the Swahili in the first three pages of results. It does turn up a couple of comparably good overviews by Derek Nurse & Thomas Spear and James de Vere Allen but scattered amid articles that are highly specialized, are literary criticism, are book reviews or otherwise just not very good cites to pop up high on a relevance listing. It doesn’t turn up a 2020 anthology called The Swahili World that is bursting with good newer scholarship. It doesn’t turn up a later 2000 book by Middleton that is a good update on his 1992 book.
Look at what the subject headings for The Swahili World are:
Swahili-speaking peoples -- Africa, East -- Civilization
Swahili-speaking peoples -- Africa, East -- History
Swahili-speaking peoples -- Africa, East -- Social conditions
Swahili-speaking peoples -- Africa, East -- Economic conditions
Africa, East -- Antiquities
What do you get with Africa, East added to the mix? Only The Swahili World and books by one of the editors of The Swahili World, Stephanie Wynne-Jones.
Middleton’s 2000 book The Swahili: The Social Landscape of a Mercantile Society?
Swahili-speaking peoples -- History
Swahili-speaking peoples -- Commerce
Swahili-speaking peoples -- Social life and customs
Merchants -- Africa, East -- History
Muslims -- Africa, East -- History
Mercantile system -- Africa, East -- History
Africa, East -- Social life and customs
By any standard, these are books that are deeply interrelated—they are all absolutely talking to one another about the same subject—but here we have barely-adjacent subject headings that are separating them, keyword searches that don’t fetch some of them (or put them way down the relevance hierarchy).
We have faceting in our catalog and that does relate them, but it chunks that relationship up with a bunch of much older work and with primary sources that are barely connected. (Lord Lugard’s diaries and a 1955 book on the Capricorn Society, for example.)
And these are instruments we expect our students and our publics to use in finding information or conducting college-level research? I can only do discovery in these environments because I know so much already. Sure, yes, because of that, I also have a good working understanding of why these books aren’t connected, and some of that isn’t the fault of any catalog or cataloguer, it’s about disciplinary differences of authors (history, anthropology, archaeology) and about some of the subjects addresses thematically within each work. That’s always been a root of some of the confusion that infests any historiography and a reason why you actually need professional training to do this kind of scholarship. There’s another confusion that’s not rooted in the catalog: Swahili is a language, a culture, and an ethnonym. But even knowing what I know, I leave this discovery search uneasy, almost certain that I’ve missed a good, relevant, “entry-level” work published more recently than the ones I know best. (That sensation is at least one of the reasons that “imposter syndrome” afflicts academics so much as they age.)
The problem is now much deeper than any particular practice of bibliographic control or expert cataloging, at any rate. Over and over again in our catalog, I don’t get the results I’d expect on almost any search where I know what ought to appear at the top. My Swahili searches were if anything some of the best outcomes. The situation doesn’t seem better in other search environments. WorldCat gives me a couple of great results in its first page for “Swahili history” that our catalogue didn’t but it also returns some anthologies and monographs that only have one relevant chapter or touchpoint to what I’m looking for. WorldCat pushes the LC subject headings out for most of the books but others as well, some of which return ridiculously generic search results. (I’m sure they’re not there to be used for further discovery but instead to help books get called up in keyword searches. In no imaginable discovery process is clicking on the subject heading “culture” after finding The Swahili World going to do any inquirer good.) The Library of Congress catalog is still more stringently organized and unfriendly to the keyword-search styles that have come to dominate most discovery work. If you know how to work it, you at least get a good accounting of the subject headings overall. But once again, work is completely jumbled up unless the user applies another filter (a second subject heading, a publication date, etc.)
It’s not just library catalogs or other expert-driven tools, of course. Everyone is noticing that search in Google (or any other big commercial search engine) is approaching uselessness. There’s all sorts of reasons why: the search engines are losing the battle against search optimizers, the results are clogged with bait-and-switch garbage choked with ads or with information stolen from Wikipedia or Reddit, and most importantly, Google no longer really gives a fuck whether you find what you’re looking for, just as Amazon no longer is in the business of actually helping you find something you’d like to buy. That’s not how they get paid off now. The problem is that increasingly that same reality has come inside the house of the tools that people whose business is to produce knowledge and teach students want to use.
ProQuest (or now its parent, Clarivate) isn’t getting paid off because its cataloging products have especially effective relevance algorithms and its business model doesn’t include room for the massive staff of expert cataloguers that might help build new structures to improve discovery outcomes. It’s getting paid off the same way that a landlord gets paid off: because it owns the building. At this point one of the few buildings left in a city where there’s a lot of people who need an apartment. All the buildings in town have the same number of roaches in the kitchen and peeling paint on the walls. It doesn’t matter how much money you have to spend: that’s all that’s on offer, wherever you turn. The staff in the building are lovely people and very helpful, but they can’t fix the problems that only the owners can address, and the owners don’t have any reason to fix those problems.
But it means that any faculty member who isn’t really paying attention to what’s going on (often despite librarians and information scientists trying to tip them off) doesn’t really understand why it is that many students can’t really do research effectively. They can’t do research effectively unless the faculty provide a ton of guardrails that highly constrain the outcomes of a discovery search because the search environments are capricious, cluttered and confusing.
Image credit: Photo by fauve othon on Unsplash
Thanks. And ha ha chipmunks. I wonder if some enterprising regulator would ever look at Proquest as a monopoly. I haven’t read the burn catalogue piece-maybe repost here? I agree article searching is the worst-bc of reviews. I find new Jstor awful to use but maybe a refresher with a librarian would help. I have the grad student desk assistant physically type in all the new books in AHR I highlight and my specialist journals then populate them in running bibs. It’s crazy that this old fashioned way is the workaround. It still doesn’t work for actual articles but I suppose I should be making more of a diligent effort to read once per week.
Thank you for this. I noticed this recently when I had to do a subject search instruction when there was a snafu with scheduling my institution’s librarian. If you are a historian of medicine it’s even more complicated bc those books use the MESH subject headings in some settings. I hadn’t realized that Proquest is the spine of keyword and subject searches for library catalogues and that Proquest uses different subject terms than the LOC or MESH-did I get that right? Is there variation between institutions? Ivies catalogues and big cross-systems like the UCs using a better keyword/cataloging system than say a SLAC or an R2? To familiarize my students with subject terms (and to populate them) I always have them do a discovery assignment/homework. I give them a handout that asks them to plug in one of the books for our class in an Ivy’s catalogue (bc they have more books on hand) and Worldcat, see what subject terms this yields and then have them find five more books or articles using variations on those subject terms. It was somewhat useful-at the very least it got them to do this on their own outside of a library instruction session. Outside of Amazon, have the changes you wrote about here (the takeover by Proquest) occurred in approximately the last 5 years or so?