Till We Have Built Jerusalem?: A Short Note on Archival Digitization and Humanistic Research (Reading, if Not The Read)
Friday's Child Is Worried About the Future
A day late on this one. Not quite my usual Read of a particular text, more on reading as an experience.
Anybody remember those ads where some guy comes into what seems like a fleabag motel out in the desert somewhere and as he checks in he’s informed that in his room he’ll be able to watch or listen to everything ever made or recorded? And he asks in wonder, “How is that possible?” The ad promises that the technology to make it possible is right around the corner, that this guy has stumbled into our future. And when we all first saw the ad, we were like the guy: how is that possible? Not everything! No way!
Of course it is possible. Technologically we have arrived in that future, more or less. But of course it is not at all the experience that any of us have, even if we pay for every streaming service known to humankind. There are films which are digitally available and yet not on any streaming service at all. There are films which are available in one country or one market but not in another.
Culture’s one thing. Reading and researching in libraries and archives is another.
Libraries have always had to balance the continuity provided by catalogs and structure with the fundamental discontinuities involved in reading more than one text. Even before the advent of digitization, you could take the entire contents of a shelf where all the texts had the same Library of Congress catalog number and find in reading them one after another that they were strikingly different—that they might be separated by decades (and thus often by much more than mere time), by the temperament of the authors, by the genre or form of the book, and much more. Only after a great amount of labor—essentially in the acquisition of expertise—would the unity implied by the catalog number begin to fully flower in the mind of the reader, only then does a sense of reading a larger meta-text or inter-text develop. But the catalog numbers and the larger structure of the library at least lightly nurtured that possibility—as did the continuous nature of the books themselves.
Archives generally conceal their continuities far more fiercely—thank goodness, for those of us whose expertise lies in using them as evidence about the world both past and present. Materials make their way to archives haphazardly (and are sometimes destroyed en route, whether by presidents who flush documents down toilets or whole organizations stuffing their entire documentary existence into burn bags) such that a reader can thread through texts which reference other texts where the question of whether the other texts can ever be read is a mystery (and one that even archivists themselves—or surviving authors of the documents—are often helpless to resolve). In the last few weeks, I’ve read materials that were bound together in a series of volumes that were plainly intended to be consulted as continuous (and yet some of those are only held together by categorical conceits that are no longer coherent or evident) and other binders and folders full of papers that were collected only after the fact by a secretary or office worker so that they could be filed away and likely forgotten. Reading through any of these materials is a matter of stops and starts. You drop out of one file halfway through to see if you can discover the names or designations of other files and conversations mentioned in the one in front of you. In others the language doesn’t matter—it is merely that the file’s existence alone is evidence of something. In others still, you can decide to read the language as important when no one else has—not searching for the intent of historical actors or their worldview, not reading a text as communication but as representation, as an imagining of the world with language that masters its authors rather than the other way around.
I am thinking on the way that reading is experienced in archives, in libraries, in large collections of texts, because I am wondering partly where it’s all going. I am familiar with where it’s been, and that cautions me. The conventions of structure and organization and metadata that I take for granted are relatively young and often fragile. The index, the footnote, the table of contents, the title or document number, the abstract, the bibliography (and the authority of the bibliographers), the notion of institutions whose entire mission is to collect texts and then provide access to them, only cohered fully together in the 20th Century. National archives were born alongside documentary states that created vast domains of text to coordinate their governance, direct their power, know their citizens and residents, understand their territories, communicate with other governments and so on. The materials kept by pre-national, pre-modern states that relied on literacy are typically much narrower in scope and documentary ambition and their survival into the present an even iffier or more unpredictable matter with a few notable exceptions. So I can’t stand on what we have had in libraries and archives and say, “This is an immortal and essential fact of life that we must not allow to change in any way.”
And yet I am troubled when I try to make my way through the digital catalog of a library where the catalog is a product purchased from a large for-profit corporation rather than maintained by the staff of that library. I am concerned when accessing digital materials ostensibly available through that catalog that the experience of reading through can be so disconcertingly discontinuous in several ways. I don’t always know whether what is actually listed in the catalog is genuinely available. Clicking on some links (while already logged into a proxy or VPN, so that I’m a permitted user) sometimes dumps me in another institutional repository where I have to re-enter the search terms I used in the first place to get the text that was attested to at my point of entry. Sometimes the internal search engine used at that repository is of such poor quality that the search fails and I have to do a whole new discovery process to finally find the thing I already found once. Sometimes it never appears regardless: it is a ghost haunting one catalogue, refusing to manifest at its haunting home.
Increasingly the description of the text that appears in the corporate catalog is a kind of metadata extrusion from its holder which may not accurately describe at all what the text really is. That’s always been a problem, certainly. As I once infamously observed of the Library of Congress subject headings, they were once upon a time so finely-tuned that they only described a single book shorn of all its conceptual and categorical relations, discoverable only if you already knew the book existed. But now I might see in top-level catalog entries an exhaustively detailed list of chapters in an anthology (each of them discoverable as a separate catalog entity as well as all together) or I might simply see a title and author of something that I only know is not a book nor a film. Sometimes the relevance sorting on a search returns ten reviews of a book before it returns the book itself.
You fall through portals as precipitously and peculiarly as in the computer game of the same name (though so far, no artificial intelligence promises cake at the end of your journey). Sometimes you don’t know: where is this text, exactly? What is it? How do I get home from here?
In archives, I’ve been far more accustomed to expect the peculiarities of what is held and withheld within their walls. (Though it is still astonishing to come across a file on British communists in Africa from 1950, say, and see that it will not be available for researchers to view until 2031, if then.) Some big national archives have in fact gotten even better about directing researchers to materials held in other archival collections. The catalog of the UK National Archives now does so in a way I really appreciate, for example, making far more discoverable even if it is not there. Except, on the other hand, for the creeping growth of exactly the same thing that’s happening to catalogs. Occasionally a discoverable document turns out to have been digitized (hooray!) only for it to turn out to be behind a private company’s paywall (what?). If you’re inside the building, you can view it, but if you’re a citizen of the UK (or elsewhere) finding that document in that catalog means you’ll have to get through that paywall on your own recognizance. I can, because my own institutional library has a portal to it. But not everybody could.
That plainly raises the same questions that privatization everywhere raises, and especially sharply so in the U.K., where a well-functioning network of public goods has been systematically dismantled by four decades of governance by both the Conservatives and New Labour and turned into a failing shambles that sharply aggravates inequality. That paywall feels a bit like the first shot across the bow, the sort of thing that ends with poor-quality chairs in a public park costing an hourly fee to sit in.
But bracket that. I’m more concerned in this context with again thinking about how that feels as a reading experience, to go from files on the desk all held in one place (sometimes references other files all held in another place, to be viewed on another desk) to having to toggle between files on the desk and files on the desktop. Especially when on the desktop, the user interfaces the reader has to navigate and work through turn out to be a higgledy-piggledy collection of different forms and permissions, some of them seemingly built to inhibit reading to the maximum extent possible. (I’m looking at you, Adam Matthew.)
We talk a lot now in public culture about how kids these days don’t know how to read a full book, that they are accustomed to read in fragments. But much of how we now store, access and work with the exalted book or journal article whose continuously coherent expanse is so celebrated by the readers who bemoan kids these days requires a more and more fragmented approach to reading. Not because of inability to concentrate or devote attention, not because of a fear of length or depth, but because how we store, access, circulate, catalog and provision the vast repositories at the heart of our reading culture is so increasingly governed by a kind of seizure of a commons, riven by fences and guarded portals, undersupported and underprovisioned. Even if the kids these days—or the old men and women—were perfectly able to and welcoming to reading across vast textual domains, they would have to cross a hundred borders in any journey, presenting different travel documents at each, traversing infrastructures that almost seem meant to break any sense of a continuous experience of reading, interpreting or thinking.
Readers and reading have survived—even flourished—in such chaotic, uncurated, competitive environments of creation and ownership before, whether in reading market literature in Onitsha or early modern British print culture. Most of those moments that I can think of, however, that chaos unleashed a flood of texts upon readers. The discontinuities of reading were a matter of dealing with excess and access, of learning to live with being a kid left alone in a well-stocked candy store. Now it seems to me we are left to manage something else: literature going behind paywalls, documents falling into private hands, the sources of information about texts and their reading being an algorithmic black box that alternatively enables and frustrates our encounter with writing, film, art, images, documents and data. Reading is not spilling out into a public world; it is being locked up in towers and going into strange, untracked cellars and wildernesses, becoming hard to see and harder to follow. It’s not a technological thing at all. We could be at this moment staying at that motel where everything we could read was readable by all. If that is fading out of reach, it is for other reasons—and if the experience of reading is as a result fragmenting and being pulled towards strange attractors, then that is not the fault of the readers, of whatever generation.