Tools – Found History

What’s in a name? AI, LLMs, Chatbots and what we hope our words will accomplish

There’s a lot of debate in academic circles about what to call ChatGPT, the new Bing, Bard, and the set of new technologies that operate on similar principles.

“AI” or “Generative AI” are the terms preferred by industry. These terms are rightly criticized by many scholars in the fields of communications, science and technology studies, and even computer science as not only inaccurate (there is nothing “intelligent” about these systems in the way we usually think about intelligence; they have no understanding of the text they produce) but also as crass marketing ploys. These critics say these terms perpetuate a cynical “AI hype cycle” that’s simply intended to drive attention and investment capital to the companies that make these technologies—and deflect attention from their harms.

But I worry that the use of the term “AI hype cycle” isn’t doing the work its proponents think it is. My understanding is that those proponents hope the term will help people see through the breathless predictions of corporate interests and focus instead on the harms these interests are perpetrating in the present. I worry, however, that it reads, to the casual observer, as “nothing to see here.”

Surely that’s not the outcome we want.

“Chatbot” is another contender for the term of choice. And indeed it describes the kinds of interactions that most people are having with these technologies at the moment. But as Ted Underwood has said, “chatbot” doesn’t capture the range of appliacations that these technologies enable beyond chat through their APIs. They’re clearly more than the chatbots we used 20 years ago on AIM.

“LLM” (“Large Language Model”) seems to be preferred by academics (according to Simon Willison’s Mastodon poll), because it’s more accurate. LLM surely does a better job of describing of how these technologies work—by statistically predicting the next most likely word in a sequence—not by suggesting there’s any kind of deeper understanding at work.

But “LLM” has a serious problem. It’s an acronym, and therefore, it’s jargon. And therefore, it’s boring.

That leads me to my title’s second question: “What do we want our words to do?” Is the purpose of our words to asymptotically approach truth above and against all other considerations? Or do we want to draw people’s attention to that truth, even if the words we use are not quite as precise?

I have very mixed feelings about this. LLM is more accurate. But it’s also easy (for ordinary people, decision makers, regulators, politicians) to ignore. That is, it’s boring. I understand not wanting to amplify Silicon Valley’s hype, but we also don’t want to downplay the likely consequences of this technology. There has to be a way to tell people that something will be transformative (quite probably for the worse), and command their necessary attention, without being a cheerleader for it.

For example, the term “World Wide Web” was both completely hype and totally inaccurate when it was coined by Tim Berners Lee in 1989. But after the Internet being ignored by pretty much the entire world for 20 years, “the Web” surely captured the popular mind and brought the Internet to public attention.

Conversely, the terms “SARS-CoV-2” and “COVID-19” were both sober and accurate. But I worry that the clinical nature of the terms enabled people who were already predisposed to looking the other way to do so more easily. Calling it “Pangolin Flu” or “Raccoon Dog Virus” would have been less accurate, but they would have caught people’s attention. “Bird Flu,” “Swine Flu,” and Zika Virus, which have killed very few people in this country get a ton of attention relative to their impact. Surely terminology wasn’t the root cause of our society’s lazy response to the pandemic. But I don’t think it helped.

Now, I am NOT suggesting we develop intentionally misleading terminology just to get people’s attention. We can leave that to Silicon Valley. But I am suggesting that we think about what we want our words to do. Do we value accuracy to the exclusion of all other considerations? In that case, an acronym of some sort may be in order. Or do we also want people to pay attention to what we’re saying?

I don’t have a good suggestion for a specific term that accomplishes both goals (accuracy and attention), but I think we need one. And if we can’t come up with one that does both jobs, and the public conversation settles on “generative AI” or some other term coined by the industry, I don’t think we should spend too much time banging our heads against it or trying to push alternatives. We’ll be better served if our many and valid and urgent criticisms of “generative AI” and its industry are heard by people than by sticking earnestly with a term that lets people ignore us.

If there’s anything we should have learned from the past 25 years of the history of the internet it’s that academics “calling bullshit” is not a plan for dealing with unwanted technology outcomes.

Academics have two very deeply held and interrelated attachments. One is to accuracy and the truth. The other is to jargon. The one is good. The other can cause trouble. I hope in this case our attachment to the first doesn’t lead us to adopt a jargon-y language that enables people already predisposed to ignore the harms of the tech industry to do so more comfortably.

Teaching and Learning with Primary Sources in the age of Generative AI

The following is a (more or less verbatim) transcript of a keynote address I gave earlier today to the Dartmouth College Teaching with Primary Sources Symposium. My thanks to Morgan Swan and Laura Barrett of the Dartmouth College Library for hosting me and giving me the opportunity to gather some initial thoughts about this thoroughly disorienting new development in the history of information.

Thank you, Morgan, and thank you all for being here this morning. I was going to talk about our Sourcery project today, which is an application to streamline remote access to archival materials for both researchers and archivists, but at the last minute I’ve decided to bow to the inevitable and talk about ChatGPT instead.

Dartmouth College Green on a beautiful early-spring day

I can almost feel the inner groan emanating from those of you who are exhausted and perhaps dismayed by the 24/7 coverage of “Generative AI.” I’m talking about things like ChatGPT, DALL-E, MidJourney, Jasper, Stable Diffusion, and Google’s just released, Bard. Indeed, the coverage has been wall to wall, and the hype has at times been breathless, and it’s reasonable to be skeptical of “the next big thing” from Silicon Valley. After all we’ve just seen the Silicon Valley hype machine very nearly bring down the banking system. In just past year, we’ve seen the spectacular fall of the last “next big thing,” so-called “crypto,” which promised to revolutionize everything from finance to art. And we’ve just lived through a decade in which the social media giants have created a veritable dystopia of teen suicide, election interference, and resurgent white nationalism.

So, when the tech industry tells you that this whatever is “going to change everything,” it makes sense to be wary. I’m wary myself. But with a healthy dose of skepticism, and more than a little cynicism, I’m here to tell you today as a 25-year veteran of the digital humanities and a historian of science and technology, as someone who teaches the history of digital culture, that Generative AI is the biggest change in the information landscape since at least 1994 and the launch of the Netscape web browser which brought the Internet to billions. It’s surely bigger than the rise of search with Google in the early 2000s or the rise of social media in the early 2010s. And it’s moving at a speed that makes it extremely difficult to say where it’s headed. But let’s just say that if we all had an inkling that the robots were coming 100 or 50 or 25 years into the future, it’s now clear to me that they’ll be here in a matter of just a few years—if not a few months.

It’s hard to overstate just how fast this is happening. Let me give you an example. Here is the text of a talk entitled (coincidentally!) “Teaching with primary sources in the next digital age.” This text was generated by ChatGPT—or GPT-3.5—the version which was made available to the public last fall, and which really kicked off this wall-to-wall media frenzy over Generative AI.

You can see that it does a plausible job of producing a three-to-five paragraph essay on the topic of my talk today that would not be an embarrassment if it was written by your ninth-grade son or daughter. It covers a range of relevant topics, provides a cogent, if simplistic, explanation of those topics, and it does so in correct and readable English prose.

Now here’s the same talk generated by GPT-4 which came out just last week. It’s significantly more convincing than the text produced by version 3.5. It demonstrates a much greater fluency with the language of libraries and archives. It correctly identifies many if not most of the most salient issues facing teaching in archives today and provides much greater detail and nuance. It’s even a little trendy, using some of the edu-speak and library lingo that you’d hear at a conference of educators or librarians in 2023.

Now here’s the outline for a slide deck of this talk that I asked GPT-4 to compose, complete with suggestions for relevant images. Below that is the text of speaker notes for just one of the bullets in this talk that I asked the bot to write.

Now—if I had generated speaker notes for each of the bullets in this outline and asked GPT’s stablemate and image generator, DALL-E, to create accompanying images—all of which would have taken the systems about 5 minutes—and then delivered this talk more or less verbatim to this highly educated, highly accomplished, Ivy League audience, I’m guessing the reaction would have been: “OK, seems a little basic for this kind of thing” and “wow, that was talk was a big piece of milktoast.” It would have been completely uninspiring, and there would have been plenty to criticize—but neither would I have seemed completely out of place at this podium. After all, how many crappy, uninspiring, worn out PowerPoints have you sat through in your career? But the important point to stress here is that in less than six months, the technology has gone from writing at a ninth-grade level to writing at a college level and maybe even more.

Much of the discourse among journalists and in the academic blogs and social media has revolved around picking out the mistakes these technologies make. For example, my good friend at Middlebury, Jason Mittell, along with many others, has pointed out that ChatGPT tends to invent citations: references to articles attributed to authors with titles that look plausible in real journals that do not, in fact, exist. Australian literary scholar, Andrew Dean, has pointed out how ChatGPT spectacularly misunderstands some metaphors in poetry. And it’s true. Generative AIs make lots of extremely weird mistakes, and they wrap those mistakes in extremely convincing-sounding prose, which often makes them hard to catch. And as Matt Kirschenbaum has pointed out: they’re going to flood the Internet with this stuff. Undoubtedly there are issues here.

But don’t mistake the fact that ChatGPT is lousy at some things for the reality that it’ll be good enough for lots, and lots, and lots of things. And based on the current trajectory of improvement, do we really think these problems won’t be fixed?

Let me give another couple of examples. Look at this chart, which shows GPT-3.5’s performance on a range of real-world tests. Now look at this chart, which shows GPT-4’s improvement. If these robots have gone from writing decent five-paragraph high school essays to passing the Bar Exam (in the 90^th percentile!!) in six months, do we really think they won’t figure out citations in the next year, or two, or five? Keep in mind that GPT-4 is a general purpose model that’s engineered to do everything pretty well. It wasn’t even engineered to take the Bar Exam. Google CEO, Sundar Pichai tells us that AI computing power is doubling every six months. If today it can kill the Bar Exam, do we really think it won’t be able to produce a plausible article for a mid-tier peer reviewed scholarly journal in a minor sub-discipline of the humanities in a year or two? Are we confident that there will be any way for us to tell that machine-written article from one written by a human?

(And just so our friends in the STEM fields don’t start feeling too smug, GPT can write code too. Not perfectly of course, but it wasn’t trained for that either. It just figured it out. Do we really think it’s that long until an AI can build yet another delivery app for yet another fast-food chain? Indeed, Ubisoft and Roblox are starting to use AI to design games. Our students’ parents are going to have to start getting their heads around the fact that “learning to code” isn’t going to be the bulletproof job-market armor they thought it was. I’m particularly worried for my digital media students who have invested blood, sweat, and tears learning the procedural ins and outs of the Adobe suite.)

There are some big philosophical issues at play here. One is around meaning. The way GPT-4 and other generative AIs produce text is by predicting the next word in a sentence statistically based on a model of drawn from an unimaginably large (and frankly unknowable) corpus of text the size of the whole Internet—a “large language model” or LLM—not by understanding the topic they’re given. In this way the prose they produce is totally devoid of meaning. Drawing on philosopher, Harry Frankfurter’s definition of “bullshit” as “speech intended to persuade without regard for truth”, Princeton computer scientists Arvind Narayanan and Sayash Kapoor suggest that these LLMs are merely “bullshit generators.” But if something meaningless is indistinguishable from something meaningful—if it holds meaning for us, but not the machine—is it really meaningless? If we can’t tell the simulation from the real, does it matter? These are crucial philosophical, even moral, questions. But I’m not a philosopher or an ethicist, and I’m not going to pretend to be able to think through them with any authority.

What I know is: here we are.

As a purely practical matter, then, we need to start preparing our students to live in a world of sometimes bogus, often very useful, generative AI. The first-year students arriving in the fall may very well graduate into a world that has no way of knowing machine-generated from human-generated work. Whatever we think about them, however we feel about them (and I feel a mixture of disorientation, disgust, and exhaustion), these technologies are going to drastically change what those Silicon Valley types might call “the value proposition” of human creativity and knowledge creation. Framing it in these terms is ugly, but that’s the reality our students will face. And there’s an urgency to it that we must face.

So, let’s get down to brass tacks. What does all this mean for what we’re here to talk about today, that is, “Teaching with Primary Sources”?

One way to start to answer this question is to take the value proposition framing seriously and ask ourselves, “What kinds of human textual production will continue to be of value in this new future and what kinds will not?” One thing I think we can say pretty much for sure is that writing based on research that can be done entirely online is in trouble. More precisely, writing about things about which there’s already a lot online is in trouble. Let’s call this “synthetic writing” for short. Writing that synthesizes existing writing is almost certainly going to be done better by robots. This means that what has passed as “journalism” for the past 20 years since Google revolutionized the ad business—those BuzzFeed style “listicles” (“The 20 best places in Dallas for tacos!”) that flood the internet and are designed for nothing more than to sell search ads against—that’s dead.

But it’s not only that. Other kinds of synthetic writing—for example, student essays that compare and contrast two texts or (more relevant to us today) place a primary source in the context drawn from secondary source reading—those are dead too. Omeka exhibits that synthesize narrative threads among a group of primary sources chosen from our digitized collections? Not yet, but soon.

And it’s not just that these kinds of assignments will be obsolete because AI will make it too easy for students to cheat. It’s what’s the point of teaching students to do something that they’ll never be asked to do again outside of school? This has always been a problem with college essays that were only ever destined for a file cabinet in the professor’s desk. But at least we could tell ourselves that we were doing something that simulated the kind of knowledge work they would so as lawyers and teachers and businesspeople out in the real world. But now?

(Incidentally, I also fear that synthetic scholarly writing is in trouble, for instance, a Marxist analysis of Don Quixote. When there’s a lot of text about Marx and a lot of text about Don Quixote out there on the Internet, chances are the AI will do a better—certainly a much faster—job of weaving the two together. Revisionist and theoretical takes on known narratives are in trouble.)

We have to start looking for the things we have to offer that are (at least for now) AI-proof, so to speak. We have to start thinking about the skills that students will need to navigate an AI world. Those are the things that will be of real value to them. So, I’m going to use the rest of my time to start exploring with you (because I certainly don’t have any hard and fast answers) some of the shifts we might want to start to make to accommodate ourselves and our students to this new world.

I’m going to quickly run through eight things.

The most obvious thing we can do it to refocus on the physical. GPT and its competitors are trained on digitized sources. At least for now they can only be as smart as what’s already on the Internet. They can’t know anything about anything that’s not online. That’s going to mean that physical archives (and material culture in general) will take on a much greater prominence as the things that AI doesn’t know about and can’t say anything about. In an age of AI, there will be much greater demand for the undigitized stuff. Being able to work with undigitized materials is going to be a big “value add” for humans in the age of these LLMs. And our students do not know how to access it. Most of us were trained on card catalogs, in sorting through library stacks, of traveling to different archives and sifting through boxes of sources. Having been born into the age of Google, our students are much less good at this, and they’re going to need to get better. Moreover, they’re going to need better ways of getting at these physical sources that don’t always involve tons of travel, with all its risks to climate and contagion. Archivists, meanwhile, will need new tools to deal with the increased demand. We launched our Sourcery app, which is designed to provide better connections between researchers and archivists and to provide improved access to remote undigitized sources before these LLMs hit the papers. But tools like Sourcery are going to be increasingly important in an age when the kind of access that real humans need isn’t the digital kind, but the physical kind.
Moreover, we should start rethinking our digitization programs. The copyright issues around LLMs are (let’s say) complex, but currently Open AI, Google, Microsoft, Meta, and the others are rolling right ahead, sucking up anything they can get their hands on, and processing those materials through their AIs. This includes all of the open access materials we have so earnestly spent 30 years producing for the greater good. Maybe we want to start asking ourselves whether we really want to continue providing completely open, barrier-free access to these materials. We’ve assumed that more open meant more humane. But when it’s a robot taking advantage of that openness? We need a gut check.
AIs will in general just be better at the Internet than us. They’ll find, sort, sift, and synthesize things faster. They’ll conduct multi-step online operations—like booking a trip or editing a podcast—faster than us. This hits a generation that’s extremely invested in being good at the Internet, and, unfortunately, increasingly bad at working in the real world. Our current undergraduates have been deeply marked by the experience of the pandemic. I’m sure many of you have seen a drastic increase in class absences and a drastic decrease in class participation since the pandemic. We know from data that more and more of our students struggle with depression and anxiety. Students have difficulty forming friendships in the real world. There are a growing number of students who choose to take all online classes even though they’re living in the dorms. This attachment to the virtual may not serve them well in a world where the virtual is dominated by robots who are better than us at doing things in the digital world. We need to get our students re-accustomed to human-to-human connections.
At the same time, we need to encourage students to know themselves better. We need to help them cultivate authentic, personal interests. This is a generation that has been trained to write to the test. But AIs will be able to write to the test much better than we can. AIs will be able to ascertain much better than we can what they (whomever they is: the school board, the college board, the boss, the search algorithm) want. But what the AI can’t really do is tell us what we want, what we like, what we’re interested in and how to get it. We need to cultivate our students’ sense of themselves and help them work with the new AIs to get it. Otherwise, the AI will just tell them what they’re interested in, in ways that are much more sophisticated and convincing than the Instagram and TikTok algorithms that are currently shoving content at them. For those of us teaching with primary sources this means exposing them to the different, the out of the ordinary, the inscrutable. It means helping them become good “pickers” – helping them select the primary sources that truly hold meaning for them. As educators of all sorts, it means building up their personalities, celebrating their uniqueness, and supporting their difference.
I think we also need to return to teaching names and dates history. That’s an unfashionable statement. The conventional wisdom of at least the last 30 years is that that names, dates, and places aren’t that important to memorize because the real stuff of history are the themes and theories—and anyway, the Google can always give us the names and dates. Moreover, names and dates history is boring and with the humanities in perpetual crisis and on the chopping block in the neoliberal university, we want to do everything we can to make our disciplines more attractive. But memorized names, and dates, and places are the things that allow historians to make the creative leaps that constitute new ideas. The biggest gap I see between students of all stripes, including graduate students, and the privileged few like me who make it into university teaching positions (besides white male privilege) is a fluency with names, dates, and places. The historians that impress most are the ones who can take two apparently disconnected happenings and draw a meaningful connection between them. Most often the thing that suggests that connection to them is a connected name, date, place, source, event, or institution that they have readily at hand. Those connections are where new historical ideas are born. Not where they end, for sure, but where they are born. AI is going to be very good at synthesizing existing ideas. But it may be less good at making new ones. We need students who can birth new ideas.
Related to this is the way we teach students to read. In the last 20 years, largely in response to the demands of testing, but also in response to the prioritization of “critical thinking” as a career skill, we’ve taught students not to read for immersion, for distraction, for imagination, but for analysis. Kids read tactically. They don’t just read. In many cases, this means they don’t read at all unless they have to. Yet, this is exactly how the AI reads. Tactically. Purely for analysis. Purely to answer the question. And they’ll ultimately be able to do this way better than us. But humans can read in another way. To be inspired. To be moved. We need to get back to this. The imaginative mode of reading will set us apart.
More practically, we need to start working with these models to get better at asking them the right questions. If you’ve spent any time with them, you’ll know that what you put in is very important in determining what you get out. Here’s an example. In this chat, I asked GPT-3.5, “How can I teach with primary sources.” OK. Not bad. But then in another chat I asked, “Give me a step-by-step plan for using primary sources in the classroom to teach students to make use of historical evidence in their writing” and I followed it up with a few more questions: “Can you elaborate?” and “Are there other steps I should take?” and then “Can you suggest an assignment that will assess these skills?” You’ll see that it gets better and better as it goes along. I’m no expert at this. But I’m planning on becoming one because I want to be able to show our students how to use it well. Because, don’t fool yourselves, they’re going to use it.
Finally, then, perhaps the most immediate thing we can do is to inculcate good practice around students’ use of AI generated content. We need to establish citation practices, and indeed the MLA has just suggested some guidance for citing generative AI content. Stanford, and other universities, are beginning to issue policies and teaching guidance. So far, these policies are pretty weak. Stanford’s policy basically boils down to, “Students: Don’t cheat. Faculty: Figure it out for yourselves.” It’s a busy time of year and all, but we need urgently to work with administration to make these things better.

I’m nearly out of time, and I really, really want to leave time for conversation, so I’ll leave it there. These are just a couple of thoughts that I’ve pulled together in my few weeks of following these developments. As I’ve said, I’m no expert in computer science, or philosophy, or business, but I think I can fairly call myself an expert in digital humanities and the history of science and technology, and I’m convinced this new world is right around the corner. I don’t have to like it. You don’t have to like it. If we want to stop it, or slow it down, we should advocate for that. But we need to understand it. We need to prepare our students for it.

At the same time, if you look at my list of things we should be doing to prepare for the AI revolution, they are, in fact, things we should have been (and in many cases have been) doing all along. Paying more attention to the undigitized materials in our collections? I’m guessing that’s something you already want to do. Helping students have meaningful, in-person, human connections? Ditto. Paying more attention to what we put online to be indexed, manipulated, sold against search advertising? Ditto. Encouraging students to have greater fluency with names, dates, and places? Helping them format more sophisticated search queries? Promoting better citation practice for born-digital materials and greater academic integrity? Ditto. Ditto. Ditto.

AI is going to change the way we do things. Make no mistake. But like all other technological revolutions, the changes it demands will just require us to be better teachers, better archivists, better humans.

Thank you.

Innovation, Use, and Sustainability

Revised notes for remarks I delivered on the topic of “Tools: Encouraging Innovation” at the Institute of Museum and Library Services (IMLS) National Digital Platform summit last month at the New York Public Library.

What do we mean when we talk about innovation? To me innovation implies not just the “new” but the “useful.” And not just the “useful” but the “implemented” and the “used.” Used, that is, by others.

If a tool stays in house, in just the one place where it was developed, it may be new and it may be interesting—let’s say “inventive”—but it is not “innovative.” Other terms we use in this context—”ground breaking” and “cutting edge,” for example—share this meaning. Ground is broken for others to build upon. The cutting edge preceeds the rest of the blade.

The IMLS program that has been charged and most generously endowed with encouraging innovation in the digital realm is the National Leadership Grants: Advancing Digital Resources program. The idea that innovation is tied to use is implicit in the title of the program: the word “leadership” implies a “following.” It implies that the digital resources that the program advances will be examples to the field to be followed widely, that the people who receive the grants will become leaders and gain followers, that the projects supported by the program will be implemented and used.

This is going to be difficult to say in present company, because I am a huge admirer of the NLG program and its staff of program officers. I am also an extremely grateful recipeint of its funds. Nevertheless, in my estimation as an observer of the program, a panelist, and an adwardee, the program has too often fallen short in this regard: it has supported a multitude of new and incredibly inventive work, but that work has too rarely been taken up by colleagues outside of the originating institution. The projects the NLG program has spawned have been creative, exciting, and new, but they have too rarely been truly innovative. This is to say that the ratio of “leaders” to “followers” is out of whack. A model that’s not taken up by others is no model at all.

I would suggest two related remedies for the Leadership Grants’ lack of followers:

More emphasis on platforms. Surely the NLG program has produced some widely used digital library and museum platforms, including the ones I have worked on. But I think it bears emphasizing that the limited funds available for grants would generate better returns if they went to enabling technologies rather than end prodcuts, to platforms rather than projects. Funding platforms doesn’t just mean funding software—there are also be social and institutional platforms like standards and convening bodies—but IMLS should be funding tools that allow lots of people to do good work, not the good work itself of just a few.
More emphasis on outreach. Big business doesn’t launch new products without a sale force. If we want people to use our products, we shouldn’t launch them without people on staff who are dedicated to encouraging their use. This should be refelected in our budgets, a much bigger chunk of which should go to outreach. That also means more flexibility in the guidelines and among panelists and program officers to support travel, advertizing, and other marketing costs.

Sustainability is a red herring

These are anecdotal impressions, but it is my belief that the NLG program could be usefully reformed by a more laser-like focus on these and other uptake and go-to-market strategies in the guidelines and evaluation criteria for proposals. In recent years, a higher and higher premium has been placed on sustainability in the guidelines. I believe the effort we require applicants to spend crafting sustainability plans and grantees to spend implementing them would be better spent on outreach—on sales. The greatest guarantor of sustainiability is use. When things are used they are sustained. When things become so widely implemented that the field can’t do without them, they are sustained. Like the banks, tools and platforms that become too big to fail are sustained. Sustainability is very simply a fuction of use, and we should recognize this in allocating scare energies and resources.

Where's the Beef? Does Digital Humanities Have to Answer Questions?

The criticism most frequently leveled at digital humanities is what I like to call the “Where’s the beef?” question, that is, what questions does digital humanities answer that can’t be answered without it? What humanities arguments does digital humanities make?

Concern over the apparent lack of argument in digital humanities comes not only from outside our young discipline. Many practicing digital humanists are concerned about it as well. Rob Nelson of the University of Richmond’s Digital Scholarship Lab, an accomplished digital humanist, recently ruminated in his THATCamp session proposal, “While there have been some projects that have been developed to present arguments, they are few, and for the most part I sense that they haven’t had a substantial impact among academics, at least in the field of history.” A recent post on the Humanist listserv expresses one digital humanist’s “dream” of “a way of interpreting with computing that would allow arguments, real arguments, to be conducted at the micro-level and their consequences made in effect instantly visible at the macro-level.”

These concerns are justified. Does digital humanities have to help answer questions and make arguments? Yes. Of course. That’s what humanities is all about. Is it answering lots of questions currently? Probably not really. Hence the reason for worry.

But this suggests another, more difficult, more nuanced question: When? When does digital humanities have to produce new arguments? Does it have to produce new arguments now? Does it have to answer questions yet?

In 1703 the great instrument maker, mathematician, and experimenter, Robert Hooke died, vacating the suggestively named position he occupied for more than forty years, Curator of Experiments to the Royal Society. In this role, it was Hooke’s job to prepare public demonstrations of scientific phenomena for the Fellows’ meetings. Among Hooke’s standbys in these scientific performances were animal dissections, demonstrations of the air pump (made famous by Robert Boyle but made by Hooke), and viewings of pre-prepared microscope slides. Part research, part ice breaker, and part theater, one important function of these performances was to entertain the wealthier Fellows of the Society, many of whom were chosen for election more for their patronage than their scientific achievements.

Upon Hooke’s death the position of Curator of Experiments passed to Francis Hauksbee, who continued Hooke’s program of public demonstrations. Many of Hauksbee’s demonstrations involved the “electrical machine,” essentially an evacuated glass globe which was turned on an axle and to which friction (a hand, a cloth, a piece of fur) was applied to produce a static electrical charge. Invented some years earlier, Hauksbee greatly improved the device to produce ever greater charges. Perhaps his most important improvement was the addition to the globe of a small amount of mercury, which produced a glow when the machine was fired up. In an age of candlelight and on a continent of long, dark winters, the creation of a new source of artificial light was sensational and became a popular learned entertainment, not only in meetings of early scientific societies but in aristocratic parlors across Europe. Hauksbee’s machine also set off an explosion of electrical instrument making, experimentation, and descriptive work in the first half of the 18th century by the likes of Stephen Gray, John Desaguliers, and Pieter van Musschenbroek.

And yet not until later in the 18th century and early in the 19th century did Franklin, Coulomb, Volta, and ultimately Faraday provide adequate theoretical and mathematical answers to the questions of electricity raised by the electrical machine and the phenomena it produced. Only after decades of tool building, experimentation, and description were the tools sufficiently articulated and phenomena sufficiently described for theoretical arguments to be fruitfully made.*

There’s a moral to this story. One of the things digital humanities shares with the sciences is a heavy reliance on instruments, on tools. Sometimes new tools are built to answer pre-existing questions. Sometimes, as in the case of Hauksbee’s electrical machine, new questions and answers are the byproduct of the creation of new tools. Sometimes it takes a while, in which meantime tools themselves and the whiz-bang effects they produce must be the focus of scholarly attention.

Eventually digital humanities must make arguments. It has to answer questions. But yet? Like 18th century natural philosophers confronted with a deluge of strange new tools like microscopes, air pumps, and electrical machines, maybe we need time to articulate our digital apparatus, to produce new phenomena that we can neither anticipate nor explain immediately. At the very least, we need to make room for both kinds of digital humanities, the kind that seeks to make arguments and answer questions now and the kind that builds tools and resources with questions in mind, but only in the back of its mind and only for later. We need time to experiment and even—as we discussed recently with Bill Turkel and Kevin Kee on Digital Campus—time to play.

The 18th century electrical machine was a parlor trick. Until it wasn’t.

* For more on Hooke, see J.A. Bennett, et al., London’s Leonardo : The Life and Work of Robert Hooke (Oxford, 2003). For Hauksbee and the electrical machine see W.D. Hackmann, Electricity from glass : The History of the Frictional Electrical Machine, 1600-1850 (Alphen aan den Rijn, 1978) and Terje Brundtland, “From Medicine to Natural Philosophy: Francis Hauksbee’s Way to the Air-Pump,” The British Journal for the History of Science (June, 2008), pp. 209-240. For 18th century electricity in general J.L. Heilbron, Electricity in the 17th and 18th Centuries : A Study of Early Modern Physics (Berkeley, 1979) is still the standard. Image of Hauksbee’s Electrical Machine via Wikimedia Commons.

E-Book Readers: Parables of Closed and Open

During a discussion of e-book readers on a recent episode of Digital Campus, I made a comparison between Amazon’s Kindle and Apple’s iPod which I think more or less holds up. Just as Apple revolutionized a fragmented, immature digital music player market in the early 2000s with an elegant, intuitive new device (the iPod) and a seamless, integrated, but closed interface for using it (iTunes)—and in doing so managed very nearly to corner that market—so too did Amazon hope to corner an otherwise stale e-book market with the introduction last year of its slick, integrated, but closed Kindle device and wireless bookstore. No doubt Amazon would be more than happy with the eighty percent of the e-book market that Apple now enjoys of the digital music player market.

In recent months, however, there have been a slew of announcements that seem to suggest that Amazon will not be able to get the same kind of jump on the e-book market that Apple got on the digital music market. Several weeks ago, Sony announced that it was revamping its longstanding line of e-book readers with built-in wifi (one of the big selling points of the Kindle) and support for the open EPUB standard (which allows it to display Google Books). Now it appears that Barnes & Noble is entering the market with its own e-book reader, and in more recent news, that its device will run on the open source Android mobile operating platform.

If these entries into the e-book market are successful, it may foretell of a more open future for e-books than has befallen digital music. It would also suggest that the iPod model of a closed, end-to-end user experience isn’t the future of computing, handheld or otherwise. Indeed, as successful and transformative as it is, Apple’s iPhone hasn’t been able to achieve the kind of dominance of the “superphone” market that the iPod did of the music player market, something borne out by a recent report by Gartner, which has Nokia’s Symbian and Android in first and second place by number of handsets by 2012 with more than fifty percent market share. This story of a relatively open hardware and operating system combination winning out over a more closed, more controlled platform is the same one that played out two decades ago when the combination of the PC and Windows won out over the Mac for leadership of the personal computing market. If Sony, Barnes & Noble, and other late entrants into the e-book game finish first, it will have shown the end-to-end iPod experience to be the exception rather than the rule, much to Amazon’s disappointment I’m sure.

One Week, One Tool: A Digital Humanities Barn Raising

I’m very happy to report that CHNM has been awarded a grant from the National Endowment for the Humanities under its Institute for Advanced Topics in Digital Humanities program to do for the summer scholarly institute what THATCamp is doing for the scholarly conference. Under the banner of “better, faster, lighter”—as well as more pragmatic, more collaborative, and more fun—CHNM will host a diverse group of twelve digital humanists for a busy week of tool-building in Summer 2010. Welcome to One Week, One Tool, a digital humanities barn raising.

With a decade of successful digital tool-building experience under its belt, we at CHNM have come to the conclusion that effective digital tools are forged mostly in practice rather than theory. Although inspirational ideas and disciplinary training are necessary, the creative process succeeds or fails due to pragmatic, often hidden or ignored fundamentals such as good user interface design, thorough code commenting and documentation, community engagement, dissemination and “marketing,” and effective project management. We may have a vision for an ideal end product, but frequently a tool is made or broken in seemingly more mundane aspects of software development.

Too often these practical aspects get lost in our conferences and workshops, only to be encountered by inexperienced tool builders at later stages of development and release. We thus believe a useful digital humanities institute should involve a great deal of doing in addition to basic instruction. There is no reason that a week long institute can’t both teach and produce something useful to the community—an actual digital humanities tool—while also laying the foundation and skills for future endeavors by the participants. Indeed, the act of doing, of building the tool, should be the best way for participants to learn what digital humanities really is and how it really happens.

We therefore propose a unique kind of institute: One Week, One Tool will teach participants how to build a digital tool for humanities scholarship by actually building a tool, from inception to launch, in a week—a digital humanities barn raising.

One Week, One Tool won’t be for the faint of heart. For one week in June 2010, from early mornings to late nights, we will bring together a group of twelve digital humanists of diverse disciplinary backgrounds and practical experience to build something useful and useable. A short course of training in principles of open source software development will be followed by an intense five days of doing and a year of continued community engagement, development, testing, dissemination, and evaluation. Comprising designers and programmers as well as project managers and outreach specialists, the group will conceive a tool, outline a roadmap, develop and disseminate a modest prototype, lay the ground work for building an open source community, and make first steps toward securing the project’s long-term sustainability.

One Week, One Tool is inspired by both longstanding and cutting edge models of rapid community development. For centuries rural communities throughout the United States have come together for “barn raisings” when one of their number required the diverse set of skills and enormous effort required to build a barn—skills and effort no one member of the community alone could possess. In recent years, Internet entrepreneurs have likewise joined forces for crash “startup” or “blitz weekends” that bring diverse groups of developers, designers, marketers, and financiers together to launch a new technology company in the span of just two days. One Week, One Tool will build on these old and new traditions of community development and the natural collaborative strengths of the digital humanities community to produce something useful for digital humanities work and to help reset the balance between learning and doing in digital humanities training.

Are you ready to rumble?

Briefly Noted: Timetoast; Google Books Settlement; Curators and Wikipedians

Via Mashable, yet another timeline service: Timetoast.

Many readers will have seen this already, but Robert Darton’s February piece in The New York Review of Books is the most readable discussion I have seen of the Google Books settlement.

Fresh + New(er), the Powerhouse Museum’s always interesting blog, describes that museum’s recent open house for local Wikipedians and the common ground they found between expert curators and amateur encyclopedists.

Briefly Noted: Surviving the Downturn; Help with Creative Commons; Yahoo Pipes

The American Association of State and Local History (AASLH) provides cultural heritage professionals with some relevant information on surviving the economic downturn.

JISC provides advice on choosing (or not choosing) a Creative Commons license.

Missed it at the launch? Didn’t see the point? Don’t know where to start? Ars Technica has a nice reintroduction and tutorial for Yahoo Pipes, a visual web content mashup editor. Here’s an example of the kind of thing you can do very easily (20 minutes in this case) with Pipes: an aggregated feed of CHNMers’ tweets displayed on a Dipity timeline.

CHNM Tweeps on Dipity.

Briefly Noted for December 19, 2008

Ahoy, Mateys! Mills Kelly’s fall semester course “Lying about the Past” was revealed today in The Chronicle of Higher Education. Read how Mills and his students perpetrated an internet hoax about “the last American pirate” and what they learned in the process. The Chronicle is, unfortunately, gated, but you can read more on Mills’ fantastic blog, edwired.

I’m sure many of you have encountered NITLE’s prediction markets, but a recent presentation at CNI by NITLE’s Director of Research Bryan Alexander reminded me I haven’t blogged it yet. As I told Bryan recently, the prediction markets are a great example of form (crowdsourcing educational technology intelligence) fitting function (NITLE’s mission to advise member schools on emergent practices) in the digital humanities.

Sadly, The Times of London recently reported a raid on the offices of Memorial, a human rights and educational organization that seeks to document the abuses of the Soviet Gulag prison camp system. Memorial was a key partner on CHNM’s Gulag: Many Days, Many Lives, and its generous research assistance and loan of documents, images, and other artifacts was essential to our successful completion of the project. It is very sad to see this brave and worthy organization suffering the same abuses in Putin’s Russia that it has worked so hard to expose in Stalin’s.

Last month the Smithsonian Institution’s National Museum of American History (NMAH) celebrated its grand reopening after an extended closure for major renovations. Meanwhile, in the web space, NMAH launched its History Explorer, which aggregates and categorizes online educational content from across the museum. Worth a look.

Hello (Linux) World

Feeling increasingly alienated by commercial software companies and increasingly uncomfortable with my absurd level of Mac lust, I finally decided this weekend to get off the Apple train and make the switch to Linux.

Until I’m sure I’ve worked out all the kinks, I’m running a dual boot setup of Ubuntu 8.10b and Mac 0S 10.5 on my MacBook Pro. It was a pretty simple operation, which took up the better part of my Sunday morning, but not much more than that. I more or less followed the Ubuntu support community’s MacBook Pro documentation line for line, and everything more or less seemed to work. A few quick Google searches showed me how to install Skype and a few other applications that aren’t included in the main Ubuntu repositories. Aside from a couple minor annoyances (e.g. “right-click” is confusingly keyed to F12 or a two-finger trackpad click) so far I’m very happy.

In the coming weeks, once I’m sure I have everything I need off my old system, I hope to leave Apple entirely. I’m a little worried about what I’m going to do about my music; I’ve bought quite a bit from the iTunes Store. But the fact that my music is locked up in iTunes shouldn’t be a reason for sticking with Apple. It is yet another reason to leave.

Currently I’m looking at a combination of a Dell Mini 9 and either a desktop or 15″ laptop. If nothing else, I have a whole new range of hardware to ogle.