Open Source – Found History

Teaching and Learning with Primary Sources in the age of Generative AI

The following is a (more or less verbatim) transcript of a keynote address I gave earlier today to the Dartmouth College Teaching with Primary Sources Symposium. My thanks to Morgan Swan and Laura Barrett of the Dartmouth College Library for hosting me and giving me the opportunity to gather some initial thoughts about this thoroughly disorienting new development in the history of information.

Thank you, Morgan, and thank you all for being here this morning. I was going to talk about our Sourcery project today, which is an application to streamline remote access to archival materials for both researchers and archivists, but at the last minute I’ve decided to bow to the inevitable and talk about ChatGPT instead.

Dartmouth College Green on a beautiful early-spring day

I can almost feel the inner groan emanating from those of you who are exhausted and perhaps dismayed by the 24/7 coverage of “Generative AI.” I’m talking about things like ChatGPT, DALL-E, MidJourney, Jasper, Stable Diffusion, and Google’s just released, Bard. Indeed, the coverage has been wall to wall, and the hype has at times been breathless, and it’s reasonable to be skeptical of “the next big thing” from Silicon Valley. After all we’ve just seen the Silicon Valley hype machine very nearly bring down the banking system. In just past year, we’ve seen the spectacular fall of the last “next big thing,” so-called “crypto,” which promised to revolutionize everything from finance to art. And we’ve just lived through a decade in which the social media giants have created a veritable dystopia of teen suicide, election interference, and resurgent white nationalism.

So, when the tech industry tells you that this whatever is “going to change everything,” it makes sense to be wary. I’m wary myself. But with a healthy dose of skepticism, and more than a little cynicism, I’m here to tell you today as a 25-year veteran of the digital humanities and a historian of science and technology, as someone who teaches the history of digital culture, that Generative AI is the biggest change in the information landscape since at least 1994 and the launch of the Netscape web browser which brought the Internet to billions. It’s surely bigger than the rise of search with Google in the early 2000s or the rise of social media in the early 2010s. And it’s moving at a speed that makes it extremely difficult to say where it’s headed. But let’s just say that if we all had an inkling that the robots were coming 100 or 50 or 25 years into the future, it’s now clear to me that they’ll be here in a matter of just a few years—if not a few months.

It’s hard to overstate just how fast this is happening. Let me give you an example. Here is the text of a talk entitled (coincidentally!) “Teaching with primary sources in the next digital age.” This text was generated by ChatGPT—or GPT-3.5—the version which was made available to the public last fall, and which really kicked off this wall-to-wall media frenzy over Generative AI.

You can see that it does a plausible job of producing a three-to-five paragraph essay on the topic of my talk today that would not be an embarrassment if it was written by your ninth-grade son or daughter. It covers a range of relevant topics, provides a cogent, if simplistic, explanation of those topics, and it does so in correct and readable English prose.

Now here’s the same talk generated by GPT-4 which came out just last week. It’s significantly more convincing than the text produced by version 3.5. It demonstrates a much greater fluency with the language of libraries and archives. It correctly identifies many if not most of the most salient issues facing teaching in archives today and provides much greater detail and nuance. It’s even a little trendy, using some of the edu-speak and library lingo that you’d hear at a conference of educators or librarians in 2023.

Now here’s the outline for a slide deck of this talk that I asked GPT-4 to compose, complete with suggestions for relevant images. Below that is the text of speaker notes for just one of the bullets in this talk that I asked the bot to write.

Now—if I had generated speaker notes for each of the bullets in this outline and asked GPT’s stablemate and image generator, DALL-E, to create accompanying images—all of which would have taken the systems about 5 minutes—and then delivered this talk more or less verbatim to this highly educated, highly accomplished, Ivy League audience, I’m guessing the reaction would have been: “OK, seems a little basic for this kind of thing” and “wow, that was talk was a big piece of milktoast.” It would have been completely uninspiring, and there would have been plenty to criticize—but neither would I have seemed completely out of place at this podium. After all, how many crappy, uninspiring, worn out PowerPoints have you sat through in your career? But the important point to stress here is that in less than six months, the technology has gone from writing at a ninth-grade level to writing at a college level and maybe even more.

Much of the discourse among journalists and in the academic blogs and social media has revolved around picking out the mistakes these technologies make. For example, my good friend at Middlebury, Jason Mittell, along with many others, has pointed out that ChatGPT tends to invent citations: references to articles attributed to authors with titles that look plausible in real journals that do not, in fact, exist. Australian literary scholar, Andrew Dean, has pointed out how ChatGPT spectacularly misunderstands some metaphors in poetry. And it’s true. Generative AIs make lots of extremely weird mistakes, and they wrap those mistakes in extremely convincing-sounding prose, which often makes them hard to catch. And as Matt Kirschenbaum has pointed out: they’re going to flood the Internet with this stuff. Undoubtedly there are issues here.

But don’t mistake the fact that ChatGPT is lousy at some things for the reality that it’ll be good enough for lots, and lots, and lots of things. And based on the current trajectory of improvement, do we really think these problems won’t be fixed?

Let me give another couple of examples. Look at this chart, which shows GPT-3.5’s performance on a range of real-world tests. Now look at this chart, which shows GPT-4’s improvement. If these robots have gone from writing decent five-paragraph high school essays to passing the Bar Exam (in the 90^th percentile!!) in six months, do we really think they won’t figure out citations in the next year, or two, or five? Keep in mind that GPT-4 is a general purpose model that’s engineered to do everything pretty well. It wasn’t even engineered to take the Bar Exam. Google CEO, Sundar Pichai tells us that AI computing power is doubling every six months. If today it can kill the Bar Exam, do we really think it won’t be able to produce a plausible article for a mid-tier peer reviewed scholarly journal in a minor sub-discipline of the humanities in a year or two? Are we confident that there will be any way for us to tell that machine-written article from one written by a human?

(And just so our friends in the STEM fields don’t start feeling too smug, GPT can write code too. Not perfectly of course, but it wasn’t trained for that either. It just figured it out. Do we really think it’s that long until an AI can build yet another delivery app for yet another fast-food chain? Indeed, Ubisoft and Roblox are starting to use AI to design games. Our students’ parents are going to have to start getting their heads around the fact that “learning to code” isn’t going to be the bulletproof job-market armor they thought it was. I’m particularly worried for my digital media students who have invested blood, sweat, and tears learning the procedural ins and outs of the Adobe suite.)

There are some big philosophical issues at play here. One is around meaning. The way GPT-4 and other generative AIs produce text is by predicting the next word in a sentence statistically based on a model of drawn from an unimaginably large (and frankly unknowable) corpus of text the size of the whole Internet—a “large language model” or LLM—not by understanding the topic they’re given. In this way the prose they produce is totally devoid of meaning. Drawing on philosopher, Harry Frankfurter’s definition of “bullshit” as “speech intended to persuade without regard for truth”, Princeton computer scientists Arvind Narayanan and Sayash Kapoor suggest that these LLMs are merely “bullshit generators.” But if something meaningless is indistinguishable from something meaningful—if it holds meaning for us, but not the machine—is it really meaningless? If we can’t tell the simulation from the real, does it matter? These are crucial philosophical, even moral, questions. But I’m not a philosopher or an ethicist, and I’m not going to pretend to be able to think through them with any authority.

What I know is: here we are.

As a purely practical matter, then, we need to start preparing our students to live in a world of sometimes bogus, often very useful, generative AI. The first-year students arriving in the fall may very well graduate into a world that has no way of knowing machine-generated from human-generated work. Whatever we think about them, however we feel about them (and I feel a mixture of disorientation, disgust, and exhaustion), these technologies are going to drastically change what those Silicon Valley types might call “the value proposition” of human creativity and knowledge creation. Framing it in these terms is ugly, but that’s the reality our students will face. And there’s an urgency to it that we must face.

So, let’s get down to brass tacks. What does all this mean for what we’re here to talk about today, that is, “Teaching with Primary Sources”?

One way to start to answer this question is to take the value proposition framing seriously and ask ourselves, “What kinds of human textual production will continue to be of value in this new future and what kinds will not?” One thing I think we can say pretty much for sure is that writing based on research that can be done entirely online is in trouble. More precisely, writing about things about which there’s already a lot online is in trouble. Let’s call this “synthetic writing” for short. Writing that synthesizes existing writing is almost certainly going to be done better by robots. This means that what has passed as “journalism” for the past 20 years since Google revolutionized the ad business—those BuzzFeed style “listicles” (“The 20 best places in Dallas for tacos!”) that flood the internet and are designed for nothing more than to sell search ads against—that’s dead.

But it’s not only that. Other kinds of synthetic writing—for example, student essays that compare and contrast two texts or (more relevant to us today) place a primary source in the context drawn from secondary source reading—those are dead too. Omeka exhibits that synthesize narrative threads among a group of primary sources chosen from our digitized collections? Not yet, but soon.

And it’s not just that these kinds of assignments will be obsolete because AI will make it too easy for students to cheat. It’s what’s the point of teaching students to do something that they’ll never be asked to do again outside of school? This has always been a problem with college essays that were only ever destined for a file cabinet in the professor’s desk. But at least we could tell ourselves that we were doing something that simulated the kind of knowledge work they would so as lawyers and teachers and businesspeople out in the real world. But now?

(Incidentally, I also fear that synthetic scholarly writing is in trouble, for instance, a Marxist analysis of Don Quixote. When there’s a lot of text about Marx and a lot of text about Don Quixote out there on the Internet, chances are the AI will do a better—certainly a much faster—job of weaving the two together. Revisionist and theoretical takes on known narratives are in trouble.)

We have to start looking for the things we have to offer that are (at least for now) AI-proof, so to speak. We have to start thinking about the skills that students will need to navigate an AI world. Those are the things that will be of real value to them. So, I’m going to use the rest of my time to start exploring with you (because I certainly don’t have any hard and fast answers) some of the shifts we might want to start to make to accommodate ourselves and our students to this new world.

I’m going to quickly run through eight things.

The most obvious thing we can do it to refocus on the physical. GPT and its competitors are trained on digitized sources. At least for now they can only be as smart as what’s already on the Internet. They can’t know anything about anything that’s not online. That’s going to mean that physical archives (and material culture in general) will take on a much greater prominence as the things that AI doesn’t know about and can’t say anything about. In an age of AI, there will be much greater demand for the undigitized stuff. Being able to work with undigitized materials is going to be a big “value add” for humans in the age of these LLMs. And our students do not know how to access it. Most of us were trained on card catalogs, in sorting through library stacks, of traveling to different archives and sifting through boxes of sources. Having been born into the age of Google, our students are much less good at this, and they’re going to need to get better. Moreover, they’re going to need better ways of getting at these physical sources that don’t always involve tons of travel, with all its risks to climate and contagion. Archivists, meanwhile, will need new tools to deal with the increased demand. We launched our Sourcery app, which is designed to provide better connections between researchers and archivists and to provide improved access to remote undigitized sources before these LLMs hit the papers. But tools like Sourcery are going to be increasingly important in an age when the kind of access that real humans need isn’t the digital kind, but the physical kind.
Moreover, we should start rethinking our digitization programs. The copyright issues around LLMs are (let’s say) complex, but currently Open AI, Google, Microsoft, Meta, and the others are rolling right ahead, sucking up anything they can get their hands on, and processing those materials through their AIs. This includes all of the open access materials we have so earnestly spent 30 years producing for the greater good. Maybe we want to start asking ourselves whether we really want to continue providing completely open, barrier-free access to these materials. We’ve assumed that more open meant more humane. But when it’s a robot taking advantage of that openness? We need a gut check.
AIs will in general just be better at the Internet than us. They’ll find, sort, sift, and synthesize things faster. They’ll conduct multi-step online operations—like booking a trip or editing a podcast—faster than us. This hits a generation that’s extremely invested in being good at the Internet, and, unfortunately, increasingly bad at working in the real world. Our current undergraduates have been deeply marked by the experience of the pandemic. I’m sure many of you have seen a drastic increase in class absences and a drastic decrease in class participation since the pandemic. We know from data that more and more of our students struggle with depression and anxiety. Students have difficulty forming friendships in the real world. There are a growing number of students who choose to take all online classes even though they’re living in the dorms. This attachment to the virtual may not serve them well in a world where the virtual is dominated by robots who are better than us at doing things in the digital world. We need to get our students re-accustomed to human-to-human connections.
At the same time, we need to encourage students to know themselves better. We need to help them cultivate authentic, personal interests. This is a generation that has been trained to write to the test. But AIs will be able to write to the test much better than we can. AIs will be able to ascertain much better than we can what they (whomever they is: the school board, the college board, the boss, the search algorithm) want. But what the AI can’t really do is tell us what we want, what we like, what we’re interested in and how to get it. We need to cultivate our students’ sense of themselves and help them work with the new AIs to get it. Otherwise, the AI will just tell them what they’re interested in, in ways that are much more sophisticated and convincing than the Instagram and TikTok algorithms that are currently shoving content at them. For those of us teaching with primary sources this means exposing them to the different, the out of the ordinary, the inscrutable. It means helping them become good “pickers” – helping them select the primary sources that truly hold meaning for them. As educators of all sorts, it means building up their personalities, celebrating their uniqueness, and supporting their difference.
I think we also need to return to teaching names and dates history. That’s an unfashionable statement. The conventional wisdom of at least the last 30 years is that that names, dates, and places aren’t that important to memorize because the real stuff of history are the themes and theories—and anyway, the Google can always give us the names and dates. Moreover, names and dates history is boring and with the humanities in perpetual crisis and on the chopping block in the neoliberal university, we want to do everything we can to make our disciplines more attractive. But memorized names, and dates, and places are the things that allow historians to make the creative leaps that constitute new ideas. The biggest gap I see between students of all stripes, including graduate students, and the privileged few like me who make it into university teaching positions (besides white male privilege) is a fluency with names, dates, and places. The historians that impress most are the ones who can take two apparently disconnected happenings and draw a meaningful connection between them. Most often the thing that suggests that connection to them is a connected name, date, place, source, event, or institution that they have readily at hand. Those connections are where new historical ideas are born. Not where they end, for sure, but where they are born. AI is going to be very good at synthesizing existing ideas. But it may be less good at making new ones. We need students who can birth new ideas.
Related to this is the way we teach students to read. In the last 20 years, largely in response to the demands of testing, but also in response to the prioritization of “critical thinking” as a career skill, we’ve taught students not to read for immersion, for distraction, for imagination, but for analysis. Kids read tactically. They don’t just read. In many cases, this means they don’t read at all unless they have to. Yet, this is exactly how the AI reads. Tactically. Purely for analysis. Purely to answer the question. And they’ll ultimately be able to do this way better than us. But humans can read in another way. To be inspired. To be moved. We need to get back to this. The imaginative mode of reading will set us apart.
More practically, we need to start working with these models to get better at asking them the right questions. If you’ve spent any time with them, you’ll know that what you put in is very important in determining what you get out. Here’s an example. In this chat, I asked GPT-3.5, “How can I teach with primary sources.” OK. Not bad. But then in another chat I asked, “Give me a step-by-step plan for using primary sources in the classroom to teach students to make use of historical evidence in their writing” and I followed it up with a few more questions: “Can you elaborate?” and “Are there other steps I should take?” and then “Can you suggest an assignment that will assess these skills?” You’ll see that it gets better and better as it goes along. I’m no expert at this. But I’m planning on becoming one because I want to be able to show our students how to use it well. Because, don’t fool yourselves, they’re going to use it.
Finally, then, perhaps the most immediate thing we can do is to inculcate good practice around students’ use of AI generated content. We need to establish citation practices, and indeed the MLA has just suggested some guidance for citing generative AI content. Stanford, and other universities, are beginning to issue policies and teaching guidance. So far, these policies are pretty weak. Stanford’s policy basically boils down to, “Students: Don’t cheat. Faculty: Figure it out for yourselves.” It’s a busy time of year and all, but we need urgently to work with administration to make these things better.

I’m nearly out of time, and I really, really want to leave time for conversation, so I’ll leave it there. These are just a couple of thoughts that I’ve pulled together in my few weeks of following these developments. As I’ve said, I’m no expert in computer science, or philosophy, or business, but I think I can fairly call myself an expert in digital humanities and the history of science and technology, and I’m convinced this new world is right around the corner. I don’t have to like it. You don’t have to like it. If we want to stop it, or slow it down, we should advocate for that. But we need to understand it. We need to prepare our students for it.

At the same time, if you look at my list of things we should be doing to prepare for the AI revolution, they are, in fact, things we should have been (and in many cases have been) doing all along. Paying more attention to the undigitized materials in our collections? I’m guessing that’s something you already want to do. Helping students have meaningful, in-person, human connections? Ditto. Paying more attention to what we put online to be indexed, manipulated, sold against search advertising? Ditto. Encouraging students to have greater fluency with names, dates, and places? Helping them format more sophisticated search queries? Promoting better citation practice for born-digital materials and greater academic integrity? Ditto. Ditto. Ditto.

AI is going to change the way we do things. Make no mistake. But like all other technological revolutions, the changes it demands will just require us to be better teachers, better archivists, better humans.

Thank you.

How Humanists Should Use Mastodon

I’m brand new to Mastodon. Many of us are. This might suggest that we shouldn’t have opinions. But I think the opposite is true. If Mastodon is truly a decentralized platform, if it’s truly designed to support distinctive communities and their distinctive needs, then we, as a community of humanists, should decide how we’re going to use it. We should start doing it now, before it gets away from us.

Deciding how we want to use it—what Mastodon will mean to us—means not putting too much stock in the “norms” and “rules” that other communities have established on the site. That is not to say we should be bulls in the porcelain shop (or as Shawna Ross tooted, we “don’t want to go all Kool-Aid man”), or that we should be disrespectful to other, more established communities and their needs and concerns. As always, we should approach our work, our tools, and our public engagements with humility. But it’s legitimate for us to use the technology to meet our needs and concerns, needs and concerns that have for too long gone unmet by Twitter, needs and concerns that may not be the same as other, older Mastodon communities.

In that spirit, here are a few early thoughts on how I think we should use Mastodon to build a supportive, inclusive, interesting, and useful thing for the humanities community.

First, you should join a server (e.g. hcommons.social) where a lot of other humanists can be found, and spend most of your time in your “local” or “community” timeline/tab. It is all well and good to follow people from other servers, and you should keep up with friends and happenings in those other places. But if you’re on the right server, your main source of serendipity, delight, information, and community will come from that local timeline. If your server’s local timeline is not delivering those things, find another server.

Second, and relatedly, you should mostly avoid the “the fediverse” (i.e. the feed of posts aggregated from across Mastodon’s servers found in the “federated” or “all” tab in your app). It seems to me that in time this aggregated feed will just reproduce Twitter, in all its disorienting chaos and vitriol. It probably won’t be quite so bad because it won’t have an algorithm pushing ads and outrage down your throat. But there’s bound to be plenty of ugly distraction nonetheless.

Third, and this is bound to be controversial, but don’t be too fussed about content warnings (CW’s), except insofar as you think members of your local server will appreciate them. That is, I wouldn’t be too worried about sticking to the “norms” or “best practices” that other, earlier communities on Mastodon have established. I appreciate that these norms are in place because Mastodon has been a refuge for marginalized BIPOC, LGBTQ+, and other communities—and I think we want to be a refuge for members of those communities too. But we shouldn’t simply adopt the practices of the early adopters because they say we should. We should decide the ways in which we want to use the tools Mastodon gives us to support our aims, including, but not limited to, our aims of diversity, equity, and inclusion. So, for example, I think it’s totally fine to use the CW feature to truncate and expand a long toot. One of the distinctive features of the humanities community is its tolerance for difference. Another is its longwindedness. It’s OK to use the tool to support both things!

Fourth, let’s start blogging again. One of the great things about early #DH Twitter was that we were all still blogging. Twitter became a place where we could let a wider audience know that we blogged something and then support a discussion around that something that was more freeflowing than the blog’s own comments thread could support. Let’s bring that practice back! One easy step would be to stop posting long, narrative threads (i.e. tweets “1/27”) to social media. Instead just post a title, a one sentence description, a link to your post with a #blogpost hashtag, and an invitation to discuss. If we could use Mastodon to reinvigorate the culture of humanities blogging, that would be an amazing success.

Fifth, keep politics to a minimum. It’s not that we should never talk about politics, but reworking takes that one can get elsewhere in the media (cable news, the op-ed pages, Twitter, etc.) isn’t going to make this a nicer place to be. If you’re going to get political, clearly tie it to your research, teaching, public humanities practice, or something else that connects you to the community that your local server is intended for. Otherwise, set up another account on another, more clearly political server, and post there.

Those are just some early thoughts. I’ll probably follow up in the next week or so with some more. In the meantime, I’d love to hear yours.

Sourcery: “Disruption,” Austerity, Equity, and Remote Access to Archives

I’ve spent the last 24 hours thinking about and responding to Mark Matienzo’s recent post about Sourcery and its response on social media. I’ve enjoyed engaging in the concerns Mark raises and I’ve learned a lot from the conversation it has spurred. Everything Mark wonders and worries about in connection with Sourcery are things we are actively questioning ourselves. It’s the reason we held a series of workshops with the archives profession this past fall and it’s the reason we’re working with a set of institutional partners to pilot Sourcery while it’s still under active development — so that we can address these questions and concerns in conversation with the community and have those conversations inform the functionality of the application.

These conversations, however, have demonstrated to me that there’s a bit of a misperception circulating, not so much about the app itself, but about the way in which we aim to develop it, a misperception that’s born, I think, more of a learned skepticism of Silicon Valley and university austerity politics than it has to do with a real look at the way in which we’re actually going about things.

The first thing to say is that we don’t begrudge archivists their skepticism. We share it. A decade that’s seen democracy undermined by social media and labor undermined by “gig economy” apps has made us justifiably skeptical of technology. Likewise, a decade or more of austerity budgets has made archivists justifiably skeptical of “external” “solutions.” But Sourcery is not “external” to these concerns: Greenhouse Studios is based in the library of a state university and staffed by unionized librarians and faculty members. We’re well familiar with austerity budgets, believe me. And we’re not promising a “solution.” What we want to do is engage the field — both researchers and archivists — in a conversation about how some of the technologies of the past decade might be retrofitted to expand access to archival sources.

Thanks @anarchivist, for raising these concerns. We share then, and we're certainly not looking to exploit anyone. Let me add a little context. 1/
— Tom Scheinfeldt (@foundhistory) April 29, 2021

Sourcery is not a stealth operation to undermine or “disrupt” archival labor or paid researchers like Uber was a stealth operation to undermine taxi drivers. If it were, we wouldn’t have released a roadmap and half-baked app to the community for comment and reflection in a series of workshops, talks, and pilot projects with institutional partners. (We are very much still in our “planning” phase.) Informed by nearly 20 years of building open source, not-for-profit, community-based software systems for libraries and scholarship, our purpose (and the explicit terms of our funding) has always been to engage the community in a process of conversation and co-creation around alleviating the (sometimes cross-cutting!) pressures of archivists and researchers and then to build something that responds to those pressures. Some may disagree, but I don’t think that just because a technology has been used badly by some means that it’s necessarily bad. Certainly Uber has used peer-to-peer technologies in some very bad ways. But GoFundMe has used peer-to-peer technology in some very good ways (the broader SNAFU that is our healthcare system notwithstanding). Our aim is to work with all the relevant groups to make sure we do the good things and avoid the bad things.

If we haven’t made that clear, that’s on me. This post certainly isn’t intended as defense against unwanted critique or tough conversations.

At the same time, it does offer a challenge to archivists. Just as its incumbent on us to understand the challenges archivists face and to work to meet those challenges in our outreach and our software, the tough conversations must also include an acknowledgment of the fact that the current systems for getting remote access to documents isn’t very good and that it hasn’t kept up with either the possibilities of the available technology or the needs of diverse researchers. The process for requesting remote assistance hasn’t really changed since the advent of email and the simple web forms of the mid-1990s (although the pandemic has complicated that picture). We should acknowledge that existing systems of remote access to non-digitized sources create confusion for researchers who need to learn a new system for every repository they encounter. They create disjointed reference workflows for archivists that can be hard to monitor, allocate within teams, track, record, and report. And their failings cause more visits to the reading room than are probably strictly necessary or desirable for either archivists or researchers. By no means do we want to replace the necessary, sustained, intellectually fruitful in-person exchanges between archivists and researchers and the mutual journeys of discovery that take place in the reading room. But we do aim to replace the unnecessary ones.

Here it’s crucial to point out the perhaps under-appreciated fact that in-person visits are available to only a small subset of the researching population — that is, those with the money and flexibility to make a trip. The same is true of the informal networks whereby friends-of-friends and colleagues’ grad students go get stuff for scholars. Travel and professional networks are a privilege of the rich and well-connected, graduate student labor is often exploited for these purposes, and the “gift economy” whereby junior scholars do uncompensated service on behalf of more senior scholars is insidious. Whether intentional or not, current systems that place an enormous premium on the in-person visit end up providing access on extremely unequal terms. Emily Higgs correctly pointed out on Twitter that Sourcery is responsible for the ill effects of its service, whatever its good intentions. It’s likewise true that—given the possibility of change—the archives profession will be at least partly responsible for the ill effects of the status quo, whether it ever intended them or not. Sourcery runs the risk of creating new inequalities, for sure. But sticking with a status quo that privileges the in-person visit even when it’s not strictly necessary — a status quo that privleges rich scholars and ones with fancy connections and ones with grad students to exploit — runs the risk of perpetuating old inequalities. Not doing something to address the situation is an affirmative choice. That something doesn’t have to be Sourcery … but we should be honest that some things should change.

I’m seriously not trying to call anybody out. I’m just saying that researchers, archivists, digital humanists, software developers, and their funders and administrators need to work together if we’re going to expand access in ways that neither create new inequalities nor perpetuate old ones. That’s the conversation we want to have, and I know the archival profession wants to have, and I’m glad Sourcery is causing it.

When UConn broke up with Adobe: A parable of artists and copyright

One of the things I try very hard to do in my DMD 2010 “History of Digital Culture” class is to teach students that their technology choices are not inevitable nor even determined primarily by what’s “best,” but rather that their technology choices are values choices, reflections of their ethical commitments and those of the communities that create and use those technologies.

When the University of Connecticut’s UITS (University Information Technology Services) made a choice not to renew it’s Adobe Creative Cloud site license, my students correctly judged that this was a values choice about the relative importance the higher administration places on artistic work at the university. The decision not to support software for artists, while at the same time maintaining support for software for, say, engineers, is a statement about how the university values different kinds of work on campus. I was pleased that the students immediately saw that this wasn’t just a choice about the quality of the software or even its cost, but about the intellectual commitments and identity of the university. What the students didn’t so easily grasp, however, was that the controversy over the Adobe suite also reflects on the values choices of the students, on the values choices that digital artists have made over many years to put the Adobe suite and other expensive, proprietary, closed-source software packages at the center of their creative practice, which in turn stems from set of larger choices artists have made vis à vis our prevailing copyright regime.

Artists have largely chosen think about copyright a something that exists to protect them and their work, and they have generally supported our ever-stricter copyright regime. Moving from a humanities and social sciences faculty to a fine arts faculty when I came to UConn from George Mason in 2013, I was struck by how poorly my storm-the-barricades, anti-copyright, open access agenda went over with my colleagues. Not that anyone really cared, but it was apparent from the beginning that I was coming at conversations that touched upon intellectual property (for example, a conversation about making faculty syllabi freely available on the web) from one side of the fence and they were coming at them from the other. Indeed, UConn’s School of Fine Arts offers a course on copyright for artists called Protecting the Creative Spirit: The Law and the Arts, which is taught by two lawyers. You can tell from the title of the course where its sympathies lie.

My DMD 2010 students (most of whom are freshman and sophomores studying in the department of Digital Media & Design which resides within the School of Fine Arts) are no exception. When I teach the unit on copyright, the first question I ask the class is, “What is the purpose of copyright.” Inevitably, students answer with some version of “to keep people from ripping you off.” My next move is to put the copyright clause of the Constitution up on the overhead and explain to them that, in fact, the purpose of copyright is to “Promote the Progress of Science and useful Arts” and that protecting an author’s exclusive rights for a limited term is simply a means to an end.

What is more, I tell them that the ever-stricter copyright regime we live with today wasn’t really designed to protect artists artists at all, although some may have used and benefited its protections. Instead, it was designed by and for big corporations, and it does a much better job of protecting those corporations than it does of protecting individual artists. It is true that many of these corporations employ artists (several former DMD 2010 students are now working for Disney), but those artists’ works are works for hire. The works may be protected by copyright law, but they are protected to the benefit of the employer, not the employee.

It is telling that the feelings of outrage and abandonment regarding the UITS Adobe announcement weren’t evenly distributed among my students. Digital Media & Design students at UConn choose from six different “concentrations,” electing to focus on either 2D animation/motion graphics; 3D animation; game design and development; web design and development; digital media business strategies; or digital culture, learning, and advocacy. (Students from all concentrations are required to take DMD 2010.) Especially hard hit by the news were the 2D/motion graphics students, for whom Adobe After Effects sits at the heart of their practice and for which there really isn’t a substitute, commercial or open source. Letting the Adobe license lapse was basically going to kill their creative practice, or, at the very least, put them out several hundred dollars.

My web design and development students, on the other hand, felt sympathy for their colleagues, but were pretty blasé about the whole thing. For them, letting the Adobe license lapse wouldn’t really change anything. The Adobe corporation has very little leverage over a web developer. To drive the point home, I challenged these web development students to think of a single piece of software that, if taken away from them, would affect their practice in any significant way. A few came up with TCP/IP, but quickly corrected themselves: TCP/IP is a protocol not a piece of software and is an open standard in any case. Apache was another, but, again, it’s open source, and there are serviceable alternatives. Certainly, they couldn’t name a corporation that exists that could raise its prices and bring their web development work to a halt in the way that Adobe was threatening to stop the work of our motion graphics artists. The difference, of course, is that our web developers rely on an open source technology stack and our motion graphics artists rely on proprietary software protected by a copyright law that was written in part by the very companies that produce it. Our web developers are not captive to copyright. Our motion graphics artists are.

Far from protecting artists, this is the best example I have of how our overly restrictive copyright regime harms artists. Rather than teaching our students how to situate their creative practice within a framework of intellectual property protection and thereby reinforce a copyright regime that wasn’t put in place for them in the first place, we should be encouraging our students to resist this regime. We should be teaching them to advocate for open access and open source software. In the longer term, we should be helping them to develop open source and open access alternatives themselves. This is an especially important message for my digital media and design students who, with their considerable skills, will be in a position to effect the longer term project of building the open source tools that will be necessary to free artists’ creative practice from propriety software. In the long term, maybe the very long term, this is the only way we can keep digital artists from being held hostage to corporations as Adobe held my students hostage this semester.

Fortunately, we’ve sorted out the Adobe license issue for now by cutting a licensing deal (shall we call it a hostage negotiation?) apart from UITS for students enrolled in the School of Fine Arts. For now, our students are safe. But only for now. You can bet I’ll be screaming this example over the fence at my colleagues in the School of Fine Arts the next time we talk about copyright.

The Hacker Way

On December 21, 2012, Blake Ross—the boy genius behind Firefox and currently Facebook’s Director of Product—posted this to his Facebook page:

Some friends and I built this new iPhone app over the last 12 days. Check it out and let us know what you think!

The new iPhone app was Facebook Poke. One of the friends was Mark Zuckerberg, Facebook’s founder and CEO. The story behind the app’s speedy development and Zuckerberg’s personal involvement holds lessons for the practice of digital humanities in colleges and universities.

Late last year, Facebook apparently entered negotiations with the developers of Snapchat, an app that lets users share pictures and messages that “self-destruct” shortly after opening. Feeding on user worries about Facebook’s privacy policies and use and retention of personal data, in little more than a matter of weeks, Snapchat had taken off among young people. By offering something Facebook didn’t—confidence that your sexts wouldn’t resurface in your job search—Snapchat exploded.

It is often said that Facebook doesn’t understand privacy. I disagree. Facebook understands privacy all too well, and it is willing to manipulate its users’ privacy tolerances for maximum gain. Facebook knows that every privacy setting is its own niche market, and if its privacy settings are complicated, it’s because the tolerances of its users are so varied. Facebook recognized that Snapchat had filled an unmet need in the privacy marketplace, and tried first to buy it. When that failed, it moved to fill the niche itself.

Crucially for our story, Facebook’s negotiations with Snapchat seem to have broken down just weeks before a scheduled holiday moratorium for submissions to Apple’s iTunes App Store. If Facebook wanted to compete over the holiday break (prime time for hooking up, on social media and otherwise) in the niche opened up by Snapchat, it had to move quickly. If Facebook couldn’t buy Snapchat, it had to build it. Less than two weeks later, Facebook Poke hit the iTunes App Store.

Facebook Poke quickly rose to the top of the app rankings, but has since fallen off dramatically in popularity. Snapchat remains among iTunes’ top 25 free apps. Snapchat continues adding users and has recently closed a substantial round of venture capital funding. To me Snapchat’s success in the face of such firepower suggests that Facebook’s users are becoming savvier players in the privacy marketplace. Surely there are lessons in this for those of us involved in digital asset management.

Yet there is another lesson digital humanists and digital librarians should draw from the Poke story. It is a lesson that depends very little on the ultimate outcome of the Poke/Snapchat horse race. It is a lesson about digital labor.

Mark Zuckerberg is CEO of one of the largest and most successful companies in the world. It would not be illegitimate if he decided to spend his time delivering keynote speeches to shareholders and entertaining politicians in Davos. Instead, Zuckerberg spent the weeks between Thanksgiving and Christmas writing code. Zuckerberg identified the Poke app as a strategic necessity for the service he created, and he was not too proud to roll up his sleeves and help build it. Zuckerberg explained the management philosophy behind his “do it yourself” impulse in the letter he wrote to shareholders prior to Facebook’s IPO. In a section of the letter entitled “The Hacker Way,” Zuckerberg wrote:

The Hacker Way is an approach to building that involves continuous improvement and iteration. Hackers believe that something can always be better, and that nothing is ever complete. They just have to go fix it – often in the face of people who say it’s impossible or are content with the status quo….

Hacking is also an inherently hands-on and active discipline. Instead of debating for days whether a new idea is possible or what the best way to build something is, hackers would rather just prototype something and see what works. There’s a hacker mantra that you’ll hear a lot around Facebook offices: “Code wins arguments.”

Hacker culture is also extremely open and meritocratic. Hackers believe that the best idea and implementation should always win – not the person who is best at lobbying for an idea or the person who manages the most people….

To make sure all our engineers share this approach, we require all new engineers – even managers whose primary job will not be to write code – to go through a program called Bootcamp where they learn our codebase, our tools and our approach. There are a lot of folks in the industry who manage engineers and don’t want to code themselves, but the type of hands-on people we’re looking for are willing and able to go through Bootcamp.

Now, listeners to Digital Campus will know that I am no fan of Facebook, which I abandoned years ago, and I’m not so naive as to swallow corporate boilerplate hook, line, and sinker. Nevertheless, it seems to me that in this case Zuckerberg was speaking from the heart and the not the wallet. As Business Insider’s Henry Blodget pointed out in the days of Facebook’s share price freefall immediately following its IPO, investors should have read Zuckerberg’s letter as a warning: he really believes this stuff. In the end, however, whether it’s heartfelt or not, or whether it actually reflects the reality of how Facebook operates, I share my colleague Audrey Watters’ sentiment that “as someone who thinks a lot about the necessity for more fearlessness, openness, speed, flexibility and real social value in education (technology) — and wow, I can’t believe I’m typing this — I find this part of Zuckerberg’s letter quite a compelling vision for shaking up a number of institutions (and not just “old media” or Wall Street).”

There is a widely held belief in the academy that the labor of those who think and talk is more valuable than the labor of those who build and do. Professorial contributions to knowledge are considered original research while librarians and educational technologists’ contributions to these endeavors are called service. These are not merely imagined prejudices. They are manifest in human resource classifications and in the terms of contracts that provide tenure to one group and, often, at will employment to the other.

Digital humanities is increasingly in the public eye. The New York Times, the Los Angeles Times, and the Economist all have published feature articles on the subject recently. Some of this coverage has been positive, some of it modestly skeptical, but almost all of it has focused on the kinds of research questions digital humanities can (or maybe cannot) answer. How digital media and methods have changed humanities knowledge is an important question. But practicing digital humanists understand that an equally important aspect of the digital shift is the extent to which digital media and methods have changed humanities work and the traditional labor and power structures of the university. Perhaps most important has been the calling into question of the traditional hierarchy of academic labor which placed librarians “in service” to scholars. Time and again, digital humanities projects have succeeded by flattening distinctions and divisions between faculty, librarians, technicians, managers, and students. Time and again, they have failed by maintaining these divisions, by honoring traditional academic labor hierarchies rather than practicing something like the hacker way.

Blowing up the inherited management structures of the university isn’t an easy business. Even projects that understand and appreciate the tensions between these structures and the hacker way find it difficult to accommodate them. A good example of an attempt at such an accommodation has been the “community source” model of software development advanced by some in the academic technology field. Community source’s successes and failures, and the reasons for them, illustrate just how important it is to make room for the hacker way in digital humanities and academic technology projects.

As Brad Wheeler wrote in EDUCAUSE Review in 2007, a community source project is distinguished from more generic open source models by the fact that “many of the investments of developers’ time, design, and project governance come from institutional contributions by colleges, universities, and some commercial firms rather than from individuals.” Funders of open source software in the academic and cultural heritage fields have often preferred the community source model assuming that, because of high level institutional commitments, the projects it generates will be more sustainable than projects that rely mainly on volunteer developers. In these community source projects, foundations and government funding agencies put up major start-up funding on the condition that recipients commit regular staff time—”FTEs”—to work on the project alongside grant funded staff.

The community source model has proven effective in many cases. Among its success stories are Sakai, an open source learning management system, and Kuali, an open source platform for university administration. Just as often, however, community source projects have failed. As I argued in a grant proposal to the Library of Congress for CHNM’s Omeka + Neatline collaboration with UVa’s Scholars’ Lab, community source projects have usually failed in one of two ways: either they become mired in meetings and disagreements between partner institutions and never really get off the ground in the first place, or they stall after the original source of foundation or government funding runs out. In both cases, community source failures lie in the failure to win the “hearts and minds” of the developers working on the project, in the failure to flatten traditional hierarchies of academic labor, in the failure to do it “the hacker way.”

In the first case—projects that never really get off the ground—developers aren’t engaged early enough in the process. Because they rely on administrative commitments of human resources, conversations about community source projects must begin with administrators rather than developers. These collaborations are born out of meetings between administrators located at institutions that are often geographically distant and culturally very different. The conversations that result can frequently end in disagreement. But even where consensus is reached, it can be a fragile basis for collaboration. We often tend to think of collaboration as shared decision making. But as I have said in this space before, shared work and shared accomplishment are more important. As Zuckerberg has it, digital projects are “inherently hands-on and active”; that “instead of debating for days whether a new idea is possible or what the best way to build something is, hackers would rather just prototype something and see what works”; that “the best idea and implementation should always win—not the person who is best at lobbying for an idea or the person who manages the most people.” That is, the most successful digital work occurs at the level of work, not at the level of discussion, and for this reason hierarchies must be flattened. Everyone has to participate in the building.

In the second case—projects that stall after funding runs out—decisions are made for developers (about platforms, programming languages, communication channels, deadlines) early on in the planning process that may deeply affect their work at the level of code sometimes several months down the road. These decisions can stifle developer creativity or make their work unnecessarily difficult, both of which can lead to developer disinterest. Yet experience both inside and outside of the academy shows us that what sustains an open source project after funding runs out is the personal interest and commitment of developers. In the absence of additional funding, the only thing that will get bugs fixed and forum posts answered are committed developers. Developer interest is often a project’s best sustainability strategy. As Zuckerberg says, “hackers believe that something can always be better, and that nothing is ever complete.” But they have to want to do so.

When decisions are made for developers (and other “doers” on digital humanities and academic technology projects such as librarians, educational technologists, outreach coordinators, and project managers), they don’t. When they are put in a position of “service,” they don’t. When traditional hierarchies of academic labor are grafted onto digital humanities and academic technology projects that owe their success as much to the culture of the digital age as they do to the culture of the humanities, they don’t.

Facebook understands that the hacker way works best in the digital age. Successful digital humanists and academic technologists do too.

[This post is based on notes for a talk I was scheduled to deliver at a NERCOMP event in Amherst, Massachusetts on Monday, February 11, 2013. The title of that talk was intended to be “‘Not My Job’: Digital Humanities and the Unhelpful Hierarchies of Academic Labor.” Unfortunately, the great Blizzard of 2013 kept me away. Thankfully, I have this blog, so all is not lost.]

[Image credit: Thomas Hawk]

One Week | One Tool: Interim Report

As promised on Twitter, I’m sharing the report (with a few minor copyedits and corrections) I submitted this week to NEH on One Week | One Tool. This is an interim report, which means there are plenty of questions that remain unanswered. The grant continues for another year, during which time the One Week | One Tool crew will be working to support and extend their work together. I’m sure I’ll be blogging about their progress as we go, but at the very least I’ll post again with the final report next summer.

Executive Summary

One Week | One Tool represents a groundbreaking experiment in digital humanities training and digital humanities tool building. Culminating with the institute held July 25 – July 31, 2010 at the Center for History and New Media (CHNM) at George Mason University and the institute participants’ successful construction and release, in seven days, of a completely new open source software product, One Week | One Tool has made significant strides in proving its central claims: 1) that learning by doing is an important and effective part of digital humanities training; 2) that the traditional NEH summer institute can be adapted to accommodate this kind of practical digital humanities pedagogy; and 3) that digital humanities tools can be built more quickly and affordably than common practice would suggest. The second year of the project will seek to reinforce these claims and continue to advance participant learning through ongoing engagement and further development of the tool that was built during their week at CHNM—Anthologize.

Recruitment

The first task of the One Week | One Tool team was to publicize the institute. After dates were chosen, we developed a simple website in the style of an Old West poster to call for proposals. The call for proposals was issued in early February and closed in mid-March. To publicize the call, we leveraged CHNMs extensive outreach networks among digital humanities bloggers and on Twitter, where the announcement generated significant attention and excitement. Applicants were asked to answer three questions: 1) What skills/experiences/interests you think are most important to building a successful tool? 2) Which of these skills/experiences/interests you will bring to the barn raising? 3) What you think you will get out of attending that will help you in future pursuits? From this call, we received 48 formal applications and approximately another dozen serious expressions of interest.

The selection process was remarkably difficult. At least 35 of the 48 applicants were perfectly qualified to attend. To differentiate among them, we stuck to the criteria we outlined in our original proposal and laid out in the call. Applicants were chosen not on the basis of specific qualifications (e.g. a higher degree or particular skill set), but on evidence of teamwork, patience, flexibility, and resourcefulness (such as a history of picking up a programming language on one’s own). We were also looking to assemble a diverse group who together would possess the entire range of skills necessary to conceive, manage, build, and disseminate a tools project. The final group represented a broad range of scholars, students, librarians, museum professionals, developers, user interaction designers, bloggers, and project managers. They were also relatively diverse in terms of gender (4 women, 8 men) and seniority (ranging from a recently graduated college student to a tenured faculty member). Given the importance of intra-team dynamics to the success of software development projects, we took pains to introduce participants to one another via Twitter in the weeks after their acceptance so that they could begin establishing a sense of camaraderie ahead of their intense week of working together in July.

The Institute

Institute participants arrived on Sunday, July 25, 2010 for a kick-off meeting at George Mason University’s brand new Mason Inn and Conference Center, which served as the participants’ home away from home during One Week | One Tool. Participants’ transportation and lodging were arranged and paid ahead of time, with an additional $1000 stipend provided to each participant as an honorarium and to cover any incidentals. After a round of personal and professional introductions, project director Tom Scheinfeldt provided a brief introduction to CHNM and some of the core values CHNM brings to its open source software development work. Particular emphasis was placed on the value of "use," that the best measure of a tool’s success is its uptake and actual use by its intended audience. The first evening concluded with a brief discussion of participant and organizer expectations and an exhortation to get a good night’s sleep.

Monday was originally designed as the "teaching" phase of One Week | One Tool, during which CHNM staff members would provide participants with some lessons in software development best practices. This was ultimately the least successful portion of the program. Although introductory talks on technical aspects of software development (Jeremy Boggs), outreach mechanisms and community building (Trevor Owens), and the state of the art of digital humanities tools (Dan Cohen) were well received, participants were eager to get down to brass tacks, and we made a strategic decision to cut short the lectures and begin brainstorming tool choices a half day early on Monday afternoon.

Brainstorming continued through Monday evening and into Tuesday morning. By mid-morning on Tuesday, the group had proposed dozens of possible directions for the week’s tool building. By lunchtime, it had distilled those ideas down to six possible candidates. Over lunch, the group asked followers on Twitter to vote on the six choices. Well over 50 outside observers recorded their preferences. Taking this outside feedback into account, the group conducted two rounds of voting of its own, making a final consensus decision by mid-afternoon on Tuesday.

Work then turned to choosing roles and responsibilities. With guidance from CHNM staff, by Tuesday evening, participants had arranged themselves into three teams: Development, User Experience, and Outreach. Work began immediately, with developers making choices about encoding and architecture, user experience team members outlining potential use cases, and outreach staff researching the competitive marketplace and writing website copy.

Wednesday, Thursday, and Friday were each long days of code, design, and writing. The group quickly established a daily routine, which started with an all-project check-in meeting at CHNM’s offices at 10:00 a.m., was punctuated by a return to the Mason Inn at about 6:00 p.m., and ended with yet more work
at the hotel bar, usually until well past closing. Communication, leadership, and even membership among the teams were fluid, with teams spontaneously reconfiguring as particular skills were required for particular tasks. Moreover, throughout this "doing" phase, Tom, Dan, Jeremy, and Trevor encouraged additional CHNM staff—including Sharon Leon, Sheila Brennan, Ammon Shepherd, and John Flatness—to be available to help participants solve problems and make tough decisions. By Saturday morning, the One Week | One Tool team had designed and built the software; established an open code repository, a ticketing and bug tracking system, and a set of open source community development channels; and mounted an outreach and marketing campaign that consisted of an original name, logo, and website, as well as promotional bookmarks and stickers, a CafePress storefront, a formal press release, a media contact sheet, a set of public use cases, an FAQ page, and other end-user support channels.

During the course of the week, several members of the team "live blogged" and Tweeted (using the hashtag #oneweek) about this process, but they did so without giving away the final choice of tool, which served to create a buzz about the launch among potential users in the digital humanities community. Between July 25^th and August 1^st, hundreds of tweets were tagged #oneweek, mostly by outside observers, a measure of the extent to which One Week | One Tool captured the imagination of the digital humanities community. The tool the group produced, Anthologize, was launched on Tuesday, August 3, 2010 with a live video episode of CHNM’s popular Digital Campus podcast. More than 125 people took time out of their work day to tune in to the live announcement. In the days following the announcement, One Week | One Tool and the Anthologize were featured by Atlantic.com, Read Write Web, the Chronicle of Higher Education, and dozens of digital humanities blogs. As one external commentator put it, Anthologize and One Week | One Tool experiment had "set the DH world abuzz."

Anthologize

Anthologize is a free, open-source plugin that transforms WordPress 3.0 into a platform for publishing electronic texts. Anthologize lets users grab posts from their WordPress blog, import feeds from external sites, or create new content directly within Anthologize. Users can then outline, order, and edit their work, crafting it into a single volume for export in several formats, including PDF, ePUB, and TEI. Among Anthologize’s projected users are scholars (to edit anthologies of thematically related blog posts or publish conference proceedings), teachers (to edit student class work into e-portfolios or organize course reading packs), and cultural institutions (to publish exhibit books or donor gift books). Directions for future development include additional export formats, improved importing of external web content, footnote support, and integrating blog comments into the Anthologize editorial workflow.

Learning Outcomes

We are still collecting survey responses and assessing the results, but early on it is clear that One Week | One Tool was a positive learning experience for the participants. Unlike many curriculum-based institutes, the experiences and therefore the learning outcomes at One Week | One Tool varied widely among the participants depending on their team membership and on the skills they brought to the institute. Nevertheless, an initial analysis of participant survey responses reveals some commons themes. Among them:

Most participants responded that they learned about collaboration, leadership, and team building. Doug Knox pointed to the "self-taught lessons in group dynamics for a team of pragmatic collaborative autodidacts" he took from One Week | One Tool. As one participant responded, those lessons included "the kind of flexibility, trust, humility, and perseverance that is necessary to take on a relatively big project in a short period of time." In a post titled "Unexpected leadership," Boone Gorges echoed these lessons, writing "leadership doesn’t work without humility and trust." Effie Kapsalis described that same lesson plainly on the Smithsonian 2.0 blog: "Trust. Period."
Related to this were the lessons in decision making and calculated risk that most participants said they took away. As one participant wrote, "the experience really made me understand how much fear of acting alone can stall development and how trusting the emergent properties of the system can really lead to fantastic results." Jana Remy wrote in a related vein on her blog, "…one of the most important things we’ve learned through this workshop is the importance of calculated risk. It’s risky to forge forward with a digital project without months of planning, coding, and testing. It’s risky to trust near-strangers to deliver on such a tight deadline…. Our team selected one of the more daring choices available for our project. We were ready to be bold and to dream big. Even if the rest of the world doesn’t see our tool’s potential to be as earthshaking as we do, the process of manifesting our vision for this tool has been a valuable lesson in and of itself, and will certainly carry through to our future endeavors."
Several participants responded that they learned about outreach methods, marketing, and dissemination of digital humanities products. As one participant said, "I really found the information on how to manage publicity and outreach extremely useful, and I was completely stunned by the results…."
Several participants said they learned particular technologies (e.g. WordPress and Git) and practical lessons in open source software development best practices: "I also think I came to understand a bit more about how agile methodologies really work," remarked one participant.
Some participants pointed to their improved understanding of "what constitutes a full end-to-end development cycle." Experienced programmers and project managers, even those who entered the week with hundreds of thousands of lines of code under their belts and long track records of successful digital humanities work, responded that they learned about the differe
nces between building a tool "from conception to roll-out" as they did at One Week | One Tool and the ad-hoc or task-specific development they do in the normal course of their work.

Every participant that responded to our survey said that One Week | One Tool met or exceeded his or her expectations, that it met its stated goals, that the experience would advance his or her career objectives, and that the model of learning by doing was effective. There were, however, at least two criticisms that were repeated by several participants. First, as mentioned above, most participants would have preferred to cut the Monday lecture portion of the program shorter, something we addressed on the spot and will keep in mind for any possible repeat events. Second, at least two people wanted to see more learning from fellow team members. The fast paced development cycle too often forced participants to keep their noses to the grindstone and opportunities for cross-pollination and peer-to-peer learning were sometimes missed. In any repeat performance of One Week | One Tool we will take seriously one participant’s observation that "…the group itself would have benefited from a more concerted effort to mix up knowledge multilaterally. There was some tension between working in small teams to get work done and making sure we had informal chances to share a sense across groups about the state of progress and the development of our sense of purpose. There were in fact essential unplanned lateral communications in all directions, and the bar was especially important in that. It worked, but there could be room to make it a more reflective process. We should have been teaching our strengths as well as contributing them, and apprenticing in our less-strong areas."

Yet in discussing One Week | One Tool’s learning outcomes, it is also important to note that, perhaps unlike more traditional institutes, the learning appears to have extended well beyond the twelve participants who traveled to Fairfax. This possibly unique aspect of One Week | One Tool results from the fact that: 1) it was an event, and 2) it produced a product for widespread distribution and use. As Tim Carmody posted on Snarkmarket, One Week | One Tool was a "generative web event," an "ongoing communal broadcast." Because it took place over a limited and clearly delineated time period, because it posited a provocative, easy-to-grasp, and unlikely challenge (academics build something in a week?), because it maintained an air of secrecy, and because it was widely publicized, One Week | One Tool was able not only to create "buzz" but to provoke conversation and discussion of serious issues among a broad range digital humanities scholars who could use their common experience of One Week | One Tool as a touchstone for debate. Two examples of this kind of broader discussion are ongoing exchanges about the nature of electronic writing and its relationship to the printed page and about the relative merits of pre- versus post-launch user studies.

Next Steps

Over the next year, the original group of twelve participants will continue to develop the tool from remote locations around the country, forming the core of an open source developer and user community that they will build and support. Indeed, the team has already developed and released a "point release" version 0.4 in the time since they left Fairfax. During the next ten months, the group will fix bugs in the software and add new features. It will continue to disseminate the software through online outreach and in-person conference presentations. As part of the group’s training in digital humanities software sustainability, it will also work with staff at CHNM to write grant proposals for additional funding and implement cost recovery mechanisms where appropriate. In June 2011, the group will travel back to Fairfax for THATCamp 2011 to assess the progress of the past year, gather additional feedback from community members, and make decisions about future development and support of Anthologize.

Conclusion

Overall, we are pleased with the progress of the One Week | One Tool project. The main learning objective of the project was to teach software development and deployment skills to a group of diverse group of digital humanists. As explained to participants on the first evening of the institute, we believe the best measure of the success of a tool is its use by and usefulness to its intended audience. By extension, the best measure of whether a group of people have learned to build successful tools is whether the tool they built is actually useful to and used by its audience. Early indications are good. In less than ten days the Anthologize website has received 232,000 hits, 75,000 page views, and 13,000 unique visitors, and the Anthologize software has been downloaded more than 2,500 times. We have high hopes for the coming year, and we will continue to keep track of these numbers and follow the projects that use Anthologize.

Lessons from One Week | One Tool – Part 2, Use

For all the emphasis on the tool itself, the primary aim of One Week | One Tool is not tool building, it’s education. One Week | One Tool is funded by NEH under the the Institutes for Advanced Topics in Digital Humanities (IATDH) program. IATDH grants “support national or regional (multistate) training programs for scholars and advanced graduate students to broaden and extend their knowledge of digital humanities.” Thus training is the criteria by which One Week | One Tool will ultimately be judged.

A key argument of One Week | One Tool is that learning digital humanities consists primarily in doing digital humanities, that digital humanities is a hands-on kind of thing, that to learn tool building you have to do some tool building. At the same time, we recognize that there’s a place for instruction of the hands-off sort. To that end, the first 18 hours or so of One Week | One Tool (essentially from Sunday night until mid-afternoon on Monday) were reserved for presentations by CHNM staff. Jeremy offered a practical introduction to software development best practices and tools. Trevor described the range of outreach strategies we have employed on projects like Zotero, Omeka, and the National History Education Clearinghouse. Dan provided the view from 30,000 feet with thoughts on the state of the art and near future of digital humanities software development. I kicked things off on Sunday with a brief introduction to CHNM and our tool building philosophy. Several strains of thought and practice inform our work at CHNM—public history, cultural history, radical democracy, dot.com atmospherics, and more—but to keep things simple I summed up our tool building philosophy in one word: use.

Here is more or less what I told the crew.

At CHNM we judge our tools by one key metric above all others: use. Successful tools are tools that are used. The databases of Sourceforge and Google Code are littered with interesting, even useful, but unused open source tools. Academic software projects are no exception. Every year NSF, NIH, and now NEH and IMLS award grants for scholarly software development. In recent years, the funding guidelines have stipulated that this software be made freely available under open source licenses. Much of the software produced by these programs is good and useful code. But little of it is actually used.

There are several reasons for this. Many efforts are focused narrowly on the problems of a particular researcher or lab. While the code produced by these researchers proves useful for solving their particular problems, even when released, it hasn’t been designed to be generally applicable to the needs of other researchers in the field. It is, in effect, a one-off tool released as open source. But open source code alone does not constitute an open source project.

Other projects build generalized tools that may be of potential use to other researchers. But few make the necessary investment in outreach and, yes, marketing to make potential users aware of the tool. It is for this reason, among others, that we see so much duplication of effort and functionality in scholarly software projects.

Building a user community is the first prerequisite to building a successful open source software project. The success of software is judged by its use. The universal assessment that iTunes is a hit and Zune is a flop is not based on the quality of the code or even the elegance or potential usefulness of the experience. It’s based on the fact that everybody uses iTunes and nobody uses Zune. This is not to say that software has to have millions of users to be successful. But it is to say that successful software is used by large swath of its potential users. To be sure, the total population of potential users of cultural heritage mapping tools is much smaller than the total population of potential users of digital media playback software. But any open source software project’s goal should be use by as many of its potential users as possible. In any case, we should aim to have our software used by as many cultural heritage institutions and digital humanists as possible.

Moreover, a large and enthusiastic user base is key to a successful open source software project’s continued success. If people use a product, they will invest in that product. They will provide valuable user testing. They will support the project in its efforts to secure financial support. They will help market the product, creating a virtuous circle. Sustainability, even for free software, is grounded in a committed customer base.

Related to building a user community is building an open source developer community. Some number of users will have the inclination, the skills, and the commitment to the project to help on the level of code. This percentage will be very small, of course, less than one percent, which is another reason to build a large user base. But this small group of code contributors and volunteer developers forms the core of most successful open source projects. They find and fix bugs. They provide end user support. They write documentation. They add new features and functionality. They provide vision and critical assessment. They constitute a ready-made pool of job candidates if a core paid developer leaves a project.

This developer community is a project’s best chance at sustainability, and collaboration at the developer level, rather than collaboration at the institution or administrator level, is usually key to a scholarly open source project’s lasting success. Getting provosts, deans, and directors from partner institutions to commit FTE’s and other resources to a project is very welcome—we’d love some commitments of this sort for the tool we built last week. But it’s not where the strength of a collaboration will be located. Individual developers, who commit their time, effort, ideas, code, heart and soul to a project, are the ones who will keep something going when money and institutional interest runs out.

A developer community does not develop on its own, of course. It requires support. First and foremost, a developer community needs open communication channels—an active IRC channel and listserv, for example—something which, in the case of a university of library-based project, means a group of responsive staff developers on other end. Community developers need profitable access to the project’s development roadmap so they know where best to contribute their efforts. They need well-documented and thoughtfully-designed APIs. They need technical entry points, things such as a plugin architecture where they can hone their chops on small bits of functionality before digging into the core code base. Most importantly, community developers need a sense of community, a sense of shared purpose, and a sense that their volunteer contributions are valued. All of this has to be planned, managed, and built into the software architecture.

This philosophy of use is core to CHNM’s vision of open source software for scholarship and cultural heritage. The tool the crew of One Week | One Tool developed—like Omeka and Zotero before it—should be case in point. It was chosen with clear audiences in mind. It was built on approachable technologies and engineered to be extensible. It’s outreach plan and feedback channels are designed to encourage broad participation. When it’s released tomorrow, I think you’ll see it is a tool to be used.

#oneweek #buildsomething

Open Source Community and the Omeka Controlled Vocabulary Plugin

I love open source. Why? Here’s a fairly representative example.

Following Patrick Murray-John’s excellent post and bootstrapping of a new AjaxCreate plugin for Omeka, I speculated on the Omeka Dev List about whether some related technologies and methods could be used to power a plugin to handle controlled vocabularies and authority lists, something Omeka currently lacks and our users (including internal CHNM users) really want. After some back and forth among developers at three institutions—and some very important input from a non-technical but very smart (and very brave!) member of Omeka’s end user community—we were able 1) to determine that AjaxCreate probably wasn’t the right vehicle for managing controlled vocabularies, and 2) to lay out some informal specs for a separate, lightweight ControlledVocab plugin. Patrick then set to building it and today introduced an alpha version of ControlledVocab to the dev list.

All of this happened in less than a week. Through the combined efforts of developers and users, the Omeka community was able to identify, describe, and make some ambitious first steps toward pluging a hole in the software. The moral of this story is get involved. Whether you’re a developer or an end user, go download some open source software (Omeka would be a nice choice), test it out (how about the ControlledVocab plugin?), post bugs and feature requests to the forums or dev lists, and see what ensues.

Often it’s something marvelous.

Benchmarking Open Source: Measuring Success by "Low End" Adoption

In an article about Kuali adoption, the Chronicle of Higher Education quotes Campus Computing Project director, Kenneth C. Green as saying,

With due respect to the elites that are at the core of Sakai and also Kuali, the real issue is not the deployment of Kuali or Sakai at MIT, at Michigan, at Indiana, or at Stanford. It’s really what happens at other institutions, the non-elites.

Indeed, all government- and charity (read, “foundation”)-funded open source projects should measure their success by adoption at the “low end.” That goes for library and museum technology as well; we could easily replace MIT, Michigan, Indiana, and Stanford in Mr. Green’s quote with Beinecke, Huntington, MoMA, and Getty, Though we still have a long way to go—the launch of Omeka.net will help a lot—Omeka aims at just that target.

E-Book Readers: Parables of Closed and Open

During a discussion of e-book readers on a recent episode of Digital Campus, I made a comparison between Amazon’s Kindle and Apple’s iPod which I think more or less holds up. Just as Apple revolutionized a fragmented, immature digital music player market in the early 2000s with an elegant, intuitive new device (the iPod) and a seamless, integrated, but closed interface for using it (iTunes)—and in doing so managed very nearly to corner that market—so too did Amazon hope to corner an otherwise stale e-book market with the introduction last year of its slick, integrated, but closed Kindle device and wireless bookstore. No doubt Amazon would be more than happy with the eighty percent of the e-book market that Apple now enjoys of the digital music player market.

In recent months, however, there have been a slew of announcements that seem to suggest that Amazon will not be able to get the same kind of jump on the e-book market that Apple got on the digital music market. Several weeks ago, Sony announced that it was revamping its longstanding line of e-book readers with built-in wifi (one of the big selling points of the Kindle) and support for the open EPUB standard (which allows it to display Google Books). Now it appears that Barnes & Noble is entering the market with its own e-book reader, and in more recent news, that its device will run on the open source Android mobile operating platform.

If these entries into the e-book market are successful, it may foretell of a more open future for e-books than has befallen digital music. It would also suggest that the iPod model of a closed, end-to-end user experience isn’t the future of computing, handheld or otherwise. Indeed, as successful and transformative as it is, Apple’s iPhone hasn’t been able to achieve the kind of dominance of the “superphone” market that the iPod did of the music player market, something borne out by a recent report by Gartner, which has Nokia’s Symbian and Android in first and second place by number of handsets by 2012 with more than fifty percent market share. This story of a relatively open hardware and operating system combination winning out over a more closed, more controlled platform is the same one that played out two decades ago when the combination of the PC and Windows won out over the Mac for leadership of the personal computing market. If Sony, Barnes & Noble, and other late entrants into the e-book game finish first, it will have shown the end-to-end iPod experience to be the exception rather than the rule, much to Amazon’s disappointment I’m sure.