Digital Humanities Pedagogy: Practices, Principles and Politics
(visit book homepage)
Cover  
Contents  
Index  

9. Programming with Humanists: Reflections on Raising an Army of Hacker-Scholars in the Digital Humanities

Stephen Ramsay

 

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

—Donald E. Knuth, Literate Programming (1992)

“Program or be programmed.” That is the strong claim made by Douglas Rushkoff in a recent book that eloquently—at times, movingly—articulates an argument often made by those who teach programming:

In the emerging, highly programmed landscape ahead, you will either create the software or you will be the software. It’s really that simple: Program, or be programmed. Choose the former, and you gain access to the control panel of civilization. Choose the latter, and it could be the last real choice you get to make. […] Computers and networks finally offer us the ability to write. And we do write with them on our websites, blogs and social networks. But the underlying capability of the computer era is actually programming—which almost none of us know how to do. We simply use the programs that have been made for us, and enter our text in the appropriate box on the screen. We teach kids how to use software to write, but not how to write software. This means they have access to the capabilities given to them by others, but not the power to determine the value-creating capabilities of these technologies.1

Such language is powerful and attractive (especially for those who already possess the requisite skills), but its highly utilitarian vision of education tends to weaken claims to relevance within the humanities. Anyone teaching a skill or a method—however abstractly we may define such matters—has recourse to arguments that are essentially based on fear and shame. “Learn calculus or be calculated!” is the underlying message of many first-day lectures in that subject, and the argument works equally well in chemistry (“Everything you see is a chemical!”), foreign language classes (“In our increasingly global world…”), and even history (“Those who cannot remember the past are condemned to repeat it”). All such statements presuppose usefulness as the criterion for relevance; the humanities, as our declining budgets make clear, have a harder time making such claims. It is difficult to say what one becomes if left without knowledge of, say, European oil painting or Derek Walcott’s poetry—difficult, because the traditional answers are also unavoidably the most elitist. Santayana’s profound and memorable dictum, when re-presented as a reason to take a course, is reduced to being yet another habit of highly effective people.

It must be frankly admitted, however, that if an English or a history student finds his or her way into a class on programming, it is not because of some perceived continuity between the study of Shakespeare or the French Revolution and the study of for-loops and conditionals. Most students see programming—and not without justice—as a mostly practical subject. Many students, over the years, have freely admitted to me that their primary motivation for studying the subject was linked to their job prospects after graduation. If Rushkoff’s argument is persuasive, it is because the students have already arrived at some less strident version of it independently.

The emphasis on usefulness is deeply encoded in the terminology—or better, the analogy—of “software engineering.” As Pete McBreen reminds us, the term was invented in the late sixties to describe a type of project that rarely appears in contemporary development: extremely large systems that require the fabrication of specialized hardware.2 Engineering has undoubtedly persisted as the term of art because of a desire—only partially revised by the so-called “agile methods” that emerged in the 1990s—to have the development of software behave according to the far more predictable principles of specification and testing used to fabricate things like bridges and automobiles. “Engineering” is a term that allows programming to join with other activities long recognized within the university as being mostly ordered toward building useful things, and for which there are processes and methods that are repeatable and well understood. Many programmers (myself included) entertain a certain skepticism toward the engineering analogy as a practical matter, but the analogy is particularly weak when used as a framework for thinking about computing in the humanities. The suggestion I would like to offer (and the one I regularly offer to my students) is that programming is most of all like writing.

Writing, it should be noted, is one of the areas of education for which strongly utilitarian justifications are most obviously appropriate. “Write or be written” is a true, if overly poetic way of stating the relationship between the skills associated with print literacy and issues of social justice and mobility. Yet writing is also—and for some scholars of rhetoric and composition, even more so—a tool for thinking through a subject. Undergraduate courses regularly assign writing projects to students without any expectation that those essays will be useful to anyone other than the person writing. The task of writing is a part of the normal pedagogy of education in the humanities, because we think of the writing process as the methodology by which the artifacts of the human record are understood, critiqued, problematized and placed into dialogue with one another.

It is possible to make the case for teaching programming in the humanities on utilitarian grounds. In a set of disciplines still primarily concerned with artifacts of communication—both as objects of study and as internally useful discursive media—the ability to participate in the design and creation of new media is at least relevant if not exactly incumbent upon all. But the analogy with writing is, for me, the deeper and more necessary one. Like writing, programming provides a way to think in and through a subject. Alan J. Perlis’s description of programming in his foreword to Structure and Interpretation of Computer Programs could, with only minor modification, describe the writing process:

Every computer program is a model, hatched in the mind, of a real or mental process. These processes, arising from human experience and thought, are huge in number, intricate in detail, and at any time only partially understood. They are modeled to our permanent satisfaction rarely by our computer programs. Thus even though our programs are carefully handcrafted discrete collections of symbols, mosaics of interlocking functions, they continually evolve: we change them as our perception of the model deepens, enlarges, generalizes until the model ultimately attains a metastable place within still another model with which we struggle. The source of the exhilaration associated with computer programming is the continual unfolding within the mind and on the computer of the mechanisms expressed as programs and the explosion of perception they generate.3

Such thoughts lead inexorably to the sort of insight expressed by Knuth in the epigraph to this essay and echoed by Harold Abelson and Gerald Jay Sussman in their preface to Structure and Interpretation of Computer Programs itself; namely, that “programs must be written for people to read, and only incidentally for machines to execute.”4 Communication of that idea has been one of the few constants in my teaching of this subject.

I have taught a class on programming and software design to graduate students and advanced undergraduates in the humanities every year—and sometimes twice a year—since 2002. The idea for the course first arose in a faculty seminar I attended at the University of Virginia (UVA) in 2000–2001. At the time, there was a felt need for a course that could provide concrete technical training to undergraduate students in a prospective media studies program at UVA. The hoped-for synergy between technical and more conventional humanistic study seemed particularly appropriate at the time, since UVA had faculty both in digital humanities (a designation that had only just begun to replace “humanities computing”) and media studies. Yet the subject tended to provoke concern and anxiety. How much technical training was appropriate? Who would teach such classes? Which skills were necessary?

I left UVA (where I had been working as a programmer for the Institute for Advanced Technology in the Humanities) for a professorship at the University of Georgia (UGA) while these debates were still going on, but I was deeply affected by these conversations. And since we were given the specific task of creating a curriculum, I eventually tried to outline what some of the relevant technologies would be for a digital humanist in 2002. I began, therefore, with a list that included XML (and related technologies), UNIX, web design, relational database development and programming. As soon as I arrived at UGA, I encountered graduate students involved with digital humanities who also felt the need for these kinds of competencies. Some had taken classes in the Computer Science Department, but found the choice of examples frustrating (one student told me that after a few weeks of C++, she timidly asked the professor if the class would ever do anything involving words).

All of this set the scene for the creation of a two-semester course, the first semester of which taught UNIX, web design, XML and database design, with the second semester devoted almost exclusively to programming—a pattern that persisted until two years ago, when I decided to reverse the sequence. Today, I teach the programming course first and offer a second, workshop-style course in which I introduce other technologies as needed in the context of particular projects. Since 2006, I have taught these courses at the University of Nebraska-Lincoln (UNL), where I am an associate professor of English. This year, for the first time, I am teaching the material in a one-semester course (cross-listed with various humanities programs) within the Department of Computer Science and Engineering, where it is called CS 1: Humanities. Though it is focused on programming for the humanities, it counts as one of the required entry-level courses for the computer science major.

I offer this brief history to illustrate the way in which the course has contracted its subject matter over the years. During that time, I had had cause (in my own research) to write software in at least a half-a-dozen languages (and as many domain-specific languages), work with three or four different database platforms, and manipulate several different modulations and additions to the ecology of XML. As any programmer knows, the ability to shift easily from one technology to another is made possible by firm knowledge of the mostly invariant concepts that underlie all of these technologies. Eventually, I came to realize that all of these underlying concepts could be had through study of programming alone. This is not to deny that technologies like relational databases and XSLT have instructional value. Indeed, both of those technologies have the additional feature of having endured where other technologies (including many programming languages) have not. But time being of the essence, it is, I think, better to introduce students to things like declarative programming, constraint logic and regular expressions with the idea that they can apply those concepts to particular technologies later. Today, the course has three main components: UNIX, Ruby programming and seminar-style discussion of articles relevant to the creation and use of technologies within the digital humanities.

I continue to introduce students to the UNIX operating system and its command-line interface as an ordinary part of the course. One rationale for this is simply to give students some familiarity with the server upon which the vast majority of web-based projects run. From the beginning, however, the idea has been to start by estranging students from the normal sense of what a computer is and what sort of interface it has. UNIX (in practice, some version of Linux) allows us to “start again” in an environment in which there is no mouse, no icons, no notion of a desktop and no windows, but in which metaphors still abound (for example, the file system “tree,” the notion of a pipeline, the idea that a process “forks” and then returns).

The first couple of weeks are therefore devoted to the survival commands and underlying principles of UNIX from a user’s standpoint. In practical terms, this means teaching students about file system navigation, redirection, file permissions and basic editing. Throughout, however, the emphasis is placed on what some have called “the philosophy of UNIX”.5 I tell students that for many UNIX developers, the goal is to write software that conforms to Doug McIlroy’s injunction: “Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”6 I return to these ideas again and again over the course of the semester, as I ask students whether their own programs are behaving according to the usual expectations of UNIX software. My purpose, though, is less to turn them into UNIX developers than it is to ask what the “philosophy” of the computers, cell phones and gaming devices they generally use might be, and what it means to conform to a philosophy as a user and as a developer.

The second component of the course—and the one we spend the most time on—involves learning to write software in the Ruby programming language. Language choice is always a contentious subject, and many good choices are possible. From the beginning, however, I felt that a few criteria had to be met:

I would not say that Ruby is the perfect language for teaching, but it meets these requirements. It supports a number of features usually associated with Lisp-like languages (including closures, continuations and anonymous functions) that make it easy to teach concepts like functional programming, recursion and meta-programming. It is also a rigidly object-oriented language that can at the same time masquerade as a procedural programming language.7 Most of all, it allows students to start writing useful programs right away, and remains useful for quite sophisticated projects later on. Many graduates of the course have gone on to proficiency in Java, PHP and Haskell (among others).

The course proceeds, as most courses in programming do, through the major constructs and concepts germane to any programming language: statements and variables, loops and conditionals, arrays, hashes, iterators, classes and objects, basic algorithms and elementary data structures. There are, however, a few key differences between the material taught in a standard introduction to programming (as it usually appears in computer science departments) and the humanities-focused version. For one thing, there is a marked emphasis on the manipulation of text data. Computing is not possible without at least basic mathematics, but computing the Fibonacci sequence or the factorial of n is simply not the sort of thing likely to seem pertinent to a humanities student. Mathematical examples have undoubtedly persisted in computer science education, because they allow one to introduce concepts—including some quite advanced ones—while using only primitive, scalar objects (integers). Strings, for all their familiarity, are far more complex, and necessarily compound data types. But this, again, is a strong reason for using the so-called scripting languages, which tend to treat strings as if they were simple, uncomplicated primitives.

It is increasingly possible to treat XML as a built-in data structure. XML facilities are part of the standard library of Ruby (as they are in most scripting languages), and recent trends suggest that XML is becoming a native data-type in more-and-more new languages (for example, Clojure and Scala). The popularity of XML as a data representation within the digital humanities community would be enough to recommend it for inclusion, but it has the additional feature of allowing for straightforward explanations of basic tree structures (and, should the need arise in a succeeding course, the basics of parsing). I also spend quite a bit of time, several weeks into the course, on regular expressions. This, again, is a matter of practical utility; few technologies are more useful to people working with text. But along the way, it allows me to talk a bit about state machines (thus introducing another layer to the idea of what a computer might be). Clearly, there is not enough time to introduce formal language theory (let alone the mathematics of finite state automata), but it is possible to gesture toward these subjects in a way that can solicit interest in more advanced subjects and courses.

As I tell students at the beginning of the semester, this class has no papers, no presentations, no quizzes, no midterm and no final exam. When I first started the course, I did have some of these, but they all gave way in the end to the one assignment type that consistently helped students (as reported, again and again, on student evaluations)—namely, the problem sets. Problem sets (which are only occasionally sets of problems) give students about five days to write a small program or to enhance a previously written one. We start out simply, though I make sure that students write a complete, working program (usually a “mad libs” program) after the very first lecture on Ruby. Other assignments include writing “poetry deformers”, word frequency generators, text-based games, and, eventually, tools for analyzing text corpora (tf-idf analyzers, basic document classifiers, sentence complexity tools and so forth). In direct violation of my university’s syllabus policies, I also try to avoid giving the students a rigid schedule in advance. Some groups simply move at a different pace than others, and I try to follow student interest when I can. I have also had varying cohorts over the years. One year I had a class made up almost exclusively of graduate students in linguistics; another year I had students who were studying composition and rhetoric. The goal, in every case, is to get away from “toy programs” as quickly as possible and into projects that are relevant to the students’ actual interests: for linguistics students, that meant an emphasis on natural language processing; for composition teachers, that (at the time, at least) meant greater emphasis on web-based pedagogical applications. I even tell students that they can get out of having to do problem sets entirely if they can present me with a substantial idea for a project. I then tailor the sequence of development for that individual project according to the sequence of subjects we’re covering in class. In practice, the students who pursue this option are highly motivated and able to work independently.

One unusual aspect of the class is the complete “open book” policy with regard to problem sets. I explicitly tell students that they are allowed to use any source of information in pursuit of the solution to a problem set: the books for the course, information available online and (most especially) each other. Such a policy would appear to invite plagiarism, but I am careful to explain that plagiarism is primarily about misrepresenting the nature of one’s use, and not a prohibition against using outside sources. The purpose of this policy is simply to recreate the conditions under which software is actually written. A few students have abused what is in essence an honor system over the years; the overwhelming majority has not.

The third component of this class is the “Friday seminar” (held every third day of a course that meets three times a week). These class meeting involve free-ranging discussions of important articles in the history of computing, theory of new media, digital humanities and sometimes popular works on cyber culture. There can be no question that devoting a third of the course to such material reduces the time available for teaching programming, but I regard these class meetings as completely essential to the course. Originally, this aspect of the course had perhaps more to do with my own insecurities about teaching a highly technical course in an English department than with any organized pedagogy. Over time, however, it began to have the unanticipated effect of putting people at ease. Teaching programming involves complicated, demanding lectures, difficult assignments and the persistent feeling on the part of the student that they’re not quite getting it. I tell them (with as much good humor as I can summon) that this feeling never entirely disappears (since to be a technologist is very often to dwell in unknown territory) and that it is more important that we grow accustomed to that state by learning to ask questions, get help, work in groups and try different solutions. But all of this requires that the students feel comfortable with each other and with the professor. The fact that humanities students are accustomed to seminar-style discussions helps with this, but I think it is even more important that students see themselves as becoming the people they read about—programmers, designers, developers and digital humanists.

Every third day, therefore, we have a general discussion on material that they will never be tested on and for which they do not have to produce any kind of written response. These conversations have, for ten years, been among the liveliest and most interesting discussions I have participated in as a teacher. Without the pressure of having to do anything but talk, they nearly always develop an easy rapport with one another, which carries over into the parts of the class where they often have to risk seeming “stupid” in front of one another. When they are comfortable enough to risk that, it allows me to assure them that they are where they need to be. This further allows me to help them to imagine themselves as becoming programmers at the end of the class—a motivational technique identified by Duane Shell as key to students’ sense of control and self-regulation within classroom learning environments.8

I am often asked what texts I use in these classes (aside from various technical manuals). I tend to draw on a large pool of articles and book chapters, customizing the reading list in accordance with students’ interests. I try to focus, however, on readings that raise issues for people building and designing software systems. I would say that in some cases, I have a bias toward older works, especially those in which a now familiar technology is being presented (or commented upon) for the first time. A typical sequence might start with Neal Stephenson’s In the Beginning Was the Command Line,9 an essay which, despite its focus on the Microsoft anti-trust case over a decade ago, provides an interesting focus for meditation on interfaces in general (particularly in relationship to identity). We might then move on to early definitions of work on human-computer interaction in the work of Vannevar Bush, J. C. R. Licklider and Douglas Engelbart. As our discussions of this subject grow more philosophical, I might turn them toward Walter Benjamin’s “The Work of Art in the Age of Mechanical Reproduction” (1936)10 and a few essays by Marshall McLuhan (usually “The Medium is the Message,” “Media Hot and Cold” and “The Gadget Lover: Narcissus as Narcosis”) from Understanding Media.11 Questions about the limits of computation invariably arise during our discussions, and so I nearly always add Alan Turing’s “Computing Machinery and Intelligence” (1950),12 and perhaps John R. Searle’s “Minds, Brains, and Programs” (1980).13 Discussions of gender, the body, and online identity might prompt examinations of essays by Sherry Turkle, Karen Franck, Mark Dery, N. Katherine Hayles and Donna Haraway. I have been known to assign Martin Heidegger’s “The Question Concerning Technology” (1954) and excerpts from Deleuze and Guattari’s A Thousand Plateaus to groups of more advanced students.14 I almost always draw essays from two edited collections specifically focused on digital humanities: A Companion to Digital Humanities (2004) and A Companion to Digital Literary Studies (2007).15 Despite this considerable range of subjects and themes, there is always a central question to which I try to return as often as possible: What might computation mean in the context of humanistic inquiry?

Though I retired the explicit two-course sequence a few years ago, I do offer a sequel to interested students from time to time. This course is invariably a workshop, in which students plan, design, develop, document and test a complete software system. These courses can suffer from some of the usual problems associated with group-based learning (uneven participation and hard-to-define grading criteria, for example), but at their best, these courses work well at exposing students to the complexities of project management and application design. In our own program, it is more likely that exposure to such matters will now occur through focused internships and independent study projects. However, the work that has come out of these classes in recent years is of a very high standard. One group, for example, developed an application that allowed the user to submit a poem that the program then pairs with an appropriate image drawn, using basic machine-learning techniques, from Flickr (http://www.flickr.com/). A more recent group wrote a program that generated graph visualizations of sentence complexity metrics over a 250-volume corpus of Victorian novels. It is in the context of such work that I am usually able to introduce matters such as relational database design, web services and web application frameworks (as well as software development tools like profilers, debuggers and revision control systems). In many cases, however, I encourage the students themselves to discover which technologies might be relevant to their project, and to assign team members the task of developing particular expertise on a given subject. Here again, the focus is on emulating the ways in which software projects are actually developed among researchers and project teams in digital humanities.

In recent years, there have been various attempts to redefine the knowledge gained from the study of computer science to something like “computational thinking”.16 This is an attractive term in many ways, since it expands the range of subjects through which that knowledge might be gained. It seems to me entirely possible to give students an experience of computational thinking using only a relational database system, a domain-specific language like Processing or XSLT, or even the UNIX shell. But I continue to think that what is gained when humanities students learn to think in the context of sophisticated computational tools is not only computational thinking, but also “humanistic thinking.” The center of digital humanities, after all, is not the technology, but the particular form of engagement that characterizes the act of building tools, models, frameworks and representations for the traditional objects of humanistic study. “The emerging, highly programmed landscape ahead,” so often the object of fear and anxiety, can become a new instrument for contemplation if we can help our students to learn to think in and through what Rushkoff rightly calls “the underlying capability of the computer era.”17

Footnotes

1 Douglas Rushkoff, Program or Be Programmed: Ten Commands for the Digital Age (New York: O/R Books, 2010), 7, 3.

2 Pete McBreen, Software Craftsmanship: The New Imperative (Boston: Addison, 2002). McBreen traces the term to a NATO conference in 1968 and offers, as a typical example of the sort of thing being described, the SAFEGUARD Ballistic Missile Defense System developed between 1969 and 1975, which took “5,407 staff-years” to build (1–3).

3 Alan J. Perlis, “Foreword,” in Harold Abelson and Gerald Jay Sussman, Structure and Interpretation of Computer Programs, 2nd edn (Cambridge: MIT Press, 1996), xii.

4 Abelson and Sussman, Structure and Interpretation of Computer Programs, xvii.

5 See M. D. McIlroy, E. N. Pinson, and B. A. Tague, “Unix Time-Sharing System: Forward,” The Bell System Technical Journal 57, no. 6 (1978): 1899–904; Mike Gancarz, Linux and the Unix Philosophy (Amsterdam: Elsevier-Digital, 2003); and Eric S. Raymond, The Art of UNIX Programming (Boston: Addison, 2003).

6 Quoted in Peter H. Salus, A Quarter Century of UNIX (Reading: Addison-Wesley, 1994), 52–53. McIlroy was the inventor of the UNIX pipeline; see Gancarz, Linux and the Unix Philosophy.

7 It is rigidly object-oriented in the sense that everything in the language—including the elements of the symbol table—is an object. It lacks some of the OO features found in other class-based languages like Java, C++, and Scala, but can be made to emulate the prototypal inheritance of languages like JavaScript and Lua.

8 I am paraphrasing extensive and detailed research undertaken by Duane F. Shell and Jenefer Husman, “Control, Motivation, Affect, and Strategic Self-Regulation in the College Classroom: A Multidimensional Phenomenon,” Journal of Educational Psychology 100, no. 2 (2008): 443-59. This work has been a part of the “Renaissance Computing” initiative at UNL (sponsored by the U.S. National Science Foundation), which tries to imagine versions of introductory computer science classes specifically designed for students in the humanities, music, art and the life sciences. On this initiative, see Leen-Kiat Soh, Ashok Samal, Stephen Scott, Stephen Ramsay, Etsuko Moriyama, George Meyer, Brian Moore, William G. Thomas, and Duane F. Shell, “Renaissance Computing: An Initiative for Promoting Student Participation in Computing,” in Proceedings of the 40th ACM Technical Symposium on Computer Science Education, SIGCSE 2009, Chatanooga, 4–7 March 2009, ed. Sue Fitzgerald, Mark Guzdial, Gary Lewandowski, and Steven A. Wolfman (New York: ACM, 2009), 59–63.

9 Neal Stephenson, In the Beginning Was the Command Line (New York: Avon, 1999).

10 Walter Benjamin, “The Work of Art in the Age of Mechanical Reproduction,” in Illuminations: Essays and Reflections, ed. Hannah Arendt, trans. Harry Zohn (New York: Schocken Books, 1969), 217–51.

11 Marshall McLuhan, Understanding Media: The Extensions of Man (London: Routledge & Kegan Paul, 1964).

12 Alan Turing, “Computing Machinery and Intelligence,” Mind 59, no. 236 (1950): 433–60.

13 John R. Searle, “Minds, Brains, and Programs,” Behavioral and Brain Sciences 3, no. 3 (1980): 417–57.

14 Martin Heidegger, “The Question Concerning Technology,” in The Question Concerning Technology and Other Essays, trans. William Lovitt (New York: Harper & Row, 1977), 3–35; Gilles Deleuze and Félix Guattari, A Thousand Plateaus: Capitalism and Schizophrenia, trans. Brian Massumi (Minneapolis: University of Minnesota Press, 1987).

15 Susan Schreibman, Ray Siemens, and John Unsworth, ed. A Companion to Digital Humanities (Malden: Blackwell, 2004); Ray Siemens and Susan Schreibman, ed. A Companion to Digital Literary Studies (Malden: Blackwell, 2007).

16 See Peter J. Denning, “Beyond Computational Thinking,” Communications of the ACM 52, no. 6 (2009): 28-30; Mark Guzdial, “Paving the Way for Computational Thinking,” Communications of the ACM 51, no. 8 (2008): 25-27; Committee for the Workshops on Computational Thinking, Report of a Workshop on The Scope and Nature of Computational Thinking (Washington: National Academies Press, 2010).

17 Rushkoff, Program or Be Programmed, 13.