MLA ‘07: Thursday
Highlights from Thursday:
The first session I went to was “The Challenge of a Million Books.” The title refers to computational mining of huge amounts of text at a time, in an attempt to discover bird’s-eye-view-level things we’d have trouble seeing with the naked eye. I discovered at this session that text mining is also called knowledge discovery. The latter is a term a bit too generic, I think: my encoded Roland excerpts also permit, even encourage, knowledge discovery, but what I’ve done with manual encoding and a simple interface is a far cry from sophisticated algorithms and machine learning.
Sara Steger presented on her research of sentimentality in nineteenth-century literature. This doctoral dissertation work is one of the test cases for the MONK project (Metadata Offer New Knowledge), one of the coolest collaborative endeavors currently out there. Simply put by the project creators themselves, MONK “is a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study.” Sara took a bunch of mid-19th-century English texts, designated some chapters as sentimental (she brought up Little Nell’s death scene from Dickens’ The Old Curiosity Shop as an example), and other chapters as unsentimental. She used these as the training set for the MONK algorithm, “asking” it to figure out more or less on its own what makes a text sentimental or unsentimental, and then having new chapters automatically classified. Statistical analysis then revealed some interesting things: some words are clearly associated with sentimentality (having to do with the female gender, or children, or death, or love), while others are just the opposite (including titles such as Mr./Mrs., and business- and law-related words). Sara’s theory is that this means sentimentality is not just there when we “feel it.” It’s at least in part a formula, used by 19th-century writers to political ends. Her research is still in progress, but is already producing quite cool results.
The other cool URL I gathered from the session is SEASR (pronounced Caesar), Software Environment for the Advancement of Scholarly Research. This project works in tandem with MONK, and seems to aim for “construct[ing] data services that access and normalize unstructured information.” It looks as though the final product will be available not only to large projects but to individual scholars as well; exciting.
Later that evening John Unsworth spoke on “Cyberinfrastructure and Open Standards, Methods, and Communities.” As usual with Unsworth’s dense and whirlwind talks, I quickly gave up on taking notes, Luckily, the entire talk is online, albeit a bit difficult to read without margins. But copy-paste, print it out even, read it: this powerhouse of digital humanities always impresses with his ability to synthesize large, important topics in an accessible way.