SEASR for Digital Humanities

This week I’m at my second Digital Humanities Summer Institute at the University of Victoria. Last year, I took the large project management course, and it was tremendously useful in managing several projects, including ESTHR. This year I decided to try my hand at SEASR (pron. “Caesar”), or Software Environment for the Advancement of Scholarly Research.

The complex toolkit has great potential. The course has been frustrating, in part (though not wholly) because SEASR’s documentation is not at all geared toward your average digital humanist, or what I know of my diverse kind, anyway. I thought that the best thing I can do with my class time today was to write some documentation. Here it is. It’s in no way complete; just the beginning of an overview of SEASR for digital humanists. Please feel free to repost, augment, comment here with your augmentations and have me edit this post to reflect them, what have you.

At the end of this post, I propose the beginning of a list of categories into which all components and flows might be subdivided, each component/flow probably listed in more than one category. This would help humanities scholars with no prior experience with SEASR, or even some of the functionality it affords, get oriented in using it.

I also propose that we need a lot more detailed information for each component and flow. The SEASR team has already begun this process, but given the project’s maturity and the fact that it’s in its third year of being taught at DHSI, such (again, humanities-scholar-oriented) documentation is sorely lacking.

I should say that, unless ESTHR or another one of my projects decides to pursue use and development of SEASR, I am unlikely to add to its documentation after the end of this week.  I ardently encourage SEASR’s developers and managers to devote significant resources to documenting this great project, such that it may be usable by the wide diversity of researchers who stand to benefit by it.

OK, here we go with the overview.

WHAT IS SEASR?

It’s a modular multi-tool for working with word-based texts. You can find the project site at seasr.org. From there, you can try it out (see the Documentation section). Or you can download the relevant software (which includes Meandre, a web-based interface for creating and working with SEASR data flows) and install it on your computer or a server.

WHAT IS SEASR MADE OF?

Meandre’s web interface—the Workbench—reveals that SEASR is made up of components and flows. Flows are little applications that can help you ask questions of your materials.

Components are the building blocks of flows, snippets of code that you can string together . At the time of this writing, there are 180 components available. Ideally, each component does a single thing: fetch a document from a URL you specify, for example, or convert all text to lowercase. The more discrete its function, the more broadly a component is applicable.

You can use existing flows with your data, or build your own out of existing components. Thanks to the graphical user interface of the Meandre Workbench, building new flows requires only dragging and dropping existing components, connecting outputs to inputs by clicking, and a basic understanding of other information you might need to provide (like a URL from which to fetch text, for example).

You can also write new components. For someone who knows Java, this is generally not difficult (although, of course, it depends on what you want the component to do).

WHAT KINDS OF QUESTIONS CAN I ASK USING SEASR? HOW CAN I DO THIS?

Theoretically, any questions. In practice, however, many of its components were created in order to answer specific questions in the moment. Because of this, the set of functions that can be performed in SEASR is not a cohesive whole. As SEASR’s developers and the user community contribute more components, the tool will become more robust.

In order to ask a question using SEASR, you have to:

  • select a flow that you think might bring you closer to the answer you seek;
  • use data in a format that the flow needs in order to operate;
  • give the flow any other operands it might need (for example, a Google Maps API key for a flow that’s supposed to map data for you)

WHAT KIND OF DATA AND FILE FORMATS CAN SEASR HANDLE?

That depends on what flow(s) you’re using—but right now, for the most part, text-based data. File formats SEASR can handle include [but aren’t limited to—developers, what am I missing?] direct plain text input, PDF (as long as it’s text-based or OCR‘d) and anything text-based that a URL might contain.

WHAT KNOWLEDGE DO I NEED TO USE SEASR EFFECTIVELY?

If you intend to build your own workflows (and it’s likely to be necessary for doing deep research), it would be very useful for you or your collaborators to know Java. SEASR’s components are, for the most part, written in it. It’s certainly possible to use SEASR without knowing Java, but that will only get you so far. This is because, although many of the components are sufficiently well documented, in some cases it helps to look at the code to determine whether the component, in fact, does what you need to do—because if not, you might be in for creating a new component.

HOW DO I GET STARTED?

The easiest way to get started is to use some of the flows available on seasr.org (see the Documentation section). Then, install your own copy of SEASR/Meandre and look at some of the flows. The manual for Meandre is available at . After installing Meandre, you can expect to gain facility with the basic processes of setting up workflows within a few hours. Understanding the intricacies of flows and components will, of course, take longer.

Other useful URLs:

PROPOSED COMPONENT/FLOW CATEGORIES

Only a starting list. What other categories do we need?

  • – Working with  XML-encoded texts
  • – Working with XML code
  • – Working with non-encoded texts
  • – Natural language processing
  • – Data visualization, graphical
  • – Working with time
  • – Working with proper nouns
  • – Social networks

INFORMATION STILL NEEDED

For each component:

  • categories for dh scholars
  • dependencies (what other components must this one be used with)
  • examples/use cases

For each flow:

  • inputs needed (a la GMaps API key)
  • external tools/specialized concepts referenced
  • categories for dh scholars
  • examples/use cases

 

Comments are closed.


css.php