TAPoR on TAPoR.

Ray Siemens of Victoria hosts a session of three papers related to the Text Analysis Portal for Research. First we have Geoffrey Rockwell, with “Text empires: text analysis in excess.” Shawn Day will talk about “The use of the recipe as a guilding metaphor for flexible and efficient self-guided computing instruction.” Finally, Stéfan Sinclair will talk “On data & views in text analysis.” All three presenters are from McMaster University in Hamilton, near Toronto.

ROCKWELL.

Information overload: 5 exabytes of information created in 2002. Exabyte = 1,000,000,000,000,000,000 Bytes. It’s a thousand petabytes, or a million terabytes. [Holy wow.] Spam is cheap, but reading has costs. How can text analysis help?

Why this explosion of information?

- growth in population and wealth: more money, more media toys

- multiple-media, from the photograph (1820s) to the iPod

- digitization of information and business practices: cheap creation, storage, reproduction, and transmission

Challenges to the system: what are the effects?

- experience of information overload

- multimedia shock

- narrowing expertise (because nobody can’t keep up with a broad discipline!)

- archive fever

What can we do?

- understand the problem (literary dimension to it; a problem of scale, a bibliographic problem)

- produce less? [shock! I can hear the internal gasps around the room!]

- file and (not) store smarter

- find smarter (not more) [ooh, I'll quote him in my dissertation work! no, I cannot, in fact, process all the litcrit written to this day]

- learn to read differently

The latter two of the above are opportunities for text analysis.

Problem of scale to text analysis for finding and reading:

- heterogeneous formats and multimedia rich

- closed (“for perfectly reasonable reasons” -GR) information empires (Google) build on existing indexes or build their own

- new questions, research methods (data mining and visualization)

- text analysis tools developed for coherent texts (collaborate with data mining & HPC [high-performance computing] community)

TAPoR.2 model, Beyond Finding and Reading:

- gathering and aggregation function (working with existing empires like Google; create your own study library (myEmpire))

- mining function (clustering and classification; provoking questions, not finding)

- interface and visualization function (effective interactions for research)

DAY

They’re using the recipe metaphor to get people of different backgrounds to use TAPoR.

A recipe for self-guided instruction:

- ingredients

- steps

- glossary

- discussion

- further information

Ingredients:

- ingenuity

- a useful metaphor

- a versatile set of tools

- users desirous or willing to consider using said tools

Steps

- identify objective

- consider users’ needs

- develop case studies that describe how your tools can meet these needs

- apply a familiar metaphorical approach to engage and instruct

- deploy recipes through a wiki

Glossary

- recipe: a useful guiding metaphor that offers optimal flexibility…. [couldn't get it, too fast]

Further Information:

Try the recipes out! (For example.)

Nice, familiar, easy concept. As Shawn is pointing out right now, super easy to engage a beginner user. This could be very useful, as well, when getting folks used to traditional humanities research methods to try, say, text encoding.

SINCLAIR

[Stéfan is the creator of HyperPo, the coolest text analysis tool ever so far.]

Generally, there’s a one-to-one mapping between tools and the data views of their results. SS has been thinking more in terms of this progression:

text -> tool -> data (TAML) -> style -> view

Among other things, he wanted to create a framework to use in teaching the development of text analysis tools in a modular way.

It’d also be nice to be able to chain tools together – you run a tool on a text, get the resultant data and feed it to another tool, and so on. This requires tools that can ‘talk” to each other, and output data in the same (or similar enough, or easily translateable) formats.

HyperPo 7.0 is coming soon!

Comments are closed.


Switch to our mobile site