“Sum” THATCamp possibilities? « THATCamp London 2010

“Sum” THATCamp possibilities?

As a participant of the upcoming THATCamp I was asked to outline a session I’d like have. Hmm… Well, I think I can brainstorm a few possibilities:

Exploiting R – R is a increasingly popular open source statistical tool/programming language. I’d like to get up with others to discuss how it can be used in the digital humanities.
Graphing texts – There are many ways texts can be “measured”. I can count the number of words, the parts of speach, and the reading level. I’ve begun to count the “greatness” of a book as described in a number of blog postings. Once these sorts of things are measured, I’d like to discuss with people ways these measurments and be illustrated through the use of charts and graphs. A picture is worth a thousand words.
Integrating digital humanities with libraries – As a librarian one of my ultimate goals is to figure out ways digital humanities computing techniques can be seamlessly integrated into library collections and services. Instead of a library “catalog” simply pointing a person to a text, I’d like it to offer services allowing the user to… use the text. Maybe we can create a prototype of such a thing.
Reducing ambiguity – In one of my “experiements” I wanted to assess a set of works’ use of the word “being”, as in the thing, but the analysis returned too many false-positives because the word was being used as a verb and not a noun. Such a problem is not uncommon, and I’m wonding how it can be resolved.

‘Just some ideas, and please be gentle with me. I’m a noob.

—
Eric Lease Morgan

Tags: DH2010, digital humanities

This entry was posted on Thursday, June 17th, 2010 at 12:06 pm and is filed under Blog. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to ““Sum” THATCamp possibilities?”

aelang says:

June 18, 2010 at 9:54 am

Hello Eric

On your ‘reducing ambiguity’ point: this is fixed fairly easily. You need to run your corpus through a POS (part of speech) tagger which will automatically tag all words with the part of speech they belong to (or that the computer thinks they belong to). Tag sets for parts of speech differ, but the one used by the British National Corpus would use NN1 to mark ‘being’ as a noun, and VBG to mark it as a present participle verb.

The British National Corpus is tagged using CLAWS (ucrel.lancs.ac.uk/claws/) which apparently has 96-97% accuracy, but there are also some open-source POS-taggers out there too. I have never used any of them, though, so I’m afraid I can’t help with recommending any specific ones.

Anouk

Log in to Reply

You must be logged in to post a comment.

“Sum” THATCamp possibilities?

One Response to ““Sum” THATCamp possibilities?”

Leave a Reply

THATCamp London 2010

Pages