Before you can start annotating, we need to set up a module of the WebTool for your language. To do that, we need your team to please provide us with the following:

  1. An UTF-8 coded version of the transcription for the TED Talk "Do schools kill creativity"
    The text must be separated in sentences per line.
    Also, please remove all time stamps and comments such as (laughs).

  2. An UTF-8 coded txt file listing all the word forms of the lexemes in the TED Talk transcription in your language. The file should configured as shown in the WF_POS_LEX.txt file attached to this tutorial. Also, since we are going to use the POS tags used in FrameNet, each group must also use this same tagset, which is listed below:

    'A' = Adjective
    'ADV' = Adverb
    'ART' = Article
    'AVP' = Adverbial Preposition
    'C' = Conjunction
    'INTJ' = Interjection
    'N' = Noun
    'NUM' = Number
    'PREP' = Preposition
    'PRON' = Pronoun
    'V' = Verb

  3. A CSV file (see model attached) listing the labels each team would like to use in annotation. For the experiment, each team may use up to 6 layers of annotation, which are:

    Frame Element: this layer will be fed with FN data release 1.7, and teams won't be able to change it.

    Grammatical Functions: we'll need a list of the relevant grammatical functions teams would like to use. Teams must also indicate to which kinds of targets the GF labels must be made available. For example, if a language features a Direct Object, which can only be taken by verbs, this specificity must be informed. Also, if the GFs used are not chosen from UD tags, it is important that teams point to the best-fit UD equivalent for the tag they decided to use.

    Phrase Types: we'll need a list of the relevant phrase types teams would like to use. Once again, if the PTs used are not chosen from UD tags, it is important that teams point to the best-fit UD equivalent for the tag they decided to use.

    POS-specific: FrameNet annotation features a layer that is coindexed with the POS of the target LU. In this layer, we annotated things such as support verbs, copulas, aspectual particles and so on. We'll need a list of the relevant labels do be made available for: Verbs, Nouns, Adjectives, Adverbs and Prepositions.

    Other: in this layer, we usually annotate relative pronouns and their antecedents. We need to know what other teams would like to have here too.

    Sentence: this last layer features tags applicable to the whole sentence, and may include notes such as the existence of a metaphor, or how prototypical the sentence is.

    In the layer_label_pair.csv file attached, we've listed the English labels and layers, so that teams can use it as a reference.

After that, your team will gain access to the tool.