Tagset

From Icelandic Parsed Historical Corpus (IcePaHC)
Revision as of 15:10, 2 February 2010 by Anton (Talk | contribs) (Verbs)

Jump to: navigation, search

This is the head level tagset used in the Icelandic Treebank. The tagset is based on the IFD Tagset. Each head is assigned a tag (N-NSDIC in the example below):

<synttree>[NP[N-NSDIC[barni]]]</synttree>

In the example barni is a dative form of the Icelandic word for 'child'. The first part of the tag, before the dash, always represents the head type (roughly 'word class'), e.g. N for Noun. The extension of the above tag, NSDIC, means: Neuter, Singular, Dative, Indefinite (no suffixed article), Common Noun. The predashial head type can be one or more characters while the postdashial subfeatures are always one character per feature.

Tags with postdashial subfeatures

Nouns

# Category/Feature Symbol – semantics
1 Word class N–noun, NPR-proper noun, NS-noun, plural, NPRS-proper noun, plural
2 Gender M–masculine, F–feminine, N–neuter, X–unspecified
3 Number S–singular, P–plural
4 Case N–nominative, A–accusative, D–dative, G–genitive
5 Article I-without suffixed article (indefnite), D–with suffixed definite article (definite)
6 Proper/Common C-common noun, P–person name, L-place name, O–other proper name

Adjectives

# Category/Feature Symbol – semantics
1 Word class ADJ–adjective, ADJR-adjective, comparative, ADJS-adjective, superlative
2 Gender M–masculine, F–feminine, N–neuter, X-unspecified
3 Number S–singular, P–plural
4 Case N–nominative, A–accusative, D–dative, G–genitive
5 Declension S–strong declension, W–weak declension, X–indeclineable
6 Degree P–positive, C–comparative, S–superlative

Pronouns

# Category/Feature Symbol – semantics
1 Word class PRO–pronoun
2 Subcategory D–demonstrative, B–indefinite demonstrative (Icel. 'óákveðið ábendingarfornafn'), Q–possessive, X–indefinite (Icel. 'óákveðið'), P–personal, W–interrogative, R–relative
3 Gender/Person M–masculine, F–feminine, N–neuter/1–1st person, 2–2nd person
4 Number S–singular, P–plural
5 Case N–nominative, A–accusative, D–dative, G–genitive

Article (determiner)

# Category/Feature Symbol – semantics
1 Word class D–article (determiner)
2 Gender M-masculine, F–feminine, N–neuter
3 Number S–singular, P–plural
4 Case N–nominative, A–accusative, D–dative, G–genitive

Numbers

# Category/Feature Symbol – semantics
1 Word class NUM–numeral
2 Category P-(málfr. frumtala, þ.e. ekki raðtala???), F-percentage (fraction), O-other
3 Gender M–masculine, F–feminine, N–neuter
4 Number S–singular, P–plural
5 Case N–nominative, A–accusative, D-dative, G–genitive

Verbs

(see table below for present/past participle)

# Category/Feature Symbol – semantics
1 Word class VBP–verb present tense, VBD-verb past tense, VB-infinitive, VBI-imperative, VBN-perfect participle
2 Mood T–infinitive, M–imperative, I–indicative, S–subjunctive, U–supine, P–present participle, D-past participle
3 Voice A–active, M–middle
4 Person 1–1st person, 2–2nd person, 3–3rd person,
5 Number S–singular, P–plural
6 Tense P–present, D–past


# Category/Feature Symbol – semantics
1 Word class VBD–verb, past participle, VAN-verb, present participle
2 Tense D–past
3 Voice A–active, M–middle
4 Gender M–masculine, F–feminine, N–neuter
5 Number S–singular, P–plural
6 Case N–nominative, A–accusative, D–dative, G–genitive

Prepositions

# Category/Feature Symbol – semantics
1 Word class P–preposition
2 Case governed A–governs accusative, D–governs dative, G–governs genitive

Adverbs

# Category/Feature Symbol – semantics
1 Word class ADV–adverb
2 Category N–normal, I–exclamation
3 Degree C–comparative, S–superlative

Simple Tags

  • CONJ - conjunction
  • FOREIGN - foreign word
  • NEG - negation, ekki, eigi, ei
  • TO - infinitival marker , 'to'.
  • X - unanalyzed word