Difference between revisions of "Treatment of individual words"

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search
(H)
(M)
Line 274: Line 274:
  
 
[[MARGUR]], tagged Q
 
[[MARGUR]], tagged Q
 +
 +
[[MEÐ SAMA HÆTTI]]
  
 
[[MEÐAL]] is P.  See also [[MILLUM]], [[MILLI]].
 
[[MEÐAL]] is P.  See also [[MILLUM]], [[MILLI]].

Revision as of 13:07, 29 September 2010

A

A for negation, see AT

AÐEINS, tag as FP=focus particle, cf. only in English

AF HVERJU 'why', parse (WPP (P af) (WNP (WPRO-D hverju)))

AFTAN can be ADV or P, depending on whether it takes a complement or not.

AFTUR, 'again', tag as ADVP, not as ADVP-TMP, cf. PPCME. When AFTUR means 'back', tag as ADVP-DIR. When it is ambiguous, tag as ADVP-DIR.

ALLUR, ALLIR, tagged Q

ALLEINASTA is generally used as a focus particle and accordingly tagged FP, as with English ONLY in focus particle use.

ALLT AF LÉTTA, "af létta" is a PP and the lemma of létta is létti.

ALLS can be an ADV or P. ALLS is always tagged P when it introduces a CP-ADV, in which case it has a meaning akin to "þar sem" or "af því að".

ANNAÐ TVEGGJA, tag as NP-ADV

ANNAR, in most cases is tagged as OTHER; AÐRIR as OTHERS. However, if ANNAR clearly means the ordinal number "second" in context, it is tagged ADJ as all ordinal numbers are.

ARNA/ARNI the swearing-expletive-like element, as in ""skituna þá arna", is simply tagged N.

AT or A, negation suffix (the lemma is -at): "þá verðura honum gagn" 'then will.be-not him use'

Á

Á either RP or P; see RP

ÁÐUR is ADV when it occurs alone, projecting an ADVP-TMP. But when it introduces an adverbial clause alone, it is a P. When ÁÐUR introduces a comparative clause (which has an adverbial function) along with "en", see ÁÐUR EN below.

ÁÐUR EN, ÁÐUR EN AÐ ÁÐUR is tagged ADVR and projects an ADVP-TMP. The EN is a P and frequently takes a CP-CMP complement; see the documentation there. (cf. also FYRR EN, FYRR EN AÐ). Note that this construction sometimes occurs without the EN in older texts.

B

BÁÐIR, tagged as Q.

BÆÐI is tagged CONJ when it is part of a correlative conjunction, but otherwise it is a form of the quantifier BÁÐIR.

BÆÐI OG is a FLOATED CONJUNCTION. Note that bæði can be a neuter form of the quantifier BÁÐIR.

BRAUT undeclined "braut" indicating motion away ("abroad"), either occurring alone or inside a PP, is tagged ADV. Any declined forms or otherwise clearly nominal forms are tagged N.

BRÁTT, tag as ADVP-TMP.

BÚA in the periphrastic perfect construction VERA BÚINN AÐ X (where X is some VB forming an IP-INF with AÐ), BÚA is tagged VBN, not VAN. Click on link for an example.

E

EF 'if', tagged as P; it introduces an adverbial clause (like ÞEGAR often does)

EFTIR can be P or ADV. See also discussion in RP and NP-TMP. When "eftir" modifies an NP-TMP, the structure is:

(NP-TMP (OTHER-A Annan-annar) (N-A dag-dagur) (ADV eftir-eftir))

ENGINN, tagged Q

EINHVER and eitthvað, etc, are tagged ONE+Q (when the meaning is "some"), not ONE+WPRO, see also EINHVERN TÍMA.

EINN, usually tagged as ONE. However, if it means "alone" in a copular clause (e.h. "Jón var þar einn"), it is tagged ADJ. Also, it can be tagged FP following the English corpora in the following case:

"When ONE means ONLY, ALONE and follows the noun or pronoun it focuses or when it follows NOT in the meaning NOT ONLY, it is treated as a focus particle (FP)."

EINNIG, EINNINN 'also', tag as ALSO

EINS, meaning 'alike' as in ekki fór eins fyrir honum og henni, tag as ADJ. Otherwise, ADVR. In the EINS OG construction (a type of comparative) or any other comparative construction (see CP-CMP), "eins" is tagged ADVR. See EINS OG in ADJP#ADJ_heads_of_ADJP and ADJP#ADVR_heads_of_ADJP. Also, "eins" is ADVR in "undir eins".

EITTHVAÐ is tagged ONE+Q.

EN can be tagged CONJ or P, much like BUT in English.

ENDA, usually tagged as ADV, but it can be CONJ in cases where it clearly conjoins clauses. ENDA can also be tagged P, but *only* where it *clearly* introduces a subordinate clause of the CP-ADV type; in this latter case ENDA usually means something like "on the condition that", and it introduces a CP-ADV without a C node but with V-to-C movement of a conditional verb [1]: "enda sé hann svo lítillátur..."

ENNÞÁ or ENN ÞÁ, tag as (ADVP-TMP (ADV enn) (ADV þá))

ER When it means 'which, that' it is a complementizer of a relative clause (CP-REL). When ER means "when", there are two possibilities (1) if there is no antecedent we take it to be a C as before, projecting a CP-ADV with no wh-word (as in the CP-ADV complement of "þegar") (2) if there is a temporal antecedent it introduces a CP-REL clause. Rarely, but on occasion ER can also be a complementizer projecting a CP-THT clause. (Check the latter parse when you find it in the corpora, as there may be confusion on this point).

ETC as in the English corpora, "etc" is tagged FW. It can appear at the clause level as FW in some cases, though it generally functions as an adverb phrase there.

F

can be VB, or also MD when it tages a VBN complement (see also GETA).

FJARRI, tag either as ADVR or ADJR and lemmatize as FJÆR. The superlative of FJARRI is FJARST (ADVS) or FJARSTUR (ADJS) (both lemmatized as FJÆR); the superlative of FJÆR is FJÆRST (ADVS) or FJÆRSTUR (ADVR).

FJARST, FJARSTUR, FJÆR, FJÆRST, FJÆRSTUR, lemmatized as FJÆR. See FJARRI.

FRAM is tagged RP when it does not take a complement. Note that as with many of the words tagged RP, when FRAM immediately precedes a preposition heading a PP, it is parsed as a specifier of PP.

FRAMAR, FRAMAST are usually ADJR, ADJS respectively, projecting an NP-MSR like English "further/farther", "furthest/farthese". See also LANGT.

FRAMVEGIS, o.s.f, etc: tagged ADV. These words can project an ADVP which can be coordinated with any category, as in the examples below. Note that when "svo" appears, it is attached at the level of CONJ, not inside the ADVP headed by FRAMVEGIS.

                  (PP-1 (P about)
                        (NP (NP (NS Males))
                            (CONJP (CONJ and)
                                   (NP (NS Females)))
		            (, ,)
                            (CONJP (CONJ and)
                                   (ADVP (ADV so))
                                   (ADVP (ADV forth)))))

(NP-PRN-1 (NP (N-D stöðuglyndi-stöðuglyndi)
			(CONJ og-og)
			(N-D sparsemi-sparsemi)
			(IP-INF-PRP (TO að-að)
				    (VB passa-passa)
				    (RP upp-upp)
				    (PP (P á-á)
					(NP (N-A heilsu-heilsa)
					    (NP-POS (PRO-A sína-sinn))))))
		    (CONJP (CONJ og-og)
			   (ADVP (ADV svo-svo))
			   (ADVP (ADV framvegis-framvegis))))
	  (. .-.)))

FREMI, tag as ADV

FULLUR

FRÁ is tagged RP when it does not take a complement, like English "fro" in the English corpora

FYRIR OFAN, parse as recursive PPs

FYRR EN, FYRR EN AÐ FYRR is tagged ADVR and projects an ADVP-TMP. The EN is a P and it frequently takes a CP-CMP complement.(cf. ÁÐUR EN, ÁÐUR EN AÐ).

FYRST, tagged as P introducing CP-ADV when the meaning is 'since', as in I will do it since you won't. When it is a temporal adverb, it is tagged ADVS (though there may be some inconsistency about whether it is tagged ADV or ADVS in the corpus).

FYRSTA unlike the English corpora, FYRSTA is tagged ADJ (not ADJS, i.e. not superlative), projecting an NP, in the PP í fyrstu 'at first' but not as ADV in that case. A strong argument for not doing it as in the English corpora is that FYRSTA can have a determiner, cf. í fyrstunni. For the ordinal number form FYRSTA as in í fyrsta skipti 'for the first time', see FYRSTI. For the temporal adverb parallel to English first, see FYRST.

(PP (P í)
    (NP (ADJ-D fyrstu)))
(PP (P í)
    (NP (ADJ-D fyrstu$) (D-D $nni))))

G

GEGNUM can be P when it takes a complement, otherwise ADV. See also PP

GER see GJÖR

GERA is tagged DO, DODI, DOPI, etc., in all meanings. GERAST, however, is not tagged DO; GERAST is VB in the meanings "to happen" and "to become". However, it is wise to include both DO and VB in searches for GERAST. See also Lemmatization.

GJÖR is usually counterpart of the DO verb GERA, but when it means 'better' it is ADVR (comparative, the superlative is GJÖRST/GERST).

GETA when it takes a participle and means "be able to", it is tagged as a modal (MD). See modals MD. When GETA means 'mention' it is a regular verb, VB.

H

HANDA projects NP in 'til handa honum' but PP in 'handa honum'

HÁLFUR usually NUM, following PPCME2 guidelines for HALF.

HÁTTUR, see MEÐ SAMA HÆTTI

HEIM, HEIMA is tagged ADV. This is not like "home" in the English corpora, which is N. HEIMA is different because unlike English "home", HEIMA can never be used as a noun.

HELDUR is ADVR; where it means "but", we assume there is a silent "but" or no conjunction. HELDUR can occasionally be tagged FP, especially if it appears to participate in the NEG...BUT construction. In the FP use, HELDUR can be translated as English "only" (in the NEG...BUT construction, NEG and NEMA together mean "only"). "Allra helst" consists of an ADVS and an NP-POS, projecting an NP-ADV (preliminary version).

HINN tag as D when used as a definite article or demonstrative but PRO when used in the meaning 'other'.

HINUMEGIN meaning "on the other side" is tagged N-D, and it usually takes an NP-POS complement, as in "hinumegin árinnar" (="on the other side of the river"). It projects an NP-ADV.

HÉR 'here' is usually tagged as ADVP-LOC. When it does not have a locative meaning, as in hér eftir 'from now on', it is only tagged as ADVP.

HVAÐ, sometimes WADV, as in HVAÐ ER ÞETTA MIKIÐ? (similar to HVE MIKIÐ ER ÞETTA?).

HVAÐAN AF, WPP like English whereto

HVAR, meaning 'where' tag as WADV

HVATKI

HVE, HVERSU 'how', tag as WADV, like English how.

HVERIGUR is sometimes tagged WD.

HVERNIG, HVERNINN

HVER meaning 'each' (as in HVER ANNAR 'each other') is tagged Q, but WPRO when it means 'who' (interrogative pronoun or relative pronoun).

HVERGI meaning "nowhere" is tagged Q+ADV

HVÍ 'why' introduces CP-QUE

HVÍLÍKUR is tagged SUCH, just like SLÍKUR and ÞVÍLÍKUR.

HVOR, as in hvor hjá öðrum 'each with the other'

HVORT usually tagged WQ, in which case it means "whether" and introduces an indirect question. Occasionally it can be WPRO and project a WNP, as in the case of English "whether" when it means "which of two", and HVORT is tagged WPRO in the expression "hvort sem er" (see CP-FRL).

HVORTVEGGJA and hvorirtveggja, hvorartveggja, etc., are tagged Q+NUM.

HVORUGUR is tagged Q when a quantifier, CONJ if it is part of a correlative conjunction ("hvorugt...né..."). See the documentation in the English corpora for EITHER, NEITHER.

HÆGRI 'right' and VINSTRI 'left', tag ADJ.

I

ITEM the Latin word for "also", used in lists. We tag it FW, like "etc.", and it is similarly attached at the IP level in most cases. (See also ETC).

J

'yes', tag INTJ.

JAFN(T) usually ADVR or ADJR, though it can also be ADJ like SAMUR. For more information about the ADVR and ADJR use, see the page linked to JAFN(T).

K

KRING, KRINGUM 'around, round', as in round the edges of the flowers, is tagged P:

(PP (P kringum)
    (NP (PRO-A hana)))

As in the case of MILLI, KRING(UM) always projects a PP, even if it is sometimes intransitive.

                            (IP-SUB (ADVP-TMP *T*-2)
				    (NP-SBJ (NPR-N Pétur-pétur))
				    (VBDI ferðaðist-ferðast)
				    (PP (P um-um)
					(PP (P kring-kring)))
				    (IP-INF-PRP (NP-OB1 (Q-G allra-allur))
						(TO að-að)
						(VB vitja-vitja)))))
( (IP-MAT (CONJ og-og)
	  (NP-SBJ (Q-N allar-allur) (NS-N ekkjur$-ekkja) (D-N $nar-hinn))
	  (VBDI flykktust-flykkjast)
	  (ADVP-LOC (ADV utan-utan))
	  (PP (P um-um)
	      (PP (P kring-kring)
		  (NP (PRO-A hann-hann))))
	  (IP-PPL (VAG grátandi-gráta))))

See KRINGUR for cases like í krók og kring

KRINGUR is a noun, as in í krók og (í) kring

L

LANGUR usually ADJ and LENGRI is usually ADJR, both frequently project NP-MSR like English "further/farther"; see especially NP-MSR#NP-MSR_heading_ADJP. See also FRAMAR.

LENGI is ADV, as in "hann var þar lengi" (lit. he was there long, i.e. 'he stayed there for a long time'). However, LENGI, like LANGUR, frequently projects an NP-MSR. Not to be confused with forms of LANGUR.

LÍKA 'also', tag as ALSO; can exceptionally be ADVR and license a CP-CMP in Oddur Gottskálksson´s New Testament.

LíKT is ADVR when it licenses a comparative.

LÍTILL, LÍTIÐ 'little, not much', tag as ADV in "Þær þekktust lítið", cf. "Þær þekktust vel". Otherwise it is usually Q, QR, or QS, parallel to MIKIÐ provided it cannot be replaced by "smár" in the given usage. Anywhere that LÍTILL can be replaced by "smár", "smærri", or "smæstur", it is tagged ADJ, ADJR, or ADJS as appropriate. In cases in doubt (as to the precise meaning of the word in context), the default is Q (or QR, QS). See also NP-MSR.

M

MANNGI, MANGI, tagged Q

MARGUR, tagged Q

MEÐ SAMA HÆTTI

MEÐAL is P. See also MILLUM, MILLI.

MEÐAN, Á MEÐAN, MEÐAN Á, Á MEÐAN Á; usually tagged P, projecting a PP and taking a CP-ADV complement, as with English "while". When MEÐAN takes no complement, it is tagged ADV and projects ADVP (this is the parse whether or not MEÐAN is the complement of another preposition). When MEÐAN takes no complement and occurs without Á, it generally projects an ADVP-TMP.

MIKIÐ, MIKILL This is tagged as a quantifier, Q, QR (for MEIRA), or QS, when it cannot be replaced by "stór", "stærri", "stærstur". If it can be replaced by "stór", then it is tagged ADJ, ADJR, or ADJS as appropriate. In cases in doubt (as to the precise meaning of the word in context), the default is Q (or QR, QS). See also NP-MSR.

MILLI/MILLUM tagged as P, as in (PP (P millum) (NP bæja)) or (PP (P á) (PP (P millum (NP bæja))). If there is no complement of MILLI there can be cases where the PP idomsonly (P milli).

MITT can be the neuter form of the ADJ "miður", but when it means "in the middle" and does not agree with some argument in number and case, we tag it ADV and it projects an ADVP-LOC.

MJÖG is tagged ADV. When it occurs alone and means "much" or "a lot", it projects NP-MSR.

MÓTI as in á móti honum tagged "N", even when it means "facing", e.g. "hvor á móti öðrum". In this construction, "móti" is considered a dative N taking an NP-COM, analogously to English "side"; see the PPCME2,PPCEME guidelines on Complements of N and NP-COM. When the preposition is missing parse the whole phrase as a PP with silent P.

N

NÉ is tagged CONJ, like English "nor"

NEI 'no', tag as INTJ.

NEINN 'no one, (not) any', tag as Q

NEMA tagged as P, analogously to English "except". NEMA can occasionally be tagged FP, especially if it appears to participate in the NEG...BUT construction. In the FP use, NEMA can be translated as English "only" (in the NEG...BUT construction, NEG and NEMA together mean "only"). See also HELDUR and EN.

NOKKUR usually Q. But when NOKKUÐ means "quite" or "somewhat", it is tagged ADV.

NÆR, NÆRRI, NÆSTUM, are either ADV or ADJ. See the discussion in ADVP#Complements_of_ADVP and ADJP#Complements_of_ADJ. Note that NÆR can sometimes mean 'when'.

NÆRINDIS is usually ADJ, but otherwise like NÆRRI below; see the discussion in ADVP#Complements_of_ADVP and ADJP#Complements_of_ADJ.

NÆSTA 'very', tag as ADV; when it means 'almost', see NÆSTUM

O

OF tagged as ADVR when it means 'too' as in 'too much', P when it takes a complement, and RP otherwise (the last case applies often when of is a word that has no obvious meaning in Old Icelandic)

S

SAKIR tagged as NS where possible, and P elsewhere

SAMAN is tagged ADV, usually attached high (Þeir komu saman). Sometimes it is though a part of an NP (Ég hitti þau saman). For TIL SAMANS, see PP documentation.

SAMUR 'same', tag as ADJ

SANNLEGA, two of these, SANNLEGA, SANNLEGA, are parsed as one ADVP, cf. English, TRULY, TRULY (Bible)

SEINN usually an ADV projecting an ADVP-TMP, when it is a clausal modifier denoting the time that something happened. It can also be tagged ADJ when it modifies a noun. See the PPCME2/PPCEME guidelines on EARLY and LATE.

SEM is always tagged as a complementizer, C. This is true even for comparatives such as feitur sem svín, 'as fat as a pig'; see CP-CMP for a full discussion. This is not the same as the treatment of "as" in the PPCME2, PPCEME, and in general, our treatment of comparatives differs somewhat. All comparatives are treated as clausal, i.e. involving a CP-CMP, in Icelandic.

When SEM introduces an adverbial clause, it is still a complementizer, and it simply projects a CP-ADV in which it occupies the C position.

SEM AÐ is treated as a single C, like: (C (C21 sem-sem) (C22 að-að)).

SÉRHVER is tagged Q, just like the quantifier usage of HVER.

SÍÐAN, is usually an ADV projecting ADVP-TMP. It can also be a preposition (Ég hef verið hér síðan í gær 'I have been here since yesterday') which can introduce CP-ADV (Ég hef verið þreyttur síðan þú komst heim 'I have been tired since you came home')

SÍÐASTA is tagged ADJ (not ADJS, i.e. not superlative), projecting an NP, in the PP að síðustu(nni) 'at last'

SJALDAN 'seldom', sometimes SKJALDAN (e.g. in Ævisaga Jóns Steingrímssonar), projects ADVP-TMP

SJÁLFUR is tagged PRO, and is parsed as an NP-PRN when it modifies another pronoun, parallel to emphatic "himself", "herself", etc., in the English corpora.

SLÍKUR, 'such' is tagged SUCH, like English such

SODAN, SODDAN, SVODDAN tagged SUCH.

SPYRJA, 'ask' takes ((NP-SBJ NOM) SPYR (NP-OB2 ACC) (NP-OB1 GEN))

SVO is tagged as ADVR when it is a degree adverb, e.g. when it modifies an adjective or another adverb or occurs in a svo...að clause. As with English "so", when "svo" is not used in a degree sense (ADVR) or as a preposition (P), SVO is tagged ADV. In its adverbial (ADV) use, SVO can generally be paraphrased by "þannig" or "á þá leið" (in English, IN THAT WAY).

SVO SEM

STÓRUM when it means "much" is tagged ADV and projects an NP-MSR.

STUNDUM is tagged NS-D, and projects an NP-TMP.

SUNDUR is ADV, projecting an ADVP complement in Í SUNDUR

T

TUGUR is tagged N, and it frequently occurs within an NP-MSR (though this need not always be true).

TVENNUR, TVISVAR, ÞRENNUR, etc. are tagged NUM, analogously to "once", "twice", "thrice" in English.

U

UNDIR is a P. In expressions like "undir eins", "undir hádegi", or "undir eins og CP-CMP", UNDIR is still tagged P and takes a PP complement. See also PP.

UNS 'until', tag as P with CP-ADV as sister.

UTAN can be ADV or P, depending on whether it takes a complement or not.

V

VANUR Note that VANUR frequently takes IP-INF complements; see the discussion in IP-INF and ADJP.

VERÐA is tagged with its own tag, RD (RDDI, RDPI, RDDS, RDPS, etc.), in all uses. This is because, like BE, it has auxiliary and non-auxiliary uses in Icelandic. This matches the treatment of "werden" in Caitlin Light´s Early New High German corpus.

VINSTRI 'left' and HÆGRI 'right', tag ADJ.

Y

YFIR is tagged P when it takes a complement and projects a PP, and RP when it does not take a complement. It can also be a degree modifier, like English OVER, in which case it is tagged ADVR: "yfir tvær þúsundir manns".

Þ

ÞAÐ The object pronoun ÞAÐ is always tagged PRO. The subject pronoun ÞAÐ is tagged as PRO in any syntactic context where it *never* disappears under subject-finite-verb inversion. In Icelandic, when the finite verb fronts over the subject under V-to-C movement in matrix clauses or embedded topicalizations, truly non-expletive ÞAÐ will still surface in subject position, but truly expletive ÞAÐ will disappear. In any syntactic context in which ÞAÐ would disappear under subject-verb inversion, it is tagged ES (in accordance with Caitlin Light's Early New High German corpus). Such contexts include, at least: weather expressions, impersonals, and existentials.

ÞAÐ is tagged ES even in contexts where there is inter-speaker or inter-text (or diachronic) variation with regard to whether it disappears.

See Expletives also, for *exp*, which is the empty category corresponding to "(ES ÞAÐ)". Note that when ÞAÐ disappears under verb-movement, "(NP-SBJ *exp*)" is only inserted in the sentence if there is no other possible subject (e.g. it is not inserted in subject-postposition constructions).

ÞAÐ ER AÐ SEGJA, ÞAÐ ER, Þ.E. 'that is to say'

ÞANNIG

ÞANGAÐ TIL, ÞAR TIL 'until' and ÞAR TIL AÐ, ÞANGAÐ TIL AÐ, 'until that', introduce an adverbial clause

ÞAR Á MILLI, second PP idoms a P and a trace of the R-pronoun "þar".

ÞARS, old form for þar es (= "þar er"), split "(ADV þar$-þar) (C $s-er),

ÞÁ ER, ÞÁ ÞEGAR

ÞEGAR, 'when', is tagged as P when it introduces an adverbial clause (i.e. in the case that there is no antecedent for the "when"-clause), CP-ADV, so þegar Norðmenn tóku ..., 'when Norwegians took ...', is (PP (P þegar) (CP-ADV (C 0) (IP-SUB (NP-SUB Norðamenn) (VBD tóku) (...))).

However, ÞEGAR is tagged WADV when it introduces an indirect question or a relative clause, just as with English WHEN in the PPCME2/PPCEME.

ÞEGAR can also be tagged ADV, projecting an ADVP-TMP or an ADVP where it unambiguously means "immediately" or "promptly".

ÞEIM MUN, ÞESS AÐ

ÞEYGI Icel. '(þó) eigi', '(though) not', tag (ADV þ$) (NEG $eygi).

ÞÓ, ÞÓTT 'while' and ÞÓ AÐ, ÞÓTT AÐ, 'while that', introduce an adverbial clause (cf. PPCME)

ÞVÍ is normally a dative pronoun, PRO. However, when it is a wh-word, meaning "hví" ("why"), it is tagged WADV. Where ÞVÍ means "because" by itself, or appears to function as an adverb by itself roughly meaning "therefore", it is still tagged PRO: see CP-THT for more information on how these constructions are parsed.

ÞVÍLÍKUR is tagged SUCH, in the same was as SLÍKUR

for þar, þangað, þaðan, as in þar sem (e. where), þangað sem (e. to where), þaðan sem (e. from where), see CP-REL