CP-CMP

From Icelandic Parsed Historical Corpus (IcePaHC)
Revision as of 12:30, 27 July 2010 by Einarfs (Talk | contribs) (SEM and SVO SEM)

Jump to: navigation, search

Being revised, see CP-CMP-OLD for old guidelines that will be removed once these are finished ...

IMPORTANT: Nearly all CP-CMPs are sisters of a comparative head (exceptions are implied comparatives; cf. also the SEM+Superlative case below). This is usually ADVR, ADJR, or QR (also sometimes SUCH, OTHER). Note that the comparative "head" may also be ADVP containing a trace in cases where the ADVR has been topicalized or stylistically fronted out of its original clause.

CAVEAT: if you are looking at something that is similar to a CP-CMP but there is no comparative head present, you are probably looking at a CP-ADV. It is especially important to distinguish these two clause types where SEM appears as the C, in which case both a CP-CMP and a CP-ADV will appear with a gap. They are distinguished not by the presence of a gap, but rather by the presence of a comparative head.

Full Correlative Comparatives

See full discussion in [ English corpora guidelines]. Treated as below, which translates as "the fewer the letters, the less writing there is":

                                          (IP-SUB (NP-SBJ-3 (ES það-það))
						  (BEPI er-vera)
						  (NP-3 (N-N rit-rit)
							(NP-MSR (QR-N minna-lítill)))
						  (NP-ADT (CP-CMP (WNP-4 0)
								  (C er-er)
								  (IP-SUB (NP-PRD *T*-4)
									  (NP-SBJ (NS-N stafir-stafur))
									  (BEPI eru-vera)))
							  (ADJR-N færri-fár))))))))
	      (CP-THT-SPE-PRN-1 (C að-að)
				(IP-SUB-SPE (NP-SBJ (N-N hluti-hluti)
						    (NP-POS (PRO-G ykkar-þú)
							    (NP-PRN (NS-G feðga-feðgi))))
					    (VBPS muni-muna)
					    (BE vera-vera)
					    (NP-PRD (D-D því-það) (ADJR-N verri-verri))
					    (NP-MSR (CP-CMP-SPE (WNP-2 0)
								(C er-er)
								(IP-SUB-SPE (NP-MSR *T*-2)
									    (NP-SBJ (NS-N deildir-deild)
										    (NP-POS (PRO-N vorar-ég)))
									    (VBPI standa-standa)))
						    (ADVR lengur-lengi))))
	      (. .-.)))

EN (BETUR EN (CP-CMP) ...)

EINS OG is treated exactly parallel to EN in comparatives, with OG in the same P position that EN occupies.

Also see the EINS OG page for expressions like "eins stór eins og Jón" or "svo stór eins og Jón"

Overt clausal structure

BETUR EN
(ADVP (ADVR betur-vel)
    (PP *ICH*-6))
(VAN BÚNIR-BÚA)
(PP (CONJ *ICH*-5)
  (PP (P að-að)
      (NP (N-D viti-vit)))
  (CONJP (CONJ og-og)
	 (PP (P að-að)
	     (NP (N-D máli-mál)))))
(PP-6 (P en-en)
    (CP-CMP (WADJP-7 0)
	    (C 0)
	    (IP-SUB (ADJP *T*-7)
		    (NP-SBJ (PRO-N vér-ég))
		    (BEPS SÉIM-VERA))))))))
HELGARI EN
  (ADJP (ADJR-N helgara-helgari)
	(PP (P en-en)
	    (CP-CMP (WNP-1 0)
		    (C 0)
		    (IP-SUB (IP-SUB-2 (NP-SBJ (NS-N menn-maður))
				      (MDPS megi-mega)
				      (PP (P eftir-eftir)
					  (NP *T*-1))
				      (VB GLÍKJA-GLÍKJA))
			    (CONJP (CONJ eða-eða)
				   (IP-SUB=2 (RP frá-frá) (VB segja-segja)))))))
MEIRA EN
(IP-SUB (NP-SBJ (PRO-N hún-hún))
      (VBDI tókst-taka)
      (NP-MSR (ADJR-A MEIRA-MIKILL)
	      (PP *ICH*-1))
      (PP (P á-á)
	  (NP (NS-A hendur-hönd)))
      (IP-INF (NP-OB1 (NPR-D Guði-guð))
	      (TO að-að)
	      (VB þjóna-þjóna))
      (PP-1 (P en-en)
	    (CP-CMP (WNP-2 0)
		    (C 0)
		    (IP-SUB (IP-SUB-3 (NP-MSR *T*-2)
				      (NP-SBJ (N-N boðorð-boðorð))
				      (BEDS væri-vera)
				      (RP til-til))
			    (CONJP (CONJ eða-eða)
				   (IP-SUB=3 (NP-SBJ (N-N dæmi-dæmi)))))))))))
ÁÐUR EN
	  (IP-MAT-SPE (ADVP-TMP (ADVR Áður-áður)
				(PP (P en-en)
				    (CP-CMP-SPE (WADVP-1 0)
						(C 0)
						(IP-SUB-SPE (ADVP *T*-1)
							    (NP-SBJ (NPR-N Filippus-filippus))
							    (VBDI kallaði-kalla)
							    (PP (P á-á)
								(NP (PRO-A þig-þú)))

(Partly) reconstructed clausal structure

When the comparative clause contains only a nominative subject, we treat it as if the rest of the comparative has been elided. In other words, the comparative is parsed as if it contained a full subordinate clause, even though only the subject of that clause is overt.

Note that this is unlike the parse of sentences like "John is bigger than Bill" in the English corpora (PPCME2, PPCEME), where "than Bill" is parsed as a PP without internal clausal structure. Since there is no prepositional comparative in Icelandic, except possibly the "No clausal structure" case described under that heading below, comparatives almost always contain some clausal structure in the Icelandic corpus.

  (ADJP (ADJR-N HELGARI-HELGUR)
	(PP (P en-en)
	    (CP-CMP (WADJP-5 0)
                    (C 0)
		    (IP-SUB (ADJP *T*-5)
			    (NP-SBJ (OTHERS-N aðrir-annar))))))))

Object comparatives

Treated exactly the same as the previous case, where all but the subject is elided. The difference is that here, the IP-SUB contains only an object, rather than only a subject.

Hann vill banana frekar en (hann vill) appelsínur.

Honum tókst setningafræðin betur en (honum tókst) hljóðkerfisfræðin.

(ADVP (ADVR frekar)
       (PP (P en-en)
           (CP-CMP (WADVP-1 0)
                   (C 0)
                   (IP-SUB (ADVP *T*-1)
                           (NP-OB1 (NS-A appelsínur-appelsína))))))))))

Ellision in Comparative Clauses

Frequently (or usually) most of the subordinate clause inside of the CP-CMP is elided. We generally do not represent the elided structure in terms of empty categories, though sometimes there will be a CODE node explaining the interpretation of the elided material. See the Penn English corpora guidelines for general guidelines on comparative elision.

Note that CP-ADVs which contain a gap, i.e. ones with the complementizer "sem", may also have elided material that is not represented in the annotation.

However, note that where a preposition is elided but its object is overt, we represent the preposition as a node with a CODE node in the word-lemma position. The CODE node indicates the meaning of the elided preposition, as in the example below. (see also "stranding or chopping" of prepositions in the the Penn English corpora guidelines.

                                    (CP-THT-PRN (C að-að)
                                                (IP-SUB (ADVP-TMP (ADV aldrei-aldrei))
					                (MDPI getur-geta)
							(NP-SBJ (PRO-D mér-ég))
							(RDN orðið-verða)
							(ADJP (ADVR eins-eins)
							    (ADV vel-vel)
							    (PP (P við-við)
							        (NP (Q-A neina-neinn)
								       (NP-POS (PRO-G þeirra-hann))))
								(ADVP (ADVR eins)
								      (PP (P og)
									  (CP-CMP (WADJP-2 0)
										  (C 0)
										  (IP-SUB (ADJP *T*-2)
											  (PP (P (CODE {við}))                        <----- Elided P
											      (NP (D-A þá-sá)
												  (N-A brekku$-brekka)
												  (D-A $na-hinn)
												  (PP (P á-á)
												      (NP (PRO-D honum-hann)
													  (NP-NPR (NPR-D Fagradal-fagradalur)
														  (, ,-,)

No clausal structure

In all of these cases there is no clausal structure inside the comparative PP, and there is no clearly elided material. In other words, there is no way to make a full clausal comparative out of the expression by adding in some elided material. Also, in all of these cases, the case-marking on the NP within the comparative PP does not come from its function inside a (possibly elided) clausal comparative, but rather matches the case of the NP which contains the comparative PP.

Usually these are expressions with OTHER like "I would hire someone other than John" (* "other than John is"), as in the following example:

                                                              (IP-SUB-SPE (NP-SBJ *T*-3)
                                                                          (NEG ekki-ekki)
                                                                          (BEPI eru-vera)
                                                                          (PP (P fyrir-fyrir)
                                                                              (NP (OTHERS-A aðrar-annar)
                                                                                  (PP (P en-en)
                                                                                      (NP (ADJ-A afgamlar-afgamall) (NS-A kerlingarhrotur-kerlingarhrota)))))))))))

Also for expressions with ÖÐRUVÍSI 'different', as in: Jón er öðruvísi en Joel (*er). Then EN is simply a PP with an NP complement.

This parse is also used for cases in which a full clausal comparative is possible, but where the case-marking on the NP inside the comparative PP could not possible come from such an analysis, as in the case below:

                                            (IP-SUB (ADVP *T*-3)
						    (NP-SBJ (NPR-N Gróa-gróa))
						    (VBDS ætti-eiga)
						    (NP-OB1 (Q-A engan-enginn)
							    (ADJR-A betri-góður)
							    (N-A vin-vinur)
							    (PP *ICH*-4))
						    (ADVP-LOC (ADV hér-hér)
							      (PP (P á-á)
								  (NP (N-D jörðu-jörð))))
						    (PP-4 (P en-en)
							  (NP (PRO-A sig-sig))))))))

SEM and SVO SEM

SEM

Currently under revision, see SVO SEM for older guidelines.

SEM is always tagged C. For the treatment of SEM AÐ see splitting and joining words.

Every clause containing a SEM complementizer also contains a gap.

If the clause also has a comparative head, e.g. QR, ADJR, ADVR, SUCH, or OTHER, then it is a CP-CMP. Otherwise, you are probably looking at a CP-ADV.

SEM comparatives do not have an introducing P head: they are just CP-CMP, with SEM in C and a gap of the appropriate kind in Spec(CP). Note that this is not the same as the treatment of "as"-comparatives in the PPCME2, PPCEME, even though SEM is sometimes translated similarly to English "as".

SEM comparative with partly reconstructed clausal structure:

   (NP (D-A þá-sá)
       (N-A mállýsku-mállýska)
       (, ,-,)
       (CP-REL (WNP-5 0)
	       (C ER-ER)
	       (IP-SUB-6 (NP-OB1 *T*-5)
			 (NP-SBJ (PRO-N ÉR-ÞÚ))
			 (MDDI kunnuð-kunna)
			 (ADVP (ADV JAMT-JAFNT)
			       (CP-CMP *ICH*-9))
			 (VB skilja-skilja)
			 (IP-SUB-PRN=6 (CONJ og-og)
				       (PP (P um-um)
					   (IP-INF (TO að-að) (VB mæla-mæla))))
			 (CP-CMP-9 (WADVP-8 0)
				   (C sem-sem)
				   (IP-SUB (ADVP *T*-8)
					   (NP-SBJ (PRO-N vér-ég))))))))))))))))
  (ADVP-TMP-RSP (ADV þá-þá))
  (BEPI er-vera)
  (NP-SBJ (PRO-N hún-hún))
  (ADJP (ADJ-N jafnbjört-jafnbjartur)
	(CP-CMP (WADJP-2 0)
		(C sem-sem)
		(IP-SUB (ADJP *T*-2)
			(ADVP-TMP (ADV áður-áður)))))

SVO SEM

Left-dislocated SVO SEM clause:

( (IP-MAT (ADVP (ADV Nú-nú))
	  (ADVP-LFD (ADVR svo-svo)
		    (CP-CMP (WADVP-1 0)
			    (C sem-sem)
			    (IP-SUB (ADVP *T*-1)
				    (NP-SBJ (Q-N allir-allur)
					    (NS-N hlutir-hlutur)
					    (NP-PRN *ICH*-2))
				    (RDDI urðu-verða)
				    (PP (P að-að)
					(NP (ADJ-D helgum-helgur) (NS-D dómum-dómur)))
				    (, ,-,)
				    (NP-PRN-2 (PRO-N þeir-hann)
					      (CP-REL (WNP-3 0)
						      (C ER-ER)
						      (IP-SUB (NP-SBJ *T*-3)
							      (NP-11 (NPR-D Drottni-drottinn)
								     (NP-POS (PRO-D ÓRUM-VOR)))
							      (BEDI VÓRU-VERA)
							      (ADJP (ADJS-N NÁLÆGSTIR-NÁLÆGUR)
								    (NP *ICH*-11)))))
				    (, ,-,)
				    (CP-ADV (WADVP-12 0)
					    (C sem-sem)
					    (IP-SUB (ADVP *T*-12)
						    (NP-SBJ (NP (N-N ETAN-JATA)
								(, ,-,)
								(CP-REL (WNP-5 0)
									(C ER-ER)
									(IP-SUB (NP-SBJ (PRO-N hann-hann))
										(BEDI var-vera)
										(PP (P í-í)
										    (NP *T*-5))
										(VAN lagður-leggja)
										(, ,-,)
										(ADVP-TMP (ADV þá-þá)
											  (CP-ADV (WADVP-6 0)
												  (C ER-ER)
												  (IP-SUB (ADVP-TMP *T*-6)
													  (NP-SBJ (PRO-N hann-hann))
													  (BEDI var-vera)
													  (NP-PRD (N-N barn-barn))))))))
							    (, ,-,)
							    (CONJP (CONJ eða-eða)
								   (NP (N-N klæði-klæði)))
							    (CONJP (CONJ eða-eða)
								   (NP (Q-N margir-margur) (NS-N hlutir-hlutur) (OTHERS-N aðrir-annar)))))))))
	  (, ,-,)
	  (NP-SBJ *exp*)
	  (ADVP-RSP (ADV þá-þá))
	  (MDPI má-mega)
          [...]

SVO SEM with stylistically fronted SVO:

  (CP-REL (WNP-1 0)
	  (C ER-ER)
	  (IP-SUB (NP-SBJ *T*-1)
		  (ADVP-2 (ADVR svo-svo))
		  (BEDS væri-vera)
		  (ADJP (ADVP *ICH*-2)
			(ADJ-N hreinlíf-hreinlíf)
			(CP-CMP (WADJP-4 0)
				(C SEM-SEM)
				(IP-SUB (ADJP *T*-4)
					(NP-SBJ (N-N líkneski-líkneski))
					(RDPI verður-verða)
					(PP (P með-með)
					    (NP (D-D þeim-sá)
						(N-D lit-litur)
						(CONJP (CONJ og-og)
						       (NX (N-D geisla-geisli)))
						(, ,-,)
						(CP-REL (WNP-3 0)
							(C sem-sem)
							(IP-SUB (NP-SBJ *T*-3)
								(PP (P í-í)
								    (ADVP (ADV gegnum-gegnum)))
								(VBPI skín-skína)))))))))))))

SVO SEM (LFD) X SVO (RSP) Y

Left dislocation of SVO SEM 'so as' clause followed by a SVO 'so' resumptive element.

( (IP-MAT (CONJ En-en)
	  (ADVP-LFD (ADVR svo-svo)
		    (CP-CMP (WADVP-1 0)
			    (C sem-sem)
			    (IP-SUB (ADVP *T*-1)
				    (NP-SBJ (PRO-N vér-ég))
				    (VBPI trúum-trúa)
				    (, ,-,)
				    (CP-THT (C að-að)
					    (IP-SUB (NP-SBJ (PRO-N hún-hún))
						    (BEPI er-vera)
						    (ADJP (NP-CMP (Q-D öllum-allur))
							  (ADJ-N helgari-helgari)))))))
	  (, ,-,)
	  (ADVP-RSP (ADVR svo-svo))
	  (MDPI skulum-skulu)
	  (NP-SBJ (PRO-N vér-ég))
	  (ALSO og-og)
	  (NP-OB1 (PRO-D því-það)
		  (CP-THT-PRN *ICH*-2))
	  (VB trúa-trúa)

SEM+SUPERLATIVE

SEM SKYNDILEGAST 'as soon as possible', SEM FYRST 'as soon as possible', SEM NÁKVÆMAST 'as precisely as possible', etc.

( (IP-MAT (ADVP-TMP (ADV Síðan-síðan))
	  (VBDI gekk-ganga)
	  (NP-SBJ (NPR-N Ingveldur-ingveldur))
	  (ADVP-DIR (ADV burt-burt))
	  (CP-CMP (WADVP-1 0)
		  (C sem-sem)
		  (IP-SUB (ADVP *T*-1)
			  (ADJP (ADJS-N skyndilegast-skyndilegur))))
	  (. ;-;)))


( (IP-MAT (CONJ og-og)
	  (NP-SBJ *con*)
	  (VBDI innti-inna)
	  (PP (P að-að)
	      (NP (Q-D öllu-allur)))
	  (CP-CMP (WADVP-1 0)
		  (C sem-sem)
		  (IP-SUB (ADVP *T*-1)
			  (ADJP (ADJS-N nákvæmast-nákvæmur))))
	  (. ,-,)))