Main Page

From Icelandic Parsed Historical Corpus (IcePaHC)
Jump to: navigation, search

This is the wiki for the Icelandic Parsed Historical Corpus (IcePaHC). It is mostly used to document the annotation standard for those constructing and using the corpus. The annotation scheme is meant to be mostly compatible with the Penn historical corpora, and the guidelines here are written as a supplement to the Penn guidelines, so look at Beatrice Santorini's guidelines for further information.


Go to the download page to download the corpus. The corpus is released under a free and open source license (LGPL) and there is no registration wall. The current release is version 0.1 which is a preview release of about 31.000 words. We recommend use of released versions to ensure that results can be replicated but between releases you can watch the development at Github.

Annotation guidelines

Search PPCME/PPCEME documentation ( <html> <form method="get" action="">

  <a href=">  
  <img src="" border="0" alt="Google"></a>  
  <input type="text" name="q" maxlength="255" />  
  <input type="submit" value="Google Search" />  
  <input style="visibility:hidden" type="radio" name="sitesearch" value="" checked="checked" />

</form> </html>


For the version 0.1 preview release of July 1st 2010.

Wallenberg, Joel C., Anton Karl Ingason, Einar Freyr Sigurðsson and Eiríkur Rögnvaldsson. 2010. 
Icelandic Parsed Historical Corpus (IcePaHC). 
Version 0.1.

General information

Annotation team stuff:


Treebank team:


The project is funded in part by the following grants:

  • From the Icelandic Research Fund (RANNÍS), grant nr. 090662011, Viable Language Technology beyond English – Icelandic as a test case.
  • From the U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English