Difference between revisions of "Icelandic Parsed Historical Corpus (IcePaHC)"
(→Citation) |
(→Grants) |
||
Line 70: | Line 70: | ||
* From the '''Icelandic Research Fund (RANNÍS)''', grant nr. 090662011, '''[http://iceblark.wordpress.com/ Viable Language Technology beyond English – Icelandic as a test case]'''. | * From the '''Icelandic Research Fund (RANNÍS)''', grant nr. 090662011, '''[http://iceblark.wordpress.com/ Viable Language Technology beyond English – Icelandic as a test case]'''. | ||
* From the '''U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP)''', grant '''#OISE-0853114, Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English''' | * From the '''U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP)''', grant '''#OISE-0853114, Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English''' | ||
+ | * From the '''University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands)''', grant '''Icelandic Diachronic Treebank (Sögulegur íslenskur trjábanki)''' |
Revision as of 15:10, 4 October 2010
This is the wiki for the Icelandic Parsed Historical Corpus (IcePaHC) (The Icelandic Treebank). It is mostly used to document the annotation standard for those constructing and using the corpus. The annotation scheme is meant to be mostly compatible with the Penn historical corpora, and the guidelines here are written as a supplement to the Penn guidelines, so look at Beatrice Santorini's guidelines for further information.
Download
You can download the corpus from the download page. The corpus is released under a free and open source license (LGPL) and there is no registration wall. The current release is version 0.2 which is a preview release of about 120.000 words from 6 different centuries (12th, 13th, 16th, 17th, 18th and 19th). We recommend use of released versions to ensure that results can be replicated but between releases you can watch the development at Github.
Annotation guidelines
- Phrase Types
- Head Types
- Conjunction
- Tagset
- Clause level constituents
- Treatment of individual words
- Empty categories
- Lemmatization
- Splitting and joining words
- Index
Search PPCME/PPCEME documentation (ling.upenn.edu/~beatrice/annotation) <html> <form method="get" action="http://www.google.com/search">
<a href="http://www.google.com/> <img src="http://www.google.com/logos/Logo_40wht.gif" border="0" alt="Google"></a> <input type="text" name="q" maxlength="255" /> <input type="submit" value="Google Search" /> <input style="visibility:hidden" type="radio" name="sitesearch" value="http://ling.upenn.edu/~beatrice/annotation/" checked="checked" />
</form> </html>
Citation
For the version 0.2 release of October 1st 2010.
Wallenberg, Joel C., Anton Karl Ingason, Einar Freyr Sigurðsson and Eiríkur Rögnvaldsson. 2010. Icelandic Parsed Historical Corpus (IcePaHC). Version 0.2. http://www.linguist.is/icelandic_treebank
General information
- Icelandic Syntax Phenomena
- Syntactic definitions
- English-Icelandic Translations of Linguistic Terminology
- Texts
- Genres
Annotation team stuff:
- Annotation Issues ...
- Annotation Process
- Checklist
- MediaWiki Formatting Guide
Resources
- Icelandic Resources for doing Computational Linguistics and Natural Language Processing
- Treebank Resources (language independent)
- Penn Parsed Corpora of Historical English
- Parsed Corpora for other languages
Treebank team:
- Joel Wallenberg (PI) (joel.wallenberg@gmail.com)
- Eiríkur Rögnvaldsson (PI) (eirikur@hi.is)
- Anton Karl Ingason (anton.karl.ingason@gmail.com)
- Einar Freyr Sigurðsson (einarfs@gmail.com)
- Brynhildur Stefánsdóttir (BA research assistant)
- Hulda Óladóttir (BA research assistant)
Grants
The project is funded in part by the following grants:
- From the Icelandic Research Fund (RANNÍS), grant nr. 090662011, Viable Language Technology beyond English – Icelandic as a test case.
- From the U.S. National Science Foundation (NSF) International Research Fellowship Program (IRFP), grant #OISE-0853114, Evolution of Language Systems: a comparative study of grammatical change in Icelandic and English
- From the University of Iceland Research Fund (Rannsóknasjóður Háskóla Íslands), grant Icelandic Diachronic Treebank (Sögulegur íslenskur trjábanki)