Kyoto University Digitization Hub of the Humanities, Social and Cognitive Sciences (KUDH) 京都大学デジタル人文学国際会議KUDH2021

Day 1 Session 1-a
Digital Corpus & Syntactic Annotation through Universal Dependencies

Day 1 Session 1

October 2
(Oct. 1 in the US)

“Digital Corpus and Syntactic Annotation through Universal Dependencies:

UD Treebanks for Coptic, Classical Chinese, Old Japanese, and Ainu

デジタルコーパスとUniversal Dependenciesによる統語情報付与:

コプト語、古典中国語、上代日本語、アイヌ語のためのUDツリーバンク

Registration is until 9:45AM (JST)

Oct 2 9:50-10:00 Japan
(Oct 1 20:50-21:00 EDT)

Opening

Chigusa Kita and So Miyagawa

Oct 2 10:00-11:00 Japan
(Oct 1 21:00-22:00 EDT)

UD Treebanking for Coptic DH: Low Resource NLP Technologies for NER, Lexicography and Linked Open Data

Amir Zeldes (Georgetown University)

Abstract

The Universal Dependencies project, which provides morphosyntactically analyzed data in over 100 languages, offers homogeneous annotation schemes and workflows for both Big Data languages such as English, and Low Resource languages often at the heart of Digital Humanities work. In this talk I will present work on a language from the latter group: Coptic, the language of 1^st millennium Egypt. Thanks to progress in NLP technologies and the development of UD annotated data, our project, Coptic Scriptorium (https://copticscriptorium.org/) has been able to create fully automatic tools for analyzing Coptic data, including morphological analysis, part-of-speech tagging, lemmatization, parsing and entity recognition. These analyses feed a suite of tools enabling Named Entity Linking to open data such as Wikipedia, as well as automatic generation of lexicographic examples and entity-type based Word Sense Disambiguation in an online dictionary. This work shows that a variety of technologies often assumed to be relevant mainly for Big Data languages, such as Deep Learning, Transformers (BERT) and more, can work well when even modest amounts of richly annotated UD data are available for bootstrapping.

Oct 2 11:00-12:00 Japan
(Oct 1 22:00-23:00 EDT)

UD for lzh (Classical Chinese) ojp (Old Japanese) and ain (Ainu)

Koichi Yasuoka (Kyoto University, Institute for Research in Humanities)

Discussion and Concluding Remarks

Page updated

Google Sites

Report abuse

Day 1 Session 1-a Digital Corpus & Syntactic Annotation through Universal Dependencies

October 2(Oct. 1 in the US)

Opening

UD Treebanking for Coptic DH: Low Resource NLP Technologies for NER, Lexicography and Linked Open Data

UD for lzh (Classical Chinese) ojp (Old Japanese) and ain (Ainu)

Day 1 Session 1-a
Digital Corpus & Syntactic Annotation through Universal Dependencies

October 2
(Oct. 1 in the US)