Unsupervised Language Learning
Course duration: 10-12 h
Prerequisites: general knowledge of Algorithms and Data Structures, Statistics
Natural language processing consists of encoding knowledge in formal models. However, in addition to encoding explicit language-specific knowledge (e.g. “маленька парасолька”UKR means “small umbrella”ENG) one can also encode meta-knowledge: assumptions on what kind of knowledge there is (e.g. “phrases in Ukrainian have corresponding translations in English”), and then use these assumptions (e.g. “phrases and their translations are likely to co-occur”) to automatically discover language-specific knowledge in recorder examples of language usage (text corpora) without having to specify it yourself. This approach is called unsupervised language learning and it will be the central topic of this course.
- unsupervised morphological segmentation
- unsupervised parsing
- word and phrase alignment
- statistical machine translation
- text clustering
Dr. Mark Fishel
Place of employment: Post-doc, Institute of Computational Linguistics, University of Zurich
Curriculum Vitae: PhD in Computer Science from the University of Tartu, currently a post-doc in statistical machine translation at the University of Zurich.
Spheres of interests: machine translation, machine learning, computational natural language learning