aacimp logo

  • About us
    • What is AACIMP?
    • Organizers
    • Contact us
    • Kyiv
  • Partners
    • How to get involved
    • Financial
    • Media
    • Corporate Sponsorship
    • Travel
    • In kind
  • Program
    • Smart Cities
    • 3D Printing
    • Computational Neuroscience
    • Applied Computer Science
    • Poster Session
    • Plenary Session
    • Student Projects
    • AACIMP 2013
    • AACIMP 2014
  • Participation
    • General schedule
    • How to apply?
    • Eligible students
    • Student responsibilities
  • Tuition
    • Registration fees
    • How to pay?
    • Discounts & scholarships
  • Housing
    • Accommodation
    • Dining services
    • Travel information
    • Visa information
  • Impressions
    • Photo memories
    • Video memories
    • Organizers impressions
    • Alumni opinions
  • Promotion
  • FAQ

X Summer School

Achievements and Applications
of Contemporary Informatics, Mathematics and Physics
August 4-18, 2015, Kyiv (Ukraine)

AACIMP small logo


Creation of Ukrainian language NER system

Abstract:

Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

NER is one of the popular NLP tasks, and the challenge of creating a robust NER system lies in access to a substantial large corpus of annotated data. However, such data is not available for all languages, specifically for the Ukrainian one, but there’s a potential to use unsupervised and semi-supervised approaches.

We will use the unannotated Ukrainian language corpus (https://github.com/mariana-scorp/lt-project) as a starting point and will need to dvelop some of our own data-sets/annotations, as well as try to adapt one of the existing NER algorithms or come up with our own variation.

Recommended reading:

  • Coursera NLP course - Week 4, Named entity recognition and Maximum Entropy Sequence Models
  • A survey of named entity recognition and classification
  • Learning a Part-of-Speech Tagger from Two Hours of Annotation
  • Design Challenges and Misconceptions in Named Entity Recognition - advanced

Project prerequisites:

  • Basics of natural language processing (ready to present)
  • Basics of machine learning
    • Linear classification models
    • Semi-supervised and unsupervised ML approaches
  • Working with text corpora (ready to present)
  • Programming language: Python

Associated topics:

natural language processing, semi-supervised and unsupervised machine learning

Planned lectures:

  • Basics of NLP
  • Working with text corpora

About lecturer:

Mr. Vsevolod Dyomkin,
Grammarly Inc.


About us

  • What is AACIMP?
  • Organizers
  • Partners
  • Contact us

Participation

  • How to apply?
  • Application form
  • Registration fees
  • How to pay?
  • Discounts & scholarships
  • FAQ
  • Home
  • Site map
  • Contact us

Built with HTML5 and CSS3
Copyright © 2006—2015 Student Science Association

  • About us
    • What is AACIMP?
    • Organizers
    • Contact us
    • Kyiv
  • Partners
    • How to get involved
    • Financial
    • Media
    • Corporate Sponsorship
    • Travel
    • In kind
  • Program
    • Smart Cities
    • 3D Printing
    • Computational Neuroscience
    • Applied Computer Science
    • Poster Session
    • Plenary Session
    • Student Projects
    • AACIMP 2013
    • AACIMP 2014
  • Participation
    • General schedule
    • How to apply?
    • Eligible students
    • Student responsibilities
  • Tuition
    • Registration fees
    • How to pay?
    • Discounts & scholarships
  • Housing
    • Accommodation
    • Dining services
    • Travel information
    • Visa information
  • Impressions
    • Photo memories
    • Video memories
    • Organizers impressions
    • Alumni opinions
  • Promotion
  • FAQ