aacimp logo

  • About us
    • What is AACIMP?
    • Organizers
    • Contact us
    • Kyiv
  • Partners
    • How to get involved
    • Financial
    • Media
    • Corporate Sponsorship
    • Travel
    • In kind
  • Program
    • Smart Cities
    • 3D Printing
    • Computational Neuroscience
    • Applied Computer Science
    • Poster Session
    • Plenary Session
    • Student Projects
    • AACIMP 2013
    • AACIMP 2014
  • Participation
    • General schedule
    • How to apply?
    • Eligible students
    • Student responsibilities
  • Tuition
    • Registration fees
    • How to pay?
    • Discounts & scholarships
  • Housing
    • Accommodation
    • Dining services
    • Travel information
    • Visa information
  • Impressions
    • Photo memories
    • Video memories
    • Organizers impressions
    • Alumni opinions
  • Promotion
  • FAQ

X Summer School

Achievements and Applications
of Contemporary Informatics, Mathematics and Physics
August 4-18, 2015, Kyiv (Ukraine)

AACIMP small logo


Normalization of Noisy Text

Abstract:

This project is based on the ACL 2015 Shared Task (see http://noisy-text.github.io/norm-shared-task.html). User generated content (UGC) such as the text in Twitter messages is notoriously varied in content and composition, often containing ungrammatical sentence structures, non-standard words and domain-specific entities. Accuracy declines have been observed in many NLP tasks over UGC data, motivating the need for methods which normalise the content prior to the application of NLP tools to the data.

This task focuses on text normalisation, in aiming to normalise non-standard words in English Twitter messages to their canonical forms. In this, we aim to correct non-standard spellings (e.g., toook for took), expand informal abbreviations (e.g., tmrw for tomorrow), and normalise phonetic substitutions (e.g., 4eva for forever).

Recommended reading:

  • Full shared task description
  • CMU Tweet NLP

Project prerequisites:

  • Basics of natural language processing (ready to present)
  • Basics of machine learning
  • Working with text corpora (ready to present)
  • Programming language: Python

Associated topics:

natural language processing, machine learning

Planned lectures:

  • Basics of NLP
  • Working with text corpora

About lecturer:

Mr. Vsevolod Dyomkin,
Grammarly Inc.


About us

  • What is AACIMP?
  • Organizers
  • Partners
  • Contact us

Participation

  • How to apply?
  • Application form
  • Registration fees
  • How to pay?
  • Discounts & scholarships
  • FAQ
  • Home
  • Site map
  • Contact us

Built with HTML5 and CSS3
Copyright © 2006—2015 Student Science Association

  • About us
    • What is AACIMP?
    • Organizers
    • Contact us
    • Kyiv
  • Partners
    • How to get involved
    • Financial
    • Media
    • Corporate Sponsorship
    • Travel
    • In kind
  • Program
    • Smart Cities
    • 3D Printing
    • Computational Neuroscience
    • Applied Computer Science
    • Poster Session
    • Plenary Session
    • Student Projects
    • AACIMP 2013
    • AACIMP 2014
  • Participation
    • General schedule
    • How to apply?
    • Eligible students
    • Student responsibilities
  • Tuition
    • Registration fees
    • How to pay?
    • Discounts & scholarships
  • Housing
    • Accommodation
    • Dining services
    • Travel information
    • Visa information
  • Impressions
    • Photo memories
    • Video memories
    • Organizers impressions
    • Alumni opinions
  • Promotion
  • FAQ