Open Collaboration for Developing and Using Asian Language Treebank

The population of the ASEAN Economic Community is over 600 million and they speak many different languages. Consequently, natural language processing (NLP) is necessary to cope with many languages.

The state of the art technologies in NLP are based on treebanks. A treebank is a linguistic knowledge representation of natural language texts. The basic linguistic annotations in treebanks are word segmentation, part-of-speech (POS) tagging, and parsing annotations. Almost all NLP researches and tools are based on treebanks in a broad sense.

The main problem of the creation of a treebank is that it needs a lot of linguistic knowledge for the language. As a result, existing treebanks are limited in their sizes, annotation types and languages. In particular, no publicly available treebanks for most of Asian languages.

This background makes us propose this project for developing Asian Language Treebank (ALT). The objective of ALT is developing a parallel treebank for Asian languages. Indeed, ASEAN IVO is an ideal organization for developing ALT, because it consists of top-level NLP research institutes for Asian languages. Without ASEAN IVO, it will be impossible to corporate and cover main Asian languages for building treebanks.


Project Theme
  • ICT Solutions to the Challenges surrounding Urbanization
  • Social Renovation in Rural Areas and/or Urban Areas

Leveraged Resources and Participants

ASEAN IVO is an ideal organization for developing ALT, because it consists of top-level NLP research institutes for Asian languages. Without ASEAN IVO, it will be impossible to corporate and cover main Asian languages for building treebanks.

The developing of ALT has already been started. NICT and UCSY has started building Japanese, English and Myanmar treebanks in FY 2015. NICT has also finished the translation of 20,000 English sentences (from Wikinews) into Indonesian, Vietnamese, Thai, Khmer, Laos, Malay, Philippine languages.

In this project, BPPT, I2R, IOIT, NIPTICT, UCSY and NICT will develop ALT for Indonesian, Malay, Vietnamese, Khmer, Myanmar and Japanese languages, respectively. (NICT will also develop English ALT). Those different language treebanks will be built from the already translated Wikinews. After finishing the development of ALT, it will be used to develop NLP tools within this project.

The members of this project are as follows:

  • BPPT
    •      Dr. Michael Purwoadi, Director ICT Center, oversee the Intelligent computing and Language Technology activities in BPPT
    •      Gunarso, Leader of Language Technology working group
    •      Dr. Teduh Uliniansyah, Researcher of Language Technology working group
  • I2R
    •      Ms Aw Ai Ti, a senior researcher at I2R, who is an expert in NLP and machine translation
    •      Ms Sharifah Mahani Aljunied, a researcher at I2R, who is an expert in NLP and Malay Linguistics.
  • IOIT
    •      Vu Tat Thang, PhD.
    •      Luong Chi Mai, Assoc. Prof., PhD.
    •      Nguyen Phuong Thais, Assoc. Prof, PhD.
  • NIPTICT
    •      Mr. Rapid, Sun, director of Research and Development Center, who is the supervisor of NLP projects
    •      Mr. Vichet Chea, researcher at NIPTICT, who is an expert in NLP and machine translation.
  • UCSY
    •      Dr. Khin Mar Soe, a Professor at NLP lab, UCSY, who is currently doing research in NLP and machine translation.
    •      Dr. Khin Thandar Nwet, a researcher at NLP lab, UCSY, who is currently doing research in NLP and machine translation.
  • NICT
    •      Dr. Masao Utiyama, a senior researcher at NICT, who is an expert in NLP and machine translation
    •      Dr. Chenchen Ding, a researcher at NICT, who is an expert in NLP and machine translation

For more information: Asian Language Treebank (ALT) Project

0 Years
0 Lecturers
0 Enrollment
0 Graduates

LATEST NEWS

2024 Convocation Announcements

The application form can be downloaded in here ...

Re-exam Time-table

Re-exam time-table are as follow ...

Call for Ph.D

The application form and entrance exam structure can be downloaded ...

Call for M.C.Sc. , M.C.Tech. , M.I.Sc. , M.A.Sc.

The application form and entrance exam structure can be downloaded ...

Ph.D Result

Result for Ph.D (IT) and Ph.D (Research Only) ...


Previous Programming Contest

MCPC 2019

2019 Myanmar Collegiate Programming Contest

Read More

ICPC 2018

2018 ICPC Asia-Yangon Regional Programming Contest

Read More

2018 Myanmar Collegiate Programming Contest

Read More

ICPC 2017

2017 ACM-ICPC Asia-Yangon Regional Programming Contest

Read More

2017 Myanmar Collegiate Programming Contest

Read More

ICPC 2016

2016 Asia-Yangon Regional Programming Contest

Read More

2016 Asia-Yangon National Programming Contest

Read More