Natural language processing

We do research to help the people with the help of, Natural Language Processing.

Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

Natural Language Processing is the driving force behind the following common applications:

  • Language translation applications such as Google Translate
  • Word Processors such as Microsoft Word and Grammarly that employ NLP to check grammatical accuracy of texts.
  • Interactive Voice Response (IVR) applications used in call centers to respond to certain users’ requests.
  • Personal assistant applications such as OK Google, Siri, Cortana, and Alexa.

About Us

From Department of Computer Science and Engineering,University of Moratuwa

Aaivu organization is an opensource organization founded by Dr.Uthayasanker Thayasivam, at the Department of Computer Science and Engineering, University of Moratuwa to enhance and enrich the lives of the community who will benefit from NLP.

Our objectives

  • We engage in NLP based projects to adapt the projects that were done for languages such as English to local languages such as Tamil and Sinhala.
  • We use techniques such as word embedding to understand and analyse the languages and implement them in a useful manner.
  • We do Data Mining to support these ventures.


Tesseract Quality Checker API

This API is used to check the quality of the captured image for tesseract OCR. Tesseract OCR works best when there is a clear segmentation of the foreground text from the background. It is easy to instruct the user get a best-fit image rather than doing preprocessing on the images.

English to Sinhala Neural Machine Translation

This research is about developing a NMT system using Transformer architecture for the under-resourced, domain-specific English to Sinhala translation task. The translation quality is improved by exploring effective ways of incorporating Part-of-Speech (POS) information and subword techniques.

Dialogue policy optimization in low resource setting

The dialogue policy optimization in task oriented concversational agents in low resource setting is a open research project. We have develop a novel approach for dialogue policy optimization using Rienforcement Learning. The methodology is based on Self-play and a novel sampling technique that prioratizes failed dialogues over successful ones.

Ride Hailing Simulation

A data-driven approach to model a simulation environment.

High School Strategizer

Data-Driven instruction strategies for Sri Lankan high schools.

Semantic Table Interpretation

Semantic Table Interpretation is the use of external knowledge bases or ontologies to provide context to tabular data sources. Data usually loses its context when converted into tabular structures. Hence, mapping these data back into its original context is non trivial. Since table data is one of most widespread structures in use, the information loss is significant. In this project, we introduce a two novel algorithms, ReleX and STEM, to derive meaning full relationships between table columns and to identify table entities with context using web ontologies.

Tamizhi Net OCR

Tamizhi-Net OCR is a tool that extract text from scanned Pdf/Image. The system covers Tamil, Sinhala and English languages.

How to contribute?

We welcome all the opensource contributors and we hope you stay with us throughout our journey.


Visit our GitHub Introduction page to learn how to contribute and pick up some interesting projects


Mail us regarding any issues at

Talk forum

If you have any community related questions, please ask in the Aaivu Talk forum

Help Desk

Please reach us via this Help Desk form to request access to the documentation for edit/write access and Github write access

Our Pioneers

Frequently Asked Questions