Summary of my research: :-)
Sentiment Analysis and Opinion Mining
Currently working on improving sentiment analysis by combining the application of sentiment lexicons with linguistic knowledge such as modality and negation analysis. Contributed (a little bit) to the Hedonometer project, which computes daily happiness on Twitter.
Scientometrics of Linguistics
Just for fun, I'm working on an analysis of the authorship network in (formal) linguistic publications to identify the main hubs, topics and researchers.
Study of language change and code-switching through Persian blogs.
Women bloggers in Iran
Investigating the formation of blogging communities by Iranian women through Social Network Analysis. Detecting the main trends and issues discussed online, identifying sentiment on those topics, investigating the use of rhetoric and metaphors, and studying in-group language features.
Tajiki Persian Machine TranslationA finite-state transducer that converts Tajiki Persian text (in Cyrillic) to Iranian Persian script (Perso-Arabic) and runs the resulting transliterated document through an existing Persian-to-English MT system. We use this strategy for the rapid prototyping of MT for the low-resource Tajiki language.
Past Conference Organizations- Third Workshop on Computational Approaches to Arabic Script-based Languages; MT Summit XII, Ottawa, August 2009
- International Conference on Complex Predicates in Iranian Languages; Paris, July 2008
Shiraz Project (1997-1999)I was the computational linguist responsible for the development of the Shiraz machine translation system at the Computing Research Lab (CRL) in New Mexico State University. The Shiraz project was a MT prototype developed at CRL that translated Persian text into English and used typed feature structures and an underlying unification-based formalism to describe Persian linguistic phenomena. It used an electronic bilingual Persian to English dictionary consisting of approximately 50,000 terms, a complete morphological analyzer, a syntactic parser as well as transfer and generation modules. The system components were tested on a bilingual tagged corpus developed from a large Persian corpus of on-line material (approximately 10MB). The machine translation system is mainly targeted at translating news material.
Coverage: Tokenization and full morphological analysis. Compounds and light verbs were also recognized. The syntactic parser could analyze noun phrases (including relative clauses), preposition phrases and basic sentential constructions. The resulting feature structures were transferred into English syntax and morphological generation was performed on the final translations. The dictionary was built by a team of Persian lexicographers and included single words, compounds and phrasal expressions. It contained information about the orthography, morphosyntactic category and syntactic properties of lexical items as well as the English word-sense equivalents.
Detailed write-ups from the Shiraz project can be found under the publications page: technical reports. However, CRL does not exist anymore and the project components are not available.