Research

Text Automatic Summarization

Under construction...

Acromine

Acronyms result from a highly productive type of term variation which substitutes fully expanded terms (e.g., retinoic acid receptor alpha) with shortened term-forms (e.g., RARA). Even though no generic rules or exact patterns have been established for dealing with acronym creation, acronyms often appears in documents without the expanded form explicitly stated. Thus, an acronym dictionary is necessary for advanced text-mining tasks to establish associations between acronyms and their expanded forms.

Acromine is a system for building a good quality acronym dictionary from running text. Assuming a word sequence co-occurring frequently with a parenthetical expression to be a potential expanded form, Acromine identifies acronym definitions in a similar manner to a statistical term recognition. Applied to the whole MEDLINE (7,811,582 abstracts) as of March 2006, Acromine extracted 920,425 acronym candidates and recognized 157,803 expanded forms in reasonable time (ca. 12 hours on a personal computer). This system achieves 99% precision and 82–95% recall on our evaluation corpus that roughly emulates the whole MEDLINE.

Acromine Web site