    UMLS-Interface Package

    This package provides a Perl interface to the Unified Medical Language System (UMLS). The UMLS is a knowledge representation framework encoded designed to support broad scope biomedical research queries. There exists three major sources in the UMLS. The Metathesaurus which is a taxonomy of medical concepts, the Semantic Network which categorizes concepts in the Metathesaurus, and the SPECIALIST Lexicon which contains a list of biomedical and general English terms used in the biomedical domain. The UMLS-Interface package is set up to access the Metathesaurus and the Semantic Network present in a mysql database.

    UMLS-Similarity Package

    This package is a suite of Perl modules that implement a number of semantic similarity measures. The measures use the UMLS-Interface module to access the UMLS to generate similarity scores between concepts. Currently, this package includes programs that implement the path-based similarity measures described by Leacock & Chodorow (1998), Wu & Palmer (1994), Nguyen and Al-Mubaid (2006), Rada, et. al. 1989, and a the path based measure. The information content measures described by Jiang & Conrath (1997), Resnik (1995) and Lin (1998). The relatedness measures described by Lesk (1994) and Patwardhan (2003).

    Ngram Statistics Package (NSP)

    This package is a suite of Perl modules that allows you to identify word and character Ngrams that appear in large corpora using standard tests of association such as Fisher's exact test, the log likelihood ratio, Pearson's chi-squared test, the Dice Coefficient, etc. NSP has been designed to allow a user to add their own tests with minimal effort.


    Xapian-Bigrams is a software package written in Perl. It finds pairs of co-occurring words in text by Xapian. Here is the readme file.


    Co-occurrence-Matrix is a software package written in Perl. It is designed to construct the co-occurrence matrix. Here is the readme file.