Friday, September 9, 2016

Some useful NLP Python libraries

NLTK is the first port of call for me. It is the core for all NLP stuff. Even production work.

Text blob https://textblob.readthedocs.io/en/dev/
This uses NLTK under the hood and is a useful API for Nltk.

Pattern http://www.clips.ua.ac.be/pattern This is not Python3 yet. It is a web inning module.

Gensimhttp://radimrehurek.com/gensim/ Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.

SciKitLearn
Useful for supervised learning and Classifiers.

MITIEhttps://github.com/mit-nlp/MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...

SpaCy
A newish project with a good future. Vector models for text only.

I have left out the screen scraping ones like Beautiful Soup and readability (https://pypi.python.org/pypi/readability-lxml )

There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.


No comments:

Post a Comment