Friday, September 9, 2016

Some useful NLP Python libraries

NLTK is the first port of call for me. It is the core for all NLP stuff. Even production work.

Text blob
This uses NLTK under the hood and is a useful API for Nltk.

Pattern This is not Python3 yet. It is a web inning module.

Gensim Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.

Useful for supervised learning and Classifiers.

MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...

A newish project with a good future. Vector models for text only.

I have left out the screen scraping ones like Beautiful Soup and readability ( )

There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.

No comments:

Post a Comment