NLTK is the first port of call for me. It is the core for all NLP stuff. Even production work.
Text blob https://textblob.readthedocs.io/en/dev/
This uses NLTK under the hood and is a useful API for Nltk.
Pattern http://www.clips.ua.ac.be/pattern This is not Python3 yet. It is a web inning module.
Gensim - http://radimrehurek.com/gensim/ Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.
SciKitLearn
Useful for supervised learning and Classifiers.
MITIE - https://github.com/mit-nlp/MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...
SpaCy
A newish project with a good future. Vector models for text only.
I have left out the screen scraping ones like Beautiful Soup and readability (https://pypi.python.org/pypi/readability-lxml )
There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.
No comments:
Post a Comment