Sunday, September 11, 2016

How to use my own text in NLTK?

The best explanation I have found of this is in the 3 hour presentation by Benjamin Bengfort.

Click on the link above and it should take you to 53:40 in the video where he talks about how this.

Friday, September 9, 2016

Resources for learning NLTK

Some great resources:

The incredible video series from Harrison.
This take you through the theory of NLP with NLTK.

And from the same nice chap,

Another very good lecture:

District Data Labs exercises from a workshop.

A talk on product categorisation

Some useful NLP Python libraries

NLTK is the first port of call for me. It is the core for all NLP stuff. Even production work.

Text blob
This uses NLTK under the hood and is a useful API for Nltk.

Pattern This is not Python3 yet. It is a web inning module.

Gensim Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.

Useful for supervised learning and Classifiers.

MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...

A newish project with a good future. Vector models for text only.

I have left out the screen scraping ones like Beautiful Soup and readability ( )

There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.