Sunday, September 11, 2016

How to use my own text in NLTK?

The best explanation I have found of this is in the 3 hour presentation by Benjamin Bengfort.

https://youtu.be/itKNpCPHq3I?list=PLOiJc_waA85o9HpyjRnfsK8slfYRmVICF&t=3220

Click on the link above and it should take you to 53:40 in the video where he talks about how this.

Friday, September 9, 2016

Resources for learning NLTK

Some great resources:

The incredible video series from Harrison.
https://www.youtube.com/watch?v=FLZvOKSCkxY&index=1&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL
This take you through the theory of NLP with NLTK.

And from the same nice chap,
https://pythonprogramming.net/data-analysis-tutorials/

Another very good lecture:
https://www.youtube.com/watch?v=itKNpCPHq3I

District Data Labs exercises from a workshop.
https://github.com/DistrictDataLabs/intro-to-nltk

A talk on product categorisation
https://www.youtube.com/watch?v=Xg8UtTgziZE

Some useful NLP Python libraries

NLTK is the first port of call for me. It is the core for all NLP stuff. Even production work.

Text blob https://textblob.readthedocs.io/en/dev/
This uses NLTK under the hood and is a useful API for Nltk.

Pattern http://www.clips.ua.ac.be/pattern This is not Python3 yet. It is a web inning module.

Gensimhttp://radimrehurek.com/gensim/ Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.

SciKitLearn
Useful for supervised learning and Classifiers.

MITIEhttps://github.com/mit-nlp/MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...

SpaCy
A newish project with a good future. Vector models for text only.

I have left out the screen scraping ones like Beautiful Soup and readability (https://pypi.python.org/pypi/readability-lxml )

There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.