The best explanation I have found of this is in the 3 hour presentation by Benjamin Bengfort.
https://youtu.be/itKNpCPHq3I?list=PLOiJc_waA85o9HpyjRnfsK8slfYRmVICF&t=3220
Click on the link above and it should take you to 53:40 in the video where he talks about how this.
Sunday, September 11, 2016
Friday, September 9, 2016
Resources for learning NLTK
Some great resources:
The incredible video series from Harrison.
https://www.youtube.com/watch?v=FLZvOKSCkxY&index=1&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL
This take you through the theory of NLP with NLTK.
And from the same nice chap,
https://pythonprogramming.net/data-analysis-tutorials/
Another very good lecture:
https://www.youtube.com/watch?v=itKNpCPHq3I
District Data Labs exercises from a workshop.
https://github.com/DistrictDataLabs/intro-to-nltk
A talk on product categorisation
https://www.youtube.com/watch?v=Xg8UtTgziZE
The incredible video series from Harrison.
https://www.youtube.com/watch?v=FLZvOKSCkxY&index=1&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL
This take you through the theory of NLP with NLTK.
And from the same nice chap,
https://pythonprogramming.net/data-analysis-tutorials/
Another very good lecture:
https://www.youtube.com/watch?v=itKNpCPHq3I
District Data Labs exercises from a workshop.
https://github.com/DistrictDataLabs/intro-to-nltk
A talk on product categorisation
https://www.youtube.com/watch?v=Xg8UtTgziZE
Some useful NLP Python libraries
NLTK is the first port of call for me. It is the core for all NLP stuff. Even production work.
Text blob https://textblob.readthedocs.io/en/dev/
This uses NLTK under the hood and is a useful API for Nltk.
Pattern http://www.clips.ua.ac.be/pattern This is not Python3 yet. It is a web inning module.
Gensim - http://radimrehurek.com/gensim/ Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.
SciKitLearn
Useful for supervised learning and Classifiers.
MITIE - https://github.com/mit-nlp/MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...
SpaCy
A newish project with a good future. Vector models for text only.
I have left out the screen scraping ones like Beautiful Soup and readability (https://pypi.python.org/pypi/readability-lxml )
There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.
Text blob https://textblob.readthedocs.io/en/dev/
This uses NLTK under the hood and is a useful API for Nltk.
Pattern http://www.clips.ua.ac.be/pattern This is not Python3 yet. It is a web inning module.
Gensim - http://radimrehurek.com/gensim/ Topic modelling. Analyse plain-text documents for semantic structure. Good for unsupervised or topic modelling.
SciKitLearn
Useful for supervised learning and Classifiers.
MITIE - https://github.com/mit-nlp/MITIE Library for information extraction. Written in C++ but callable from R, Python, C, ...
SpaCy
A newish project with a good future. Vector models for text only.
I have left out the screen scraping ones like Beautiful Soup and readability (https://pypi.python.org/pypi/readability-lxml )
There are wrappers for the Stanford CoreNLP and also for he Berkley Parser. I don't think that these parsers are free, though I might be wrong. I've not used them. I remember one of them being free for research purposes only.
Subscribe to:
Posts (Atom)