Skip to content

pyladies-bcn/spacy-workshop

 
 

Repository files navigation

Python & NLP. Introduction to spaCy with Jupyter notebooks

PyladiesBCN April 2019 meetup.

Original source: https://github.com/explosion/spacy-notebooks

Requirements

See installation example

  • python 3
  • ipdb
  • jupyter
  • spacy
  • scipy
  • nltk
  • pandas
  • matplotlib
  • seaborn
  • wordcloud
  • nltk
  • spaCy English en model: python -m spacy download en
  • spaCy English en_core_web_lg model: python -m spacy download en_core_web_lg

Installation example

For example, we can use miniconda to create an isolated Python environment that does not mess with our system Python.

After installing miniconda, create and activate a new python 3.7 envirinment:

$ conda create -n spacy python=3.7
$ conda activate spacy

Let's make sure that we git everything right by checking which python are we using:

$ which python
.../miniconda3/envs/spacy/bin/python

$ python
Python 3.7.3 (default, Mar 27 2019, 16:54:48) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Now lets install the required packages and spaCy models in our new environment:

$ pip install ipdb
$ pip install jupyter
$ pip install spacy
$ pip install scipy
$ pip install nltk
$ pip install pandas
$ pip install matplotlib
$ pip install seaborn
$ pip install wordcloud

$ python -m spacy download en
$ python -m spacy download en_core_web_lg

Lastly, clone the repository and run jupyter notebook within the project folder:

$ git clone https://github.com/pyladies-bcn/spacy-workshop
$ cd spacy-workshop
$ jupyter notebook

NotADirectoryError

There seems to be an open issue with Python 3.7 and Mac, raising the following error:

$ jupyter notebook
NotADirectoryError: [Errno 20] Not a directory: 'xdg-settings'

If this is the case, in jupyter/notebook#3746 they propose this fix:

It's a if/else logic bug in webbrowser.py; a simple fix is to edit /usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/webbrowser.py (or wherever Jupyter is pointing you to) and then look for xdg-settings in the code.

Several lines above (15 in my case), you will have if sys.platform[:3] == "win": if you change it to elif sys.platform[:3] == "win": then the Darwin/Windows/Linux will work properly. I'm running python3.7 from Homebrew using macOS 10.14.2.

Alternative, use this workaround and open the url manually in your browser (including the token):

$ jupyter notebook --no-browser
[I 11:51:02.843 NotebookApp] Serving notebooks from local directory: /.../spacy-workshop
[I 11:51:02.843 NotebookApp] The Jupyter Notebook is running at:
[I 11:51:02.843 NotebookApp] http://localhost:8888/?token=f592e3a9f94783dd85068edd65fd6a3246369e287df41a88
[I 11:51:02.844 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:51:02.849 NotebookApp] 

spacy.displacy.serve raises TypeError

TypeError: init() got an unexpected keyword argument 'encoding'

Solution source: explosion/spaCy#2810

Current workaround:

pip install "msgpack-numpy<0.4.4.0"

slides

Source

Serve Locally and Iterate

jupyter nbconvert 00_jupyter_intro.ipynb --to slides --post serve --SlidesExporter.reveal_theme=serif --SlidesExporter.reveal_scroll=True --SlidesExporter.reveal_transition=none

Brief overview of configuration used :

  • SlidesExporter.reveal_theme: sets the theme to serif. https://github.com/hakimel/reveal.js/tree/master/css/theme has list of themes that ship by default with reveal.js: night, simple, sky, league, blood...
  • SlidesExporter.reveal_scroll: sets the scrolling option to True. For big images or long cells scrolling options are helpful. It’s also helpful for visualizing dataframes.
  • SlidesExporter.reveal_transition: Sets the transition to None. I don’t like to use any transition effect because adding them creates a sort of jerkiness to the screen which I believe to be unsuitable for code. The optins are : none, fade, slide, convex, concave and zoom.

Generate HTML file with the slides

jupyter nbconvert 00_jupyter_intro.ipynb --to slides

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 61.0%
  • Jupyter Notebook 39.0%