README

SU DATA-401 Introduction to NLP Spring 2026

The course website is located here. Lecture materials, assignments, quizzes, etc. can be accessed at that link. You will need an API key to submit notebooks and that will be provided to you via email.

This site contains jupyter notebooks, data, and other code artifacts associated with this course.

Choosing a Notebook Environment

Most work will not require the use of GPUs. You can probably get away with not using them at all, unless you have a particular desire to do so.

Google Colab - single notebook experience

If you prefer:

  • working within a single notebook
  • are already comfortable with Google Colab
  • don’t mind re-installing dependencies on re-start
  • need access to GPUs

you may prefer Google Colab.

Deepnote

If you prefer:

  • easy install, more persistence of dependencies
  • large number of system integrations
  • Dataframe charts, interactive widgets, dashboards, app deployment
  • realtime collaboration

you may prefer Deepnote.

You will need to create a free account and then request an education plan. To use GPUs or higher performance machines, you must add a payment method - but you do not need to upgrade the plan.

All students will be given links to deepnote for labs.

Local JupyterLab / Notebook

If you are already comfortable in Jupyter in your local environment and:

  • you want full control of your machine and environment
  • persistence of dependencies
  • and don’t mind dealing with management of your environment

you may prefer local Jupyter. The downside is that there is no GPU access unless you know how to set up something like a remote modal function that uses GPU.

Installation

For Students (Google Colab)

To use Colab and submit for credit:

  • Download a notebook from GitHub
  • Upload a local copy of the notebook to Colab
  • Save a copy in Drive
  • Ensure the file name matches the variable NOTEBOOK_NAME in the section “Submit Notebook for Credit”.

Saving to Drive and matching the filename are only required if you are submitting for credit.

You will need to add the SUBMIT_API_KEY to environmental variables.

For Deepnote

Every week, there will be new link posted for a Deepnote project. At least the first time, when you click on the link you will be asked to login or sign up to see the project. If you sign up, you’ll get a free 14-trial of the Team plan, and from there you can request the education plan.

  • When the project opens, click Duplicate (top right).
  • This creates your own private copy of the lab.
  • You will need to add the SUBMIT_API_KEY to environmental variables.

For Local Development

If you want to run notebooks locally:

# Clone the repository
git clone https://github.com/su-dataAI/data401-nlp.git
cd data401-nlp

# If you don't have uv you can:
#curl -LsSf https://astral.sh/uv/install.sh | sh (macOS/Linux) or pip install uv as a fallback

# Create a virtual environment using uv (requires Python 3.11+)
# If you want to use a 3.13+, you will need to upgrade torch to torch>=2.1,<2.6
uv venv --python 3.11

# Activate the virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install with all dependencies
uv pip install -e ".[dev,all]"

# Download spaCy model
python -m spacy download en_core_web_sm

# Start Jupyter Lab
jupyter lab

# Add .env file (root or nbs folder)

You will need to git pull when each new lab is posted.

Installation Options

The package supports flexible installation based on your needs:

# Minimal installation (core utilities only)
pip install data401-nlp

# With NLP tools (spaCy, NLTK)
pip install data401-nlp[nlp]

# With transformers and PyTorch
pip install data401-nlp[transformers]

# With API support (FastAPI, Pydantic)
pip install data401-nlp[api]

# Everything (recommended for students)
pip install data401-nlp[all]

Platform Support

✅ Google Colab
✅ Deepnote
✅ Jupyter Lab
✅ Local Python 3.11+

Helper Modules

The package includes several helper modules to make your NLP work easier:

  • data401_nlp.helpers.env - Environment detection and API key loading
  • data401_nlp.helpers.spacy - Automatic spaCy model management
  • data401_nlp.helpers.submit - Assignment submission utilities
  • data401_nlp.helpers.llm - LLM integration helpers

The helper libraries may be updated as the course proceeds.