Best Practices Advanced

Follow these best practices to write clean, reproducible, and maintainable Jupyter notebooks. Learn about organization, version control, reproducibility, and common pitfalls.

Notebook Organization

  • Title and overview: Start every notebook with a Markdown cell containing the title, author, date, and purpose
  • Imports first: Put all imports in the first code cell so dependencies are clear
  • Constants and configuration: Define constants and paths in a dedicated cell near the top
  • Logical sections: Use Markdown headings (##) to divide the notebook into clear sections
  • Functions before usage: Define helper functions before they are called
  • Clean outputs: Clear all outputs before sharing (Cell → All Output → Clear)

Cell Ordering Discipline

💡
Golden Rule: Your notebook should always run correctly from top to bottom. Before sharing or committing, restart the kernel and run all cells (Kernel → Restart & Run All). If any cell fails, fix the issue before proceeding.
  • Avoid cells that depend on state created by later cells
  • Do not rely on running cells multiple times for correct behavior
  • Delete experimental cells that are no longer needed
  • The execution counter In [n]: should increase monotonically

Documentation in Notebooks

  • Add a Markdown cell before each major code section explaining what and why
  • Use inline comments for non-obvious code logic
  • Include interpretation of results and visualizations
  • Document assumptions and limitations
  • Add a conclusion/summary section at the end

Version Control

Notebooks are JSON files, which makes git diffs noisy. Use these tools:

# nbstripout - Strip output before committing
pip install nbstripout
nbstripout --install  # Adds git filter to strip outputs

# ReviewNB - GitHub app for notebook code review
# Install from github.com/marketplace/review-notebook-app

# nbdime - Notebook-aware diffing and merging
pip install nbdime
nbdime config-git --enable  # Git integration

# Compare notebooks
nbdiff notebook_v1.ipynb notebook_v2.ipynb
nbdiff-web notebook_v1.ipynb notebook_v2.ipynb
Best practice: Add nbstripout as a git pre-commit hook to automatically strip output cells before committing. This keeps diffs clean and repository sizes small.

Reproducibility

  • Pin dependencies: Include a requirements.txt or environment.yml with exact versions
  • Set random seeds: Use np.random.seed(42), torch.manual_seed(42) for reproducible results
  • Document data sources: Specify where data comes from and how to obtain it
  • Record environment: Print library versions at the start of the notebook
  • Use relative paths: Avoid hardcoded absolute paths that break on other machines

Converting to Scripts

# Convert notebook to Python script
jupyter nbconvert --to script analysis.ipynb

# Better approach: Extract reusable code into modules
# notebook.ipynb imports from utils.py
# Keep notebooks for exploration, scripts for production

# Project structure:
# project/
# ├── notebooks/
# │   ├── 01_eda.ipynb
# │   ├── 02_modeling.ipynb
# │   └── 03_evaluation.ipynb
# ├── src/
# │   ├── data.py
# │   ├── features.py
# │   └── model.py
# ├── requirements.txt
# └── README.md

Testing Notebooks

  • nbval: Validate that notebooks run without errors (pytest --nbval notebook.ipynb)
  • nbmake: Run notebooks as tests in CI/CD (pytest --nbmake notebooks/)
  • Assertions: Include assert statements to verify intermediate results
  • CI integration: Run notebooks in GitHub Actions or GitLab CI to catch regressions

Collaboration

  • Clear outputs before sharing: Reduces file size and avoids merge conflicts
  • Use JupyterHub: Deploy shared Jupyter servers for team collaboration
  • Google Colab: For real-time collaboration with Google Docs-like sharing
  • Code reviews: Use ReviewNB or nbdime for notebook-aware pull request reviews
  • Naming conventions: Use numbered prefixes (01_eda.ipynb, 02_modeling.ipynb) for order

Common Mistakes

MistakeProblemSolution
Out-of-order executionHidden state from deleted or reordered cellsRestart kernel and Run All regularly
Giant notebooksHard to navigate, slow to loadSplit into multiple focused notebooks
No documentationImpossible to understand weeks laterAdd Markdown cells explaining each section
Hardcoded pathsBreaks on other machinesUse relative paths and config variables
Committing outputsBloated repo, noisy diffsUse nbstripout to strip outputs before commit
No error handlingNotebook stops on first errorAdd try/except for external dependencies

Frequently Asked Questions

Should I use Jupyter Notebook or JupyterLab?

Use JupyterLab. It is the actively developed interface and provides a superior experience with multi-tab editing, built-in terminals, and a modern extension system. Jupyter Notebook 7.0+ is built on JupyterLab technology anyway.

Are Jupyter Notebooks good for production code?

Notebooks are ideal for exploration, prototyping, and communication. For production, extract reusable code into Python modules and scripts. Use notebooks for EDA and experimentation, scripts for pipelines and deployment.

How do I share a notebook with someone who does not have Jupyter?

Convert to HTML (jupyter nbconvert --to html notebook.ipynb) for a static view. Use nbviewer.org to render notebooks from GitHub URLs. Or share via Google Colab, which only requires a browser.

Can I use Jupyter with languages other than Python?

Yes. Jupyter supports over 40 languages through kernels, including R (IRkernel), Julia (IJulia), JavaScript, C++, Scala, and many more. The name "Jupyter" itself comes from Julia, Python, and R.

How do I make my notebook reproducible?

Pin all dependencies with exact versions, set random seeds, document data sources, use relative paths, and always verify your notebook runs with "Kernel → Restart & Run All" before sharing.