Best Practices Advanced

Follow these best practices to write clean, reproducible, and maintainable Jupyter notebooks. Learn about organization, version control, reproducibility, and common pitfalls.

Notebook Organization

Title and overview: Start every notebook with a Markdown cell containing the title, author, date, and purpose
Imports first: Put all imports in the first code cell so dependencies are clear
Constants and configuration: Define constants and paths in a dedicated cell near the top
Logical sections: Use Markdown headings (##) to divide the notebook into clear sections
Functions before usage: Define helper functions before they are called
Clean outputs: Clear all outputs before sharing (Cell → All Output → Clear)

Cell Ordering Discipline

💡

Golden Rule: Your notebook should always run correctly from top to bottom. Before sharing or committing, restart the kernel and run all cells (Kernel → Restart & Run All). If any cell fails, fix the issue before proceeding.

Avoid cells that depend on state created by later cells
Do not rely on running cells multiple times for correct behavior
Delete experimental cells that are no longer needed
The execution counter In [n]: should increase monotonically

Documentation in Notebooks

Add a Markdown cell before each major code section explaining what and why
Use inline comments for non-obvious code logic
Include interpretation of results and visualizations
Document assumptions and limitations
Add a conclusion/summary section at the end

Version Control

Notebooks are JSON files, which makes git diffs noisy. Use these tools:

# nbstripout - Strip output before committing
pip install nbstripout
nbstripout --install  # Adds git filter to strip outputs

# ReviewNB - GitHub app for notebook code review
# Install from github.com/marketplace/review-notebook-app

# nbdime - Notebook-aware diffing and merging
pip install nbdime
nbdime config-git --enable  # Git integration

# Compare notebooks
nbdiff notebook_v1.ipynb notebook_v2.ipynb
nbdiff-web notebook_v1.ipynb notebook_v2.ipynb

✅

Best practice: Add nbstripout as a git pre-commit hook to automatically strip output cells before committing. This keeps diffs clean and repository sizes small.

Reproducibility

Pin dependencies: Include a requirements.txt or environment.yml with exact versions
Set random seeds: Use np.random.seed(42), torch.manual_seed(42) for reproducible results
Document data sources: Specify where data comes from and how to obtain it
Record environment: Print library versions at the start of the notebook
Use relative paths: Avoid hardcoded absolute paths that break on other machines

Converting to Scripts

# Convert notebook to Python script
jupyter nbconvert --to script analysis.ipynb

# Better approach: Extract reusable code into modules
# notebook.ipynb imports from utils.py
# Keep notebooks for exploration, scripts for production

# Project structure:
# project/
# ├── notebooks/
# │   ├── 01_eda.ipynb
# │   ├── 02_modeling.ipynb
# │   └── 03_evaluation.ipynb
# ├── src/
# │   ├── data.py
# │   ├── features.py
# │   └── model.py
# ├── requirements.txt
# └── README.md

Testing Notebooks

nbval: Validate that notebooks run without errors (pytest --nbval notebook.ipynb)
nbmake: Run notebooks as tests in CI/CD (pytest --nbmake notebooks/)
Assertions: Include assert statements to verify intermediate results
CI integration: Run notebooks in GitHub Actions or GitLab CI to catch regressions

Collaboration

Clear outputs before sharing: Reduces file size and avoids merge conflicts
Use JupyterHub: Deploy shared Jupyter servers for team collaboration
Google Colab: For real-time collaboration with Google Docs-like sharing
Code reviews: Use ReviewNB or nbdime for notebook-aware pull request reviews
Naming conventions: Use numbered prefixes (01_eda.ipynb, 02_modeling.ipynb) for order

Common Mistakes

Mistake	Problem	Solution
Out-of-order execution	Hidden state from deleted or reordered cells	Restart kernel and Run All regularly
Giant notebooks	Hard to navigate, slow to load	Split into multiple focused notebooks
No documentation	Impossible to understand weeks later	Add Markdown cells explaining each section
Hardcoded paths	Breaks on other machines	Use relative paths and config variables
Committing outputs	Bloated repo, noisy diffs	Use nbstripout to strip outputs before commit
No error handling	Notebook stops on first error	Add try/except for external dependencies

Frequently Asked Questions

Should I use Jupyter Notebook or JupyterLab?

Use JupyterLab. It is the actively developed interface and provides a superior experience with multi-tab editing, built-in terminals, and a modern extension system. Jupyter Notebook 7.0+ is built on JupyterLab technology anyway.

Are Jupyter Notebooks good for production code?

Notebooks are ideal for exploration, prototyping, and communication. For production, extract reusable code into Python modules and scripts. Use notebooks for EDA and experimentation, scripts for pipelines and deployment.

How do I share a notebook with someone who does not have Jupyter?

Convert to HTML (jupyter nbconvert --to html notebook.ipynb) for a static view. Use nbviewer.org to render notebooks from GitHub URLs. Or share via Google Colab, which only requires a browser.

Can I use Jupyter with languages other than Python?

Yes. Jupyter supports over 40 languages through kernels, including R (IRkernel), Julia (IJulia), JavaScript, C++, Scala, and many more. The name "Jupyter" itself comes from Julia, Python, and R.

How do I make my notebook reproducible?

Pin all dependencies with exact versions, set random seeds, document data sources, use relative paths, and always verify your notebook runs with "Kernel → Restart & Run All" before sharing.

← JupyterLab Course Overview →