Best Practices Advanced
Follow these best practices to write clean, reproducible, and maintainable Jupyter notebooks. Learn about organization, version control, reproducibility, and common pitfalls.
Notebook Organization
- Title and overview: Start every notebook with a Markdown cell containing the title, author, date, and purpose
- Imports first: Put all imports in the first code cell so dependencies are clear
- Constants and configuration: Define constants and paths in a dedicated cell near the top
- Logical sections: Use Markdown headings (##) to divide the notebook into clear sections
- Functions before usage: Define helper functions before they are called
- Clean outputs: Clear all outputs before sharing (Cell → All Output → Clear)
Cell Ordering Discipline
- Avoid cells that depend on state created by later cells
- Do not rely on running cells multiple times for correct behavior
- Delete experimental cells that are no longer needed
- The execution counter
In [n]:should increase monotonically
Documentation in Notebooks
- Add a Markdown cell before each major code section explaining what and why
- Use inline comments for non-obvious code logic
- Include interpretation of results and visualizations
- Document assumptions and limitations
- Add a conclusion/summary section at the end
Version Control
Notebooks are JSON files, which makes git diffs noisy. Use these tools:
# nbstripout - Strip output before committing
pip install nbstripout
nbstripout --install # Adds git filter to strip outputs
# ReviewNB - GitHub app for notebook code review
# Install from github.com/marketplace/review-notebook-app
# nbdime - Notebook-aware diffing and merging
pip install nbdime
nbdime config-git --enable # Git integration
# Compare notebooks
nbdiff notebook_v1.ipynb notebook_v2.ipynb
nbdiff-web notebook_v1.ipynb notebook_v2.ipynb
nbstripout as a git pre-commit hook to automatically strip output cells before committing. This keeps diffs clean and repository sizes small.Reproducibility
- Pin dependencies: Include a
requirements.txtorenvironment.ymlwith exact versions - Set random seeds: Use
np.random.seed(42),torch.manual_seed(42)for reproducible results - Document data sources: Specify where data comes from and how to obtain it
- Record environment: Print library versions at the start of the notebook
- Use relative paths: Avoid hardcoded absolute paths that break on other machines
Converting to Scripts
# Convert notebook to Python script
jupyter nbconvert --to script analysis.ipynb
# Better approach: Extract reusable code into modules
# notebook.ipynb imports from utils.py
# Keep notebooks for exploration, scripts for production
# Project structure:
# project/
# ├── notebooks/
# │ ├── 01_eda.ipynb
# │ ├── 02_modeling.ipynb
# │ └── 03_evaluation.ipynb
# ├── src/
# │ ├── data.py
# │ ├── features.py
# │ └── model.py
# ├── requirements.txt
# └── README.md
Testing Notebooks
- nbval: Validate that notebooks run without errors (
pytest --nbval notebook.ipynb) - nbmake: Run notebooks as tests in CI/CD (
pytest --nbmake notebooks/) - Assertions: Include
assertstatements to verify intermediate results - CI integration: Run notebooks in GitHub Actions or GitLab CI to catch regressions
Collaboration
- Clear outputs before sharing: Reduces file size and avoids merge conflicts
- Use JupyterHub: Deploy shared Jupyter servers for team collaboration
- Google Colab: For real-time collaboration with Google Docs-like sharing
- Code reviews: Use ReviewNB or nbdime for notebook-aware pull request reviews
- Naming conventions: Use numbered prefixes (
01_eda.ipynb,02_modeling.ipynb) for order
Common Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| Out-of-order execution | Hidden state from deleted or reordered cells | Restart kernel and Run All regularly |
| Giant notebooks | Hard to navigate, slow to load | Split into multiple focused notebooks |
| No documentation | Impossible to understand weeks later | Add Markdown cells explaining each section |
| Hardcoded paths | Breaks on other machines | Use relative paths and config variables |
| Committing outputs | Bloated repo, noisy diffs | Use nbstripout to strip outputs before commit |
| No error handling | Notebook stops on first error | Add try/except for external dependencies |
Frequently Asked Questions
Should I use Jupyter Notebook or JupyterLab?
Use JupyterLab. It is the actively developed interface and provides a superior experience with multi-tab editing, built-in terminals, and a modern extension system. Jupyter Notebook 7.0+ is built on JupyterLab technology anyway.
Are Jupyter Notebooks good for production code?
Notebooks are ideal for exploration, prototyping, and communication. For production, extract reusable code into Python modules and scripts. Use notebooks for EDA and experimentation, scripts for pipelines and deployment.
How do I share a notebook with someone who does not have Jupyter?
Convert to HTML (jupyter nbconvert --to html notebook.ipynb) for a static view. Use nbviewer.org to render notebooks from GitHub URLs. Or share via Google Colab, which only requires a browser.
Can I use Jupyter with languages other than Python?
Yes. Jupyter supports over 40 languages through kernels, including R (IRkernel), Julia (IJulia), JavaScript, C++, Scala, and many more. The name "Jupyter" itself comes from Julia, Python, and R.
How do I make my notebook reproducible?
Pin all dependencies with exact versions, set random seeds, document data sources, use relative paths, and always verify your notebook runs with "Kernel → Restart & Run All" before sharing.
Lilly Tech Systems