How to Organize Jupyter Notebooks in a Machine Learning Repo

Machine learning repositories quickly become chaotic without proper organization. Jupyter notebooks multiply as teams explore data, experiment with features, train models, and analyze results. Within weeks, a repository can contain dozens of notebooks with names like notebook_final_v2_actually_final.ipynb, test123.ipynb, and Untitled47.ipynb—making it nearly impossible to understand the project’s structure or reproduce past results. This organizational debt compounds over time, slowing development and creating barriers for new team members.

Effective notebook organization transforms repositories from confusing collections into well-structured projects where anyone can quickly locate relevant analyses, understand the workflow, and build upon previous work. This comprehensive guide presents battle-tested strategies for organizing Jupyter notebooks in machine learning repositories, covering directory structures, naming conventions, notebook lifecycle management, and documentation practices that scale from solo projects to large team environments.

Establishing a Clear Directory Structure

A well-designed directory structure provides the foundation for notebook organization. The structure should separate concerns, indicate workflow progression, and make it obvious where different types of notebooks belong. Rather than dumping all notebooks in a single directory, create a hierarchy that reflects your ML workflow stages.

The Core Directory Layout: Start with a top-level structure that separates notebooks from code, data, and models:

ml-project/
├── notebooks/          # All Jupyter notebooks
├── src/                # Production Python code
├── data/               # Data storage (often gitignored)
├── models/             # Trained models
├── reports/            # Generated reports and figures
├── tests/              # Unit and integration tests
├── configs/            # Configuration files
├── docs/               # Documentation
├── requirements.txt    # Dependencies
└── README.md          # Project overview

This separation prevents notebooks from cluttering code directories while keeping all related materials accessible. The notebooks/ directory becomes the dedicated space for exploratory and experimental work, clearly distinguished from production code in src/.

Workflow-Based Notebook Organization: Within the notebooks/ directory, organize by workflow stages rather than by person or date. Machine learning projects typically follow a progression from data exploration through model deployment:

notebooks/
├── 01_exploration/     # Initial data exploration
├── 02_preprocessing/   # Data cleaning and preparation
├── 03_features/        # Feature engineering experiments
├── 04_modeling/        # Model training and selection
├── 05_evaluation/      # Model evaluation and analysis
├── 06_experiments/     # Specific experiments and ablations
├── archive/            # Deprecated or superseded notebooks
└── templates/          # Reusable notebook templates

This structure immediately communicates project progression. New team members can start at 01_exploration/ to understand the data, then move through subsequent directories following the analysis journey. The numbering prefix enforces ordering while keeping directory names descriptive.

Each workflow directory should contain notebooks focused on that specific stage. The 02_preprocessing/ directory might include 01_missing_values.ipynb, 02_outlier_detection.ipynb, and 03_normalization.ipynb. This granular organization makes it easy to find specific analyses without wading through unrelated content.

The Archive Strategy: The archive/ directory provides a home for outdated notebooks without deleting them. When you supersede an approach or find a better solution, move the old notebook to the archive rather than deleting it. This preserves institutional knowledge—you might need to reference why certain approaches didn’t work or recover specific visualizations from earlier analyses.

Add a brief README in the archive explaining what’s there and why notebooks were archived. This context prevents future confusion and helps people avoid repeating failed experiments.

Implementing Effective Naming Conventions

Consistent naming conventions transform random collections of notebooks into organized libraries where you can find what you need by name alone. Good names communicate content, sequence, and status without requiring you to open files.

The Numbered Prefix Pattern: Use numbered prefixes to indicate sequence within directories. This ensures notebooks appear in logical order when listed alphabetically:

03_features/
├── 01_baseline_features.ipynb
├── 02_temporal_features.ipynb
├── 03_interaction_features.ipynb
├── 04_feature_importance_analysis.ipynb
└── 05_final_feature_set.ipynb

The numbering makes the intended sequence obvious and prevents tools from listing notebooks in unhelpful orders. When notebooks have dependencies—one requires outputs from another—the numeric ordering documents these relationships.

Descriptive Names with Context: After the numeric prefix, use descriptive names that communicate the notebook’s purpose. Avoid generic names like analysis.ipynb or test.ipynb. Instead, be specific:

01_explore_customer_churn_data.ipynb (not exploration.ipynb)
03_random_forest_hyperparameter_tuning.ipynb (not model.ipynb)
02_handle_missing_timestamps.ipynb (not preprocessing.ipynb)

Specific names enable quick searches and make it immediately clear what each notebook contains. If you can’t create a descriptive name, the notebook might be trying to do too much—consider splitting it.

Status Indicators in Names: For experimental or work-in-progress notebooks, include status indicators:

04_modeling/
├── 01_baseline_logistic_regression.ipynb
├── 02_random_forest_WIP.ipynb
├── 03_xgboost_DRAFT.ipynb
└── 04_neural_network_BROKEN.ipynb

Status indicators (WIP, DRAFT, BROKEN, EXPERIMENTAL) signal that notebooks aren’t finalized and might not run successfully. Once completed and verified, remove the status indicator. This prevents confusion about which notebooks represent validated work.

Date Prefixes for Time-Series Work: When notebooks represent repeated analyses over time—like monthly reports or periodic retraining—include dates in names:

05_evaluation/
├── 2024_01_model_performance_report.ipynb
├── 2024_02_model_performance_report.ipynb
├── 2024_03_model_performance_report.ipynb

The YYYY_MM format sorts chronologically and makes it easy to identify the latest analysis. For weekly reports, use YYYY_MM_DD format.

Notebook Naming Best Practices

❌ Poor Naming

• Untitled12.ipynb
• test.ipynb
• notebook_final.ipynb
• analysis_copy_2.ipynb
• stuff.ipynb

✅ Good Naming

• 01_explore_churn_data.ipynb
• 03_test_lstm_architecture.ipynb
• 05_final_model_evaluation.ipynb
• 02_feature_correlation_v2.ipynb
• 04_hyperparameter_search.ipynb

Managing Notebook Lifecycle and Evolution

Notebooks evolve as projects progress—early exploratory notebooks become outdated as better approaches emerge, experiments succeed or fail, and analyses get refined. Managing this lifecycle prevents repositories from becoming graveyards of obsolete work.

The Promotion Pattern: Establish a clear path for moving notebook code into production. As notebook code matures and stabilizes, extract it into reusable Python modules in the src/ directory:

# src/preprocessing.py
def clean_customer_data(df):
    """Clean customer dataset, handling missing values and outliers.
    
    This function implements the cleaning strategy developed in
    notebooks/02_preprocessing/01_missing_values.ipynb
    """
    df = df.dropna(subset=['customer_id', 'signup_date'])
    df = df[df['age'].between(18, 100)]
    df = df[df['revenue'] >= 0]
    return df

Notebooks then import these functions rather than containing the implementation:

# In notebook
from src.preprocessing import clean_customer_data

df_clean = clean_customer_data(df_raw)

This pattern keeps notebooks focused on exploration and visualization while building a library of production-ready code. Document in notebook markdown cells which functions correspond to which analysis, creating a clear connection between exploration and production code.

Versioning Exploratory Work: When iterating on an approach, create new versions rather than overwriting previous notebooks. If you’re experimenting with different feature engineering strategies:

03_features/
├── 01_temporal_features_v1.ipynb
├── 02_temporal_features_v2.ipynb
├── 03_temporal_features_v3.ipynb
└── 04_temporal_features_final.ipynb

Version numbers preserve the evolution of your thinking. The v3 notebook might refer back to why the v1 approach failed, and having all versions accessible enables this documentation. Once you’ve settled on a final approach, mark it clearly in the name.

Alternatively, use Git branches for significant experimental divergences. Create a branch, conduct experiments in notebooks, and if successful, merge back to main. This approach works well for substantial explorations but adds Git complexity that version-suffixed files avoid.

Cleaning Up Dead Ends: Not every notebook leads somewhere useful. When experiments fail or approaches prove unworkable, don’t leave abandoned notebooks in active directories. Move them to the archive/ directory with a brief note explaining what was tried and why it didn’t work:

# notebooks/archive/README.md

## Archived Notebooks

### neural_network_embeddings.ipynb
Attempted to use neural network embeddings for categorical features.
Abandoned because:
- Required too much training time (>8 hours per epoch)
- Didn't improve performance over target encoding
- Added significant complexity for minimal gain
Archived: 2024-03-15

This documentation prevents others from repeating failed experiments while preserving institutional knowledge about what doesn’t work.

Creating Documentation and Navigation

Even well-organized repositories need documentation that helps people navigate and understand the structure. Good documentation serves as onboarding material for new team members and refreshes memory when returning to old projects.

The Master README: Create a comprehensive README in the notebooks/ directory that serves as a map:

# Notebooks Overview

This directory contains all Jupyter notebooks for the customer churn prediction project.

## Directory Structure

- **01_exploration/**: Initial data exploration and understanding
- **02_preprocessing/**: Data cleaning and preparation pipelines
- **03_features/**: Feature engineering experiments
- **04_modeling/**: Model training and selection
- **05_evaluation/**: Performance analysis and validation

## Getting Started

1. Start with `01_exploration/01_data_overview.ipynb` for dataset introduction
2. Review preprocessing in `02_preprocessing/` to understand data cleaning
3. See `03_features/04_feature_importance_analysis.ipynb` for final feature set
4. Current best model is in `04_modeling/05_final_xgboost_model.ipynb`

## Key Findings

- Customer tenure is the strongest predictor of churn
- Contract type significantly impacts retention
- XGBoost outperforms logistic regression by 12% on F1 score

## Running Notebooks

All notebooks assume data is available in `../data/processed/`.
Run notebooks in sequence within each directory for best results.

This README provides entry points, highlights key results, and explains prerequisites. Update it as the project evolves to keep it current.

Individual Notebook Headers: Each notebook should begin with a standard header cell explaining its purpose and context:

# Temporal Feature Engineering (Version 2)

**Purpose**: Engineer time-based features from customer transaction history

**Inputs**: 
- `data/processed/transactions.csv`
- `data/processed/customers.csv`

**Outputs**:
- `data/processed/temporal_features.csv`

**Key Changes from v1**:
- Added rolling window features (7, 30, 90 days)
- Fixed bug in recency calculation
- Added seasonality indicators

**Status**: Complete and validated

**Last Updated**: 2024-03-20

This header makes notebooks self-documenting. Anyone opening the notebook immediately understands its role without reading through code.

Dependency Documentation: When notebooks depend on outputs from other notebooks, document these dependencies explicitly:

## Dependencies

This notebook requires:
1. Run `01_exploration/02_data_quality_check.ipynb` first
2. Outputs from `02_preprocessing/03_normalization.ipynb`
3. Feature definitions in `configs/feature_config.yaml`

If data is missing, these notebooks will generate required files.

Clear dependency documentation prevents frustration from missing inputs and enables proper workflow sequencing.

Implementing Notebook Templates

Templates standardize structure across notebooks, reducing cognitive load and ensuring consistency. When all notebooks follow the same pattern, team members can quickly orient themselves regardless of which notebook they open.

Creating a Standard Template: Develop a template that includes common sections:

# Cell 1: Header (Markdown)
"""
# [Notebook Title]
**Purpose**: [Clear description]
**Created**: [Date]
**Author**: [Name]
"""

# Cell 2: Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set styles
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Cell 3: Configuration
DATA_DIR = Path("../data/processed")
OUTPUT_DIR = Path("../reports/figures")
RANDOM_SEED = 42

# Cell 4: Data Loading
# Load data here

# Cell 5+: Analysis sections with markdown headers

# Final Cell: Summary (Markdown)
"""
## Summary
- Key finding 1
- Key finding 2
- Next steps
"""

Store templates in notebooks/templates/ so team members can copy them when creating new notebooks. This standardization makes notebooks more professional and easier to review.

Parameterization Strategy: Design templates to support parameterization using tools like Papermill. Place all configuration in clearly marked cells that can be programmatically modified:

# Parameters (this cell is tagged 'parameters' for Papermill)
start_date = "2024-01-01"
end_date = "2024-03-31"
model_type = "random_forest"
test_size = 0.2

Parameterized notebooks can be executed with different settings, enabling automated reporting and experimentation without manual editing.

Handling Large Notebooks and Code Reuse

As analyses grow complex, notebooks become unwieldy. Strategies for managing complexity keep individual notebooks focused and maintainable.

The Notebook Size Guideline: Keep notebooks under 50-75 cells when possible. Beyond this threshold, notebooks become difficult to navigate and slow to load. If a notebook exceeds this size, consider splitting it:

Extract helper functions to src/ modules
Split into multiple notebooks covering different aspects
Move detailed exploratory work to appendix notebooks

Shared Utilities Module: Create a src/utils.py module for functions used across multiple notebooks:

# src/utils.py
import pandas as pd
from pathlib import Path

def load_dataset(name, data_dir="../data/processed"):
    """Load a dataset with consistent error handling."""
    path = Path(data_dir) / f"{name}.csv"
    if not path.exists():
        raise FileNotFoundError(f"Dataset {name} not found at {path}")
    return pd.read_csv(path)

def save_figure(fig, name, output_dir="../reports/figures"):
    """Save figure with consistent settings."""
    path = Path(output_dir)
    path.mkdir(parents=True, exist_ok=True)
    fig.savefig(path / f"{name}.png", dpi=300, bbox_inches='tight')
    print(f"Saved figure to {path / name}.png")

Import these utilities in notebooks, keeping notebook cells focused on analysis rather than boilerplate:

from src.utils import load_dataset, save_figure

df = load_dataset("customers")
# Analysis code...
save_figure(fig, "customer_distribution")

Cross-Notebook Communication: When notebooks need to share intermediate results, use a consistent approach for saving and loading:

# In notebook A: Save intermediate results
processed_df.to_parquet("../data/interim/processed_features.parquet")

# In notebook B: Load intermediate results
processed_df = pd.read_parquet("../data/interim/processed_features.parquet")

Document these intermediate artifacts in README files so others understand what’s available and how to use it. Consider using a manifest file that lists all intermediate outputs with descriptions:

# data/interim/manifest.yaml
processed_features:
  file: processed_features.parquet
  created_by: notebooks/03_features/04_final_feature_set.ipynb
  description: Customer features after all engineering steps
  updated: 2024-03-20

Version Control Considerations

Organizing notebooks for version control requires additional considerations beyond directory structure and naming.

Clearing Outputs Before Commits: Notebook outputs—especially images and large dataframes—bloat repository size and create unnecessary merge conflicts. Configure automatic output clearing using nbstripout:

pip install nbstripout
nbstripout --install --attributes .gitattributes

This ensures all committed notebooks have cleared outputs, keeping the repository clean and focused on code rather than results.

The .gitignore Strategy: Create a comprehensive .gitignore that excludes appropriate files:

# Jupyter
.ipynb_checkpoints/
*/.ipynb_checkpoints/*

# Data (large files shouldn't be in Git)
data/raw/
data/processed/
data/interim/

# Models (use Git LFS or external storage)
models/*.pkl
models/*.h5

# Outputs
reports/figures/
*.png
*.jpg

This prevents accidentally committing large binary files that don’t belong in version control.

Commit Message Standards: Use descriptive commit messages for notebook changes:

“Add temporal feature engineering notebook”
“Fix bug in preprocessing pipeline (notebook 02)”
“Update model evaluation with cross-validation”

Generic messages like “Update notebook” provide no value. Specific messages help understand project evolution through Git history.

Conclusion

Organizing Jupyter notebooks in machine learning repositories transforms chaotic collections into well-structured projects that facilitate collaboration, reproducibility, and knowledge sharing. By implementing workflow-based directory structures, consistent naming conventions, clear lifecycle management, comprehensive documentation, and standardized templates, teams can maintain order even as projects grow in complexity and scope.

The investment in organization pays dividends throughout a project’s lifetime—onboarding becomes faster, analyses are easier to find and understand, and the repository serves as documentation of the entire ML development process. Start with these organizational principles early in your project, and you’ll avoid the technical debt and confusion that plague poorly organized repositories.