How to Convert Jupyter Notebook to Python Script for Production

Jupyter notebooks are phenomenal for exploration, prototyping, and communicating results. But when it’s time to move your work to production, that beautifully interactive notebook becomes a liability. Production systems need reliable, testable, modular code that can run without a browser interface—and notebooks simply weren’t designed for that. I’ve seen too many teams struggle with this transition, either running notebooks in production (a maintenance nightmare) or spending weeks manually rewriting code that could have been systematically converted.

The challenge isn’t just about file conversion—it’s about transforming exploratory code into production-quality software. This means removing cell-by-cell execution dependencies, eliminating visualizations that clutter logs, extracting hardcoded values into configurations, and restructuring code into reusable functions. The good news is that with a systematic approach, you can make this transition efficiently while maintaining the logic you’ve carefully developed in your notebook.

Understanding the Fundamental Differences

Before diving into conversion techniques, you need to understand what makes notebook code different from production scripts. This awareness guides how you approach the conversion process.

Execution Model Differences

Notebooks execute in cells, often out of order, with state persisting between executions. This creates hidden dependencies that break when you convert to a linear script:

python

# In a notebook, these cells might be executed in any order
# Cell 1
data = load_data()

# Cell 3 (executed before Cell 2)
results = process_data(cleaned_data)

# Cell 2 (executed after Cell 3)
cleaned_data = clean_data(data)

This code “works” in a notebook if you execute cells in the right order, but fails as a linear script. Production scripts must have clear, top-to-bottom execution flow with explicit dependencies.

Interactive Elements That Don’t Translate

Notebooks contain elements that have no place in production:

  • Inline visualizations: plt.show(), df.head(), interactive plots
  • Progress bars: Widgets and tqdm displays designed for visual feedback
  • Print debugging: Excessive print() statements for exploration
  • Magic commands: %matplotlib inline, %load_ext, !pip install
  • Display helpers: display(), IPython.display objects

These need removal or transformation for production deployment.

Code Organization Philosophy

Notebooks encourage a linear narrative structure—you tell a story from data loading through results. Production code demands modular organization with clear separations of concerns, making code testable and reusable.

Notebook vs Production Script: Key Differences

❌ Notebook Code

  • Cell-by-cell execution
  • Global state management
  • Hardcoded parameters
  • Inline visualizations
  • Exploratory print statements
  • Linear narrative flow
  • No formal error handling

✓ Production Script

  • Top-to-bottom execution
  • Encapsulated functions
  • Configuration files/arguments
  • Logging instead of prints
  • Structured logging
  • Modular organization
  • Robust error handling

Basic Conversion: Using Built-in Tools

Let’s start with the mechanical conversion—turning a .ipynb file into a .py file. Several tools handle this, each with different strengths.

Using Jupyter’s nbconvert

The most straightforward method uses Jupyter’s built-in conversion tool:

bash

# Basic conversion
jupyter nbconvert --to script notebook.ipynb

# This creates notebook.py with all code cells

By default, this creates a script with all code cells concatenated, including markdown as comments. While quick, the output needs significant cleanup.

Enhanced conversion with options:

bash

# Remove markdown cells entirely
jupyter nbconvert --to script notebook.ipynb --no-prompt

# Specify output filename
jupyter nbconvert --to script notebook.ipynb --output production_script.py

# Convert multiple notebooks
jupyter nbconvert --to script *.ipynb

The --no-prompt flag removes input/output prompts like [1]:, making the code cleaner.

Using nb_convert Programmatically

For more control, use nbconvert as a Python library:

python

import nbformat
from nbconvert import PythonExporter

def convert_notebook(notebook_path, output_path):
    """Convert notebook to Python script programmatically"""
    # Read the notebook
    with open(notebook_path, 'r', encoding='utf-8') as f:
        notebook = nbformat.read(f, as_version=4)
    
    # Convert to Python
    exporter = PythonExporter()
    python_code, resources = exporter.from_notebook_node(notebook)
    
    # Write to file
    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(python_code)
    
    print(f"Converted {notebook_path} to {output_path}")

convert_notebook('analysis.ipynb', 'analysis.py')

This approach allows you to customize the conversion process, filtering cells or adding preprocessing steps.

Using p2j and Other Tools

Alternative tools offer different features:

bash

# Install p2j (Python to Jupyter, also does reverse)
pip install p2j

# Convert with p2j
p2j -t notebook.ipynb

Each tool has quirks—experiment to find what works best for your workflow.

Cleaning the Converted Code

The raw converted script needs substantial cleanup before it’s production-ready. Here’s a systematic approach to transformation.

Remove Notebook-Specific Code

First, eliminate code that only makes sense in notebooks:

python

# BEFORE: Notebook code with display elements
import matplotlib.pyplot as plt
%matplotlib inline  # Magic command - remove

df = load_data()
df.head()  # Interactive display - remove or modify
print(df.describe())  # Exploratory print - remove or convert to logging

plt.figure(figsize=(10, 6))
plt.plot(data)
plt.show()  # Remove or save to file instead

python

# AFTER: Production-ready code
import matplotlib.pyplot as plt
import logging

logger = logging.getLogger(__name__)

df = load_data()
logger.info(f"Loaded {len(df)} rows")

# Save visualization instead of showing
plt.figure(figsize=(10, 6))
plt.plot(data)
plt.savefig('output/plot.png')
plt.close()

Systematic removal checklist:

  • Delete all magic commands (%, %%)
  • Replace display(), .head(), .describe() with logging
  • Convert or remove plt.show() calls
  • Remove jupyter-specific imports (IPython.display, etc.)
  • Clean up excessive print statements

Refactor Into Functions

Notebook code is typically a series of script-level statements. Production code needs function encapsulation:

python

# BEFORE: Script-level code
data = pd.read_csv('data.csv')
data = data.dropna()
data['new_col'] = data['col1'] * data['col2']
model = train_model(data)
predictions = model.predict(test_data)
save_results(predictions)

python

# AFTER: Organized functions
def load_and_clean_data(filepath):
    """Load and preprocess data from CSV"""
    data = pd.read_csv(filepath)
    data = data.dropna()
    return data

def engineer_features(data):
    """Create derived features"""
    data = data.copy()
    data['new_col'] = data['col1'] * data['col2']
    return data

def run_pipeline(data_path, output_path):
    """Execute complete ML pipeline"""
    # Load data
    data = load_and_clean_data(data_path)
    logger.info(f"Loaded {len(data)} samples")
    
    # Engineer features
    data = engineer_features(data)
    
    # Train and predict
    model = train_model(data)
    predictions = model.predict(test_data)
    
    # Save results
    save_results(predictions, output_path)
    logger.info("Pipeline completed successfully")

if __name__ == '__main__':
    run_pipeline('data/input.csv', 'output/predictions.csv')

Benefits of function encapsulation:

  • Each function has a clear, testable purpose
  • State management is explicit through parameters and returns
  • Code becomes reusable across different scripts
  • Error handling can be localized
  • Documentation via docstrings becomes natural

Extract Configuration and Hardcoded Values

Notebooks often contain hardcoded parameters scattered throughout cells. Centralize these:

python

# BEFORE: Scattered hardcoded values
data = pd.read_csv('data/train.csv')  # Hardcoded path
model = RandomForest(n_estimators=100, max_depth=10)  # Hardcoded params
threshold = 0.5  # Hardcoded threshold
output_file = 'results_2024.csv'  # Hardcoded output

python

# AFTER: Centralized configuration
import yaml
from dataclasses import dataclass

@dataclass
class Config:
    """Configuration for ML pipeline"""
    data_path: str = 'data/train.csv'
    n_estimators: int = 100
    max_depth: int = 10
    threshold: float = 0.5
    output_path: str = 'output/results.csv'
    
    @classmethod
    def from_yaml(cls, path):
        """Load configuration from YAML file"""
        with open(path, 'r') as f:
            config_dict = yaml.safe_load(f)
        return cls(**config_dict)

def run_pipeline(config: Config):
    """Run pipeline with provided configuration"""
    data = pd.read_csv(config.data_path)
    model = RandomForest(
        n_estimators=config.n_estimators,
        max_depth=config.max_depth
    )
    # ... rest of pipeline
    results.to_csv(config.output_path)

Create a separate config.yaml:

yaml

data_path: 'data/train.csv'
n_estimators: 100
max_depth: 10
threshold: 0.5
output_path: 'output/results.csv'

This separation makes your code flexible and environment-agnostic.

Adding Production-Grade Features

Once you have clean, functional code, add features that make it production-ready.

Implement Proper Logging

Replace print statements with structured logging:

python

import logging
import sys

def setup_logging(log_level=logging.INFO):
    """Configure logging for production"""
    logging.basicConfig(
        level=log_level,
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
        handlers=[
            logging.FileHandler('pipeline.log'),
            logging.StreamHandler(sys.stdout)
        ]
    )

logger = logging.getLogger(__name__)

def process_data(data):
    """Process data with proper logging"""
    logger.info("Starting data processing")
    
    initial_rows = len(data)
    data = data.dropna()
    logger.info(f"Removed {initial_rows - len(data)} rows with missing values")
    
    try:
        data = transform_features(data)
        logger.info("Feature transformation completed")
    except Exception as e:
        logger.error(f"Feature transformation failed: {str(e)}")
        raise
    
    return data

Logging best practices:

  • Use appropriate log levels (DEBUG, INFO, WARNING, ERROR, CRITICAL)
  • Log to both file and console
  • Include context in log messages
  • Log exceptions with full tracebacks
  • Never log sensitive information

Add Command-Line Interface

Make your script executable from the command line with arguments:

python

import argparse
from pathlib import Path

def parse_args():
    """Parse command-line arguments"""
    parser = argparse.ArgumentParser(
        description='ML Pipeline for production'
    )
    parser.add_argument(
        '--config',
        type=Path,
        default='config.yaml',
        help='Path to configuration file'
    )
    parser.add_argument(
        '--data',
        type=Path,
        required=True,
        help='Path to input data'
    )
    parser.add_argument(
        '--output',
        type=Path,
        default='output/results.csv',
        help='Path for output results'
    )
    parser.add_argument(
        '--log-level',
        choices=['DEBUG', 'INFO', 'WARNING', 'ERROR'],
        default='INFO',
        help='Logging level'
    )
    return parser.parse_args()

def main():
    """Main execution function"""
    args = parse_args()
    
    # Setup logging
    setup_logging(getattr(logging, args.log_level))
    logger.info("Starting pipeline")
    
    # Load configuration
    config = Config.from_yaml(args.config)
    
    # Override with command-line arguments
    config.data_path = args.data
    config.output_path = args.output
    
    # Run pipeline
    run_pipeline(config)
    logger.info("Pipeline completed successfully")

if __name__ == '__main__':
    main()

Now you can run your script flexibly:

bash

python pipeline.py --data data/new_data.csv --output results/predictions.csv --log-level DEBUG

Implement Error Handling and Validation

Add robust error handling that makes debugging easier:

python

class DataValidationError(Exception):
    """Custom exception for data validation failures"""
    pass

def validate_data(data, required_columns):
    """Validate input data meets requirements"""
    missing_cols = set(required_columns) - set(data.columns)
    if missing_cols:
        raise DataValidationError(
            f"Missing required columns: {missing_cols}"
        )
    
    if data.empty:
        raise DataValidationError("Input data is empty")
    
    if data.isnull().all().any():
        null_cols = data.columns[data.isnull().all()].tolist()
        raise DataValidationError(
            f"Columns entirely null: {null_cols}"
        )
    
    logger.info("Data validation passed")

def run_pipeline(config: Config):
    """Run pipeline with comprehensive error handling"""
    try:
        # Load data
        logger.info(f"Loading data from {config.data_path}")
        data = pd.read_csv(config.data_path)
        
        # Validate
        required_cols = ['feature1', 'feature2', 'target']
        validate_data(data, required_cols)
        
        # Process
        data = process_data(data)
        results = train_and_predict(data)
        
        # Save
        results.to_csv(config.output_path, index=False)
        logger.info(f"Results saved to {config.output_path}")
        
    except FileNotFoundError as e:
        logger.error(f"File not found: {e}")
        raise
    except DataValidationError as e:
        logger.error(f"Data validation failed: {e}")
        raise
    except Exception as e:
        logger.error(f"Unexpected error: {e}", exc_info=True)
        raise

Production-Ready Script Checklist

Component Requirements
Code Organization Functions, classes, clear separation of concerns
Configuration Externalized params, YAML/JSON configs, CLI args
Logging Structured logging, appropriate levels, file output
Error Handling Try-except blocks, custom exceptions, validation
Documentation Docstrings, README, usage examples
Testing Unit tests, integration tests, test data

Organizing Multiple Notebooks Into a Package

Real projects often involve multiple notebooks. Converting these into a cohesive package structure is essential for maintainability.

Package Structure Design

Organize your converted code into a proper Python package:

ml_project/
├── ml_project/
│   ├── __init__.py
│   ├── config.py          # Configuration management
│   ├── data/
│   │   ├── __init__.py
│   │   ├── loader.py      # Data loading functions
│   │   └── preprocessor.py # Data cleaning/preprocessing
│   ├── features/
│   │   ├── __init__.py
│   │   └── engineer.py    # Feature engineering
│   ├── models/
│   │   ├── __init__.py
│   │   ├── train.py       # Model training
│   │   └── predict.py     # Inference
│   └── utils/
│       ├── __init__.py
│       └── logging.py     # Logging utilities
├── scripts/
│   ├── train_pipeline.py  # Training script
│   └── predict_pipeline.py # Inference script
├── tests/
│   ├── test_data.py
│   ├── test_features.py
│   └── test_models.py
├── configs/
│   ├── default.yaml
│   └── production.yaml
├── notebooks/
│   └── archive/           # Original notebooks for reference
├── requirements.txt
├── setup.py
└── README.md

This structure separates concerns, makes code reusable, and enables proper testing.

Creating Entry Point Scripts

Create clean entry points that orchestrate your package:

python

# scripts/train_pipeline.py
from ml_project.config import load_config
from ml_project.data.loader import load_data
from ml_project.data.preprocessor import preprocess
from ml_project.features.engineer import engineer_features
from ml_project.models.train import train_model
from ml_project.utils.logging import setup_logging
import argparse
import logging

def main():
    """Training pipeline entry point"""
    parser = argparse.ArgumentParser()
    parser.add_argument('--config', required=True)
    args = parser.parse_args()
    
    # Setup
    config = load_config(args.config)
    setup_logging(config.log_level)
    logger = logging.getLogger(__name__)
    
    # Execute pipeline
    logger.info("Starting training pipeline")
    
    data = load_data(config.data_path)
    data = preprocess(data, config.preprocessing)
    data = engineer_features(data, config.features)
    model = train_model(data, config.model)
    
    logger.info("Training complete")

if __name__ == '__main__':
    main()

Each script imports from your package, keeping the entry point clean and focused on orchestration.

Testing Your Converted Code

Production code requires tests. Here’s how to make your converted notebook code testable:

Write Unit Tests for Core Functions

python

# tests/test_features.py
import pytest
import pandas as pd
from ml_project.features.engineer import create_interaction_features

def test_create_interaction_features():
    """Test feature engineering creates expected interactions"""
    # Arrange
    data = pd.DataFrame({
        'feature1': [1, 2, 3],
        'feature2': [4, 5, 6]
    })
    
    # Act
    result = create_interaction_features(data)
    
    # Assert
    assert 'feature1_x_feature2' in result.columns
    assert result['feature1_x_feature2'].iloc[0] == 4
    assert result['feature1_x_feature2'].iloc[1] == 10
    assert result['feature1_x_feature2'].iloc[2] == 18

def test_create_interaction_features_handles_missing():
    """Test function handles missing values correctly"""
    data = pd.DataFrame({
        'feature1': [1, None, 3],
        'feature2': [4, 5, None]
    })
    
    result = create_interaction_features(data)
    
    assert result['feature1_x_feature2'].isna().sum() == 2

Integration Tests for Pipeline

python

# tests/test_integration.py
import pytest
from pathlib import Path
from ml_project.config import Config
from scripts.train_pipeline import main

def test_full_training_pipeline(tmp_path):
    """Test complete training pipeline runs successfully"""
    # Create test config
    config_path = tmp_path / 'test_config.yaml'
    config_path.write_text("""
    data_path: 'tests/data/test_data.csv'
    output_path: 'output/test_model.pkl'
    model:
      type: 'random_forest'
      n_estimators: 10
    """)
    
    # Run pipeline (should not raise exceptions)
    # In real implementation, you'd mock or use test data
    # and verify outputs

Testing makes your code reliable and gives you confidence when making changes.

Practical Example: Complete Conversion

Let’s walk through a complete example converting a real notebook to production code.

Original Notebook (analysis.ipynb):

python

# Cell 1
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Cell 2
df = pd.read_csv('data.csv')
df.head()

# Cell 3
df = df.dropna()
df['feature_ratio'] = df['feature1'] / df['feature2']

# Cell 4
X = df.drop('target', axis=1)
y = df['target']

# Cell 5
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)

# Cell 6
predictions = model.predict(X)
print(f"Accuracy: {accuracy_score(y, predictions)}")

Converted Production Script (pipeline.py):

python

"""
Production ML Pipeline
Converted from analysis.ipynb
"""
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import logging
import argparse
from pathlib import Path
import joblib

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

def load_and_validate_data(filepath):
    """Load data and perform basic validation"""
    logger.info(f"Loading data from {filepath}")
    df = pd.read_csv(filepath)
    
    required_cols = ['feature1', 'feature2', 'target']
    missing = set(required_cols) - set(df.columns)
    if missing:
        raise ValueError(f"Missing required columns: {missing}")
    
    logger.info(f"Loaded {len(df)} rows")
    return df

def preprocess_data(df):
    """Clean and prepare data"""
    initial_rows = len(df)
    df = df.dropna()
    logger.info(f"Removed {initial_rows - len(df)} rows with missing values")
    
    # Feature engineering
    df['feature_ratio'] = df['feature1'] / df['feature2']
    logger.info("Created engineered features")
    
    return df

def split_features_target(df, target_col='target'):
    """Separate features and target"""
    X = df.drop(target_col, axis=1)
    y = df[target_col]
    return X, y

def train_model(X, y, n_estimators=100):
    """Train random forest model"""
    logger.info(f"Training model with {n_estimators} estimators")
    model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
    model.fit(X, y)
    logger.info("Model training completed")
    return model

def evaluate_model(model, X, y):
    """Evaluate model performance"""
    predictions = model.predict(X)
    accuracy = accuracy_score(y, predictions)
    logger.info(f"Model accuracy: {accuracy:.4f}")
    return accuracy

def save_model(model, filepath):
    """Save trained model"""
    filepath = Path(filepath)
    filepath.parent.mkdir(parents=True, exist_ok=True)
    joblib.dump(model, filepath)
    logger.info(f"Model saved to {filepath}")

def run_pipeline(data_path, model_path, n_estimators=100):
    """Execute complete ML pipeline"""
    try:
        # Load and prepare data
        df = load_and_validate_data(data_path)
        df = preprocess_data(df)
        X, y = split_features_target(df)
        
        # Train and evaluate
        model = train_model(X, y, n_estimators)
        accuracy = evaluate_model(model, X, y)
        
        # Save model
        save_model(model, model_path)
        
        return model, accuracy
        
    except Exception as e:
        logger.error(f"Pipeline failed: {str(e)}", exc_info=True)
        raise

def main():
    """Command-line interface"""
    parser = argparse.ArgumentParser(description='ML Training Pipeline')
    parser.add_argument('--data', required=True, help='Path to training data')
    parser.add_argument('--output', default='model.pkl', help='Output model path')
    parser.add_argument('--n-estimators', type=int, default=100, help='Number of trees')
    
    args = parser.parse_args()
    
    run_pipeline(args.data, args.output, args.n_estimators)
    logger.info("Pipeline completed successfully")

if __name__ == '__main__':
    main()

This conversion demonstrates all key principles: function encapsulation, proper logging, error handling, configuration via CLI, and clear execution flow.

Conclusion

Converting Jupyter notebooks to production scripts is more than a mechanical transformation—it’s about elevating exploratory code to production standards. The process involves systematic cleanup of notebook-specific elements, refactoring into modular functions, externalizing configuration, implementing robust logging and error handling, and adding command-line interfaces. Each step transforms brittle, interactive code into reliable, maintainable software that runs smoothly in production environments.

The investment in proper conversion pays enormous dividends. You gain testable, reusable code that’s easy to debug, deploy, and maintain. Start with the mechanical conversion using nbconvert, then systematically apply the refactoring patterns outlined here. Your future self—and your team—will thank you for taking the time to do it right.

Leave a Comment