How to Plot ROC Curve in Matplotlib

The ROC (Receiver Operating Characteristic) curve is one of the most important visualization tools in machine learning for evaluating binary classification models. When you need to plot ROC curve matplotlib provides excellent capabilities for creating professional, publication-ready visualizations that help you understand your model’s performance across different classification thresholds.

Understanding ROC Curves: The Foundation

Before diving into matplotlib implementation, understanding what ROC curves represent is crucial for creating meaningful visualizations. The ROC curve plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various classification thresholds, providing a comprehensive view of your model’s discrimination ability.

The True Positive Rate, also known as sensitivity or recall, measures the proportion of actual positive cases correctly identified by the model. Mathematically, TPR = TP / (TP + FN), where TP represents true positives and FN represents false negatives. The False Positive Rate measures the proportion of actual negative cases incorrectly classified as positive, calculated as FPR = FP / (FP + TN), where FP is false positives and TN is true negatives.

The diagonal line from (0,0) to (1,1) represents random chance performance. Models performing better than random chance will have ROC curves that bow upward and to the left of this diagonal. The area under the ROC curve (AUC) quantifies overall model performance, with values closer to 1.0 indicating better discrimination ability.

Essential Libraries and Data Preparation

To plot ROC curve matplotlib works seamlessly with scikit-learn’s metrics module, which provides the necessary functions for calculating ROC curve coordinates. Here’s the essential setup:

import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification

# Generate sample binary classification data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, 
                          n_redundant=0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, 
                                                    random_state=42)

# Train a simple logistic regression model
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Get prediction probabilities
y_pred_proba = model.predict_proba(X_test)[:, 1]

The key insight here is that ROC curves require prediction probabilities rather than hard classifications. The predict_proba method returns the probability estimates, and we typically use the probability of the positive class (index 1) for binary classification problems.

Creating Basic ROC Curves with Matplotlib

The fundamental process to plot ROC curve matplotlib involves calculating the false positive rates, true positive rates, and thresholds using scikit-learn’s roc_curve function, then visualizing these coordinates:

# Calculate ROC curve coordinates
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Create the basic ROC plot
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, 
         label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', 
         label='Random chance')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

This basic implementation creates a professional-looking ROC curve visualization. The lw parameter controls line width, while the alpha parameter in the grid function creates subtle background gridlines that aid interpretation without cluttering the visualization.

Advanced ROC Curve Visualizations

When you need to plot ROC curve matplotlib offers extensive customization options for more sophisticated analyses. Comparing multiple models on the same plot provides valuable insights for model selection:

# Train multiple models for comparison
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42),
    'SVM': SVC(probability=True, random_state=42)
}

plt.figure(figsize=(10, 8))
colors = ['darkorange', 'green', 'red']

for i, (name, model) in enumerate(models.items()):
    # Train model and get predictions
    model.fit(X_train, y_train)
    y_pred_proba = model.predict_proba(X_test)[:, 1] if hasattr(model, "predict_proba") else model.decision_function(X_test)
    
    # Calculate ROC curve
    fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
    roc_auc = auc(fpr, tpr)
    
    # Plot ROC curve
    plt.plot(fpr, tpr, color=colors[i], lw=2.5,
             label=f'{name} (AUC = {roc_auc:.3f})')

# Add random chance line
plt.plot([0, 1], [0, 1], 'k--', lw=2, alpha=0.8, label='Random chance')

# Customize the plot
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('ROC Curves Comparison - Multiple Models', fontsize=14, fontweight='bold')
plt.legend(loc="lower right", fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

This advanced visualization allows direct comparison of model performance, making it easy to identify which model provides the best discrimination across different operating points.

Customizing ROC Curve Aesthetics

Professional data visualization requires attention to aesthetic details. When you plot ROC curve matplotlib provides numerous styling options to create publication-quality figures:

• Color schemes: Use colorblind-friendly palettes like ‘viridis’, ‘plasma’, or manually specify colors using hex codes • Line styles: Vary line styles (‘-‘, ‘–‘, ‘-.’, ‘:’) to distinguish multiple curves when printing in grayscale • Markers: Add markers (‘o’, ‘s’, ‘^’) at specific threshold points to highlight important operating characteristics • Fill areas: Use fill_between() to highlight the area under specific curves for emphasis • Annotations: Add text annotations to mark optimal operating points or specific threshold values

# Enhanced aesthetic ROC curve
plt.figure(figsize=(9, 7))

# Plot main ROC curve with enhanced styling
plt.plot(fpr, tpr, color='#2E86AB', linewidth=3, 
         label=f'Model ROC (AUC = {roc_auc:.3f})')
plt.fill_between(fpr, tpr, alpha=0.2, color='#2E86AB')

# Add random chance line
plt.plot([0, 1], [0, 1], color='#A23B72', linewidth=2, 
         linestyle='--', alpha=0.8, label='Random chance')

# Find and mark optimal threshold point (Youden's index)
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold = thresholds[optimal_idx]
plt.plot(fpr[optimal_idx], tpr[optimal_idx], marker='o', 
         markersize=10, color='red', label=f'Optimal threshold = {optimal_threshold:.3f}')

# Styling enhancements
plt.xlim([-0.01, 1.01])
plt.ylim([-0.01, 1.01])
plt.xlabel('False Positive Rate', fontsize=13, fontweight='bold')
plt.ylabel('True Positive Rate', fontsize=13, fontweight='bold')
plt.title('Enhanced ROC Curve Visualization', fontsize=15, fontweight='bold', pad=20)
plt.legend(loc="lower right", fontsize=11, framealpha=0.9)
plt.grid(True, alpha=0.4, linestyle='-', linewidth=0.5)
plt.tight_layout()
plt.show()

💡 Pro Tip

Always include confidence intervals when presenting ROC curves in research papers or professional reports. Bootstrap sampling can generate these intervals, providing statistical significance to your model comparisons.

Interpreting ROC Curve Results

Understanding how to interpret ROC curves is essential for making informed decisions about model performance and threshold selection. The shape and position of the ROC curve reveal crucial information about your classifier’s behavior across different operating conditions.

A ROC curve that hugs the top-left corner indicates excellent model performance, with high true positive rates achieved while maintaining low false positive rates. Conversely, a curve that closely follows the diagonal suggests poor discrimination ability, performing only slightly better than random guessing.

The optimal threshold selection depends on your specific application requirements. For medical diagnosis applications where missing positive cases (false negatives) carries high cost, you might prefer operating points toward the upper-right portion of the curve, accepting higher false positive rates to minimize false negatives. In spam detection systems where false positives (legitimate emails marked as spam) are costly, you might choose operating points toward the lower-left portion.

When comparing multiple models, the model with the ROC curve that dominates others (positioned higher and to the left) generally provides better performance. However, consider the specific operating region relevant to your application rather than focusing solely on overall AUC values.

Handling Class Imbalance and ROC Limitations

ROC curves can be misleading when dealing with highly imbalanced datasets. In scenarios where positive cases represent a small fraction of the total dataset, ROC curves may present an overly optimistic view of model performance because the false positive rate denominator includes many true negatives.

For imbalanced datasets, precision-recall curves often provide more informative model evaluation. However, when you must plot ROC curve matplotlib can still be valuable if you understand these limitations and interpret results accordingly:

# Example with imbalanced data
from sklearn.datasets import make_classification

# Create highly imbalanced dataset
X_imbal, y_imbal = make_classification(n_samples=10000, n_features=20, 
                                       n_classes=2, n_redundant=0, 
                                       weights=[0.95, 0.05], random_state=42)

# Split and train model
X_train_imbal, X_test_imbal, y_train_imbal, y_test_imbal = train_test_split(
    X_imbal, y_imbal, test_size=0.3, random_state=42)

model_imbal = LogisticRegression(random_state=42)
model_imbal.fit(X_train_imbal, y_train_imbal)
y_pred_proba_imbal = model_imbal.predict_proba(X_test_imbal)[:, 1]

# Plot ROC for imbalanced data
fpr_imbal, tpr_imbal, _ = roc_curve(y_test_imbal, y_pred_proba_imbal)
roc_auc_imbal = auc(fpr_imbal, tpr_imbal)

plt.figure(figsize=(8, 6))
plt.plot(fpr_imbal, tpr_imbal, color='purple', lw=2,
         label=f'Imbalanced Data ROC (AUC = {roc_auc_imbal:.3f})')
plt.plot([0, 1], [0, 1], 'k--', lw=2, alpha=0.8)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate') 
plt.title('ROC Curve for Highly Imbalanced Dataset (5% Positive Class)')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

Cross-Validation and ROC Curve Stability

Robust model evaluation requires assessing ROC curve stability across different data splits. Cross-validation provides a framework for understanding the variability in your model’s performance and generating confidence intervals for AUC estimates:

from sklearn.model_selection import cross_val_score, StratifiedKFold
from sklearn.metrics import roc_auc_score

# Perform stratified k-fold cross-validation
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_auc_scores = cross_val_score(model, X, y, cv=cv, scoring='roc_auc')

print(f"Cross-validated AUC scores: {cv_auc_scores}")
print(f"Mean AUC: {cv_auc_scores.mean():.3f} (+/- {cv_auc_scores.std() * 2:.3f})")

# Plot distribution of AUC scores
plt.figure(figsize=(8, 5))
plt.hist(cv_auc_scores, bins=10, alpha=0.7, color='skyblue', edgecolor='black')
plt.axvline(cv_auc_scores.mean(), color='red', linestyle='--', 
            label=f'Mean AUC: {cv_auc_scores.mean():.3f}')
plt.xlabel('AUC Score')
plt.ylabel('Frequency')
plt.title('Distribution of Cross-Validated AUC Scores')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

🎯 Key Insight

When evaluating model performance, always consider both the mean AUC and its variability across cross-validation folds. A model with slightly lower mean AUC but consistent performance might be preferable to one with higher mean AUC but high variability.

Saving and Exporting ROC Curve Visualizations

Professional workflows require saving ROC curve visualizations in various formats for reports, presentations, and publications. Matplotlib provides extensive options for exporting high-quality figures:

# Save ROC curve in multiple formats
plt.figure(figsize=(10, 8))

# Create publication-quality ROC curve
plt.plot(fpr, tpr, color='#1f77b4', linewidth=3, 
         label=f'Model ROC (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], 'k--', linewidth=2, alpha=0.8, label='Random chance')

plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=14, fontweight='bold')
plt.ylabel('True Positive Rate', fontsize=14, fontweight='bold')
plt.title('ROC Curve - Model Performance Evaluation', fontsize=16, fontweight='bold')
plt.legend(loc="lower right", fontsize=12)
plt.grid(True, alpha=0.3)

# Save in multiple formats
plt.savefig('roc_curve.png', dpi=300, bbox_inches='tight', facecolor='white')
plt.savefig('roc_curve.pdf', bbox_inches='tight', facecolor='white')
plt.savefig('roc_curve.svg', bbox_inches='tight', facecolor='white')
plt.show()

print("ROC curve saved in PNG, PDF, and SVG formats")

Conclusion

Mastering how to plot ROC curve matplotlib enables you to create powerful visualizations that effectively communicate model performance to stakeholders and guide critical decision-making processes. The combination of matplotlib’s flexible plotting capabilities with scikit-learn’s robust metrics functions provides a comprehensive toolkit for ROC curve analysis.

From basic single-model visualizations to advanced multi-model comparisons with custom styling, the techniques covered in this guide will help you create professional-quality ROC curve plots that enhance your machine learning workflow. Remember to consider your specific application context when interpreting ROC curves and selecting optimal operating thresholds for your classification models.