How to Draw ROC AUC Curve in Python

When working on classification problems in machine learning, it’s essential to evaluate the performance of your models accurately. Among many metrics, the ROC AUC curve stands out for its ability to illustrate how well a model distinguishes between classes. In this article, we’ll explore how to draw ROC AUC curve in Python, step-by-step, using real code examples and practical tips.

Whether you’re a beginner in machine learning or an experienced practitioner aiming to improve your model evaluation workflow, this guide has you covered.


Table of Contents

  1. What is ROC and AUC?
  2. Why Use ROC AUC Curve?
  3. Prerequisites and Setup
  4. ROC AUC Curve in Python: Step-by-Step
  5. Multi-class ROC AUC Curve
  6. Tips for Interpreting the ROC Curve
  7. Limitations of ROC AUC
  8. Conclusion

1. What is ROC and AUC?

ROC (Receiver Operating Characteristic) Curve

The ROC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system. It plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold levels.

  • True Positive Rate (Recall or Sensitivity):
\[\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}\]
  • False Positive Rate:
\[\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}\]

Each point on the ROC curve represents a TPR/FPR pair corresponding to a particular decision threshold.

AUC (Area Under the Curve)

AUC quantifies the overall ability of the model to discriminate between positive and negative classes. An AUC of:

  • 1.0 = perfect classifier
  • 0.5 = random guess
  • < 0.5 = worse than random

2. Why Use ROC AUC Curve?

There are several reasons why the ROC AUC curve is a popular metric:

  • Threshold-agnostic: Unlike accuracy, it evaluates model performance across all classification thresholds.
  • Visual interpretation: You can compare multiple models by overlaying ROC curves.
  • Robust for imbalanced datasets: AUC doesn’t rely on class distribution as heavily as accuracy.

3. Prerequisites and Setup

Before we draw ROC AUC curve in Python, make sure you have the necessary libraries installed:

pip install scikit-learn matplotlib numpy

We’ll use:

  • Scikit-learn for modeling and metrics
  • Matplotlib for visualization
  • NumPy for numerical operations

4. ROC AUC Curve in Python: Step-by-Step

Let’s walk through how to draw ROC AUC curve in Python with a practical example using the breast cancer dataset.

Step 1: Import Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve, auc

Step 2: Load Dataset

data = load_breast_cancer()
X = data.data
y = data.target

Step 3: Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Step 4: Train a Classifier

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

Step 5: Predict Probabilities

y_proba = clf.predict_proba(X_test)[:, 1]  # Probability for positive class

Step 6: Calculate ROC Curve and AUC

fpr, tpr, thresholds = roc_curve(y_test, y_proba)
roc_auc = auc(fpr, tpr)

Step 7: Plot the ROC Curve

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='blue', label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='gray', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC AUC Curve')
plt.legend(loc='lower right')
plt.grid(True)
plt.show()

This will display a curve, typically bowing toward the top-left corner — the higher the curve, the better the model performance.

ROC AUC Curve

5. Multi-Class ROC AUC Curve

While ROC AUC is traditionally used for binary classification, it can be extended to multi-class classification using the One-vs-Rest (OvR) strategy.

Example:

from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

# Simulated multi-class target
from sklearn.datasets import make_classification
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3, n_informative=5)
y_bin = label_binarize(y, classes=[0, 1, 2])

X_train, X_test, y_train, y_test = train_test_split(X, y_bin, test_size=0.3, random_state=42)

clf = OneVsRestClassifier(LogisticRegression())
clf.fit(X_train, y_train)

y_score = clf.decision_function(X_test)
roc_auc_score(y_test, y_score, average='macro', multi_class='ovr')

This computes an averaged AUC score across classes.


6. Tips for Interpreting the ROC Curve

  • Closer to top-left = better: This means high TPR and low FPR.
  • Diagonal line: A model that predicts at random.
  • Steep curve at the beginning: Strong at distinguishing between classes.

Threshold Tuning

The ROC curve also gives insight into optimal threshold selection. For instance:

optimal_idx = np.argmax(tpr - fpr)
optimal_threshold = thresholds[optimal_idx]

This gives the point where the difference between TPR and FPR is maximized — a useful threshold.


7. Limitations of ROC AUC

Despite its usefulness, there are a few caveats:

  • Misleading with highly imbalanced data: Even poor models can have high AUC when the majority class dominates.
  • Ignores decision threshold: AUC summarizes all thresholds, but in real applications, you often care about specific thresholds.
  • Doesn’t reflect real-world cost: False positives and false negatives may have different costs, which AUC doesn’t account for.

For imbalanced datasets, consider Precision-Recall curves as an alternative.


Conclusion

In this guide, we walked through how to draw ROC AUC curve in Python using scikit-learn. Understanding and visualizing ROC AUC curves is a powerful skill for any data scientist or machine learning practitioner. It provides an intuitive way to evaluate classification models, compare their performance, and fine-tune decision thresholds.

From binary to multi-class setups, ROC AUC remains one of the most valuable tools in your model evaluation arsenal. So next time you build a classifier, don’t just check the accuracy — draw the ROC curve and interpret the AUC for a deeper understanding.

Leave a Comment