How to Visualize Data in Jupyter Notebook Using Matplotlib and Seaborn

Data visualization transforms raw numbers into visual stories that reveal patterns, trends, and insights invisible in spreadsheets. When you combine the power of Matplotlib and Seaborn with Jupyter Notebook’s interactive environment, you create a dynamic workspace where you can experiment with different visualizations instantly, refining your approach until your data’s story becomes crystal clear. This guide dives deep into practical visualization techniques that will elevate your data analysis from good to exceptional.

Understanding Matplotlib and Seaborn: Two Libraries, Complementary Strengths

Matplotlib serves as Python’s foundational plotting library, offering complete control over every visual element. You can customize axes, colors, labels, legends, and layouts down to the pixel level. This flexibility makes it perfect when you need precise control or when creating custom visualization types.

Seaborn builds on Matplotlib’s foundation, providing a high-level interface for statistical graphics. It excels at creating attractive, informative plots with minimal code. Seaborn’s built-in themes, color palettes, and statistical functions make it ideal for exploratory data analysis and producing publication-quality graphics quickly. The beauty of using both libraries together is that Seaborn handles the heavy lifting for common statistical plots while Matplotlib lets you fine-tune every detail.

Setting Up Your Visualization Environment

Start by installing the necessary libraries and configuring Jupyter Notebook for optimal visualization:

pip install jupyter matplotlib seaborn pandas numpy

Launch Jupyter Notebook and create a new notebook. Your first cell should import libraries and configure display settings:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# Display plots inline
%matplotlib inline

# Set default figure size
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['figure.dpi'] = 100

# Set Seaborn style
sns.set_style('whitegrid')
sns.set_palette('husl')

The %matplotlib inline magic command displays plots directly in your notebook. The rcParams configuration sets default figure sizes, preventing you from specifying dimensions for every plot. Seaborn’s style system immediately improves your plots’ aesthetics—whitegrid provides clean backgrounds with subtle gridlines that aid reading without cluttering the visual.

Creating Basic Plots with Matplotlib

Matplotlib’s object-oriented interface provides precise control over your visualizations. Understanding this interface is crucial for creating professional-quality plots.

The Figure and Axes Architecture

Every Matplotlib plot consists of a Figure (the overall container) and one or more Axes (individual plot areas):

# Create figure and axes explicitly
fig, ax = plt.subplots(figsize=(10, 6))

# Sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Plot on the axes
ax.plot(x, y, linewidth=2, color='blue', label='sin(x)')
ax.set_xlabel('X Values', fontsize=12)
ax.set_ylabel('Y Values', fontsize=12)
ax.set_title('Sine Wave Visualization', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

This approach separates plot creation from customization, making your code more readable and maintainable. The fig, ax = plt.subplots() pattern becomes second nature once you understand its power.

Line Plots for Trends and Time Series

Line plots excel at showing trends over continuous data:

# Multiple lines on same plot
fig, ax = plt.subplots(figsize=(12, 6))

# Generate sample time series data
dates = pd.date_range('2024-01-01', periods=365)
revenue = np.cumsum(np.random.randn(365)) + 100
expenses = np.cumsum(np.random.randn(365)) + 80

ax.plot(dates, revenue, linewidth=2, label='Revenue', color='green')
ax.plot(dates, expenses, linewidth=2, label='Expenses', color='red')
ax.fill_between(dates, revenue, expenses, where=(revenue > expenses), 
                 alpha=0.3, color='green', label='Profit')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Amount ($)', fontsize=12)
ax.set_title('Revenue vs Expenses Over Time', fontsize=14, fontweight='bold')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The fill_between() method adds visual context by highlighting the profit area. This technique makes comparisons immediate and intuitive—viewers instantly see when revenue exceeded expenses.

Scatter Plots for Relationships

Scatter plots reveal relationships between two continuous variables:

# Generate correlated data
np.random.seed(42)
x = np.random.randn(200)
y = 2 * x + np.random.randn(200) * 0.5

fig, ax = plt.subplots(figsize=(10, 6))
scatter = ax.scatter(x, y, c=y, cmap='viridis', s=50, alpha=0.6, edgecolors='black')

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Y Value', fontsize=10)

# Add trend line
z = np.polyfit(x, y, 1)
p = np.poly1d(z)
ax.plot(x, p(x), "r--", linewidth=2, alpha=0.8, label='Trend Line')

ax.set_xlabel('X Variable', fontsize=12)
ax.set_ylabel('Y Variable', fontsize=12)
ax.set_title('Scatter Plot with Correlation', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Color mapping (the c and cmap parameters) adds a third dimension to your scatter plot, revealing patterns that might otherwise stay hidden. The trend line provides immediate visual confirmation of correlation strength and direction.

📊 Choosing the Right Plot Type

📈
Line Plot
Trends over time or continuous data progression
📊
Bar Chart
Comparing quantities across categories
🔵
Scatter Plot
Relationships between two variables
📦
Box Plot
Distribution and outliers detection
🌡️
Heatmap
Correlation matrices and patterns in 2D data
📉
Histogram
Distribution of single variable values

Bar Charts for Categorical Comparisons

Bar charts compare quantities across categories:

# Sample categorical data
categories = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']
values = [45, 67, 38, 92, 56]
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#FFA07A', '#98D8C8']

fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.bar(categories, values, color=colors, edgecolor='black', linewidth=1.5)

# Add value labels on bars
for bar in bars:
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height}',
            ha='center', va='bottom', fontsize=11, fontweight='bold')

ax.set_xlabel('Product Category', fontsize=12)
ax.set_ylabel('Sales Count', fontsize=12)
ax.set_title('Product Sales Comparison', fontsize=14, fontweight='bold')
ax.set_ylim(0, max(values) * 1.15)  # Add space for labels
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Adding value labels directly on bars eliminates the need for viewers to estimate values from the axis. The set_ylim() adjustment creates space above bars for these labels, preventing them from being cut off.

Leveraging Seaborn for Statistical Visualizations

Seaborn simplifies creating sophisticated statistical graphics. Its integration with Pandas DataFrames makes it especially powerful for data analysis workflows.

Distribution Plots: Understanding Your Data

Distribution plots reveal how values spread across your dataset:

# Generate sample data
np.random.seed(42)
data = {
    'normal': np.random.normal(100, 15, 1000),
    'skewed': np.random.exponential(2, 1000),
    'bimodal': np.concatenate([np.random.normal(80, 10, 500), 
                                np.random.normal(120, 10, 500)])
}
df = pd.DataFrame(data)

# Create subplot grid
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Histogram with KDE
sns.histplot(data=df, x='normal', kde=True, ax=axes[0], color='blue', bins=30)
axes[0].set_title('Normal Distribution', fontsize=12, fontweight='bold')

# KDE plot only
sns.kdeplot(data=df, x='skewed', ax=axes[1], color='red', fill=True, alpha=0.6)
axes[1].set_title('Skewed Distribution', fontsize=12, fontweight='bold')

# Violin plot
sns.violinplot(data=df, y='bimodal', ax=axes[2], color='green')
axes[2].set_title('Bimodal Distribution', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

The histogram with kernel density estimation (KDE) overlay provides both the raw count distribution and a smooth estimate of the underlying probability distribution. Violin plots combine box plots and KDE, showing distribution shape alongside quartiles—perfect for spotting bimodal distributions.

Relationship Plots: Exploring Correlations

Seaborn excels at visualizing relationships between multiple variables:

# Create sample dataset
np.random.seed(42)
df = pd.DataFrame({
    'age': np.random.randint(20, 65, 200),
    'income': np.random.randint(30000, 120000, 200),
    'spending': np.random.randint(10000, 60000, 200),
    'category': np.random.choice(['A', 'B', 'C'], 200)
})

# Regression plot with confidence interval
fig, ax = plt.subplots(figsize=(10, 6))
sns.regplot(data=df, x='income', y='spending', scatter_kws={'alpha':0.5}, 
            line_kws={'color':'red', 'linewidth':2}, ax=ax)
ax.set_title('Income vs Spending with Regression Line', fontsize=14, fontweight='bold')
ax.set_xlabel('Annual Income ($)', fontsize=12)
ax.set_ylabel('Annual Spending ($)', fontsize=12)
plt.tight_layout()
plt.show()

The regplot automatically fits a regression line and displays a confidence interval (the shaded area), giving you instant visual confirmation of relationship strength and uncertainty.

Categorical Plots: Comparing Groups

Seaborn’s categorical plots make group comparisons intuitive:

# Box plot with grouped categories
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Box plot
sns.boxplot(data=df, x='category', y='spending', ax=axes[0], palette='Set2')
axes[0].set_title('Spending Distribution by Category', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Category', fontsize=11)
axes[0].set_ylabel('Spending ($)', fontsize=11)

# Violin plot with split
sns.violinplot(data=df, x='category', y='income', ax=axes[1], palette='muted')
axes[1].set_title('Income Distribution by Category', fontsize=12, fontweight='bold')
axes[1].set_xlabel('Category', fontsize=11)
axes[1].set_ylabel('Income ($)', fontsize=11)

plt.tight_layout()
plt.show()

Box plots immediately reveal medians, quartiles, and outliers for each category. The violin plot adds distribution shape information, showing whether data clusters at certain values or spreads evenly.

Heatmaps for Correlation Matrices

Heatmaps transform correlation matrices into intuitive visual representations:

# Create correlation matrix
numeric_df = df[['age', 'income', 'spending']]
correlation_matrix = numeric_df.corr()

# Create heatmap
fig, ax = plt.subplots(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, fmt='.2f', cmap='coolwarm', 
            center=0, square=True, linewidths=1, cbar_kws={'shrink': 0.8},
            ax=ax)
ax.set_title('Correlation Matrix Heatmap', fontsize=14, fontweight='bold', pad=20)
plt.tight_layout()
plt.show()

The annot=True parameter displays correlation coefficients directly on the heatmap. The coolwarm colormap uses blue for negative correlations and red for positive ones, with white indicating no correlation—an intuitive color scheme that makes patterns immediately apparent.

Creating Multi-Plot Layouts

Complex analyses often require multiple plots displayed together. Matplotlib’s subplot system provides flexible layout options:

# Create 2x2 grid of plots
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Line plot
x = np.linspace(0, 10, 100)
axes[0, 0].plot(x, np.sin(x), label='sin(x)')
axes[0, 0].plot(x, np.cos(x), label='cos(x)')
axes[0, 0].set_title('Trigonometric Functions')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Bar chart
categories = ['A', 'B', 'C', 'D']
values = [25, 40, 30, 55]
axes[0, 1].bar(categories, values, color='skyblue', edgecolor='black')
axes[0, 1].set_title('Category Comparison')

# Plot 3: Scatter plot
x_scatter = np.random.randn(100)
y_scatter = 2 * x_scatter + np.random.randn(100)
axes[1, 0].scatter(x_scatter, y_scatter, alpha=0.6)
axes[1, 0].set_title('Correlation Analysis')

# Plot 4: Histogram
data_hist = np.random.normal(100, 15, 1000)
axes[1, 1].hist(data_hist, bins=30, color='green', alpha=0.7, edgecolor='black')
axes[1, 1].set_title('Distribution')

plt.tight_layout()
plt.show()

The axes array lets you reference each subplot using row and column indices. This organizational structure keeps your code clean even when creating complex dashboard-style visualizations.

Customizing Plot Aesthetics

Professional visualizations require attention to aesthetic details. Both libraries offer extensive customization options:

Color Palettes and Themes

# Seaborn color palettes
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

data_sample = [np.random.normal(loc, 1, 100) for loc in range(5)]

# Different palettes
palettes = ['deep', 'muted', 'pastel', 'bright', 'dark', 'colorblind']

for idx, (ax, palette) in enumerate(zip(axes.flat, palettes)):
    sns.violinplot(data=data_sample, ax=ax, palette=palette)
    ax.set_title(f'Palette: {palette}', fontsize=11, fontweight='bold')
    ax.set_xticklabels([])

plt.tight_layout()
plt.show()

Seaborn’s built-in palettes ensure your colors are both attractive and accessible. The colorblind palette, for instance, uses colors distinguishable by people with common forms of color blindness—an important accessibility consideration.

Style Contexts

Seaborn provides context managers that adjust plot elements for different presentation contexts:

# Different contexts
contexts = ['paper', 'notebook', 'talk', 'poster']

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

for ax, context in zip(axes.flat, contexts):
    with sns.axes_style('darkgrid'):
        sns.set_context(context)
        x = np.linspace(0, 10, 100)
        ax.plot(x, np.sin(x), linewidth=2)
        ax.set_title(f'Context: {context}', fontsize=14, fontweight='bold')
        ax.set_xlabel('X Values')
        ax.set_ylabel('Y Values')

plt.tight_layout()
plt.show()

The paper context uses smaller fonts and thinner lines for publication. The poster context dramatically increases all elements for visibility from a distance. This automatic scaling saves tremendous time when adapting visualizations for different media.

✨ Visualization Best Practices

  • Label Everything Clearly: Axes, titles, and legends should leave no ambiguity about what data represents
  • Choose Appropriate Scales: Start bar chart axes at zero; use log scales for exponential data
  • Limit Color Use: Too many colors create confusion; use color purposefully to highlight important data
  • Consider Accessibility: Use colorblind-friendly palettes and sufficient contrast
  • Avoid Chart Junk: Remove unnecessary gridlines, borders, and decorative elements that distract from data
  • Match Plot to Data: Continuous data uses line plots; categorical data uses bar charts; distributions use histograms
  • Tell a Story: Every visualization should answer a specific question or support a particular insight

Saving and Exporting Your Visualizations

Creating great visualizations is only valuable if you can share them. Both libraries offer flexible export options:

# Create a sample plot
fig, ax = plt.subplots(figsize=(10, 6))
x = np.linspace(0, 10, 100)
ax.plot(x, np.sin(x), linewidth=2)
ax.set_title('Sample Visualization', fontsize=14, fontweight='bold')
ax.set_xlabel('X Values')
ax.set_ylabel('Y Values')
ax.grid(True, alpha=0.3)

# Save in different formats
plt.savefig('visualization.png', dpi=300, bbox_inches='tight')
plt.savefig('visualization.pdf', bbox_inches='tight')
plt.savefig('visualization.svg', bbox_inches='tight')

plt.show()

The dpi=300 parameter ensures print-quality resolution. PNG works for web use and presentations, PDF provides vector graphics for publications, and SVG allows further editing in design software. The bbox_inches='tight' parameter removes unnecessary whitespace, making your saved files cleaner and more professional.

Conclusion

Mastering data visualization with Matplotlib and Seaborn in Jupyter Notebook transforms you from someone who analyzes data to someone who communicates insights effectively. You’ve learned to create fundamental plot types, leverage Seaborn’s statistical visualizations, customize aesthetics, build complex multi-plot layouts, and export professional-quality graphics. These skills enable you to explore data interactively, discover patterns quickly, and present findings that drive decisions.

The journey from basic plots to compelling visualizations is iterative. Start with simple plots to understand your data, then refine them with customization as your analytical questions become more focused. Experiment with different plot types, color schemes, and layouts until your visualizations communicate insights clearly and immediately. Remember that the best visualization is the one that makes your data’s story impossible to misunderstand.

Leave a Comment