Managing Python Dependencies for ML Projects

Machine learning projects fail more often from dependency conflicts than from model performance issues. A colleague’s training script crashes with cryptic NumPy errors. Your production deployment breaks because PyTorch installed a different CUDA version. A model that worked perfectly last month refuses to train after updating a single package. These scenarios plague ML teams daily because dependency management in machine learning is uniquely complex, involving tight coupling between frameworks, system libraries, and numerical precision requirements that general Python projects don’t encounter.

The challenge extends beyond simply tracking package versions. ML dependencies form intricate webs where PyTorch depends on specific NumPy versions, Transformers requires particular PyTorch builds, CUDA libraries must align with GPU drivers, and numerical computation libraries interact in ways that break reproducibility with subtle version mismatches. Understanding how to navigate this complexity—choosing the right dependency management approach, avoiding common pitfalls, and building resilient dependency specifications—separates ML projects that deploy reliably from those trapped in perpetual environment debugging.

The ML Dependency Challenge

Machine learning projects face dependency complexity that exceeds typical Python development in several critical ways.

The System Dependency Layer

ML frameworks require system-level dependencies that pip alone can’t manage. PyTorch and TensorFlow need CUDA libraries for GPU acceleration. These libraries have specific version requirements that change between framework versions. Installing PyTorch 2.0 might require CUDA 11.8, but PyTorch 2.1 works better with CUDA 12.1. JAX needs even more specific cuDNN versions.

The problem: System dependencies live outside Python package managers. You can install PyTorch with pip, but pip won’t install CUDA. Mismatched CUDA versions cause subtle bugs—training works but runs slowly, or numerical results differ from expected, or worse, everything seems fine until production where a different CUDA version exists.

Example failure scenario: Developer has CUDA 11.8 installed globally. They pip install torch which downloads PyTorch binaries compiled for CUDA 11.8. Works perfectly. Deployment server has CUDA 12.1. PyTorch tries using CUDA 12.1 libraries it wasn’t compiled for. Result: crashes, wrong computation results, or mysterious performance degradation.

Version Interdependencies

ML packages have tighter version coupling than typical software. NumPy is the foundational numerical computing library. PyTorch, TensorFlow, pandas, scikit-learn, and dozens of other packages depend on it. But they depend on specific NumPy versions or ranges.

Conflict scenario: PyTorch 2.0 works with NumPy 1.23-1.25. The transformers library requires NumPy >=1.22. Scikit-learn 1.3 needs NumPy >=1.21. This seems compatible—NumPy 1.24 satisfies all requirements. But then you add scipy which needs NumPy <1.24. Now no NumPy version satisfies all constraints.

The dependency resolver must navigate these constraints, and different tools (pip, conda, poetry) resolve them differently. Pip might install incompatible versions that technically meet stated requirements but break at runtime. Conda might refuse to install entirely. Poetry might downgrade multiple packages to find a compatible set.

Binary Distribution Complexity

ML packages ship compiled binaries with platform-specific builds. PyTorch for Linux with CUDA 11.8 is a different binary than PyTorch for macOS or PyTorch with CUDA 12.1. TensorFlow has separate GPU and CPU builds. These binaries aren’t interchangeable.

Package index challenges: PyPI (the standard Python package index) hosts many PyTorch versions, but specialized builds live on PyTorch’s custom index. Installing from the wrong index gets you the wrong binary. Your requirements.txt might specify torch==2.1.0, but without specifying the index and platform, you might get CPU-only PyTorch when you need GPU support.

Requirements.txt: Understanding the Baseline

The simplest dependency specification has significant limitations for ML projects but remains widely used.

What Requirements.txt Does Well

Basic version pinning is straightforward:

torch==2.1.0
transformers==4.35.2
numpy==1.24.3
scikit-learn==1.3.1
pandas==2.1.1

torch==2.1.0
transformers==4.35.2
numpy==1.24.3
scikit-learn==1.3.1
pandas==2.1.1

This specification works when:

All packages are available on PyPI
You’re okay with the default PyPI builds
Platform differences don’t matter
System dependencies are handled separately

Generating requirements is simple:

pip freeze > requirements.txt

pip freeze > requirements.txt

This captures everything currently installed, including transitive dependencies with exact versions.

Where Requirements.txt Falls Short

No dependency resolution. Pip’s resolver has improved but still allows installing conflicting versions. If you specify numpy>=1.20 and another package requires numpy<1.24, pip installs NumPy 1.25 (the latest) and ignores the conflict until runtime errors appear.

No platform specification. Requirements.txt doesn’t indicate “this needs Linux” or “this is for CUDA 11.8”. The same file used on different platforms installs different binaries, causing subtle incompatibilities.

No environment isolation from requirements alone. The file lists packages but doesn’t create the environment. You need virtual environments separately.

Transitive dependency drift. Say you specify torch==2.1.0 without pinning its dependencies. Today, pip installs NumPy 1.24.3 (current latest compatible version). Next month, NumPy 1.26 releases. New installation gets NumPy 1.26, which might have breaking changes. Your requirements file is “the same” but installs differently.

Making Requirements.txt Better

Pin everything:

pip freeze > requirements-frozen.txt

pip freeze > requirements-frozen.txt

This captures exact versions of all packages, including transitive dependencies. More verbose but much more reliable.

Specify platform and Python version in comments:

# Python 3.10
# Platform: linux_x86_64
# CUDA: 11.8

torch==2.1.0+cu118
torchvision==0.16.0+cu118
numpy==1.24.3

# Python 3.10
# Platform: linux_x86_64
# CUDA: 11.8

torch==2.1.0+cu118
torchvision==0.16.0+cu118
numpy==1.24.3

The +cu118 suffix specifies the CUDA version for PyTorch.

Use constraints files for complex scenarios:

pip install -r requirements.txt -c constraints.txt

pip install -r requirements.txt -c constraints.txt

Constraints file limits versions without requiring them, useful for enforcing compatible versions across multiple requirements files.

Dependency Management Tool Comparison

pip + venv

Strengths: Universal, simple, fast, minimal learning curve
Weaknesses: Manual resolution, no system deps, limited conflict detection
Best for: Quick prototypes, small projects, experienced users

Conda

Strengths: Handles CUDA/system deps, excellent resolution, multi-language
Weaknesses: Slow, complex, large environments, cross-platform limitations
Best for: GPU projects, teams, complex dependencies

Poetry

Strengths: Lock files, excellent resolution, modern workflow, reproducible
Weaknesses: No system deps, slower than pip, learning curve
Best for: Production code, packages, teams wanting reproducibility

pipenv

Strengths: Lock files, integrated venv, automatic activation
Weaknesses: Slower than pip, less ML-focused, smaller community
Best for: General Python projects transitioning to ML

Conda: The ML Standard

Conda dominates ML dependency management for good reasons, despite its drawbacks.

Why Conda Excels for ML

Conda packages system dependencies alongside Python packages. When you install PyTorch with conda, it installs CUDA libraries too—automatically matched to compatible versions. This eliminates the most common source of ML dependency problems.

Example conda installation:

conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia

conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia

This single command installs PyTorch, torchvision, CUDA 11.8 libraries, and all dependencies with compatible versions. The conda resolver ensures everything works together.

Binary compatibility is handled transparently. Conda builds include all necessary compiled libraries, avoiding issues where PyPI packages expect system libraries that don’t exist.

Creating Conda Environments

Environment specification (environment.yml):

name: ml-project
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pytorch=2.1.0
  - pytorch-cuda=11.8
  - torchvision=0.16.0
  - numpy=1.24.3
  - pandas=2.1.1
  - scikit-learn=1.3.1
  - jupyter
  - pip:
      - transformers==4.35.2
      - datasets==2.14.6

name: ml-project
channels:
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - python=3.10
  - pytorch=2.1.0
  - pytorch-cuda=11.8
  - torchvision=0.16.0
  - numpy=1.24.3
  - pandas=2.1.1
  - scikit-learn=1.3.1
  - jupyter
  - pip:
      - transformers==4.35.2
      - datasets==2.14.6

Creating the environment:

conda env create -f environment.yml
conda activate ml-project

conda env create -f environment.yml
conda activate ml-project

Key elements:

Channels: Package sources, ordered by priority
Python version: Explicitly specified
CUDA version: Managed as a package (pytorch-cuda)
Pip section: For packages not available in conda

Conda’s Limitations

Slow dependency resolution is conda’s notorious weakness. Creating or updating environments can take 5-20 minutes for complex dependency trees. The SAT solver that ensures compatibility explores enormous solution spaces.

Mamba solves this by reimplementing conda’s resolver in C++:

# Install mamba
conda install mamba -c conda-forge

# Use mamba instead of conda (10-100x faster)
mamba env create -f environment.yml
mamba install numpy pandas

# Install mamba
conda install mamba -c conda-forge

# Use mamba instead of conda (10-100x faster)
mamba env create -f environment.yml
mamba install numpy pandas

Mamba is drop-in compatible—same commands, same environment files, just faster.

Environment size can balloon. Conda environments commonly reach 5-10GB because they include entire CUDA toolchains and duplicate system libraries. This consumes disk space and slows environment creation.

Cross-platform reproducibility is imperfect. An environment.yml created on Linux might not recreate identically on macOS or Windows because platform-specific packages differ. Export with --from-history to capture only explicitly installed packages, improving cross-platform compatibility at the cost of less precise version locking.

Poetry: Modern Dependency Management

Poetry brings JavaScript/Rust-style dependency management to Python, with strong benefits for ML projects despite CUDA limitations.

Poetry’s Core Advantages

Lock files guarantee reproducibility. Poetry generates poetry.lock containing exact versions of all dependencies with cryptographic hashes. Two people running poetry install get identical environments.

Dependency resolution is sophisticated. Poetry’s resolver exhaustively searches for compatible versions, refusing to install if conflicts exist. This prevents broken environments that “kind of work” but fail mysteriously.

Project structure is standardized. Poetry manages package metadata, builds, and publishing alongside dependencies. For projects becoming packages (MLOps tools, model serving libraries), this integration is valuable.

Setting Up Poetry for ML

pyproject.toml for ML project:

[tool.poetry]
name = "ml-pipeline"
version = "0.1.0"
description = "ML training pipeline"
python = "^3.10"

[tool.poetry.dependencies]
python = "^3.10"
torch = {version = "^2.1.0", source = "pytorch"}
transformers = "^4.35.0"
numpy = "^1.24.0"
scikit-learn = "^1.3.0"
pandas = "^2.1.0"

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.0"
black = "^23.10.0"
jupyter = "^1.0.0"

[tool.poetry.group.gpu.dependencies]
# GPU-specific dependencies
nvidia-ml-py = "^12.535.0"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu118"
priority = "supplemental"

[tool.poetry]
name = "ml-pipeline"
version = "0.1.0"
description = "ML training pipeline"
python = "^3.10"

[tool.poetry.dependencies]
python = "^3.10"
torch = {version = "^2.1.0", source = "pytorch"}
transformers = "^4.35.0"
numpy = "^1.24.0"
scikit-learn = "^1.3.0"
pandas = "^2.1.0"

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.0"
black = "^23.10.0"
jupyter = "^1.0.0"

[tool.poetry.group.gpu.dependencies]
# GPU-specific dependencies
nvidia-ml-py = "^12.535.0"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu118"
priority = "supplemental"

Using custom package indexes enables installing CUDA-specific PyTorch builds. The source field tells Poetry where to find PyTorch wheels.

Installing the project:

poetry install  # Install all dependencies
poetry install --with dev  # Include dev dependencies
poetry install --with gpu  # Include GPU dependencies

poetry install  # Install all dependencies
poetry install --with dev  # Include dev dependencies
poetry install --with gpu  # Include GPU dependencies

Poetry’s ML Limitations

No system dependency management. Poetry handles Python packages exclusively. CUDA, cuDNN, and system libraries must be installed separately—via conda, system package managers, or Docker.

Slower than pip for large ML packages. Installing multi-GB PyTorch or TensorFlow packages takes longer through Poetry’s dependency resolution process.

Workaround for CUDA: Use Poetry for Python dependencies, Docker for system dependencies:

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

# Install Poetry
RUN pip install poetry

# Copy project files
COPY pyproject.toml poetry.lock ./

# Install dependencies
RUN poetry install --no-root

# Copy application code
COPY . .

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

# Install Poetry
RUN pip install poetry

# Copy project files
COPY pyproject.toml poetry.lock ./

# Install dependencies
RUN poetry install --no-root

# Copy application code
COPY . .

This combines Poetry’s Python dependency management with Docker’s system dependency handling.

Handling CUDA and GPU Dependencies

GPU dependencies deserve special attention as they cause the most ML-specific dependency issues.

CUDA Version Management

PyTorch CUDA variants:

# CPU only
pip install torch torchvision

# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# CPU only
pip install torch torchvision

# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

Different CUDA versions are completely different packages. They can’t coexist in the same environment.

Conda handles this elegantly:

# Specify CUDA version as a package
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

# Specify CUDA version as a package
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

Conda installs CUDA libraries alongside PyTorch, ensuring compatibility.

Detecting and Validating GPU Setup

After installation, verify GPU access:

import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
    print("WARNING: CUDA not available!")

import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
    print("WARNING: CUDA not available!")

Add this to project setup scripts to catch GPU configuration issues early.

Multiple CUDA Versions

Projects using different CUDA versions require separate conda environments:

# Environment for CUDA 11.8 projects
conda create -n ml-cuda118 python=3.10
conda activate ml-cuda118
conda install pytorch pytorch-cuda=11.8 -c pytorch

# Environment for CUDA 12.1 projects
conda create -n ml-cuda121 python=3.10
conda activate ml-cuda121
conda install pytorch pytorch-cuda=12.1 -c pytorch

# Environment for CUDA 11.8 projects
conda create -n ml-cuda118 python=3.10
conda activate ml-cuda118
conda install pytorch pytorch-cuda=11.8 -c pytorch

# Environment for CUDA 12.1 projects
conda create -n ml-cuda121 python=3.10
conda activate ml-cuda121
conda install pytorch pytorch-cuda=12.1 -c pytorch

Never try mixing CUDA versions in one environment. It won’t work.

Common Dependency Problems and Solutions

Problem: “No module named ‘torch'”

Cause: Wrong environment activated or torch not installed
Solution: Verify environment with `which python` and reinstall torch

Problem: “CUDA out of memory”

Cause: GPU VRAM exhausted by model/batch size
Solution: Reduce batch size, use gradient accumulation, or smaller model

Problem: NumPy/PyTorch version conflicts

Cause: Incompatible versions installed by different packages
Solution: Use pip check, conda list, or poetry show to find conflicts. Reinstall with compatible versions.

Problem: Different results on different machines

Cause: Different package versions or CUDA versions
Solution: Pin all versions with lock files, document CUDA version, set random seeds

Best Practices for ML Dependency Management

Proven practices prevent most dependency issues before they occur.

Version Pinning Strategy

Pin major ML frameworks exactly:

torch==2.1.0
tensorflow==2.14.0
transformers==4.35.2

torch==2.1.0
tensorflow==2.14.0
transformers==4.35.2

Allow minor version flexibility for utilities:

pandas>=2.1.0,<3.0.0
numpy>=1.24.0,<2.0.0

pandas>=2.1.0,<3.0.0
numpy>=1.24.0,<2.0.0

This balances reproducibility (exact ML framework versions) with flexibility (utilities can update for bug fixes).

Always pin in production:

# Production: pin everything
torch==2.1.0
pandas==2.1.1
numpy==1.24.3

# Development: some flexibility acceptable
torch==2.1.0
pandas>=2.1.0,<3.0.0
numpy>=1.24.0,<2.0.0

# Production: pin everything
torch==2.1.0
pandas==2.1.1
numpy==1.24.3

# Development: some flexibility acceptable
torch==2.1.0
pandas>=2.1.0,<3.0.0
numpy>=1.24.0,<2.0.0

Separate Development and Production Dependencies

Development dependencies (testing, debugging, notebooks) shouldn’t pollute production:

With Poetry:

[tool.poetry.dependencies]
torch = "^2.1.0"
numpy = "^1.24.0"

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.0"
jupyter = "^1.0.0"
black = "^23.10.0"

[tool.poetry.dependencies]
torch = "^2.1.0"
numpy = "^1.24.0"

[tool.poetry.group.dev.dependencies]
pytest = "^7.4.0"
jupyter = "^1.0.0"
black = "^23.10.0"

With requirements files:

requirements.txt        # Production dependencies
requirements-dev.txt    # Development additions

requirements.txt        # Production dependencies
requirements-dev.txt    # Development additions

Install with:

# Production
pip install -r requirements.txt

# Development
pip install -r requirements.txt -r requirements-dev.txt

# Production
pip install -r requirements.txt

# Development
pip install -r requirements.txt -r requirements-dev.txt

Dependency Update Strategy

Never update all dependencies simultaneously. Update systematically:

Update one package at a time
Run full test suite after each update
Document any behavioral changes
Commit each update separately

Check for security vulnerabilities:

pip install safety
safety check

# Or with poetry
poetry export -f requirements.txt | safety check --stdin

pip install safety
safety check

# Or with poetry
poetry export -f requirements.txt | safety check --stdin

Use dependabot or similar for automated dependency update PRs in CI/CD.

Documentation Requirements

Document dependency choices in README:

## Dependencies

**Python:** 3.10 (required, 3.11 not yet supported)
**CUDA:** 11.8 (for GPU support)

**Key packages:**
- PyTorch 2.1.0 (training framework)
- Transformers 4.35.2 (model library)
- NumPy 1.24.3 (pinned for reproducibility)

**Installation:**
```bash
# GPU (recommended)
conda env create -f environment.yml

# CPU only
pip install -r requirements-cpu.txt

## Dependencies

**Python:** 3.10 (required, 3.11 not yet supported)
**CUDA:** 11.8 (for GPU support)

**Key packages:**
- PyTorch 2.1.0 (training framework)
- Transformers 4.35.2 (model library)
- NumPy 1.24.3 (pinned for reproducibility)

**Installation:**
```bash
# GPU (recommended)
conda env create -f environment.yml

# CPU only
pip install -r requirements-cpu.txt

This helps new team members understand requirements and make informed setup decisions.

Troubleshooting Dependency Issues

When problems arise, systematic debugging resolves them faster than trial-and-error.

Diagnostic Commands

Check installed versions:

pip list  # pip-managed packages
conda list  # conda-managed packages
poetry show  # poetry-managed packages

pip list  # pip-managed packages
conda list  # conda-managed packages
poetry show  # poetry-managed packages

Detect conflicts:

pip check  # Find incompatible packages
conda list --explicit  # Show exact package specs

pip check  # Find incompatible packages
conda list --explicit  # Show exact package specs

Verify package sources:

pip show torch  # Shows installation location and version

pip show torch  # Shows installation location and version

Resolving Conflicts

When pip check reports conflicts:

Identify the conflicting packages
Check their version requirements
Find compatible versions
Reinstall in the right order:

# Remove conflicting packages
pip uninstall numpy torch pandas

# Reinstall in dependency order
pip install numpy==1.24.3
pip install torch==2.1.0
pip install pandas==2.1.1

# Remove conflicting packages
pip uninstall numpy torch pandas

# Reinstall in dependency order
pip install numpy==1.24.3
pip install torch==2.1.0
pip install pandas==2.1.1

For conda conflicts:

# Let conda resolve
conda install package_a package_b package_c

# If conda can't resolve, try mamba
mamba install package_a package_b package_c

# Let conda resolve
conda install package_a package_b package_c

# If conda can't resolve, try mamba
mamba install package_a package_b package_c

Clean Installation

When conflicts are intractable, rebuild the environment:

# Conda
conda deactivate
conda env remove -n ml-project
conda env create -f environment.yml

# venv
deactivate
rm -rf venv
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Poetry
poetry env remove python3.10
poetry install

# Conda
conda deactivate
conda env remove -n ml-project
conda env create -f environment.yml

# venv
deactivate
rm -rf venv
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Poetry
poetry env remove python3.10
poetry install

Fresh environments eliminate hidden state causing conflicts.

Conclusion

Managing Python dependencies for ML projects requires understanding that standard Python practices are insufficient for the unique challenges of deep learning frameworks, GPU libraries, and numerical computing dependencies. The choice between pip, conda, and poetry depends on project needs: conda excels for GPU work and teams, poetry provides superior reproducibility for production deployments, while pip remains viable for simple projects with careful version management. Success requires explicit version pinning, systematic update strategies, comprehensive documentation, and recognizing that dependency management is critical infrastructure deserving the same attention as model architecture.

The investment in robust dependency management—whether through conda environments with pinned versions, poetry lock files, or Docker containers—prevents the countless hours wasted debugging environment issues that plague poorly-managed ML projects. Start with clear dependency specifications, maintain them rigorously through project evolution, and leverage tools appropriate to your complexity level. The goal is spending time on machine learning, not on resolving dependency conflicts or debugging environment-specific failures. Build dependency management practices that scale with your project, and you’ll avoid the dependency hell that derails so many ML initiatives.