Machine learning projects fail more often from dependency conflicts than from model performance issues. A colleague’s training script crashes with cryptic NumPy errors. Your production deployment breaks because PyTorch installed a different CUDA version. A model that worked perfectly last month refuses to train after updating a single package. These scenarios plague ML teams daily because dependency management in machine learning is uniquely complex, involving tight coupling between frameworks, system libraries, and numerical precision requirements that general Python projects don’t encounter.
The challenge extends beyond simply tracking package versions. ML dependencies form intricate webs where PyTorch depends on specific NumPy versions, Transformers requires particular PyTorch builds, CUDA libraries must align with GPU drivers, and numerical computation libraries interact in ways that break reproducibility with subtle version mismatches. Understanding how to navigate this complexity—choosing the right dependency management approach, avoiding common pitfalls, and building resilient dependency specifications—separates ML projects that deploy reliably from those trapped in perpetual environment debugging.
The ML Dependency Challenge
Machine learning projects face dependency complexity that exceeds typical Python development in several critical ways.
The System Dependency Layer
ML frameworks require system-level dependencies that pip alone can’t manage. PyTorch and TensorFlow need CUDA libraries for GPU acceleration. These libraries have specific version requirements that change between framework versions. Installing PyTorch 2.0 might require CUDA 11.8, but PyTorch 2.1 works better with CUDA 12.1. JAX needs even more specific cuDNN versions.
The problem: System dependencies live outside Python package managers. You can install PyTorch with pip, but pip won’t install CUDA. Mismatched CUDA versions cause subtle bugs—training works but runs slowly, or numerical results differ from expected, or worse, everything seems fine until production where a different CUDA version exists.
Example failure scenario: Developer has CUDA 11.8 installed globally. They pip install torch which downloads PyTorch binaries compiled for CUDA 11.8. Works perfectly. Deployment server has CUDA 12.1. PyTorch tries using CUDA 12.1 libraries it wasn’t compiled for. Result: crashes, wrong computation results, or mysterious performance degradation.
Version Interdependencies
ML packages have tighter version coupling than typical software. NumPy is the foundational numerical computing library. PyTorch, TensorFlow, pandas, scikit-learn, and dozens of other packages depend on it. But they depend on specific NumPy versions or ranges.
Conflict scenario: PyTorch 2.0 works with NumPy 1.23-1.25. The transformers library requires NumPy >=1.22. Scikit-learn 1.3 needs NumPy >=1.21. This seems compatible—NumPy 1.24 satisfies all requirements. But then you add scipy which needs NumPy <1.24. Now no NumPy version satisfies all constraints.
The dependency resolver must navigate these constraints, and different tools (pip, conda, poetry) resolve them differently. Pip might install incompatible versions that technically meet stated requirements but break at runtime. Conda might refuse to install entirely. Poetry might downgrade multiple packages to find a compatible set.
Binary Distribution Complexity
ML packages ship compiled binaries with platform-specific builds. PyTorch for Linux with CUDA 11.8 is a different binary than PyTorch for macOS or PyTorch with CUDA 12.1. TensorFlow has separate GPU and CPU builds. These binaries aren’t interchangeable.
Package index challenges: PyPI (the standard Python package index) hosts many PyTorch versions, but specialized builds live on PyTorch’s custom index. Installing from the wrong index gets you the wrong binary. Your requirements.txt might specify torch==2.1.0, but without specifying the index and platform, you might get CPU-only PyTorch when you need GPU support.
Requirements.txt: Understanding the Baseline
The simplest dependency specification has significant limitations for ML projects but remains widely used.
What Requirements.txt Does Well
Basic version pinning is straightforward:
torch==2.1.0
transformers==4.35.2
numpy==1.24.3
scikit-learn==1.3.1
pandas==2.1.1
This specification works when:
- All packages are available on PyPI
- You’re okay with the default PyPI builds
- Platform differences don’t matter
- System dependencies are handled separately
Generating requirements is simple:
pip freeze > requirements.txt
This captures everything currently installed, including transitive dependencies with exact versions.
Where Requirements.txt Falls Short
No dependency resolution. Pip’s resolver has improved but still allows installing conflicting versions. If you specify numpy>=1.20 and another package requires numpy<1.24, pip installs NumPy 1.25 (the latest) and ignores the conflict until runtime errors appear.
No platform specification. Requirements.txt doesn’t indicate “this needs Linux” or “this is for CUDA 11.8”. The same file used on different platforms installs different binaries, causing subtle incompatibilities.
No environment isolation from requirements alone. The file lists packages but doesn’t create the environment. You need virtual environments separately.
Transitive dependency drift. Say you specify torch==2.1.0 without pinning its dependencies. Today, pip installs NumPy 1.24.3 (current latest compatible version). Next month, NumPy 1.26 releases. New installation gets NumPy 1.26, which might have breaking changes. Your requirements file is “the same” but installs differently.
Making Requirements.txt Better
Pin everything:
pip freeze > requirements-frozen.txt
This captures exact versions of all packages, including transitive dependencies. More verbose but much more reliable.
Specify platform and Python version in comments:
# Python 3.10
# Platform: linux_x86_64
# CUDA: 11.8
torch==2.1.0+cu118
torchvision==0.16.0+cu118
numpy==1.24.3
The +cu118 suffix specifies the CUDA version for PyTorch.
Use constraints files for complex scenarios:
pip install -r requirements.txt -c constraints.txt
Constraints file limits versions without requiring them, useful for enforcing compatible versions across multiple requirements files.
Dependency Management Tool Comparison
Weaknesses: Manual resolution, no system deps, limited conflict detection
Best for: Quick prototypes, small projects, experienced users
Weaknesses: Slow, complex, large environments, cross-platform limitations
Best for: GPU projects, teams, complex dependencies
Weaknesses: No system deps, slower than pip, learning curve
Best for: Production code, packages, teams wanting reproducibility
Weaknesses: Slower than pip, less ML-focused, smaller community
Best for: General Python projects transitioning to ML
Conda: The ML Standard
Conda dominates ML dependency management for good reasons, despite its drawbacks.
Why Conda Excels for ML
Conda packages system dependencies alongside Python packages. When you install PyTorch with conda, it installs CUDA libraries too—automatically matched to compatible versions. This eliminates the most common source of ML dependency problems.
Example conda installation:
conda install pytorch torchvision pytorch-cuda=11.8 -c pytorch -c nvidia
This single command installs PyTorch, torchvision, CUDA 11.8 libraries, and all dependencies with compatible versions. The conda resolver ensures everything works together.
Binary compatibility is handled transparently. Conda builds include all necessary compiled libraries, avoiding issues where PyPI packages expect system libraries that don’t exist.
Creating Conda Environments
Environment specification (environment.yml):
name: ml-project
channels:
- pytorch
- nvidia
- conda-forge
- defaults
dependencies:
- python=3.10
- pytorch=2.1.0
- pytorch-cuda=11.8
- torchvision=0.16.0
- numpy=1.24.3
- pandas=2.1.1
- scikit-learn=1.3.1
- jupyter
- pip:
- transformers==4.35.2
- datasets==2.14.6
Creating the environment:
conda env create -f environment.yml
conda activate ml-project
Key elements:
- Channels: Package sources, ordered by priority
- Python version: Explicitly specified
- CUDA version: Managed as a package (pytorch-cuda)
- Pip section: For packages not available in conda
Conda’s Limitations
Slow dependency resolution is conda’s notorious weakness. Creating or updating environments can take 5-20 minutes for complex dependency trees. The SAT solver that ensures compatibility explores enormous solution spaces.
Mamba solves this by reimplementing conda’s resolver in C++:
# Install mamba
conda install mamba -c conda-forge
# Use mamba instead of conda (10-100x faster)
mamba env create -f environment.yml
mamba install numpy pandas
Mamba is drop-in compatible—same commands, same environment files, just faster.
Environment size can balloon. Conda environments commonly reach 5-10GB because they include entire CUDA toolchains and duplicate system libraries. This consumes disk space and slows environment creation.
Cross-platform reproducibility is imperfect. An environment.yml created on Linux might not recreate identically on macOS or Windows because platform-specific packages differ. Export with --from-history to capture only explicitly installed packages, improving cross-platform compatibility at the cost of less precise version locking.
Poetry: Modern Dependency Management
Poetry brings JavaScript/Rust-style dependency management to Python, with strong benefits for ML projects despite CUDA limitations.
Poetry’s Core Advantages
Lock files guarantee reproducibility. Poetry generates poetry.lock containing exact versions of all dependencies with cryptographic hashes. Two people running poetry install get identical environments.
Dependency resolution is sophisticated. Poetry’s resolver exhaustively searches for compatible versions, refusing to install if conflicts exist. This prevents broken environments that “kind of work” but fail mysteriously.
Project structure is standardized. Poetry manages package metadata, builds, and publishing alongside dependencies. For projects becoming packages (MLOps tools, model serving libraries), this integration is valuable.
Setting Up Poetry for ML
pyproject.toml for ML project:
[tool.poetry]
name = "ml-pipeline"
version = "0.1.0"
description = "ML training pipeline"
python = "^3.10"
[tool.poetry.dependencies]
python = "^3.10"
torch = {version = "^2.1.0", source = "pytorch"}
transformers = "^4.35.0"
numpy = "^1.24.0"
scikit-learn = "^1.3.0"
pandas = "^2.1.0"
[tool.poetry.group.dev.dependencies]
pytest = "^7.4.0"
black = "^23.10.0"
jupyter = "^1.0.0"
[tool.poetry.group.gpu.dependencies]
# GPU-specific dependencies
nvidia-ml-py = "^12.535.0"
[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu118"
priority = "supplemental"
Using custom package indexes enables installing CUDA-specific PyTorch builds. The source field tells Poetry where to find PyTorch wheels.
Installing the project:
poetry install # Install all dependencies
poetry install --with dev # Include dev dependencies
poetry install --with gpu # Include GPU dependencies
Poetry’s ML Limitations
No system dependency management. Poetry handles Python packages exclusively. CUDA, cuDNN, and system libraries must be installed separately—via conda, system package managers, or Docker.
Slower than pip for large ML packages. Installing multi-GB PyTorch or TensorFlow packages takes longer through Poetry’s dependency resolution process.
Workaround for CUDA: Use Poetry for Python dependencies, Docker for system dependencies:
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
# Install Poetry
RUN pip install poetry
# Copy project files
COPY pyproject.toml poetry.lock ./
# Install dependencies
RUN poetry install --no-root
# Copy application code
COPY . .
This combines Poetry’s Python dependency management with Docker’s system dependency handling.
Handling CUDA and GPU Dependencies
GPU dependencies deserve special attention as they cause the most ML-specific dependency issues.
CUDA Version Management
PyTorch CUDA variants:
# CPU only
pip install torch torchvision
# CUDA 11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
Different CUDA versions are completely different packages. They can’t coexist in the same environment.
Conda handles this elegantly:
# Specify CUDA version as a package
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia
Conda installs CUDA libraries alongside PyTorch, ensuring compatibility.
Detecting and Validating GPU Setup
After installation, verify GPU access:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
print(f"GPU count: {torch.cuda.device_count()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
else:
print("WARNING: CUDA not available!")
Add this to project setup scripts to catch GPU configuration issues early.
Multiple CUDA Versions
Projects using different CUDA versions require separate conda environments:
# Environment for CUDA 11.8 projects
conda create -n ml-cuda118 python=3.10
conda activate ml-cuda118
conda install pytorch pytorch-cuda=11.8 -c pytorch
# Environment for CUDA 12.1 projects
conda create -n ml-cuda121 python=3.10
conda activate ml-cuda121
conda install pytorch pytorch-cuda=12.1 -c pytorch
Never try mixing CUDA versions in one environment. It won’t work.
Common Dependency Problems and Solutions
Solution: Verify environment with `which python` and reinstall torch
Solution: Reduce batch size, use gradient accumulation, or smaller model
Solution: Use pip check, conda list, or poetry show to find conflicts. Reinstall with compatible versions.
Solution: Pin all versions with lock files, document CUDA version, set random seeds
Best Practices for ML Dependency Management
Proven practices prevent most dependency issues before they occur.
Version Pinning Strategy
Pin major ML frameworks exactly:
torch==2.1.0
tensorflow==2.14.0
transformers==4.35.2
Allow minor version flexibility for utilities:
pandas>=2.1.0,<3.0.0
numpy>=1.24.0,<2.0.0
This balances reproducibility (exact ML framework versions) with flexibility (utilities can update for bug fixes).
Always pin in production:
# Production: pin everything
torch==2.1.0
pandas==2.1.1
numpy==1.24.3
# Development: some flexibility acceptable
torch==2.1.0
pandas>=2.1.0,<3.0.0
numpy>=1.24.0,<2.0.0
Separate Development and Production Dependencies
Development dependencies (testing, debugging, notebooks) shouldn’t pollute production:
With Poetry:
[tool.poetry.dependencies]
torch = "^2.1.0"
numpy = "^1.24.0"
[tool.poetry.group.dev.dependencies]
pytest = "^7.4.0"
jupyter = "^1.0.0"
black = "^23.10.0"With requirements files:
requirements.txt # Production dependencies
requirements-dev.txt # Development additions
Install with:
# Production
pip install -r requirements.txt
# Development
pip install -r requirements.txt -r requirements-dev.txt
Dependency Update Strategy
Never update all dependencies simultaneously. Update systematically:
- Update one package at a time
- Run full test suite after each update
- Document any behavioral changes
- Commit each update separately
Check for security vulnerabilities:
pip install safety
safety check
# Or with poetry
poetry export -f requirements.txt | safety check --stdin
Use dependabot or similar for automated dependency update PRs in CI/CD.
Documentation Requirements
Document dependency choices in README:
## Dependencies
**Python:** 3.10 (required, 3.11 not yet supported)
**CUDA:** 11.8 (for GPU support)
**Key packages:**
- PyTorch 2.1.0 (training framework)
- Transformers 4.35.2 (model library)
- NumPy 1.24.3 (pinned for reproducibility)
**Installation:**
```bash
# GPU (recommended)
conda env create -f environment.yml
# CPU only
pip install -r requirements-cpu.txt
This helps new team members understand requirements and make informed setup decisions.
Troubleshooting Dependency Issues
When problems arise, systematic debugging resolves them faster than trial-and-error.
Diagnostic Commands
Check installed versions:
pip list # pip-managed packages
conda list # conda-managed packages
poetry show # poetry-managed packages
Detect conflicts:
pip check # Find incompatible packages
conda list --explicit # Show exact package specs
Verify package sources:
pip show torch # Shows installation location and version
Resolving Conflicts
When pip check reports conflicts:
- Identify the conflicting packages
- Check their version requirements
- Find compatible versions
- Reinstall in the right order:
# Remove conflicting packages
pip uninstall numpy torch pandas
# Reinstall in dependency order
pip install numpy==1.24.3
pip install torch==2.1.0
pip install pandas==2.1.1
For conda conflicts:
# Let conda resolve
conda install package_a package_b package_c
# If conda can't resolve, try mamba
mamba install package_a package_b package_c
Clean Installation
When conflicts are intractable, rebuild the environment:
# Conda
conda deactivate
conda env remove -n ml-project
conda env create -f environment.yml
# venv
deactivate
rm -rf venv
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Poetry
poetry env remove python3.10
poetry install
Fresh environments eliminate hidden state causing conflicts.
Conclusion
Managing Python dependencies for ML projects requires understanding that standard Python practices are insufficient for the unique challenges of deep learning frameworks, GPU libraries, and numerical computing dependencies. The choice between pip, conda, and poetry depends on project needs: conda excels for GPU work and teams, poetry provides superior reproducibility for production deployments, while pip remains viable for simple projects with careful version management. Success requires explicit version pinning, systematic update strategies, comprehensive documentation, and recognizing that dependency management is critical infrastructure deserving the same attention as model architecture.
The investment in robust dependency management—whether through conda environments with pinned versions, poetry lock files, or Docker containers—prevents the countless hours wasted debugging environment issues that plague poorly-managed ML projects. Start with clear dependency specifications, maintain them rigorously through project evolution, and leverage tools appropriate to your complexity level. The goal is spending time on machine learning, not on resolving dependency conflicts or debugging environment-specific failures. Build dependency management practices that scale with your project, and you’ll avoid the dependency hell that derails so many ML initiatives.