## 🚀 Quick Start
### Create Environment from Scratch
```bash
# Create with specific Python version
conda create -n myproject python=3.11
# Create with packages
conda create -n myproject python=3.11 numpy pandas matplotlib
# Activate environment
conda activate myproject
```
### Create from Environment File
```bash
# Create from environment.yml
conda env create -f environment.yml
# Create with custom name
conda env create -f environment.yml -n custom-name
```
## 📋 Understanding Environment Files
### What are Channels?
Channels are **package repositories** where conda looks for packages. Think of them as different "stores" for software packages.
```yaml
channels:
- conda-forge # 1st priority - check here first
- defaults # 2nd priority - check here if not found above
```
**Channel Priority (Top to Bottom):** When conda looks for a package like `pandas`, it will:
1. First check `conda-forge`
2. If not found (or if a newer version exists), check `defaults`
3. Install from whichever channel has the best match
**Common Channels:**
- **`conda-forge`**: Community-maintained, most packages, frequent updates (preferred for data science)
- **`defaults`**: Anaconda's official channel, more stable but sometimes older versions
- **`bioconda`**: Specialized for bioinformatics packages
- **`pytorch`**: Official PyTorch channel for deep learning
### Dependencies vs Pip Packages
The placement of packages matters for how they're installed and managed:
#### Dependencies Section (Conda Packages)
```yaml
dependencies:
- numpy
- pandas
- scikit-learn
```
**Installed via conda from specified channels:**
- ✅ **Binary packages** - pre-compiled, faster installation
- ✅ **Dependency resolution** - conda handles all dependencies automatically
- ✅ **Environment isolation** - better integration with conda's environment system
- ✅ **Platform optimized** - often optimized for your specific OS/architecture
#### Pip Section (PyPI Packages)
```yaml
dependencies:
- pip # Need this first!
- pip:
- streamlit
- wandb
- shap
```
**Installed via pip from PyPI:**
- ⚠️ **Source packages** - may need compilation during install
- ⚠️ **Separate dependency resolution** - pip handles dependencies independently
- ✅ **Broader package selection** - PyPI has more packages than conda channels
- ✅ **Latest versions** - often has newer versions than conda channels
### When to Use Which?
**Use Conda (`dependencies`) when:**
- Package is available on conda-forge or defaults
- You want the most stable, optimized version
- It's a core data science library (numpy, pandas, matplotlib)
- You need specific platform optimizations
**Use Pip (`pip:`) when:**
- Package only exists on PyPI
- You need the absolute latest version
- It's a newer/experimental package
- The conda version is significantly outdated
**Installation Order:**
1. Conda processes all conda packages first
2. Then installs pip packages
3. This is why `pip` itself needs to be in conda dependencies!
### Updating Environments
After adding new libraries to your `environment.yml`:
```bash
# Update existing environment (RECOMMENDED)
conda env update -f environment.yml --prune
# The --prune flag removes packages no longer in your file
```
**Other Update Options:**
```bash
# Update with specific name
conda env update -n your-env-name -f environment.yml --prune
# Nuclear option (complete rebuild)
conda env remove -n datascience-env
conda env create -f environment.yml
# Quick addition without editing file (but update your file too!)
conda install new-package-name
```
## 📋 Complete Data Science Template
```yaml
name: datascience-env
channels:
- conda-forge
- defaults
dependencies:
# Python
- python=3.11
# Core data science libraries
- numpy
- pandas
- scipy
- scikit-learn
# Visualization
- matplotlib
- seaborn
- plotly
- bokeh
# Jupyter ecosystem
- jupyter
- jupyterlab
- ipykernel
- ipywidgets
# Statistical analysis
- statsmodels
- pingouin
# Machine learning extras
- xgboost
- lightgbm
- catboost
# Deep learning (CPU version - uncomment GPU versions if needed)
- pytorch
- torchvision
- torchaudio
- tensorflow
# - pytorch-cuda=11.8 # Uncomment for GPU support
# - tensorflow-gpu # Uncomment for GPU support
# Data manipulation and I/O
- openpyxl
- xlrd
- h5py
- pytables
- sqlalchemy
- pymongo
# Web scraping and APIs
- requests
- beautifulsoup4
- selenium
# Image processing
- pillow
- opencv
- scikit-image
# Natural language processing
- nltk
- spacy
- textblob
# Development tools
- black
- flake8
- pytest
- mypy
# Utilities
- tqdm
- joblib
- dask
- numba
# AWS and Cloud
- boto3
- botocore
- s3fs
# Additional packages via pip
- pip
- pip:
# Web apps and dashboards
- streamlit
- dash
- gradio
# ML experiment tracking
- wandb
- mlflow
- optuna
# ML interpretability and EDA
- shap
- yellowbrick
- missingno
# AWS services (PyPI has more recent versions)
- awswrangler # AWS Data Wrangler for S3, Athena, Glue, etc.
- redshift-connector
- awscli
# Database connectors
- psycopg2-binary # PostgreSQL
- pymysql # MySQL
- snowflake-connector-python
```
## 🔧 Daily Environment Commands
### Basic Operations
```bash
# List all environments
conda env list
# Activate environment
conda activate myproject
# Deactivate current environment
conda deactivate
# Remove environment
conda env remove -n myproject
```
### Package Management
```bash
# Install packages in active environment
conda install package-name
conda install -c conda-forge package-name
# Install from pip
pip install package-name
# List installed packages
conda list
# Search for packages
conda search package-name
```
## 📝 Maintaining Environments
### Update Environment from File
```bash
# Update existing environment (RECOMMENDED)
conda env update -f environment.yml --prune
# The --prune flag removes packages not in the file
```
### Export Current Environment
```bash
# Export to file (exact versions)
conda env export > environment.yml
# Export without builds (more portable)
conda env export --no-builds > environment.yml
# Export only non-pip packages
conda env export --from-history > environment.yml
```
### Clone Environment
```bash
# Clone existing environment
conda create -n new-env --clone existing-env
```
## 🎯 Best Practices
### 1. One Environment Per Project
- **DO**: Create separate environments for each project
- **WHY**: Avoid dependency conflicts, easier to manage
### 2. Always Use Environment Files
- **DO**: Keep `environment.yml` in your project root
- **WHY**: Reproducible environments, easy collaboration
### 3. Channel Priority
```yaml
channels:
- conda-forge # Check here first
- defaults # Fallback option
```
### 4. Package Source Strategy
- **Use conda for**: Core libraries (numpy, pandas, matplotlib)
- **Use pip for**: Packages only on PyPI, latest versions
- **Never mix**: Don't install same package via both conda and pip
### 5. Version Pinning
```yaml
dependencies:
- python=3.11 # Pin major version
- numpy>=1.20 # Minimum version
- pandas=1.5.3 # Exact version (when needed)
```
## 🔄 Workflow Examples
### Starting a New Project
```bash
# 1. Create environment
conda create -n myproject python=3.11
# 2. Activate it
conda activate myproject
# 3. Install packages as needed
conda install numpy pandas matplotlib
# 4. Export to file
conda env export > environment.yml
# 5. Add to version control
git add environment.yml
```
### Collaborating on a Project
```bash
# 1. Clone repo
git clone project-repo
# 2. Create environment from file
conda env create -f environment.yml
# 3. Activate environment
conda activate project-name
# 4. Start working!
```
### Adding New Dependencies
```bash
# 1. Edit environment.yml (add new packages)
# 2. Update environment
conda env update -f environment.yml --prune
# 3. Commit changes
git add environment.yml
git commit -m "Add new dependencies"
```
## 🚨 Troubleshooting
### Environment Issues
```bash
# Environment conflicts
conda env remove -n myproject
conda env create -f environment.yml
# Package conflicts
conda install package-name --force-reinstall
# Clear conda cache
conda clean --all
```
### Common Problems
- **"Package not found"**: Check channel spelling, try conda-forge
- **"Dependency conflict"**: Pin specific versions or use pip
- **"Environment activation fails"**: Restart terminal, check conda init
## 🎨 Environment File Recipes
### Data Science Stack
```yaml
name: datascience
channels:
- conda-forge
dependencies:
- python=3.11
- numpy
- pandas
- matplotlib
- seaborn
- scikit-learn
- jupyter
- pip:
- streamlit
```
### Web Development
```yaml
name: webapp
channels:
- conda-forge
dependencies:
- python=3.11
- flask
- requests
- pip:
- fastapi
- uvicorn
```
### AWS Data Stack
```yaml
name: aws-datascience
channels:
- conda-forge
dependencies:
- python=3.11
- numpy
- pandas
- boto3
- s3fs
- jupyter
- pip:
- awswrangler
- redshift-connector
- awscli
```
## 💡 Pro Tips
1. **Use descriptive names**: `customer-analysis` not `project1`
2. **Keep environments small**: Only install what you need
3. **Regular cleanup**: Remove unused environments
4. **Document requirements**: Comment your environment.yml
5. **Version control**: Always commit environment files
6. **Test environments**: Verify after updates with `conda list`
## 📚 Quick Reference
|Command|Purpose|
|---|---|
|`conda env list`|List all environments|
|`conda activate name`|Switch to environment|
|`conda env create -f file.yml`|Create from file|
|`conda env update -f file.yml --prune`|Update from file|
|`conda env export > file.yml`|Export current env|
|`conda env remove -n name`|Delete environment|
|`conda install package`|Install package|
|`conda list`|Show installed packages|
---
_Keep this cheat sheet handy and you'll be a conda environment pro! 🐍_