Installation Guide

This guide provides detailed instructions for installing highentDCA on your system.

Prerequisites

Before installing highentDCA, ensure you have the following prerequisites:

System Requirements

Operating System: Linux, macOS, or Windows with WSL2
Python: Version 3.10 or higher
GPU (recommended): NVIDIA GPU with CUDA support for optimal performance
Memory: At least 8GB RAM (16GB+ recommended for large datasets)

Python Environment

We strongly recommend using a virtual environment to avoid dependency conflicts:

# Using conda (recommended)
conda create -n highentdca python=3.10
conda activate highentdca

# Or using venv
python -m venv highentdca_env
source highentdca_env/bin/activate  # On Windows: highentdca_env\Scripts\activate

Installation Methods

Method 1: Install from Source (Recommended)

This is currently the only installation method since the package is not yet on PyPI.

Clone the repository:

git clone https://github.com/robertonetti/highentropyDCA.git
cd highentropyDCA

Install the package:

pip install .

This will automatically install all required dependencies.

Verify installation:

highentDCA --help

You should see the help message with available commands.

Method 2: Development Installation

If you plan to modify the code or contribute to the project:

git clone https://github.com/robertonetti/highentropyDCA.git
cd highentropyDCA
pip install -e .

The -e flag installs the package in "editable" mode, so changes to the source code are immediately reflected without reinstalling.

Dependencies

highentDCA requires the following packages, which are installed automatically:

Core Dependencies

adabmDCA (== 0.5.0): Base DCA framework with core algorithms
PyTorch (>= 2.1.0): Deep learning framework for GPU acceleration
NumPy (>= 1.26.4): Numerical computing library
Pandas (>= 2.2.2): Data manipulation and analysis

Additional Dependencies

Matplotlib (>= 3.8.0): Plotting and visualization
tqdm (>= 4.66.6): Progress bars for long-running operations
BioPython (>= 1.85): Biological sequence file handling
wandb (>= 0.12.0): Experiment tracking (optional, for --wandb flag)

GPU Support

CUDA Installation

For optimal performance, install PyTorch with CUDA support:

Check CUDA version:

nvidia-smi

Look for the CUDA version in the output (e.g., CUDA 11.8, 12.1).

Install PyTorch with CUDA:

Visit PyTorch's installation page and select your configuration. For example:

# For CUDA 11.8
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Verify GPU support:

python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

Should output: CUDA available: True

CPU-Only Installation

If you don't have a GPU or prefer CPU-only execution:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Note: Training will be significantly slower on CPU.

Installing adabmDCA

highentDCA depends on adabmDCA version 0.5.0. This will be installed automatically, but you can also install it manually:

From PyPI

pip install adabmDCA==0.5.0

From Source

git clone https://github.com/spqb/adabmDCApy.git
cd adabmDCApy
git checkout v0.5.0  # Ensure correct version
pip install .

Troubleshooting

Common Issues

Issue: `command not found: highentDCA`

Solution: Ensure the installation directory is in your PATH:

# Check if pip bin directory is in PATH
pip show highentDCA | grep Location

Add to PATH if needed:

export PATH="$HOME/.local/bin:$PATH"  # Add to ~/.bashrc or ~/.zshrc

Issue: CUDA out of memory errors

Solutions: - Reduce the number of chains: --nchains 5000 (instead of default 10000) - Use smaller batch sizes for sampling - Monitor GPU memory: nvidia-smi -l 1

Issue: ImportError for adabmDCA modules

Solution: Ensure adabmDCA is correctly installed:

python -c "import adabmDCA; print(adabmDCA.__version__)"

Should output: 0.5.0

If not, reinstall:

pip uninstall adabmDCA
pip install adabmDCA==0.5.0

Solution: Initialize wandb (only needed if using --wandb flag):

wandb login

Enter your API key from wandb.ai.

Platform-Specific Notes

macOS

On macOS with Apple Silicon (M1/M2/M3):

# Use conda for better compatibility
conda create -n highentdca python=3.10
conda activate highentdca
conda install pytorch::pytorch -c pytorch
pip install adabmDCA==0.5.0
pip install .

Note: MPS (Metal Performance Shaders) acceleration is supported but may not be as optimized as CUDA.

Windows

On Windows, use WSL2 (Windows Subsystem for Linux) for the best experience:

Install WSL2 with Ubuntu
Follow the Linux installation instructions inside WSL2
For GPU support, install CUDA on WSL2

Linux (Docker)

For a containerized environment:

FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y \
    python3.10 \
    python3-pip \
    git

RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

RUN git clone https://github.com/robertonetti/highentropyDCA.git && \
    cd highentropyDCA && \
    pip3 install .

CMD ["bash"]

Build and run:

docker build -t highentdca .
docker run --gpus all -it highentdca

Verifying Installation

After installation, verify everything works:

1. Check CLI Access

highentDCA train --help

Should display the training command help.

2. Test Python Import

import highentDCA
from highentDCA.models.edDCA import fit
from highentDCA.checkpoint import Checkpoint
print("highentDCA successfully imported!")

3. Run a Quick Test

# Download test data
cd highentropyDCA/example_data

# Train a small model
highentDCA train \
    --data TEST/chains.fasta \
    --output test_output \
    --model edDCA \
    --density 0.98 \
    --drate 0.01 \
    --nchains 1000 \
    --nepochs 100 \
    --seed 42

This should complete without errors and create output in test_output/.

Updating highentDCA

To update to the latest version:

cd highentropyDCA
git pull origin main  # or develop
pip install --upgrade .

For development installations (pip install -e .), just pull the latest code:

git pull origin main

Uninstalling

To remove highentDCA:

pip uninstall highentDCA

To also remove dependencies:

pip uninstall highentDCA adabmDCA torch numpy pandas matplotlib tqdm biopython wandb

Next Steps

Now that you have highentDCA installed:

Quick Start: Train your first model
Usage Guide: Explore all features and options
API Reference: Use highentDCA in Python scripts
Examples: Learn from practical examples

Getting Help

If you encounter issues not covered here:

Check the GitHub Issues
Review the adabmDCA documentation
Contact: robertonetti3@gmail.com