DependencyGraph Architectural Incompatibility

# DependencyGraph Architectural Incompatibility with Transformers CausalLM and MistralForCausalLM Models

## Issue Overview
The `DependencyGraph.build_dependency()` method exhibits systematic incompatibility with Hugging Face Transformers causal language model architectures due to fundamental differences in computational graph construction patterns and input/output tensor handling semantics. This architectural mismatch prevents successful structural pruning operations across all LLaMA, Mistral, and derivative model families.

## Application Context
This incompatibility emerged during development of an interactive neural network pruning pipeline designed for production model compression workflows. The intended system architecture encompasses:

**Model Management Layer:**
- Interactive selection interface for locally cached models (`argilla/CapybaraHermes-2.5-Mistral-7B`, `TinyLlama/TinyLlama-1.1B-Chat-v1.0`)
- Hierarchical module enumeration with comprehensive layer identification and indexing
- Multi-target layer selection via space-delimited numerical interface

**Pruning Configuration System:**
- Fine-grained retention ratio specification (0.0-100.0% with arbitrary decimal precision)
- Semantic pruning interpretations: 100% indicates structural preservation, 0% denotes complete removal, intermediate values represent proportional channel/dimension reduction
- Per-layer granular control supporting heterogeneous pruning strategies across network topology

**Resource Management Framework:**
- Memory-constrained execution with 90% GPU memory allocation ceiling
- Automatic CPU offloading for overflow allocation with transparent fallback mechanisms
- CUDA availability detection with seamless CPU-only execution mode

**Model Serialization Pipeline:**
- Intermediate checkpoint persistence in HuggingFace-compatible format
- Integration with llama.cpp quantization toolchain for production deployment
- Target quantization: Q6_K precision with GGUF container format

**Directory Structure Specifications:**
- Local model repository: `../full_models/<model-identifier-without-namespace>/`
- Cross-platform path resolution with consistent behavior across execution environments

## Environment
- **Python**: 3.13.3
- **PyTorch**: 2.8.0 (tested with both CPU and CUDA 12.6)
- **torch-pruning**: latest (pip)
- **transformers**: latest (pip)
- **accelerate**: latest (pip)
- **OS**: Windows 11

## Affected Models
- `argilla/CapybaraHermes-2.5-Mistral-7B`
- `TinyLlama/TinyLlama-1.1B-Chat-v1.0`
- All `LlamaForCausalLM` and `MistralForCausalLM` based models

---

## Error Sequence 1: Tuple Input to Embedding Layer

### Script Snippet
```python
# Initial torch-pruning approach with tuple input format
enc = tokenizer("Hallo Welt", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)

DG = tp.DependencyGraph().build_dependency(
    model, 
    example_inputs=example_inputs  # Direct tensor input
)
```

### Error
```
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not tuple
    at torch.nn.functional.py:2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
```

### What Triggers This Error
The error occurs when the dependency graph attempts to trace through the model's embedding layer. The `build_dependency` method internally creates a forward function that passes inputs as positional arguments, but Transformers models expect keyword arguments like `input_ids=tensor`.

### Reproduction
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp

# Load local model
model = AutoModelForCausalLM.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")

enc = tokenizer("Test", return_tensors="pt")
example_inputs = enc["input_ids"]

# This triggers the TypeError
DG = tp.DependencyGraph().build_dependency(model, example_inputs=example_inputs)
```

### Expected vs Actual Behavior
- **Expected**: The dependency graph should build successfully with standard transformer model inputs
- **Actual**: TypeError occurs because the embedding layer receives incompatible argument format

### Error Explanation
Transformers models require keyword argument calls (`model(input_ids=tensor)`) rather than positional calls (`model(tensor)`). The default dependency graph construction assumes vision model patterns where inputs can be passed positionally, leading to argument mismatch at the embedding layer level.

---

## Error Sequence 2: grad_fn AttributeError in Computational Graph

### Script Snippet  
```python
# Attempted fix using tuple format and forward function
enc = tokenizer("Hallo Welt", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)

model.train()
dg = tp.DependencyGraph()
dg.build_dependency(
    model,
    example_inputs=(example_inputs,),            # tuple format
    forward_fn=lambda m, x: m(x).logits.float()  # extract logits
)
```

### Error
```
Traceback (most recent call last):
  File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
    grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
```

### What Triggers This Error
The error occurs in `_trace_computational_graph()` when the method attempts to access the `grad_fn` attribute of the forward function output. However, the forward function `lambda m, x: m(x).logits.float()` fails because `m(x)` with tuple `x` causes the same embedding layer issue, resulting in a tuple output rather than a tensor with gradient information.

### Reproduction
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp

model = AutoModelForCausalLM.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")

enc = tokenizer("Test", return_tensors="pt")
example_inputs = enc["input_ids"]
model.train()

# This will trigger the AttributeError
dg = tp.DependencyGraph()
dg.build_dependency(
    model,
    example_inputs=(example_inputs,),
    forward_fn=lambda m, x: m(x).logits.float()
)
```

### Expected vs Actual Behavior
- **Expected**: The dependency graph should successfully trace the computational graph and extract gradient information
- **Actual**: AttributeError because the forward function fails, preventing gradient graph construction

### Error Explanation
The combination of tuple input format and incompatible forward function signature creates a cascade failure. The forward function cannot execute successfully due to the underlying embedding layer argument mismatch, which prevents the dependency graph from obtaining a valid tensor output with gradient information.

---

## Error Sequence 3: Integer Tensor Gradient Requirement

### Script Snippet
```python
# Attempt to manually enable gradients on input tokens
enc = tokenizer("Hallo Welt", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
example_inputs.requires_grad_(True)  # This line causes the error

model.train()
dg = tp.DependencyGraph()
dg.build_dependency(model, example_inputs=(example_inputs,))
```

### Error
```
RuntimeError: only Tensors of floating point dtype can require gradients
```

### What Triggers This Error
The error occurs when attempting to call `requires_grad_(True)` on `input_ids` tensors, which have `dtype=torch.int64`. PyTorch's automatic differentiation system only supports gradient computation on floating-point tensors.

### Reproduction
```python
import torch
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
enc = tokenizer("Test", return_tensors="pt")
input_ids = enc["input_ids"]

print(f"input_ids dtype: {input_ids.dtype}")  # torch.int64
input_ids.requires_grad_(True)  # RuntimeError occurs here
```

### Expected vs Actual Behavior
- **Expected**: The dependency graph construction should handle input tensor types appropriately without requiring manual gradient enablement
- **Actual**: RuntimeError when attempting to enable gradients on discrete token indices

### Error Explanation  
Token IDs represent discrete vocabulary indices (`torch.int64`) rather than continuous values. Gradient computation requires continuous, differentiable parameters. The error occurs when code attempts to enable gradient tracking on these discrete input tokens, which is mathematically invalid for automatic differentiation.

---

## Error Sequence 4: BasePruner Integration Failure

### Script Snippet
```python
# Direct BasePruner usage attempt
modules = list(model.modules())
layer = modules[64]  # Linear layer selection
ratio = 0.2

imp = tp.importance.GroupMagnitudeImportance(p=2)
pruner = tp.pruner.BasePruner(
    model,
    {"input_ids": example_inputs},  # Dictionary format attempt
    importance=imp,
    pruning_ratio=ratio,
    pruning_ratio_dict={layer: ratio}
)
```

### Error
```
Traceback (most recent call last):
  File "torch_pruning/pruner/algorithms/base_pruner.py", line 137, in __init__
    self.DG = dependency.DependencyGraph().build_dependency(...)
  File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
    grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
```

### What Triggers This Error
The error occurs during `BasePruner` initialization when it internally constructs a `DependencyGraph` using default parameters. Even with dictionary input format, the internal dependency graph construction encounters the same computational graph tracing failure described in Error Sequence 2.

### Reproduction
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp

model = AutoModelForCausalLM.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")

enc = tokenizer("Test", return_tensors="pt")
example_inputs = enc["input_ids"]
modules = list(model.modules())
target_layer = modules[64]

imp = tp.importance.GroupMagnitudeImportance(p=2)
pruner = tp.pruner.BasePruner(
    model,
    {"input_ids": example_inputs},
    importance=imp,
    pruning_ratio=0.2,
    pruning_ratio_dict={target_layer: 0.2}
)  # Fails during initialization
```

### Expected vs Actual Behavior
- **Expected**: `BasePruner` should initialize successfully and perform structural pruning on Transformers models
- **Actual**: Initialization fails due to underlying dependency graph construction incompatibilities

### Error Explanation  
`BasePruner` internally creates a dependency graph during initialization using default forward functions and output transforms. Since these defaults are designed for vision models with tensor outputs, they fail when applied to Transformers models that return structured `CausalLMOutput` objects, preventing the pruner from completing initialization.


---
## Complete Script Evolution and Error Logs
<details>
<summary><strong>Toggle complete Script Evolution and Error Logs</strong></summary>

### Initial Script Implementation
```python
#!/usr/bin/env python3
from pathlib import Path
import subprocess
import sys
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp

BASE_DIR = Path(__file__).resolve().parent

CHOICES = {
    "1": "argilla/CapybaraHermes-2.5-Mistral-7B",
    "2": "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
}

def choose_model():
    print("Choose a model:")
    print("1) argilla/CapybaraHermes-2.5-Mistral-7B")
    print("2) TinyLlama/TinyLlama-1.1B-Chat-v1.0")
    c = input("Selection (1 or 2): ").strip()
    if c not in CHOICES:
        print("Invalid selection. Exiting.")
        sys.exit(1)
    return CHOICES[c]

def get_cache_dir(repo_id: str) -> Path:
    model_name = repo_id.split("/")[-1]
    return (BASE_DIR / ".." / "full_models" / model_name).resolve()

def device_and_mem_settings():
    if torch.cuda.is_available():
        return torch.device("cuda"), "auto", {0: "90%", "cpu": "10%"}
    else:
        return torch.device("cpu"), None, {"cpu": "100%"}

def load_model_and_tokenizer(repo_id: str, cache_dir: Path, device_map, max_memory):
    print(f"Loading local model from {cache_dir} ...")
    if not cache_dir.exists():
        print(f"Error: Local model folder {cache_dir} not found.")
        sys.exit(1)
    load_kwargs = {}
    if device_map is not None:
        load_kwargs.update({"device_map": device_map, "max_memory": max_memory})
    model = AutoModelForCausalLM.from_pretrained(str(cache_dir), **load_kwargs)
    tokenizer = AutoTokenizer.from_pretrained(str(cache_dir))
    return model, tokenizer

def list_all_layers(model):
    mods = list(model.modules())
    print("\n--- All Modules / Layers (numbered) ---")
    for i, m in enumerate(mods):
        print(f"[{i}] {m.__class__.__name__}")
    return mods

def ask_layer_selection():
    s = input("\nEnter layer numbers (space-separated): ").strip()
    return [int(x) for x in s.split()] if s else []

def ask_percentage_for_layer(idx, module):
    raw = input(f"For layer [{idx}] {module.__class__.__name__} -> Keep what percentage? (0-100): ").strip()
    raw = raw.replace(",", ".")
    v = float(raw)
    if v < 0 or v > 100: 
        sys.exit("Invalid value")
    return v

def perform_pruning(model, tokenizer, modules, selected_idxs):
    enc = tokenizer("Hello World", return_tensors="pt")
    example_inputs = enc["input_ids"]
    print("Building DependencyGraph...")
    tp.DependencyGraph().build_dependency(model, example_inputs=example_inputs)
    
    for idx in selected_idxs:
        layer = modules[idx]
        keep = ask_percentage_for_layer(idx, layer)
        if keep == 100:
            print("No change.")
            continue
        ratio = 1 - keep / 100.0
        print(f"Prune {layer.__class__.__name__} ({idx}) with {ratio*100:.2f}% ...")
        imp = tp.importance.GroupMagnitudeImportance(p=2)
        pruner = tp.pruner.BasePruner(
            model,
            example_inputs,
            importance=imp,
            pruning_ratio=ratio,
            pruning_ratio_dict={layer: ratio}
        )
        pruner.step()
    return model

def main():
    repo_id = choose_model()
    model_name = repo_id.split("/")[-1]
    cache_dir = get_cache_dir(repo_id)
    device, device_map, max_memory = device_and_mem_settings()

    model, tokenizer = load_model_and_tokenizer(repo_id, cache_dir, device_map, max_memory)
    mods = list_all_layers(model)
    sel = ask_layer_selection()
    if not sel: 
        sys.exit("No layers selected")

    pruned = perform_pruning(model, tokenizer, mods, sel)

if __name__ == "__main__":
    main()
```

### First Execution and Error Log
```
Choose a model:
1) argilla/CapybaraHermes-2.5-Mistral-7B
2) TinyLlama/TinyLlama-1.1B-Chat-v1.0
Selection (1 or 2): 2
Loading local model from /path/to/full_models/TinyLlama-1.1B-Chat-v1.0 ...

--- All Modules / Layers (numbered) ---
[0] LlamaForCausalLM
[1] LlamaModel
[2] Embedding
[3] ModuleList
[4] LlamaDecoderLayer
[5] LlamaAttention
[6] Linear
...
[292] Linear

Enter layer numbers (space-separated): 8 16 32 64 128 200 201 202 203 204 205 206 207 208 209 210
Building DependencyGraph...

UserWarning: Unwrapped parameters detected: ['model.layers.0.post_attention_layernorm.weight', ...]
 Torch-Pruning will prune the last non-singleton dimension of these parameters.

Traceback (most recent call last):
  File "script.py", line 156, in <module>
    main()
  File "script.py", line 151, in main
    pruned = perform_pruning(model, tokenizer, mods, sel)
  File "script.py", line 88, in perform_pruning
    tp.DependencyGraph().build_dependency(model, example_inputs=example_inputs)
  File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
    grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
```

### Second Iteration with Tuple Format
```python
def perform_pruning(model, tokenizer, modules, selected_idxs):
    enc = tokenizer("Hello World", return_tensors="pt")
    example_inputs = enc["input_ids"].to(next(model.parameters()).device)
    example_inputs.requires_grad_(True)

    print("Building DependencyGraph...")
    model.train()
    dg = tp.DependencyGraph()
    dg.build_dependency(
        model,
        example_inputs=(example_inputs,),
        forward_fn=lambda m, x: m(x).logits
    )
    # ... rest unchanged
```

### Second Error Log
```
Enter layer numbers (space-separated): 64
Building DependencyGraph...

Traceback (most recent call last):
  File "script.py", line 86, in perform_pruning
    example_inputs.requires_grad_(True)
RuntimeError: only Tensors of floating point dtype can require gradients
```

### Third Iteration without requires_grad_
```python
def perform_pruning(model, tokenizer, modules, selected_idxs):
    enc = tokenizer("Hello World", return_tensors="pt")
    example_inputs = enc["input_ids"].to(next(model.parameters()).device)

    print("Building DependencyGraph...")
    model.train()
    dg = tp.DependencyGraph()
    dg.build_dependency(
        model,
        example_inputs=(example_inputs,),
        forward_fn=lambda m, x: m(x).logits.float()
    )
    # ... rest unchanged
```

### Third Error Log
```
Enter layer numbers (space-separated): 64
Building DependencyGraph...

Traceback (most recent call last):
  File "script.py", line 93, in <lambda>
    forward_fn=lambda m, x: m(x).logits.float()
  File "torch/nn/functional.py", line 2551, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not tuple
```

### Fourth Iteration with Dictionary Format
```python
def perform_pruning(model, tokenizer, modules, selected_idxs):
    enc = tokenizer("Hello World", return_tensors="pt")
    example_inputs = enc["input_ids"].to(next(model.parameters()).device)

    print("Building DependencyGraph...")
    model.train()
    dg = tp.DependencyGraph()
    dg.build_dependency(
        model,
        example_inputs={"input_ids": example_inputs},
        forward_fn=lambda m, x: m(**x).logits.float()
    )

    for idx in selected_idxs:
        layer = modules[idx]
        keep = ask_percentage_for_layer(idx, layer)
        if keep == 100:
            print("No change.")
            continue
        ratio = 1 - keep / 100.0
        print(f"Prune {layer.__class__.__name__} ({idx}) with {ratio*100:.2f}% ...")
        imp = tp.importance.GroupMagnitudeImportance(p=2)
        pruner = tp.pruner.BasePruner(
            model,
            {"input_ids": example_inputs},
            importance=imp,
            pruning_ratio=ratio,
            pruning_ratio_dict={layer: ratio}
        )
        pruner.step()
    return model
```

### Final Error Log
```
Enter layer numbers (space-separated): 64
Building DependencyGraph...
For layer [64] Linear -> Keep what percentage? (0-100): 80
Prune Linear (64) with 20.00% ...

Traceback (most recent call last):
  File "script.py", line 106, in perform_pruning
    pruner = tp.pruner.BasePruner(
        model,
        {"input_ids": example_inputs},
        importance=imp,
        pruning_ratio=ratio,
        pruning_ratio_dict={layer: ratio}
    )
  File "torch_pruning/pruner/algorithms/base_pruner.py", line 137, in __init__
    self.DG = dependency.DependencyGraph().build_dependency(...)
  File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
    grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
```

</details>


DependencyGraph Architectural Incompatibility #519

Description

DependencyGraph Architectural Incompatibility with Transformers CausalLM and MistralForCausalLM Models

Issue Overview

Application Context

Environment

Affected Models

Error Sequence 1: Tuple Input to Embedding Layer

Script Snippet

Error

What Triggers This Error

Reproduction

Expected vs Actual Behavior

Error Explanation

Error Sequence 2: grad_fn AttributeError in Computational Graph

Script Snippet

Error

What Triggers This Error

Reproduction

Expected vs Actual Behavior

Error Explanation

Error Sequence 3: Integer Tensor Gradient Requirement

Script Snippet

Error

What Triggers This Error

Reproduction

Expected vs Actual Behavior

Error Explanation

Error Sequence 4: BasePruner Integration Failure

Script Snippet

Error

What Triggers This Error

Reproduction

Expected vs Actual Behavior

Error Explanation

Complete Script Evolution and Error Logs

Initial Script Implementation

First Execution and Error Log

Second Iteration with Tuple Format

Second Error Log

Third Iteration without requires_grad_

Third Error Log

Fourth Iteration with Dictionary Format

Final Error Log

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions