DependencyGraph Architectural Incompatibility with Transformers CausalLM and MistralForCausalLM Models
Issue Overview
The DependencyGraph.build_dependency() method exhibits systematic incompatibility with Hugging Face Transformers causal language model architectures due to fundamental differences in computational graph construction patterns and input/output tensor handling semantics. This architectural mismatch prevents successful structural pruning operations across all LLaMA, Mistral, and derivative model families.
Application Context
This incompatibility emerged during development of an interactive neural network pruning pipeline designed for production model compression workflows. The intended system architecture encompasses:
Model Management Layer:
- Interactive selection interface for locally cached models (
argilla/CapybaraHermes-2.5-Mistral-7B, TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- Hierarchical module enumeration with comprehensive layer identification and indexing
- Multi-target layer selection via space-delimited numerical interface
Pruning Configuration System:
- Fine-grained retention ratio specification (0.0-100.0% with arbitrary decimal precision)
- Semantic pruning interpretations: 100% indicates structural preservation, 0% denotes complete removal, intermediate values represent proportional channel/dimension reduction
- Per-layer granular control supporting heterogeneous pruning strategies across network topology
Resource Management Framework:
- Memory-constrained execution with 90% GPU memory allocation ceiling
- Automatic CPU offloading for overflow allocation with transparent fallback mechanisms
- CUDA availability detection with seamless CPU-only execution mode
Model Serialization Pipeline:
- Intermediate checkpoint persistence in HuggingFace-compatible format
- Integration with llama.cpp quantization toolchain for production deployment
- Target quantization: Q6_K precision with GGUF container format
Directory Structure Specifications:
- Local model repository:
../full_models/<model-identifier-without-namespace>/
- Cross-platform path resolution with consistent behavior across execution environments
Environment
- Python: 3.13.3
- PyTorch: 2.8.0 (tested with both CPU and CUDA 12.6)
- torch-pruning: latest (pip)
- transformers: latest (pip)
- accelerate: latest (pip)
- OS: Windows 11
Affected Models
argilla/CapybaraHermes-2.5-Mistral-7B
TinyLlama/TinyLlama-1.1B-Chat-v1.0
- All
LlamaForCausalLM and MistralForCausalLM based models
Error Sequence 1: Tuple Input to Embedding Layer
Script Snippet
# Initial torch-pruning approach with tuple input format
enc = tokenizer("Hallo Welt", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
DG = tp.DependencyGraph().build_dependency(
model,
example_inputs=example_inputs # Direct tensor input
)
Error
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not tuple
at torch.nn.functional.py:2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
What Triggers This Error
The error occurs when the dependency graph attempts to trace through the model's embedding layer. The build_dependency method internally creates a forward function that passes inputs as positional arguments, but Transformers models expect keyword arguments like input_ids=tensor.
Reproduction
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp
# Load local model
model = AutoModelForCausalLM.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
enc = tokenizer("Test", return_tensors="pt")
example_inputs = enc["input_ids"]
# This triggers the TypeError
DG = tp.DependencyGraph().build_dependency(model, example_inputs=example_inputs)
Expected vs Actual Behavior
- Expected: The dependency graph should build successfully with standard transformer model inputs
- Actual: TypeError occurs because the embedding layer receives incompatible argument format
Error Explanation
Transformers models require keyword argument calls (model(input_ids=tensor)) rather than positional calls (model(tensor)). The default dependency graph construction assumes vision model patterns where inputs can be passed positionally, leading to argument mismatch at the embedding layer level.
Error Sequence 2: grad_fn AttributeError in Computational Graph
Script Snippet
# Attempted fix using tuple format and forward function
enc = tokenizer("Hallo Welt", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
model.train()
dg = tp.DependencyGraph()
dg.build_dependency(
model,
example_inputs=(example_inputs,), # tuple format
forward_fn=lambda m, x: m(x).logits.float() # extract logits
)
Error
Traceback (most recent call last):
File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
What Triggers This Error
The error occurs in _trace_computational_graph() when the method attempts to access the grad_fn attribute of the forward function output. However, the forward function lambda m, x: m(x).logits.float() fails because m(x) with tuple x causes the same embedding layer issue, resulting in a tuple output rather than a tensor with gradient information.
Reproduction
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp
model = AutoModelForCausalLM.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
enc = tokenizer("Test", return_tensors="pt")
example_inputs = enc["input_ids"]
model.train()
# This will trigger the AttributeError
dg = tp.DependencyGraph()
dg.build_dependency(
model,
example_inputs=(example_inputs,),
forward_fn=lambda m, x: m(x).logits.float()
)
Expected vs Actual Behavior
- Expected: The dependency graph should successfully trace the computational graph and extract gradient information
- Actual: AttributeError because the forward function fails, preventing gradient graph construction
Error Explanation
The combination of tuple input format and incompatible forward function signature creates a cascade failure. The forward function cannot execute successfully due to the underlying embedding layer argument mismatch, which prevents the dependency graph from obtaining a valid tensor output with gradient information.
Error Sequence 3: Integer Tensor Gradient Requirement
Script Snippet
# Attempt to manually enable gradients on input tokens
enc = tokenizer("Hallo Welt", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
example_inputs.requires_grad_(True) # This line causes the error
model.train()
dg = tp.DependencyGraph()
dg.build_dependency(model, example_inputs=(example_inputs,))
Error
RuntimeError: only Tensors of floating point dtype can require gradients
What Triggers This Error
The error occurs when attempting to call requires_grad_(True) on input_ids tensors, which have dtype=torch.int64. PyTorch's automatic differentiation system only supports gradient computation on floating-point tensors.
Reproduction
import torch
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
enc = tokenizer("Test", return_tensors="pt")
input_ids = enc["input_ids"]
print(f"input_ids dtype: {input_ids.dtype}") # torch.int64
input_ids.requires_grad_(True) # RuntimeError occurs here
Expected vs Actual Behavior
- Expected: The dependency graph construction should handle input tensor types appropriately without requiring manual gradient enablement
- Actual: RuntimeError when attempting to enable gradients on discrete token indices
Error Explanation
Token IDs represent discrete vocabulary indices (torch.int64) rather than continuous values. Gradient computation requires continuous, differentiable parameters. The error occurs when code attempts to enable gradient tracking on these discrete input tokens, which is mathematically invalid for automatic differentiation.
Error Sequence 4: BasePruner Integration Failure
Script Snippet
# Direct BasePruner usage attempt
modules = list(model.modules())
layer = modules[64] # Linear layer selection
ratio = 0.2
imp = tp.importance.GroupMagnitudeImportance(p=2)
pruner = tp.pruner.BasePruner(
model,
{"input_ids": example_inputs}, # Dictionary format attempt
importance=imp,
pruning_ratio=ratio,
pruning_ratio_dict={layer: ratio}
)
Error
Traceback (most recent call last):
File "torch_pruning/pruner/algorithms/base_pruner.py", line 137, in __init__
self.DG = dependency.DependencyGraph().build_dependency(...)
File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
What Triggers This Error
The error occurs during BasePruner initialization when it internally constructs a DependencyGraph using default parameters. Even with dictionary input format, the internal dependency graph construction encounters the same computational graph tracing failure described in Error Sequence 2.
Reproduction
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp
model = AutoModelForCausalLM.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
tokenizer = AutoTokenizer.from_pretrained("../full_models/TinyLlama-1.1B-Chat-v1.0")
enc = tokenizer("Test", return_tensors="pt")
example_inputs = enc["input_ids"]
modules = list(model.modules())
target_layer = modules[64]
imp = tp.importance.GroupMagnitudeImportance(p=2)
pruner = tp.pruner.BasePruner(
model,
{"input_ids": example_inputs},
importance=imp,
pruning_ratio=0.2,
pruning_ratio_dict={target_layer: 0.2}
) # Fails during initialization
Expected vs Actual Behavior
- Expected:
BasePruner should initialize successfully and perform structural pruning on Transformers models
- Actual: Initialization fails due to underlying dependency graph construction incompatibilities
Error Explanation
BasePruner internally creates a dependency graph during initialization using default forward functions and output transforms. Since these defaults are designed for vision models with tensor outputs, they fail when applied to Transformers models that return structured CausalLMOutput objects, preventing the pruner from completing initialization.
Complete Script Evolution and Error Logs
Toggle complete Script Evolution and Error Logs
Initial Script Implementation
#!/usr/bin/env python3
from pathlib import Path
import subprocess
import sys
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch_pruning as tp
BASE_DIR = Path(__file__).resolve().parent
CHOICES = {
"1": "argilla/CapybaraHermes-2.5-Mistral-7B",
"2": "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
}
def choose_model():
print("Choose a model:")
print("1) argilla/CapybaraHermes-2.5-Mistral-7B")
print("2) TinyLlama/TinyLlama-1.1B-Chat-v1.0")
c = input("Selection (1 or 2): ").strip()
if c not in CHOICES:
print("Invalid selection. Exiting.")
sys.exit(1)
return CHOICES[c]
def get_cache_dir(repo_id: str) -> Path:
model_name = repo_id.split("/")[-1]
return (BASE_DIR / ".." / "full_models" / model_name).resolve()
def device_and_mem_settings():
if torch.cuda.is_available():
return torch.device("cuda"), "auto", {0: "90%", "cpu": "10%"}
else:
return torch.device("cpu"), None, {"cpu": "100%"}
def load_model_and_tokenizer(repo_id: str, cache_dir: Path, device_map, max_memory):
print(f"Loading local model from {cache_dir} ...")
if not cache_dir.exists():
print(f"Error: Local model folder {cache_dir} not found.")
sys.exit(1)
load_kwargs = {}
if device_map is not None:
load_kwargs.update({"device_map": device_map, "max_memory": max_memory})
model = AutoModelForCausalLM.from_pretrained(str(cache_dir), **load_kwargs)
tokenizer = AutoTokenizer.from_pretrained(str(cache_dir))
return model, tokenizer
def list_all_layers(model):
mods = list(model.modules())
print("\n--- All Modules / Layers (numbered) ---")
for i, m in enumerate(mods):
print(f"[{i}] {m.__class__.__name__}")
return mods
def ask_layer_selection():
s = input("\nEnter layer numbers (space-separated): ").strip()
return [int(x) for x in s.split()] if s else []
def ask_percentage_for_layer(idx, module):
raw = input(f"For layer [{idx}] {module.__class__.__name__} -> Keep what percentage? (0-100): ").strip()
raw = raw.replace(",", ".")
v = float(raw)
if v < 0 or v > 100:
sys.exit("Invalid value")
return v
def perform_pruning(model, tokenizer, modules, selected_idxs):
enc = tokenizer("Hello World", return_tensors="pt")
example_inputs = enc["input_ids"]
print("Building DependencyGraph...")
tp.DependencyGraph().build_dependency(model, example_inputs=example_inputs)
for idx in selected_idxs:
layer = modules[idx]
keep = ask_percentage_for_layer(idx, layer)
if keep == 100:
print("No change.")
continue
ratio = 1 - keep / 100.0
print(f"Prune {layer.__class__.__name__} ({idx}) with {ratio*100:.2f}% ...")
imp = tp.importance.GroupMagnitudeImportance(p=2)
pruner = tp.pruner.BasePruner(
model,
example_inputs,
importance=imp,
pruning_ratio=ratio,
pruning_ratio_dict={layer: ratio}
)
pruner.step()
return model
def main():
repo_id = choose_model()
model_name = repo_id.split("/")[-1]
cache_dir = get_cache_dir(repo_id)
device, device_map, max_memory = device_and_mem_settings()
model, tokenizer = load_model_and_tokenizer(repo_id, cache_dir, device_map, max_memory)
mods = list_all_layers(model)
sel = ask_layer_selection()
if not sel:
sys.exit("No layers selected")
pruned = perform_pruning(model, tokenizer, mods, sel)
if __name__ == "__main__":
main()
First Execution and Error Log
Choose a model:
1) argilla/CapybaraHermes-2.5-Mistral-7B
2) TinyLlama/TinyLlama-1.1B-Chat-v1.0
Selection (1 or 2): 2
Loading local model from /path/to/full_models/TinyLlama-1.1B-Chat-v1.0 ...
--- All Modules / Layers (numbered) ---
[0] LlamaForCausalLM
[1] LlamaModel
[2] Embedding
[3] ModuleList
[4] LlamaDecoderLayer
[5] LlamaAttention
[6] Linear
...
[292] Linear
Enter layer numbers (space-separated): 8 16 32 64 128 200 201 202 203 204 205 206 207 208 209 210
Building DependencyGraph...
UserWarning: Unwrapped parameters detected: ['model.layers.0.post_attention_layernorm.weight', ...]
Torch-Pruning will prune the last non-singleton dimension of these parameters.
Traceback (most recent call last):
File "script.py", line 156, in <module>
main()
File "script.py", line 151, in main
pruned = perform_pruning(model, tokenizer, mods, sel)
File "script.py", line 88, in perform_pruning
tp.DependencyGraph().build_dependency(model, example_inputs=example_inputs)
File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
Second Iteration with Tuple Format
def perform_pruning(model, tokenizer, modules, selected_idxs):
enc = tokenizer("Hello World", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
example_inputs.requires_grad_(True)
print("Building DependencyGraph...")
model.train()
dg = tp.DependencyGraph()
dg.build_dependency(
model,
example_inputs=(example_inputs,),
forward_fn=lambda m, x: m(x).logits
)
# ... rest unchanged
Second Error Log
Enter layer numbers (space-separated): 64
Building DependencyGraph...
Traceback (most recent call last):
File "script.py", line 86, in perform_pruning
example_inputs.requires_grad_(True)
RuntimeError: only Tensors of floating point dtype can require gradients
Third Iteration without requires_grad_
def perform_pruning(model, tokenizer, modules, selected_idxs):
enc = tokenizer("Hello World", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
print("Building DependencyGraph...")
model.train()
dg = tp.DependencyGraph()
dg.build_dependency(
model,
example_inputs=(example_inputs,),
forward_fn=lambda m, x: m(x).logits.float()
)
# ... rest unchanged
Third Error Log
Enter layer numbers (space-separated): 64
Building DependencyGraph...
Traceback (most recent call last):
File "script.py", line 93, in <lambda>
forward_fn=lambda m, x: m(x).logits.float()
File "torch/nn/functional.py", line 2551, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not tuple
Fourth Iteration with Dictionary Format
def perform_pruning(model, tokenizer, modules, selected_idxs):
enc = tokenizer("Hello World", return_tensors="pt")
example_inputs = enc["input_ids"].to(next(model.parameters()).device)
print("Building DependencyGraph...")
model.train()
dg = tp.DependencyGraph()
dg.build_dependency(
model,
example_inputs={"input_ids": example_inputs},
forward_fn=lambda m, x: m(**x).logits.float()
)
for idx in selected_idxs:
layer = modules[idx]
keep = ask_percentage_for_layer(idx, layer)
if keep == 100:
print("No change.")
continue
ratio = 1 - keep / 100.0
print(f"Prune {layer.__class__.__name__} ({idx}) with {ratio*100:.2f}% ...")
imp = tp.importance.GroupMagnitudeImportance(p=2)
pruner = tp.pruner.BasePruner(
model,
{"input_ids": example_inputs},
importance=imp,
pruning_ratio=ratio,
pruning_ratio_dict={layer: ratio}
)
pruner.step()
return model
Final Error Log
Enter layer numbers (space-separated): 64
Building DependencyGraph...
For layer [64] Linear -> Keep what percentage? (0-100): 80
Prune Linear (64) with 20.00% ...
Traceback (most recent call last):
File "script.py", line 106, in perform_pruning
pruner = tp.pruner.BasePruner(
model,
{"input_ids": example_inputs},
importance=imp,
pruning_ratio=ratio,
pruning_ratio_dict={layer: ratio}
)
File "torch_pruning/pruner/algorithms/base_pruner.py", line 137, in __init__
self.DG = dependency.DependencyGraph().build_dependency(...)
File "torch_pruning/dependency/graph.py", line 514, in _trace_computational_graph
grad_fn_root = output.grad_fn
AttributeError: 'tuple' object has no attribute 'grad_fn'
DependencyGraph Architectural Incompatibility with Transformers CausalLM and MistralForCausalLM Models
Issue Overview
The
DependencyGraph.build_dependency()method exhibits systematic incompatibility with Hugging Face Transformers causal language model architectures due to fundamental differences in computational graph construction patterns and input/output tensor handling semantics. This architectural mismatch prevents successful structural pruning operations across all LLaMA, Mistral, and derivative model families.Application Context
This incompatibility emerged during development of an interactive neural network pruning pipeline designed for production model compression workflows. The intended system architecture encompasses:
Model Management Layer:
argilla/CapybaraHermes-2.5-Mistral-7B,TinyLlama/TinyLlama-1.1B-Chat-v1.0)Pruning Configuration System:
Resource Management Framework:
Model Serialization Pipeline:
Directory Structure Specifications:
../full_models/<model-identifier-without-namespace>/Environment
Affected Models
argilla/CapybaraHermes-2.5-Mistral-7BTinyLlama/TinyLlama-1.1B-Chat-v1.0LlamaForCausalLMandMistralForCausalLMbased modelsError Sequence 1: Tuple Input to Embedding Layer
Script Snippet
Error
What Triggers This Error
The error occurs when the dependency graph attempts to trace through the model's embedding layer. The
build_dependencymethod internally creates a forward function that passes inputs as positional arguments, but Transformers models expect keyword arguments likeinput_ids=tensor.Reproduction
Expected vs Actual Behavior
Error Explanation
Transformers models require keyword argument calls (
model(input_ids=tensor)) rather than positional calls (model(tensor)). The default dependency graph construction assumes vision model patterns where inputs can be passed positionally, leading to argument mismatch at the embedding layer level.Error Sequence 2: grad_fn AttributeError in Computational Graph
Script Snippet
Error
What Triggers This Error
The error occurs in
_trace_computational_graph()when the method attempts to access thegrad_fnattribute of the forward function output. However, the forward functionlambda m, x: m(x).logits.float()fails becausem(x)with tuplexcauses the same embedding layer issue, resulting in a tuple output rather than a tensor with gradient information.Reproduction
Expected vs Actual Behavior
Error Explanation
The combination of tuple input format and incompatible forward function signature creates a cascade failure. The forward function cannot execute successfully due to the underlying embedding layer argument mismatch, which prevents the dependency graph from obtaining a valid tensor output with gradient information.
Error Sequence 3: Integer Tensor Gradient Requirement
Script Snippet
Error
What Triggers This Error
The error occurs when attempting to call
requires_grad_(True)oninput_idstensors, which havedtype=torch.int64. PyTorch's automatic differentiation system only supports gradient computation on floating-point tensors.Reproduction
Expected vs Actual Behavior
Error Explanation
Token IDs represent discrete vocabulary indices (
torch.int64) rather than continuous values. Gradient computation requires continuous, differentiable parameters. The error occurs when code attempts to enable gradient tracking on these discrete input tokens, which is mathematically invalid for automatic differentiation.Error Sequence 4: BasePruner Integration Failure
Script Snippet
Error
What Triggers This Error
The error occurs during
BasePrunerinitialization when it internally constructs aDependencyGraphusing default parameters. Even with dictionary input format, the internal dependency graph construction encounters the same computational graph tracing failure described in Error Sequence 2.Reproduction
Expected vs Actual Behavior
BasePrunershould initialize successfully and perform structural pruning on Transformers modelsError Explanation
BasePrunerinternally creates a dependency graph during initialization using default forward functions and output transforms. Since these defaults are designed for vision models with tensor outputs, they fail when applied to Transformers models that return structuredCausalLMOutputobjects, preventing the pruner from completing initialization.Complete Script Evolution and Error Logs
Toggle complete Script Evolution and Error Logs
Initial Script Implementation
First Execution and Error Log
Second Iteration with Tuple Format
Second Error Log
Third Iteration without requires_grad_
Third Error Log
Fourth Iteration with Dictionary Format
Final Error Log