Skip to content

TPUGraphs dataset layout training error #9

@fwu11

Description

@fwu11

I encountered this error during TPUGraphs dataset training with default settings.

Start from epoch 0
train: {'epoch': 0, 'time_epoch': 86.18243, 'eta': 1723562.42292, 'eta_hours': 478.76734, 'loss': 49.52224776, 'lr': 0.0001, 'params': 546197, 'time_iter': 12.31178, 'opa'
: 0.51881, 'spearmanr': 0.05149}
...computing epoch stats took: 0.11s
val: {'epoch': 0, 'time_epoch': 6.01468, 'loss': 49.34476471, 'lr': 0, 'params': 546197, 'time_iter': 6.01468, 'opa': 0.62399, 'spearmanr': 0.35255}
...computing epoch stats took: 0.01s
Traceback (most recent call last):
  File "/home/clc/workspace/ai-model-runtime/layout_train.py", line 172, in <module>
    train_dict[cfg.train.mode](loggers, loaders, model, optimizer,
  File "/home/clc/workspace/ai-model-runtime/graphgps/train/custom_tpu_train.py", line 313, in custom_train
    perf[i].append(loggers[i].write_epoch(cur_epoch))
  File "/home/clc/workspace/ai-model-runtime/graphgps/logger.py", line 264, in write_epoch
    task_stats = self.ranking()
  File "/home/clc/workspace/ai-model-runtime/graphgps/logger.py", line 192, in ranking
    opas.append(eval_opa(true[i], pred[i]))
  File "/home/clc/workspace/ai-model-runtime/graphgps/logger.py", line 341, in eval_opa
    opa_acc = float((opa_preds > 0).sum()) / opa_preds.shape[0]
ZeroDivisionError: float division by zero

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions