I encountered this error during TPUGraphs dataset training with default settings.
Start from epoch 0
train: {'epoch': 0, 'time_epoch': 86.18243, 'eta': 1723562.42292, 'eta_hours': 478.76734, 'loss': 49.52224776, 'lr': 0.0001, 'params': 546197, 'time_iter': 12.31178, 'opa'
: 0.51881, 'spearmanr': 0.05149}
...computing epoch stats took: 0.11s
val: {'epoch': 0, 'time_epoch': 6.01468, 'loss': 49.34476471, 'lr': 0, 'params': 546197, 'time_iter': 6.01468, 'opa': 0.62399, 'spearmanr': 0.35255}
...computing epoch stats took: 0.01s
Traceback (most recent call last):
File "/home/clc/workspace/ai-model-runtime/layout_train.py", line 172, in <module>
train_dict[cfg.train.mode](loggers, loaders, model, optimizer,
File "/home/clc/workspace/ai-model-runtime/graphgps/train/custom_tpu_train.py", line 313, in custom_train
perf[i].append(loggers[i].write_epoch(cur_epoch))
File "/home/clc/workspace/ai-model-runtime/graphgps/logger.py", line 264, in write_epoch
task_stats = self.ranking()
File "/home/clc/workspace/ai-model-runtime/graphgps/logger.py", line 192, in ranking
opas.append(eval_opa(true[i], pred[i]))
File "/home/clc/workspace/ai-model-runtime/graphgps/logger.py", line 341, in eval_opa
opa_acc = float((opa_preds > 0).sum()) / opa_preds.shape[0]
ZeroDivisionError: float division by zero
I encountered this error during TPUGraphs dataset training with default settings.