-
Notifications
You must be signed in to change notification settings - Fork 17
Pretrained checkpoint hardcodes data path, causing NotImplementedError when directory structure is different #22
Description
Issue Description:
I'm trying to use the provided pretrained model (PointGroup-PAPER.pt checkpoint) to run prediction on my own point cloud dataset, but encountered a NotImplementedError during execution. This appears to be caused by a hardcoded dataroot embedded in the checkpoint config, which assumes the dataset is located at data_set1_5classes.
Since I reorganized the data directory to a different structure (e.g., ./data/data_set1_5classes/), the dataset loader failed to find the data and fell back to download(), which is not implemented — resulting in a crash.
Error Trace
[2025-06-11 14:35:17,201][torch_points3d.trainer][INFO] - DEVICE : cuda
[2025-06-11 14:35:18,132][torch_points3d.metrics.model_checkpoint][INFO] - Loading checkpoint from /home/cisong/ForAINet/outputs/pretrained/PointGroup-PAPER.pt
Traceback (most recent call last):
File "PointCloudSegmentation/predict.py", line 13, in main
trainer = Trainer(cfg)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 48, in __init__
self._initialize_trainer()
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 90, in _initialize_trainer
self._dataset: BaseDataset = instantiate_dataset(self._checkpoint.data_config)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/dataset_factory.py", line 46, in instantiate_dataset
dataset = dataset_cls(dataset_config)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 633, in __init__
self.test_dataset = dataset_cls(
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 599, in __init__
super().__init__(root, grid_size, *args, **kwargs)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 239, in __init__
super(TreeinsOriginalFused, self).__init__(root, transform, pre_transform, pre_filter)
...
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 625, in download
super().download()
File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/in_memory_dataset.py", line 50, in download
raise NotImplementedError
NotImplementedError
My Analysis
I tried to use VS Code to debug it, after loading the checkpoint, near line 90 of trainer.py, self._checkpoint.data_config.dataroot shows data_set1_classes. And a new ./data_set1_5classes/treeinsfused/raw directory was created. So I assume this fixed dataset configuration (data_config) prevents the model from being easily portable to other data directory structures.
This is also probably the cause of issue #1 (comment) , and why the workaround to change file path proposed in prs-eth/PanopticSegForLargeScalePointCloud#10 (comment) works.
Possible Improvements
If my analysis above is correct -- excuse me if not -- Ideally, checkpoints should store only model weights and architecture-related configs, not absolute or relative paths to training datasets.
Current Workaround
For now I put the /data_set1_5classes/treeinsfused/raw dataset folder back to where it is and solved this issue. I don't quite understand, if I am deploying the pretrained model weights to my own dataset, why does it needs to see the treeinsfused dataset. Or am I doing something wrong in configuration?
More issue
After the workaround, the NotImplementedError solved, another error aroused:
[2025-06-12 16:04:32,709][torch_points3d.trainer][INFO] - DEVICE : cuda
[2025-06-12 16:04:33,722][torch_points3d.metrics.model_checkpoint][INFO] - Loading checkpoint from /home/cisong/ForAINet/outputs/pretrained/PointGroup-PAPER.pt
Processing...
Traceback (most recent call last):
File "/home/cisong/ForAINet/PointCloudSegmentation/predict.py", line 13, in main
trainer = Trainer(cfg)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 48, in __init__
self._initialize_trainer()
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 90, in _initialize_trainer
self._dataset: BaseDataset = instantiate_dataset(self._checkpoint.data_config)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/dataset_factory.py", line 46, in instantiate_dataset
dataset = dataset_cls(dataset_config)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 633, in __init__
self.test_dataset = dataset_cls(
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 599, in __init__
super().__init__(root, grid_size, *args, **kwargs)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 239, in __init__
super(TreeinsOriginalFused, self).__init__(root, transform, pre_transform, pre_filter)
File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/in_memory_dataset.py", line 60, in __init__
super().__init__(root, transform, pre_transform, pre_filter)
File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/dataset.py", line 86, in __init__
self._process()
File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/dataset.py", line 165, in _process
self.process()
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 557, in process
super().process()
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 622, in process
super().process_test(self.test_area)
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 492, in process_test
xyz, semantic_labels, instance_labels = read_treeins_format(
File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 73, in read_treeins_format
semantic_labels = data['semantic_seg'].astype(np.int64)-1
ValueError: no field of name semantic_seg
Can you help me with this? Why Python is looking for ['semantic_seg'] column from the data? My main python script is predict.py and the configuration file I use is predict.yaml