Skip to content

Pretrained checkpoint hardcodes data path, causing NotImplementedError when directory structure is different #22

@CiSong10

Description

@CiSong10

Issue Description:

I'm trying to use the provided pretrained model (PointGroup-PAPER.pt checkpoint) to run prediction on my own point cloud dataset, but encountered a NotImplementedError during execution. This appears to be caused by a hardcoded dataroot embedded in the checkpoint config, which assumes the dataset is located at data_set1_5classes.

Since I reorganized the data directory to a different structure (e.g., ./data/data_set1_5classes/), the dataset loader failed to find the data and fell back to download(), which is not implemented — resulting in a crash.

Error Trace

[2025-06-11 14:35:17,201][torch_points3d.trainer][INFO] - DEVICE : cuda
[2025-06-11 14:35:18,132][torch_points3d.metrics.model_checkpoint][INFO] - Loading checkpoint from /home/cisong/ForAINet/outputs/pretrained/PointGroup-PAPER.pt
Traceback (most recent call last):
  File "PointCloudSegmentation/predict.py", line 13, in main
    trainer = Trainer(cfg)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 48, in __init__
    self._initialize_trainer()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 90, in _initialize_trainer
    self._dataset: BaseDataset = instantiate_dataset(self._checkpoint.data_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/dataset_factory.py", line 46, in instantiate_dataset
    dataset = dataset_cls(dataset_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 633, in __init__
    self.test_dataset = dataset_cls(
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 599, in __init__
    super().__init__(root, grid_size, *args, **kwargs)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 239, in __init__
    super(TreeinsOriginalFused, self).__init__(root, transform, pre_transform, pre_filter)

...

  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 625, in download
    super().download()
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/in_memory_dataset.py", line 50, in download
    raise NotImplementedError
NotImplementedError

My Analysis

I tried to use VS Code to debug it, after loading the checkpoint, near line 90 of trainer.py, self._checkpoint.data_config.dataroot shows data_set1_classes. And a new ./data_set1_5classes/treeinsfused/raw directory was created. So I assume this fixed dataset configuration (data_config) prevents the model from being easily portable to other data directory structures.

This is also probably the cause of issue #1 (comment) , and why the workaround to change file path proposed in prs-eth/PanopticSegForLargeScalePointCloud#10 (comment) works.

Possible Improvements

If my analysis above is correct -- excuse me if not -- Ideally, checkpoints should store only model weights and architecture-related configs, not absolute or relative paths to training datasets.

Current Workaround

For now I put the /data_set1_5classes/treeinsfused/raw dataset folder back to where it is and solved this issue. I don't quite understand, if I am deploying the pretrained model weights to my own dataset, why does it needs to see the treeinsfused dataset. Or am I doing something wrong in configuration?

More issue

After the workaround, the NotImplementedError solved, another error aroused:

[2025-06-12 16:04:32,709][torch_points3d.trainer][INFO] - DEVICE : cuda
[2025-06-12 16:04:33,722][torch_points3d.metrics.model_checkpoint][INFO] - Loading checkpoint from /home/cisong/ForAINet/outputs/pretrained/PointGroup-PAPER.pt
Processing...
Traceback (most recent call last):
  File "/home/cisong/ForAINet/PointCloudSegmentation/predict.py", line 13, in main
    trainer = Trainer(cfg)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 48, in __init__
    self._initialize_trainer()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/trainer.py", line 90, in _initialize_trainer
    self._dataset: BaseDataset = instantiate_dataset(self._checkpoint.data_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/dataset_factory.py", line 46, in instantiate_dataset
    dataset = dataset_cls(dataset_config)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 633, in __init__
    self.test_dataset = dataset_cls(
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 599, in __init__
    super().__init__(root, grid_size, *args, **kwargs)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 239, in __init__
    super(TreeinsOriginalFused, self).__init__(root, transform, pre_transform, pre_filter)
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/in_memory_dataset.py", line 60, in __init__
    super().__init__(root, transform, pre_transform, pre_filter)
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/dataset.py", line 86, in __init__
    self._process()
  File "/usr/local/lib/python3.8/dist-packages/torch_geometric/data/dataset.py", line 165, in _process
    self.process()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/panoptic/treeins_set1.py", line 557, in process
    super().process()
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 622, in process
    super().process_test(self.test_area)
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 492, in process_test
    xyz, semantic_labels, instance_labels = read_treeins_format(
  File "/home/cisong/ForAINet/PointCloudSegmentation/torch_points3d/datasets/segmentation/treeins_set1.py", line 73, in read_treeins_format
    semantic_labels = data['semantic_seg'].astype(np.int64)-1
ValueError: no field of name semantic_seg

Can you help me with this? Why Python is looking for ['semantic_seg'] column from the data? My main python script is predict.py and the configuration file I use is predict.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions