Skip to content

It caused error When geting ‘resized_depth_mono’ #47

@Maple-geekZhu

Description

@Maple-geekZhu

It happened when I use your method to train other dataset(use 9 input views).
Frist, I used your code DNGaussian/dpt/get_depth_map_for_llff_dtu.pyto get the depth_maps, and the results is no problem.
Second, I customized the code DNGaussian/scene/dataset_readers.pyto read my own data.
It caused error when I run the train code:

python mytrain.py \
    --source_path "./new_datasets/omni3d/backpack_016" \
    --model_path "./Data/test/backpack_016" \
    --images "images_2" \
    --dataset "DTU" \
    --data_device "cuda:1" \
    --n_sparse 9 \
    --eval \
    --test_iterations -1 \
    --save_iterations 10000 \
    --iterations 10000 \

As you can see the bash output below, all the cam_infos are successfully read and it seems raise error when processing the depth_map:

Reading camera 200/200 [20/02 19:37:14]
Dataset Type:  DTU [20/02 19:37:14]
train ['/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00001.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00019.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00036.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00060.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00082.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00097.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00122.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00143.jpg', '/data/zrt/DNGaussian/new_datasets/omni3d/backpack_016/images_2/00167.jpg'] [20/02 19:37:14]
Loading Training Cameras [1.0] [20/02 19:37:14]
Traceback (most recent call last):
  File "/data/zrt/DNGaussian/mytrain.py", line 370, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/data/zrt/DNGaussian/mytrain.py", line 42, in training
    scene = Scene(dataset, gaussians)
  File "/data/zrt/DNGaussian/scene/__init__.py", line 79, in __init__
    self.train_cameras[resolution_scale] = cameraList_from_camInfos(scene_info.train_cameras, resolution_scale, args)
  File "/data/zrt/DNGaussian/utils/camera_utils.py", line 93, in cameraList_from_camInfos
    camera_list.append(loadCam(args, id, c, resolution_scale))
  File "/data/zrt/DNGaussian/utils/camera_utils.py", line 45, in loadCam
    resized_depth_mono = PILtoTorch(cam_info.depth_mono, resolution)
  File "/data/zrt/DNGaussian/utils/general_utils.py", line 23, in PILtoTorch
    resized_image = torch.from_numpy(np.array(resized_image_PIL)) / 255.0
TypeError: can't convert np.ndarray of type numpy.uint16. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Have you ever encountered this problem?
And I tried to solve it by modifying the code DNGaussian/utils/general_utils.py:

def PILtoTorch(pil_image, resolution):
    resized_image_PIL = pil_image.resize(resolution)
    resized_image = torch.from_numpy(np.array(resized_image_PIL).astype(np.uint8)) / 255.0
    if len(resized_image.shape) == 3:
        return resized_image.permute(2, 0, 1)
    else:
        return resized_image.unsqueeze(dim=-1).permute(2, 0, 1)

It can start training process, but when it iterated 5800/10000 there was a OOM problem:

Traceback (most recent call last):
  File "/data/zrt/DNGaussian/mytrain.py", line 370, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/data/zrt/DNGaussian/mytrain.py", line 185, in training
    loss.backward()
  File "/data/zrt/anaconda3/envs/DNG/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/data/zrt/anaconda3/envs/DNG/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.38 GiB (GPU 1; 23.55 GiB total capacity; 17.99 GiB already allocated; 977.19 MiB free; 21.75 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Training progress:  58%|######################                | 5800/10000 [11:04<08:01,  8.72it/s, Loss=0.2403810]

How can I make it work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions