Skip to content

Is it a problem with the transformer version? with an error saying missing parameters #6

@QtyIUDSL

Description

@QtyIUDSL

[INFO|trainer.py:749] 2025-11-18 21:49:36,652 >> Using auto half precision backend
[WARNING|trainer.py:982] 2025-11-18 21:49:36,653 >> The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 151645, 'pad_token_id': 151643}.
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/autodl-tmp/SGL-main/internvl/train/internvl_chat_pretrain.py", line 892, in
[rank0]: main()
[rank0]: File "/root/autodl-tmp/SGL-main/internvl/train/internvl_chat_pretrain.py", line 877, in main
[rank0]: train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 2325, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 2375, in _inner_training_loop
[rank0]: train_dataloader = self.get_train_dataloader()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 1140, in get_train_dataloader
[rank0]: return self._get_dataloader(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/lib/python3.12/site-packages/transformers/trainer.py", line 1109, in _get_dataloader
[rank0]: dataloader_params["sampler"] = sampler_fn(dataset)
[rank0]: ^^^^^^^^^^^^^^^^^^^

[rank0]: TypeError: _get_train_sampler() takes 1 positional argument but 2 were given

[rank0]:[W1118 21:49:37.788983626 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
E1118 21:49:39.131000 5322 site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 5342) of binary: /root/miniconda3/bin/python
Traceback (most recent call last):
File "/root/miniconda3/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 357, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 143, in call
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 277, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
internvl/train/internvl_chat_pretrain.py FAILED
Failures:
<NO_OTHER_FAILURES>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions