Skip to content

errors with DELF-MLP #23

@duncanmcelfresh

Description

@duncanmcelfresh

error with DELF-MLP. These occurred on a GCP n1-highmem-2 node with 1x tesla T4.

Datasets:

  • AmazonMoviesTV
  • GoogleLocalReviews
  • Gowalla
  • YahooMusicReader
DELF_MLP_RecommenderWrapper: Init model DELF-MLP...
Traceback (most recent call last):
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/SearchAbstractClass.py", line 402, in _objective_function
    result_dict, result_string, recommender_instance, train_time, evaluation_time = self._evaluate_on_validation(current_fit_parameters_dict)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/RandomSearch.py", line 50, in _evaluate_on_validation
    current_fit_parameters
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/SearchAbstractClass.py", line 293, in _evaluate_on_validation
    recommender_instance, train_time = self._fit_model(current_fit_parameters)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/SearchAbstractClass.py", line 283, in _fit_model
    **current_fit_parameters)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Conferences/IJCAI/DELF_our_interface/DELFWrapper.py", line 104, in fit
    self.train_arr = self.train.toarray()
  File "/home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/scipy/sparse/base.py", line 881, in toarray
    return self.tocoo(copy=False).toarray(order=order, out=out)
  File "/home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/scipy/sparse/coo.py", line 317, in toarray
    B = self._process_toarray_args(order, out)
  File "/home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/scipy/sparse/base.py", line 1187, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
MemoryError

this error on:

  • Dating
  • NetflixPrize
DELF_MLP_RecommenderWrapper: Init model... done!
DELF_MLP_RecommenderWrapper: Training...
initial result file: /home/shared/result_20220728_222141_metadata.zip
renaming to: /home/shared/result.zip
Traceback (most recent call last):
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/SearchAbstractClass.py", line 402, in _objective_function
    result_dict, result_string, recommender_instance, train_time, evaluation_time = self._evaluate_on_validation(current_fit_parameters_dict)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/RandomSearch.py", line 50, in _evaluate_on_validation
    current_fit_parameters
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/SearchAbstractClass.py", line 293, in _evaluate_on_validation
    recommender_instance, train_time = self._fit_model(current_fit_parameters)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/ParameterTuning/SearchAbstractClass.py", line 283, in _fit_model
    **current_fit_parameters)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Conferences/IJCAI/DELF_our_interface/DELFWrapper.py", line 135, in fit
    **earlystopping_kwargs)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Base/Incremental_Training_Early_Stopping.py", line 177, in _train_with_early_stopping
    self._run_epoch(epochs_current)
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Conferences/IJCAI/DELF_our_interface/DELFWrapper.py", line 159, in _run_epoch
    user_input, item_input, labels = unison_shuffled_copies(np.asarray(user_input), np.asarray(item_input), np.asarray(labels))
  File "/home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Conferences/IJCAI/DELF_our_interface/DELFWrapper.py", line 294, in unison_shuffled_copies
    return a[p], b[p], c[p]
MemoryError

this one:

  • RecipesReader
[2022-07-28 23:14:05,790] [RandomSearch.py:_log_info] : RandomSearch: Starting parameter set

WARNING:tensorflow:From /home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from 
tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
[2022-07-28 23:14:33,842] [deprecation.py:new_func] : From /home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/framework/op_def_l
ibrary.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Conferences/IJCAI/DELF_original/Model/NMF_attention_MLP.py:210: calling reduce_s
um_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
[2022-07-28 23:14:34,055] [deprecation.py:new_func] : From /home/shared/reczilla/RecSys2019_DeepLearning_Evaluation/Conferences/IJCAI/DELF_original/Model/NMF_att
ention_MLP.py:210: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.pytho
n.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
[2022-07-28 23:14:34,559] [deprecation.py:new_func] : From /home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:30
66: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Traceback (most recent call last):
  File "/home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/shared/miniconda3/envs/reczilla/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[64034,1,231637] and type float on /job:localhost/replica:0/
task:0/device:CPU:0 by allocator cpu
         [[{{node predict/GatherV2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

failed for unknown reason on :

  • Anime
  • BookCrossingReader
  • Epinions
  • Jester2
  • MovieTweetings
  • Movielens10M

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions