In graphium/features/featurizer.py, line 634:
def mol_to_adj_and_features(
mol: Union[str, dm.Mol],
atom_property_list_onehot: List[str] = [],
atom_property_list_float: List[Union[str, Callable]] = [],
conformer_property_list: List[str] = [],
edge_property_list: List[str] = [],
add_self_loop: bool = False,
explicit_H: bool = False,
use_bonds_weights: bool = False,
pos_encoding_as_features: Dict[str, Any] = None,
dtype: np.dtype = np.float16,
mask_nan: Union[str, float, type(None)] = "raise",
) -> Union[
coo_matrix,
Union[Tensor, None],
Union[Tensor, None],
Dict[str, Tensor],
Union[Tensor, None],
Dict[str, Tensor],
]:
graphium seems to use np.float16 as default dtype for this method. However, mol_to_adj_and_features calls
def mol_to_adjacency_matrix(
mol: dm.Mol,
use_bonds_weights: bool = False,
add_self_loop: bool = False,
dtype: np.dtype = np.float32,
) -> coo_matrix:
(line 791)
which has default dtype of np.float32.
The problem is that in mol_to_adjacency_matrix, the adjacency matrix is converted to sparse array;
if len(adj_val) > 0: # ensure tensor is not empty
adj = coo_matrix(
(torch.as_tensor(adj_val), torch.as_tensor(adj_idx).T.reshape(2, -1)),
shape=(mol.GetNumAtoms(), mol.GetNumAtoms()),
dtype=dtype,
)
Which causes, in my environment,
ValueError: scipy.sparse does not support dtype float16. The only supported types are: bool, int8, uint8, int16, uint16, int32, uint32, int64, uint64, longlong, ulonglong, float32, float64, longdouble, complex64, complex128, clongdouble.
As far as I know, this has been discussed in scipy (scipy/scipy#7408) and in recent versions the checks have become stronger (scipy/scipy#20207).
I believe this can be fixed simply by using np.float32 instead. However, if usage of small dtypes for memory efficiency is critical, workarounds would be more complicated.
In
graphium/features/featurizer.py, line 634:graphiumseems to usenp.float16as default dtype for this method. However,mol_to_adj_and_featurescalls(line 791)
which has default dtype of
np.float32.The problem is that in
mol_to_adjacency_matrix, the adjacency matrix is converted to sparse array;Which causes, in my environment,
As far as I know, this has been discussed in scipy (scipy/scipy#7408) and in recent versions the checks have become stronger (scipy/scipy#20207).
I believe this can be fixed simply by using
np.float32instead. However, if usage of small dtypes for memory efficiency is critical, workarounds would be more complicated.