Skip to content

Define proper data types for training with PyTorch #4

Description

@DePoli97

We need to finalize the column data types before saving .pkl files to ensure compatibility and efficiency during training.

Decisions:

? userID and profileIDint64 (torch.long) ==> how to embed gender together?
Required by nn.Embedding. int32 or float32 will cause runtime errors. Still need to a
? ratingfloat32
Compatible with common losses like MSELoss. Using float16 is risky unless the entire model is converted and hardware supports it.
? gender → categorical (object or category in pandas)
Should be encoded as int64 via .cat.codes if used as input to an embedding.

Notes:

  • float32 is not valid for ID columns, even if values would numerically fit.
  • Avoid int8/uint32 for training — they must be cast anyway and offer no gain.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions