We need to finalize the column data types before saving .pkl files to ensure compatibility and efficiency during training.
Decisions:
? userID and profileID → int64 (torch.long) ==> how to embed gender together?
Required by nn.Embedding. int32 or float32 will cause runtime errors. Still need to a
? rating → float32
Compatible with common losses like MSELoss. Using float16 is risky unless the entire model is converted and hardware supports it.
? gender → categorical (object or category in pandas)
Should be encoded as int64 via .cat.codes if used as input to an embedding.
Notes:
float32 is not valid for ID columns, even if values would numerically fit.
- Avoid
int8/uint32 for training — they must be cast anyway and offer no gain.
We need to finalize the column data types before saving
.pklfiles to ensure compatibility and efficiency during training.Decisions:
?
userIDandprofileID→int64(torch.long) ==> how to embed gender together?Required by
nn.Embedding.int32orfloat32will cause runtime errors. Still need to a?
rating→float32Compatible with common losses like
MSELoss. Usingfloat16is risky unless the entire model is converted and hardware supports it.?
gender→ categorical (objectorcategoryin pandas)Should be encoded as
int64via.cat.codesif used as input to an embedding.Notes:
float32is not valid for ID columns, even if values would numerically fit.int8/uint32for training — they must be cast anyway and offer no gain.