Re #77 (comment), I would like to propose a new driver that handles this slightly differently. It requires more work up front, but I think it could allow for more customization and future-proofing in the long run. @richfitz, if you like the idea, please let me know and I will write a more thorough design document.
Initialization
The proposed multiformat driver accepts a custom read/write protocol on initialization. The default format is RDS, and storr_multiformat() with an empty formats argument should behave like storr_rds().
s <- storr_multiformat(
path,
formats = storr_format_protocol(
storr_format(
class = c("keras.engine.sequential.Sequential"),
extension = "keras",
hash = "object",
serialize = function(object) {
keras::serialize_model(object)
},
unserialize = function(raw) {
keras::unserialize_model(raw)
},
read = function(filepath = path) {
readRDS(file = path)
},
write = function(object, path) {
saveRDS(object = object, file = path)
}
),
storr_format(
extn = "fst",
class = "data.frame",
hash = "file", # Hash the file, not the in-memory data. Avoids serialization.
read = function(path) {
# Read in fst format.
},
write = function(object, path) {
# Write in fst format.
},
)
)
)
We could store the format protocol in an R script that gets source()d when we call storr_multiformat() on an existing storr.
path/
├── config/
├───── formats.R
├───── hash_algorithm
├───── mangle_key
├───── version
├── data/
├── keys/
└── scratch/
If a multiformat storr already exists at the given path, the user should not be allowed to set the formats argument.
s <- storr_multiformat(
path,
formats = storr_format_protocol(storr_format(...))
)
#> Error: cannot set formats of an existing multiformat storr.
Storage
s$set(key, value) could
- Choose the most appropriate format for
value given its S3 class.
- If
hash is equal to "object" for the given format, serialize and hash value in memory.
- Save the object to a temporary file in
scratch/.
- If
hash is equal to "file", hash the temporary file without having serialized anything.
- Move the file to
HASH.EXT, where EXT is the file extension we gave in the protocol.
Retrieval
s$get(key) could
- Get the file extension of the data file.
- Identify the format in which it was originally saved.
- Read the data using the
read function in the protocol.
Re #77 (comment), I would like to propose a new driver that handles this slightly differently. It requires more work up front, but I think it could allow for more customization and future-proofing in the long run. @richfitz, if you like the idea, please let me know and I will write a more thorough design document.
Initialization
The proposed multiformat driver accepts a custom read/write protocol on initialization. The default format is RDS, and
storr_multiformat()with an emptyformatsargument should behave likestorr_rds().We could store the format protocol in an R script that gets
source()d when we callstorr_multiformat()on an existingstorr.If a multiformat
storralready exists at the given path, the user should not be allowed to set theformatsargument.Storage
s$set(key, value)couldvaluegiven its S3 class.hashis equal to"object"for the given format, serialize and hashvaluein memory.scratch/.hashis equal to"file", hash the temporary file without having serialized anything.HASH.EXT, whereEXTis the file extension we gave in the protocol.Retrieval
s$get(key)couldreadfunction in the protocol.