Current implementation of KryoCoder writes class for every object on the output stream. (
|
val bytes = kryoPool.toBytesWithClass(value) |
)
This was done because beam can split the stream in between and if registration is only in the beginning of the stream, the latter part of the stream will fail. However we don't want to write className for classes which are already registered.
We can set setRegistrationRequired(true) when creating the Instantiator (
|
implicit val kryoCoder: KryoCoder = new KryoCoder(defaultKryoCoderConfiguration(config)) |
).
Then in KryoCoder we can keep a mapping of classes which have registration available (We can do a Try {pool.hasRegistration} and save the output in a map for future) and for those we use kryoPool.toBytesWithoutClass and for others we do kryoPool.toBytesWithClass
Is there a better way to achieve this?
Current implementation of KryoCoder writes class for every object on the output stream. (
scalding/scalding-beam/src/main/scala/com/twitter/scalding/beam_backend/KryoCoder.scala
Line 16 in b0ba993
This was done because beam can split the stream in between and if registration is only in the beginning of the stream, the latter part of the stream will fail. However we don't want to write className for classes which are already registered.
We can set
setRegistrationRequired(true)when creating the Instantiator (scalding/scalding-beam/src/main/scala/com/twitter/scalding/beam_backend/BeamBackend.scala
Line 22 in b0ba993
Then in KryoCoder we can keep a mapping of classes which have registration available (We can do a
Try {pool.hasRegistration}and save the output in a map for future) and for those we usekryoPool.toBytesWithoutClassand for others we dokryoPool.toBytesWithClassIs there a better way to achieve this?