Two options exist for memory manager:
- Static memory allocator - if we know the schedule of the graph computations we can detect at "graph" finalization each node exactly when "goes out of scope". This can allow us to detect when this memory can be recycled and used for other nodes. This all can be done statically and the memory manager should map an
id: usize to an offset in an internal preallocated buffer. The size of the buffer will be fully known at begging of calling the function and can be persistant.
- Example : f = tanh(a + MatrixMul(b, c) + d). In graph or SSA form we will have something like:
n0 = MatrixMul(b, c), n1 = a + n0 + b, n2 = tanh(n1). If all of the tensors are the same size we can instead use a single buffer, that is n0, n1, n2 to point to the same memory location. Since we know f this can be done before even calling f (e.g. what a standard compiler would do for your registers)
- Dynamic memory allocator - essentially at run time of the function it would "request" from the memory allocator memory of certain sizes, while the manager would have its own buffer probably into buckets and provide free slots. This however requires that we call back the memory manager to free slots when they are no longer needed.
From the two 1. is preferable as it a lot more optimal than 2. as well as it can tell you before execution the memory needed. 1. Has been used in MXNet, which is why they achieve the best memory footprints, while 2. is largely used in all other frameworks.