After doing some profiling, I think the solution to #109 will not run sufficiently quickly. For ASO, the covariance matrix is expected to be O(80k x 80k), or about 50GB. On one tiger node (112 cores), direct inversion of such an array takes ~8.5 minutes, but this job resource allocation is probably not what will be used for running a chain (eg, a single task for running a chain typically has 1-10 cores). 112 cores for such a task may waste a lot of resources, but waiting O(2 hours) on 10 cores to invert the matrix before starting a chain is prohibitive and also wasteful if done repeatedly.
The motivation for passing the covariance to mflike is that cuts can be applied dynamically. However, in the vast majority of cases, two things are usually true: the cuts are repeated over many instances of mflike (usually, they are decided on as a group and then set as a standard for many runs) and/or the number of elements cut is much smaller than the total data vector (if restricted to TT/TE/EE to begin with, ie, we aren't also cutting B crosses). This suggests two solutions:
mflike should be able to load an inverse covariance that has been "pre-cut" for some standard cuts, along with the info necessary to apply the same cuts to the forward-modeled data vector. This assumes that the inverse covariance has also been precomputed, which may be challenging for a public user. This doesn't support any dynamic cuts to be applied, the cuts are fixed to whatever went into the precomputed inverse-covariance.
mflike can load the entire, not-cut inverse covariance matrix, and assuming the number of elements to be cut is small, use the Schur's complement to compute the inverse of the cut inverse covariance on the fly. This requires inverting the subsection of the entire inverse covariance that corresponds to the cut elements. If that subsection is small, then this computation should be fast. Of course, if we are always cutting B crosses, then this subsection won't be small, so somehow there should be a flag to signal that we are restricting ourselves to a universe without B crosses in the first place. Overall, I think this solution is better in the long term.
This should be accompanied by an investigation into the numerical stability of large matrix inversion, and the additional computation required for solution (2).
After doing some profiling, I think the solution to #109 will not run sufficiently quickly. For ASO, the covariance matrix is expected to be O(80k x 80k), or about 50GB. On one tiger node (112 cores), direct inversion of such an array takes ~8.5 minutes, but this job resource allocation is probably not what will be used for running a chain (eg, a single task for running a chain typically has 1-10 cores). 112 cores for such a task may waste a lot of resources, but waiting O(2 hours) on 10 cores to invert the matrix before starting a chain is prohibitive and also wasteful if done repeatedly.
The motivation for passing the covariance to
mflikeis that cuts can be applied dynamically. However, in the vast majority of cases, two things are usually true: the cuts are repeated over many instances ofmflike(usually, they are decided on as a group and then set as a standard for many runs) and/or the number of elements cut is much smaller than the total data vector (if restricted to TT/TE/EE to begin with, ie, we aren't also cutting B crosses). This suggests two solutions:mflikeshould be able to load an inverse covariance that has been "pre-cut" for some standard cuts, along with the info necessary to apply the same cuts to the forward-modeled data vector. This assumes that the inverse covariance has also been precomputed, which may be challenging for a public user. This doesn't support any dynamic cuts to be applied, the cuts are fixed to whatever went into the precomputed inverse-covariance.mflikecan load the entire, not-cut inverse covariance matrix, and assuming the number of elements to be cut is small, use the Schur's complement to compute the inverse of the cut inverse covariance on the fly. This requires inverting the subsection of the entire inverse covariance that corresponds to the cut elements. If that subsection is small, then this computation should be fast. Of course, if we are always cutting B crosses, then this subsection won't be small, so somehow there should be a flag to signal that we are restricting ourselves to a universe without B crosses in the first place. Overall, I think this solution is better in the long term.This should be accompanied by an investigation into the numerical stability of large matrix inversion, and the additional computation required for solution (2).