The vector mask load instruction does not work correctly. For VLEN=256 and SEW=8, LMUL=8 set previously, mask of 32 bytes should be loaded into mask register, but only 4 bytes are loaded actually. That's because micro_vl is constructed as follows in the Vlm_v's constructor.
int32_t micro_vl = (std::min(this->vl, micro_vlmax) + 7) / 8;
Though this->vl is 256, micro_vlmax is calculated to be 32.
The vector mask load instruction does not work correctly. For VLEN=256 and SEW=8, LMUL=8 set previously, mask of 32 bytes should be loaded into mask register, but only 4 bytes are loaded actually. That's because micro_vl is constructed as follows in the Vlm_v's constructor.
int32_t micro_vl = (std::min(this->vl, micro_vlmax) + 7) / 8;Though
this->vlis 256, micro_vlmax is calculated to be 32.