Skip to content

Assess performance bottlenecks of the gather instruction in JVector #632

@r-devulap

Description

@r-devulap

Due to security vulnerabilities in Intel processors up to the Ice Lake generation, the gather instruction was microcode patched and is now extremely slow. Intel advisory: https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00828.html. JVector uses gather instructions in multiple places that are worth looking into:

float assemble_and_sum_f32_512(const float* data, int dataBase, const unsigned char* baseOffsets, int baseOffsetsOffset, int baseOffsetsLength) {

float pq_decoded_cosine_similarity_f32_512(const unsigned char* baseOffsets, int baseOffsetsOffset, int baseOffsetsLength, int clusterCount, const float* partialSums, const float* aMagnitude, float bMagnitude) {

‣ Ref: other libraries (e.g., NumPy’s x86 simd sort) improved performance by replacing gather with scalar loads: numpy/x86-simd-sort#65

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions