Skip to content

Segfault when querying dense array with variable-length attributes when indices access tiles out of order #2305

@chanjd

Description

@chanjd

Querying a dense TileDB array with variable-length string attributes causes a segfault (SIGSEGV, exit 139) when the index list causes tiles to be accessed out of order.

Confirmed affected: TileDB-Py 0.35.1, 0.36.0, 0.36.1

Workaround: sort indices before querying — np.sort(indices).


Conditions required to trigger

All three must be met:

  1. Dense array
  2. Variable-length attributes
  3. At least one index falls in a lower tile than a preceding index (tile access order not monotonically non-decreasing)

Note: descending indices within a single tile are safe. The tile boundary crossing is the trigger, not global sort order.


Verified cases (tile extent = 1000)

Unsafe:

  • [1001, 999] — tile 1 → tile 0
  • [1500, 500] — tile 1 → tile 0
  • [2500, 500] — tile 2 → tile 0
  • [500, 1500, 300] — tile 0 → tile 1 → tile 0

Safe:

  • [999, 1001] — tile 0 → tile 1 (ascending cross-tile)
  • [500, 1500] — tile 0 → tile 1 (ascending cross-tile)
  • [1500, 1200] — tile 1 → tile 1 (descending, same tile)
  • [500, 200] — tile 0 → tile 0 (descending, same tile)
  • [999, 1001, 1000] — tile 0 → tile 1 → tile 1 (lower tile always first)

MRE — multi_index

import tiledb
import numpy as np
import tempfile

temp_dir = tempfile.mkdtemp()
uri = f'{temp_dir}/dense_varlen'

# tile=1000: tile 0 covers indices 0-999, tile 1 covers 1000-1999
dim = tiledb.Dim(name='idx', domain=(0, 99999), tile=1000, dtype=np.uint32)
attr = tiledb.Attr(name='value', dtype=str)
schema = tiledb.ArraySchema(domain=tiledb.Domain(dim), sparse=False, attrs=[attr])
tiledb.Array.create(uri, schema)

with tiledb.open(uri, 'w') as arr:
    arr[0:10000] = {'value': [f'val_{i}' for i in range(10000)]}

with tiledb.open(uri, 'r') as arr:
    result = arr.multi_index[np.array([1001, 999], dtype=np.uint32)]  # SEGFAULT: tile 1 before tile 0

MRE — .df[]

import tiledb
import numpy as np
import tempfile

temp_dir = tempfile.mkdtemp()
uri = f'{temp_dir}/dense_varlen'

# tile=1000: tile 0 covers indices 0-999, tile 1 covers 1000-1999
dim = tiledb.Dim(name='idx', domain=(0, 99999), tile=1000, dtype=np.uint32)
attr = tiledb.Attr(name='value', dtype=str)
schema = tiledb.ArraySchema(domain=tiledb.Domain(dim), sparse=False, attrs=[attr])
tiledb.Array.create(uri, schema)

with tiledb.open(uri, 'w') as arr:
    arr[0:10000] = {'value': [f'val_{i}' for i in range(10000)]}

with tiledb.open(uri, 'r') as arr:
    result = arr.df[[1001, 999]]  # SEGFAULT: tile 1 before tile 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions