Skip to content

[CBRD-26537] supports an API that collects OOS OIDs from heap RECDES, even without class_repr#6829

Merged
vimkim merged 17 commits intoCUBRID:feat/oosfrom
vimkim:oos-vot-last-element
Mar 19, 2026
Merged

[CBRD-26537] supports an API that collects OOS OIDs from heap RECDES, even without class_repr#6829
vimkim merged 17 commits intoCUBRID:feat/oosfrom
vimkim:oos-vot-last-element

Conversation

@vimkim
Copy link
Copy Markdown
Contributor

@vimkim vimkim commented Feb 4, 2026

http://jira.cubrid.org/browse/CBRD-26537

Purpose

RECDES만으로 OOS OID 목록을 조회할 수 있는 API(heap_recdes_get_oos_oids)를 추가한다.

기존에는 OOS OID를 읽기 위해 attrinfo(즉, classrepr)가 반드시 필요했다.
attrinfo 없이 raw RECDES만 다루는 로깅·복제 경로에서는 이 정보를 얻기 어렵기 때문에,
RECDES만으로도 가변 오프셋 테이블의 끝을 알 수 있는 방법이 필요했다.

Implementation

  • object_representation.hOR_VAR_BIT_LAST_ELEMENT (0x2) 플래그 및 관련 매크로를 추가한다.
    • OR_VAR_BIT_RESERVEDOR_VAR_BIT_LAST_ELEMENT로 명칭 변경.
    • OR_SET_VAR_LAST_ELEMENT, OR_IS_LAST_ELEMENT 매크로 추가.
  • heap_attrinfo_transform_variable_to_disk에서 마지막 가변 속성을 직렬화할 때 OR_SET_VAR_LAST_ELEMENT로 해당 플래그를 설정한다.
    • 이를 통해 attrinfo->num_values를 몰라도 가변 오프셋 테이블의 끝을 판단할 수 있다.
  • heap_recdes_get_oos_oids(const RECDES *) 함수를 새로 추가한다.
    • RECDES의 MVCC 헤더에 OR_MVCC_FLAG_HAS_OOS가 없으면 빈 벡터를 반환한다.
    • 가변 오프셋 테이블을 순회하며 OR_IS_OOS 플래그가 있는 엔트리에서 OID를 읽어 벡터에 추가한다.
    • OR_IS_LAST_ELEMENT 플래그를 만나면 순회를 종료하고 수집된 OID 목록을 반환한다.

Remarks

  • LAST_ELEMENT 플래그는 OOS 여부와 무관하게 항상 마지막 가변 속성에 설정된다. 즉, 모든 레코드 쓰기에 영향을 준다.
  • 이 플래그가 없는 이전 포맷 레코드(업그레이드 전 데이터)에 대한 하위 호환성 처리는 아직 미구현이다.

검증

image

위와 같은 테스트를 통과해야 한다.

-- Test: CBRD-26537 - OR_VAR_BIT_LAST_ELEMENT correctness for OOS VOT traversal
--
-- Validates heap_recdes_get_oos_oids() can correctly traverse the variable offset
-- table (VOT) using the LAST_ELEMENT flag, without needing attrinfo->num_values.
--
-- OOS is triggered when:
--   (1) total record size > DB_PAGESIZE / 8  (typically > 2048 B)
--   (2) the variable column value is > 512 B
--
-- Each varchar below uses 1000 B to comfortably satisfy both conditions.

-- ============================================================
-- Scenario 1: 2 varchar -> 3 varchar  (ADD COLUMN)
--
-- Before ALTER: VOT = [A, B(LAST_ELEMENT)]
-- After  ALTER: new rows have VOT = [A, B, C(LAST_ELEMENT)]
--              old rows keep VOT = [A, B(LAST_ELEMENT)] until rewritten
-- ============================================================

DROP TABLE IF EXISTS tbl_s1;
CREATE TABLE tbl_s1 (id INT, vc_a VARCHAR(10000), vc_b VARCHAR(10000));

-- Insert with 2 varchar cols; each 1000 B -> OOS triggered for both cols.
INSERT INTO tbl_s1 VALUES (1, REPEAT('A', 1000), REPEAT('B', 1000));
INSERT INTO tbl_s1 VALUES (2, REPEAT('C', 1000), REPEAT('D', 1000));

-- Sanity check before schema change.
SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b FROM tbl_s1 ORDER BY id;
-- Expected: 1 | 1000 | 1000
--           2 | 1000 | 1000

-- Schema change: add a third varchar column.
-- Existing rows retain their 2-col on-disk layout (lazy schema upgrade).
-- LAST_ELEMENT must still be correctly detected on the 2nd VOT entry.
ALTER TABLE tbl_s1 ADD COLUMN vc_c VARCHAR(10000);

-- Read old (2-col layout) records under the new (3-col) schema.
-- heap_recdes_get_oos_oids must stop at LAST_ELEMENT in the 2-entry VOT.
SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b, vc_c FROM tbl_s1 ORDER BY id;
-- Expected: 1 | 1000 | 1000 | NULL
--           2 | 1000 | 1000 | NULL

-- Insert a new row that will be written with the 3-col schema.
-- LAST_ELEMENT is now on the 3rd VOT entry.
INSERT INTO tbl_s1 VALUES (3, REPEAT('E', 1000), REPEAT('F', 1000), REPEAT('G', 1000));

SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b, LENGTH(vc_c) AS len_c
FROM tbl_s1 ORDER BY id;
-- Expected: 1 | 1000 | 1000 | NULL
--           2 | 1000 | 1000 | NULL
--           3 | 1000 | 1000 | 1000

-- Force a rewrite of an old row to exercise the 3-col write path.
UPDATE tbl_s1 SET vc_c = REPEAT('Z', 1000) WHERE id = 1;

SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b, LENGTH(vc_c) AS len_c
FROM tbl_s1 ORDER BY id;
-- Expected: 1 | 1000 | 1000 | 1000
--           2 | 1000 | 1000 | NULL
--           3 | 1000 | 1000 | 1000

DROP TABLE IF EXISTS tbl_s1;

-- ============================================================
-- Scenario 2: 3 varchar -> 2 varchar  (DROP COLUMN)
--
-- Before ALTER: VOT = [A, B, C(LAST_ELEMENT)]
-- After  ALTER: new rows have VOT = [A, B(LAST_ELEMENT)]
--              old rows keep VOT = [A, B, C(LAST_ELEMENT)] until rewritten
-- ============================================================

DROP TABLE IF EXISTS tbl_s2;
CREATE TABLE tbl_s2 (id INT, vc_a VARCHAR(10000), vc_b VARCHAR(10000), vc_c VARCHAR(10000));

-- Insert with 3 varchar cols; each 1000 B -> OOS triggered for all cols.
INSERT INTO tbl_s2 VALUES (1, REPEAT('A', 1000), REPEAT('B', 1000), REPEAT('C', 1000));
INSERT INTO tbl_s2 VALUES (2, REPEAT('D', 1000), REPEAT('E', 1000), REPEAT('F', 1000));

-- Sanity check before schema change.
SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b, LENGTH(vc_c) AS len_c
FROM tbl_s2 ORDER BY id;
-- Expected: 1 | 1000 | 1000 | 1000
--           2 | 1000 | 1000 | 1000

-- Schema change: drop the third varchar column.
-- Existing rows retain their 3-col on-disk layout (lazy schema upgrade).
-- heap_recdes_get_oos_oids must still detect LAST_ELEMENT on the 3rd VOT entry
-- in those old records and stop there.
ALTER TABLE tbl_s2 DROP COLUMN vc_c;

-- Read old (3-col layout) records under the new (2-col) schema.
SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b FROM tbl_s2 ORDER BY id;
-- Expected: 1 | 1000 | 1000
--           2 | 1000 | 1000

-- Insert a new row that will be written with the 2-col schema.
-- LAST_ELEMENT is now on the 2nd VOT entry.
INSERT INTO tbl_s2 VALUES (3, REPEAT('G', 1000), REPEAT('H', 1000));

SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b FROM tbl_s2 ORDER BY id;
-- Expected: 1 | 1000 | 1000
--           2 | 1000 | 1000
--           3 | 1000 | 1000

-- Force a rewrite of an old row to exercise the 2-col write path on it.
UPDATE tbl_s2 SET vc_a = REPEAT('Z', 1000) WHERE id = 1;

SELECT id, LENGTH(vc_a) AS len_a, LENGTH(vc_b) AS len_b FROM tbl_s2 ORDER BY id;
-- Expected: 1 | 1000 | 1000
--           2 | 1000 | 1000
--           3 | 1000 | 1000

DROP TABLE IF EXISTS tbl_s2;

결과

image

@vimkim vimkim changed the title feat(heap_file): api to get vector of oos_oids from heap recdes [CBRD-26537] supports an API that collects OOS OIDs from heap RECDES, even without class_repr Mar 11, 2026
@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 11, 2026

/run sql medium shell

@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 11, 2026

/run sql medium shell

@vimkim vimkim requested review from a team, H2SU, InChiJun, YeunjunLee, hgryoo, hornetmj, hyahong and youngjun9072 and removed request for a team March 11, 2026 11:56
@vimkim vimkim self-assigned this Mar 11, 2026
@vimkim vimkim marked this pull request as ready for review March 11, 2026 12:00
@vimkim vimkim requested a review from beyondykk9 as a code owner March 11, 2026 12:00
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 11, 2026

Last reviewed commit: 51087f5

Comment thread src/storage/heap_file.c
Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.h Outdated
@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 12, 2026

/run sql medium shell

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 12, 2026

Last reviewed commit: 3ec5830

Comment thread src/storage/heap_file.c Outdated
@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 12, 2026

모든 greptile 피드백 반영 완료했습니다.

● All 5 review issues have been applied:

  1. Missing return (UB fix) — added return oos_oids; at line 27752 after the final assert(false).
  2. Loop upper bound — replaced for (;;) with for (int index = 0; index <= max_var_count; ++index) where max_var_count = recdes->length / offset_size, with an early return when the bound is hit.
  3. Fragile LAST_ELEMENT condition — changed index == attr_info->num_values - 1 to value->last_attrepr->location == attr_info->last_classrepr->n_variable - 1 (uses VOT position directly, not overall attr index).
  4. oid_vector namespace pollution — removed the using oid_vector = std::vector<OID> alias from the header; both the declaration and definition now use std::vector<OID> directly.
  5. OID bounds check — added explicit check that oid_ptr + OR_OID_SIZE <= recdes->data + recdes->length before reading the OID.

@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 12, 2026

/run medium sql

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 12, 2026

Last reviewed commit: a921293

Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.h
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 13, 2026

Last reviewed commit: 8101801

Comment thread src/storage/heap_file.c
@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 13, 2026

/run sql medium

@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 13, 2026

/run sql medium

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 13, 2026

Last reviewed commit: 8674481

Comment thread src/storage/heap_file.c
Comment thread src/base/object_representation.h
@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 13, 2026

8674481 커밋에서 test_sql, test_medium 통과

@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 13, 2026

/run shell

Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.c
Comment thread src/storage/heap_file.c Outdated
Comment thread src/storage/heap_file.c Outdated
Comment thread src/base/object_representation.h Outdated
Comment thread src/transaction/log_applier.c Outdated
return ER_OUT_OF_VIRTUAL_MEMORY;
}
offset = or_get_offset_internal (buf, &rc, offset_size);
offset = OR_GET_VAR_LENGTH (or_get_offset_internal (buf, &rc, offset_size));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. offset을 구할 때 OR_GET_VAR_LENGTH 매크로 사용은 어색합니다. or_get_offset_internal() 함수에서 ~OR_VAR_FALG_MASK 처리를 해주는게 어떨까요?
  2. offset 연산 시 OR_VAR_FALG_MASK 처리가 필요해 짐에 따라서 다른 offset 함수들을 정리해주는게 어떨까요? or_get_big_var_offset(), or_get_offset() 등 사용되지 않는 함수들을 제거 -> or_get_offset_internal()를 or_get_offset() 함수로 랩핑하고, 외부에서는 or_get_offset()를 사용

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

반영했습니다.

  1. or_get_offset_internal() 내부에서 ~OR_VAR_FLAG_MASK 처리하도록 변경하고, 모든 caller에서 OR_GET_VAR_LENGTH() 래핑 제거했습니다. OR_GET_OFFSET_INTERNAL 매크로도 동일하게 적용.
  2. or_get_big_var_offset(), or_get_offset() (get 계열) — caller가 없어 삭제했습니다. or_put_offset()transform_cl.c에서 많이 사용 중이라 유지했습니다.

@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 17, 2026

/run sql medium

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 17, 2026

Last reviewed commit: 020d7f4

Comment thread src/storage/heap_file.c
Comment thread src/base/object_representation.h Outdated
#define OR_GET_VAR_FLAG(length) ((int) (length) & OR_VAR_FALG_MASK)
#define OR_GET_VAR_LENGTH(length) ((int) (length) & (~OR_VAR_FALG_MASK))
#define OR_GET_VAR_FLAG(length) ((int) (length) & OR_VAR_FLAG_MASK)
#define OR_GET_VAR_LENGTH(length) ((int) (length) & (~OR_VAR_FLAG_MASK))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define OR_GET_VAR_LENGTH(length) ((int) (length) & (~OR_VAR_FLAG_MASK))
#define OR_GET_VAR_OFFSET(length) ((int) (length) & (~OR_VAR_FLAG_MASK))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

대응 완료했습니다.

@vimkim
Copy link
Copy Markdown
Contributor Author

vimkim commented Mar 19, 2026

/run sql medium

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 19, 2026

Last reviewed commit: "review(or): OR_GET_V..."

Comment thread src/storage/heap_file.c
Comment on lines +27733 to +27734
oos_debug ("there exists an OOS with OID %hd|%d|%hd at offset %d index %d", OID_AS_ARGS (&oid), offset,
index);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 디버그 로그에서 offset이 플래그 포함 원시값으로 기록됨

offset 변수는 OR_GET_BYTE/SHORT/INT로 읽은 원시 VOT 엔트리 값으로, OR_VAR_BIT_OOS (0x1)OR_VAR_BIT_LAST_ELEMENT (0x2) 플래그 비트가 포함된 상태입니다. 이 값을 그대로 "offset"으로 로깅하면 실제 바이트 오프셋과 다른 값(최대 3 차이)이 출력되어 디버깅 시 혼란을 줄 수 있습니다.

예: offset = 0x401이면 실제 오프셋은 0x400이지만 로그에는 0x401로 찍힙니다.

OR_GET_VAR_OFFSET(offset)을 사용하거나 oid_ptr - recdes->data로 실제 오프셋을 출력하는 것을 권장합니다:

Suggested change
oos_debug ("there exists an OOS with OID %hd|%d|%hd at offset %d index %d", OID_AS_ARGS (&oid), offset,
index);
oos_debug ("there exists an OOS with OID %hd|%d|%hd at offset %d index %d", OID_AS_ARGS (&oid),
OR_GET_VAR_OFFSET (offset), index);

Comment thread src/storage/heap_file.c
Comment on lines +27716 to +27735
{
assert (false && "OID read would exceed record bounds");
return ER_FAILED;
}
OR_BUF buf;
or_init (&buf, (char *) oid_ptr, OR_OID_SIZE);
int err = or_get_oid (&buf, &oid);
if (err != NO_ERROR)
{
assert (false && "or_get_oid failed unexpectedly");
return ER_FAILED;
}
if (OID_ISNULL (&oid))
{
assert (false && "OID read from OOS slot is null — corrupted record?");
return ER_FAILED;
}
oos_debug ("there exists an OOS with OID %hd|%d|%hd at offset %d index %d", OID_AS_ARGS (&oid), offset,
index);
oos_oids.emplace_back (oid);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 에러 반환 시 oos_oids가 부분 채워진 상태로 남음

oos_oids.clear()는 함수 진입 시 한 번만 호출됩니다. 이후 루프 중간에 ER_FAILED를 반환하는 경우(예: 27717, 27725, 27729행의 경계 초과·or_get_oid 실패·null OID 검출), 이미 emplace_back된 OID들이 oos_oids에 남아있는 채로 반환됩니다.

현재는 호출자가 반환값을 항상 확인한다고 가정하지만, 향후 실수로 반환값을 무시하는 코드가 생기면 부분 OID 목록이 조용히 사용될 수 있습니다. DBMS 커널에서 복제·로깅 경로에 잘못된 OID 목록이 전달되면 데이터 불일치 위험이 있습니다.

에러 반환 직전에 oos_oids.clear()를 호출하여 호출자에게 안전한 상태를 보장하는 것을 권장합니다:

if (oid_ptr + OR_OID_SIZE > (char *) recdes->data + recdes->length)
  {
    assert (false && "OID read would exceed record bounds");
    oos_oids.clear ();
    return ER_FAILED;
  }
// ... (or_get_oid, OID_ISNULL 에러 경로도 동일하게 처리)

@vimkim vimkim merged commit c79cdca into CUBRID:feat/oos Mar 19, 2026
10 checks passed
@vimkim vimkim deleted the oos-vot-last-element branch March 19, 2026 09:26
@github-actions
Copy link
Copy Markdown

TC Branch Finalized for cubrid-testcases-private-ex

Engine PR was merged.

Cleanup Results:

TC develop branch is ready for the next PR.

@github-actions
Copy link
Copy Markdown

TC Branch Finalized for cubrid-testcases

Engine PR was merged.

Cleanup Results:

TC develop branch is ready for the next PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants