Skip to content

[debug] Support 32-bit column number in DILocation#201269

Open
topolarity wants to merge 1 commit into
llvm:mainfrom
topolarity:ct/dwarf-32bit-col-trunk
Open

[debug] Support 32-bit column number in DILocation#201269
topolarity wants to merge 1 commit into
llvm:mainfrom
topolarity:ct/dwarf-32bit-col-trunk

Conversation

@topolarity

Copy link
Copy Markdown

This adds support for up to 32-bit column numbers in DILocation.

The downstream use case is in Julia, where we'd like to use this as a side channel to encode a PC-like index in DWARF, which works well except for this ~16-bit limitation.

DWARF supports this (DW_LNS_set_column takes a ULEB128) but CodeView does not, so in the case of overflow, I opted to drop the column number (similar to the existing adjustColumn) rather than the entire DILocation.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Hello @topolarity 👋

Thank you for submitting a Pull Request (PR) to the LLVM Project. Since this is your first PR, here are a few useful links covering our main contribution policies and review practices.

  • All contributions to LLVM must follow our LLVM AI Tool Use Policy. In particular, if you used AI while working on this PR, remember to add a note to the PR description.
  • The LLVM Code-Review Policy and Practices document contains practical information about the PR process, including how patches are reviewed and accepted, and who can review a PR.
  • Our LLVM Developer Policy describes our expectations for code quality, commit summaries and contains notes on our CI system.

Please reply to this message to confirm that you have read these policies, especially the LLVM AI Tool Use Policy, and that any AI tool usage has been noted in the PR description.


Frequently asked questions

How do I add reviewers?

This PR will be automatically labeled, and the relevant teams will be notified. For some parts of the project, reviewers may also be added automatically.

You can also add reviewers manually using the Reviewers section on this page. If you cannot use that section, it is probably because you do not have write permissions for the repository. In that case, you can request a review by tagging reviewers in a comment using @ followed by their GitHub username.

What if there are no comments?

If you have not received any comments on your PR after a week, you can request a review by pinging the PR with a comment such as “Ping”. The common courtesy ping rate is once a week. Please remember that you are asking for volunteer time from other developers.

Are any special GitHub settings required to contribute to LLVM?

We only require contributors to have a public email address associated with their GitHub commits, see this section of LLVM Developer Policy for details.


If you have questions, feel free to leave a comment on this PR, or ask on LLVM Discord or LLVM Discourse.

Thank you,
The LLVM Community

@topolarity topolarity changed the title DILocation: Support 32-bit column number [debug] Support 32-bit column number in DILocation Jun 3, 2026
@llvmorg-github-actions

llvmorg-github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-llvm-mc

Author: Cody Tapscott (topolarity)

Changes

This adds support for up to 32-bit column numbers in DILocation.

The downstream use case is in Julia, where we'd like to use this as a side channel to encode a PC-like index in DWARF, which works well except for this ~16-bit limitation.

DWARF supports this (DW_LNS_set_column takes a ULEB128) but CodeView does not, so in the case of overflow, I opted to drop the column number (similar to the existing adjustColumn) rather than the entire DILocation.


Full diff: https://github.com/llvm/llvm-project/pull/201269.diff

7 Files Affected:

  • (modified) llvm/include/llvm/IR/DebugInfoMetadata.h (+5-6)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+2-5)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp (+12-5)
  • (modified) llvm/lib/IR/DebugInfoMetadata.cpp (+5-18)
  • (modified) llvm/lib/IR/LLVMContextImpl.h (+7-8)
  • (modified) llvm/unittests/IR/MetadataTest.cpp (+15)
diff --git a/llvm/include/llvm/IR/DebugInfoMetadata.h b/llvm/include/llvm/IR/DebugInfoMetadata.h
index 29d2de7a58884..9ea7b099b459a 100644
--- a/llvm/include/llvm/IR/DebugInfoMetadata.h
+++ b/llvm/include/llvm/IR/DebugInfoMetadata.h
@@ -2643,13 +2643,13 @@ class DISubprogram : public DILocalScope {
 ///
 /// A debug location in source code, used for debug info and otherwise.
 ///
-/// Uses the SubclassData1, SubclassData16 and SubclassData32
-/// Metadata slots.
+/// Uses the SubclassData1 and SubclassData32 Metadata slots.
 
 class DILocation : public MDNode {
   friend class LLVMContextImpl;
   friend class MDNode;
-  uint64_t AtomGroup : 61;
+  uint64_t Column : 32;
+  uint64_t AtomGroup : 29;
   uint64_t AtomRank : 3;
 
   DILocation(LLVMContext &C, StorageType Storage, unsigned Line,
@@ -2710,7 +2710,7 @@ class DILocation : public MDNode {
   TempDILocation clone() const { return cloneImpl(); }
 
   unsigned getLine() const { return SubclassData32; }
-  unsigned getColumn() const { return SubclassData16; }
+  unsigned getColumn() const { return Column; }
   DILocalScope *getScope() const { return cast<DILocalScope>(getRawScope()); }
 
   /// Return the linkage name of Subprogram. If the linkage name is empty,
@@ -2965,14 +2965,13 @@ class DILexicalBlock : public DILexicalBlockBase {
   friend class LLVMContextImpl;
   friend class MDNode;
 
-  uint16_t Column;
+  uint32_t Column;
 
   DILexicalBlock(LLVMContext &C, StorageType Storage, unsigned Line,
                  unsigned Column, ArrayRef<Metadata *> Ops)
       : DILexicalBlockBase(C, DILexicalBlockKind, Storage, Ops),
         Column(Column) {
     SubclassData32 = Line;
-    assert(Column < (1u << 16) && "Expected 16-bit column");
   }
   ~DILexicalBlock() = default;
 
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index 168eba59c11db..c7e12458192d0 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -107,7 +107,7 @@ struct MCDwarfFile {
 class MCDwarfLoc {
   uint32_t FileNum;
   uint32_t Line;
-  uint16_t Column;
+  uint32_t Column;
   // Flags (see #define's below)
   uint8_t Flags;
   uint8_t Isa;
@@ -159,10 +159,7 @@ class MCDwarfLoc {
   void setLine(unsigned line) { Line = line; }
 
   /// Set the Column of this MCDwarfLoc.
-  void setColumn(unsigned column) {
-    assert(column <= UINT16_MAX);
-    Column = column;
-  }
+  void setColumn(unsigned column) { Column = column; }
 
   /// Set the Flags of this MCDwarfLoc.
   void setFlags(unsigned flags) {
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index 44d81ad852688..d16bfe4a9b910 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -5000,7 +5000,7 @@ struct LineField : public MDUnsignedField {
 };
 
 struct ColumnField : public MDUnsignedField {
-  ColumnField() : MDUnsignedField(0, UINT16_MAX) {}
+  ColumnField() : MDUnsignedField(0, UINT32_MAX) {}
 };
 
 struct DwarfTagField : public MDUnsignedField {
diff --git a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
index ccec0b5910658..d27ef852cda8f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
@@ -246,9 +246,15 @@ CodeViewDebug::getInlineSite(const DILocation *InlinedAt,
               .SiteFuncId;
 
     Site->SiteFuncId = NextFuncId++;
+
+    // Set to unknown on overflow. We only have 16 bits to play with here.
+    unsigned IAColumn = InlinedAt->getColumn();
+    if (IAColumn > UINT16_MAX)
+      IAColumn = 0;
+
     OS.emitCVInlineSiteIdDirective(
         Site->SiteFuncId, ParentFuncId, maybeRecordFile(InlinedAt->getFile()),
-        InlinedAt->getLine(), InlinedAt->getColumn(), SMLoc());
+        InlinedAt->getLine(), IAColumn, SMLoc());
     Site->Inlinee = Inlinee;
     InlinedSubprograms.insert(Inlinee);
     auto InlineeIdx = getFuncIdForSubprogram(Inlinee);
@@ -523,9 +529,10 @@ void CodeViewDebug::maybeRecordLocation(const DebugLoc &DL,
       LI.isNeverStepInto())
     return;
 
-  ColumnInfo CI(DL.getCol(), /*EndColumn=*/0);
-  if (CI.getStartColumn() != DL.getCol())
-    return;
+  // Set to unknown on overflow. We only have 16 bits to play with here.
+  unsigned Column = DL.getCol();
+  if (Column > UINT16_MAX)
+    Column = 0;
 
   if (!CurFn->HaveLineInfo)
     CurFn->HaveLineInfo = true;
@@ -558,7 +565,7 @@ void CodeViewDebug::maybeRecordLocation(const DebugLoc &DL,
     addLocIfNotPresent(CurFn->ChildSites, Loc);
   }
 
-  OS.emitCVLocDirective(FuncId, FileId, DL.getLine(), DL.getCol(),
+  OS.emitCVLocDirective(FuncId, FileId, DL.getLine(), Column,
                         /*PrologueEnd=*/false, /*IsStmt=*/false,
                         DL->getFilename(), SMLoc());
 }
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index 79b7ce040b552..e553dcd45395a 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -61,37 +61,27 @@ DebugVariableAggregate::DebugVariableAggregate(const DbgVariableRecord *DVR)
 DILocation::DILocation(LLVMContext &C, StorageType Storage, unsigned Line,
                        unsigned Column, uint64_t AtomGroup, uint8_t AtomRank,
                        ArrayRef<Metadata *> MDs, bool ImplicitCode)
-    : MDNode(C, DILocationKind, Storage, MDs), AtomGroup(AtomGroup),
-      AtomRank(AtomRank) {
+    : MDNode(C, DILocationKind, Storage, MDs), Column(Column),
+      AtomGroup(AtomGroup), AtomRank(AtomRank) {
   assert(AtomRank <= 7 && "AtomRank number should fit in 3 bits");
+  assert(AtomGroup < (1ULL << 29) && "AtomGroup number should fit in 29 bits");
   if (AtomGroup)
     C.updateDILocationAtomGroupWaterline(AtomGroup + 1);
 
   assert((MDs.size() == 1 || MDs.size() == 2) &&
          "Expected a scope and optional inlined-at");
-  // Set line and column.
-  assert(Column < (1u << 16) && "Expected 16-bit column");
-
+  // Set line. The column is stored in the dedicated 32-bit Column field via the
+  // member initializer above.
   SubclassData32 = Line;
-  SubclassData16 = Column;
 
   setImplicitCode(ImplicitCode);
 }
 
-static void adjustColumn(unsigned &Column) {
-  // Set to unknown on overflow.  We only have 16 bits to play with here.
-  if (Column >= (1u << 16))
-    Column = 0;
-}
-
 DILocation *DILocation::getImpl(LLVMContext &Context, unsigned Line,
                                 unsigned Column, Metadata *Scope,
                                 Metadata *InlinedAt, bool ImplicitCode,
                                 uint64_t AtomGroup, uint8_t AtomRank,
                                 StorageType Storage, bool ShouldCreate) {
-  // Fixup column.
-  adjustColumn(Column);
-
   if (Storage == Uniqued) {
     if (auto *N = getUniqued(Context.pImpl->DILocations,
                              DILocationInfo::KeyTy(Line, Column, Scope,
@@ -1521,9 +1511,6 @@ DILexicalBlock *DILexicalBlock::getImpl(LLVMContext &Context, Metadata *Scope,
                                         Metadata *File, unsigned Line,
                                         unsigned Column, StorageType Storage,
                                         bool ShouldCreate) {
-  // Fixup column.
-  adjustColumn(Column);
-
   assert(Scope && "Expected scope");
   DEFINE_GETIMPL_LOOKUP(DILexicalBlock, (Scope, File, Line, Column));
   Metadata *Ops[] = {File, Scope};
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 57ff622b08813..a80c993f1bc29 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -297,13 +297,13 @@ template <> struct MDNodeKeyImpl<MDTuple> : MDNodeOpsKey {
 template <> struct MDNodeKeyImpl<DILocation> {
   Metadata *Scope;
   Metadata *InlinedAt;
-  uint64_t AtomGroup : 61;
+  uint64_t AtomGroup : 29;
   uint64_t AtomRank : 3;
   unsigned Line;
-  uint16_t Column;
+  unsigned Column;
   bool ImplicitCode;
 
-  MDNodeKeyImpl(unsigned Line, uint16_t Column, Metadata *Scope,
+  MDNodeKeyImpl(unsigned Line, unsigned Column, Metadata *Scope,
                 Metadata *InlinedAt, bool ImplicitCode, uint64_t AtomGroup,
                 uint8_t AtomRank)
       : Scope(Scope), InlinedAt(InlinedAt), AtomGroup(AtomGroup),
@@ -324,8 +324,7 @@ template <> struct MDNodeKeyImpl<DILocation> {
   }
 
   unsigned getHashValue() const {
-    uint64_t LineColumnAndImplicitCode =
-        Line | (uint64_t(Column) << 32) | (uint64_t(ImplicitCode) << 48);
+    uint64_t LineAndColumn = uint64_t(Line) | (uint64_t(Column) << 32);
     // Hashing AtomGroup and AtomRank substantially impacts performance whether
     // Key Instructions is enabled or not. We can't detect whether it's enabled
     // here cheaply; avoiding hashing zero values is a good approximation. This
@@ -334,9 +333,9 @@ template <> struct MDNodeKeyImpl<DILocation> {
     // outweighed by the overall compile time savings by performing this check.
     // * (hash_combine(x) != hash_combine(x, 0))
     if (AtomGroup || AtomRank)
-      return hash_combine(LineColumnAndImplicitCode, Scope, InlinedAt,
-                          AtomGroup | (uint64_t(AtomRank) << 61));
-    return hash_combine(LineColumnAndImplicitCode, Scope, InlinedAt);
+      return hash_combine(LineAndColumn, ImplicitCode, Scope, InlinedAt,
+                          AtomGroup | (uint64_t(AtomRank) << 29));
+    return hash_combine(LineAndColumn, ImplicitCode, Scope, InlinedAt);
   }
 };
 
diff --git a/llvm/unittests/IR/MetadataTest.cpp b/llvm/unittests/IR/MetadataTest.cpp
index 7222c885548e0..d0a35720d99a0 100644
--- a/llvm/unittests/IR/MetadataTest.cpp
+++ b/llvm/unittests/IR/MetadataTest.cpp
@@ -1627,6 +1627,21 @@ TEST_F(DILocationTest, getDistinct) {
   EXPECT_EQ(L1, DILocation::get(Context, 2, 7, N));
 }
 
+TEST_F(DILocationTest, WideColumn) {
+  MDNode *N = getSubprogram();
+  // Column numbers wider than 16 bits are preserved (up to 32 bits) rather
+  // than being clamped to zero.
+  DILocation *L = DILocation::get(Context, 2, 0x10000, N);
+  EXPECT_EQ(0x10000u, L->getColumn());
+
+  DILocation *Max = DILocation::get(Context, 2, UINT32_MAX, N);
+  EXPECT_EQ(UINT32_MAX, Max->getColumn());
+
+  // Distinct columns produce distinct (and uniqued) nodes.
+  EXPECT_EQ(L, DILocation::get(Context, 2, 0x10000, N));
+  EXPECT_NE(L, Max);
+}
+
 TEST_F(DILocationTest, getTemporary) {
   MDNode *N = MDNode::get(Context, {});
   auto L = DILocation::getTemporary(Context, 2, 7, N);

@llvmorg-github-actions

Copy link
Copy Markdown

@llvm/pr-subscribers-debuginfo

Author: Cody Tapscott (topolarity)

Changes

This adds support for up to 32-bit column numbers in DILocation.

The downstream use case is in Julia, where we'd like to use this as a side channel to encode a PC-like index in DWARF, which works well except for this ~16-bit limitation.

DWARF supports this (DW_LNS_set_column takes a ULEB128) but CodeView does not, so in the case of overflow, I opted to drop the column number (similar to the existing adjustColumn) rather than the entire DILocation.


Full diff: https://github.com/llvm/llvm-project/pull/201269.diff

7 Files Affected:

  • (modified) llvm/include/llvm/IR/DebugInfoMetadata.h (+5-6)
  • (modified) llvm/include/llvm/MC/MCDwarf.h (+2-5)
  • (modified) llvm/lib/AsmParser/LLParser.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp (+12-5)
  • (modified) llvm/lib/IR/DebugInfoMetadata.cpp (+5-18)
  • (modified) llvm/lib/IR/LLVMContextImpl.h (+7-8)
  • (modified) llvm/unittests/IR/MetadataTest.cpp (+15)
diff --git a/llvm/include/llvm/IR/DebugInfoMetadata.h b/llvm/include/llvm/IR/DebugInfoMetadata.h
index 29d2de7a58884..9ea7b099b459a 100644
--- a/llvm/include/llvm/IR/DebugInfoMetadata.h
+++ b/llvm/include/llvm/IR/DebugInfoMetadata.h
@@ -2643,13 +2643,13 @@ class DISubprogram : public DILocalScope {
 ///
 /// A debug location in source code, used for debug info and otherwise.
 ///
-/// Uses the SubclassData1, SubclassData16 and SubclassData32
-/// Metadata slots.
+/// Uses the SubclassData1 and SubclassData32 Metadata slots.
 
 class DILocation : public MDNode {
   friend class LLVMContextImpl;
   friend class MDNode;
-  uint64_t AtomGroup : 61;
+  uint64_t Column : 32;
+  uint64_t AtomGroup : 29;
   uint64_t AtomRank : 3;
 
   DILocation(LLVMContext &C, StorageType Storage, unsigned Line,
@@ -2710,7 +2710,7 @@ class DILocation : public MDNode {
   TempDILocation clone() const { return cloneImpl(); }
 
   unsigned getLine() const { return SubclassData32; }
-  unsigned getColumn() const { return SubclassData16; }
+  unsigned getColumn() const { return Column; }
   DILocalScope *getScope() const { return cast<DILocalScope>(getRawScope()); }
 
   /// Return the linkage name of Subprogram. If the linkage name is empty,
@@ -2965,14 +2965,13 @@ class DILexicalBlock : public DILexicalBlockBase {
   friend class LLVMContextImpl;
   friend class MDNode;
 
-  uint16_t Column;
+  uint32_t Column;
 
   DILexicalBlock(LLVMContext &C, StorageType Storage, unsigned Line,
                  unsigned Column, ArrayRef<Metadata *> Ops)
       : DILexicalBlockBase(C, DILexicalBlockKind, Storage, Ops),
         Column(Column) {
     SubclassData32 = Line;
-    assert(Column < (1u << 16) && "Expected 16-bit column");
   }
   ~DILexicalBlock() = default;
 
diff --git a/llvm/include/llvm/MC/MCDwarf.h b/llvm/include/llvm/MC/MCDwarf.h
index 168eba59c11db..c7e12458192d0 100644
--- a/llvm/include/llvm/MC/MCDwarf.h
+++ b/llvm/include/llvm/MC/MCDwarf.h
@@ -107,7 +107,7 @@ struct MCDwarfFile {
 class MCDwarfLoc {
   uint32_t FileNum;
   uint32_t Line;
-  uint16_t Column;
+  uint32_t Column;
   // Flags (see #define's below)
   uint8_t Flags;
   uint8_t Isa;
@@ -159,10 +159,7 @@ class MCDwarfLoc {
   void setLine(unsigned line) { Line = line; }
 
   /// Set the Column of this MCDwarfLoc.
-  void setColumn(unsigned column) {
-    assert(column <= UINT16_MAX);
-    Column = column;
-  }
+  void setColumn(unsigned column) { Column = column; }
 
   /// Set the Flags of this MCDwarfLoc.
   void setFlags(unsigned flags) {
diff --git a/llvm/lib/AsmParser/LLParser.cpp b/llvm/lib/AsmParser/LLParser.cpp
index 44d81ad852688..d16bfe4a9b910 100644
--- a/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm/lib/AsmParser/LLParser.cpp
@@ -5000,7 +5000,7 @@ struct LineField : public MDUnsignedField {
 };
 
 struct ColumnField : public MDUnsignedField {
-  ColumnField() : MDUnsignedField(0, UINT16_MAX) {}
+  ColumnField() : MDUnsignedField(0, UINT32_MAX) {}
 };
 
 struct DwarfTagField : public MDUnsignedField {
diff --git a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
index ccec0b5910658..d27ef852cda8f 100644
--- a/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
@@ -246,9 +246,15 @@ CodeViewDebug::getInlineSite(const DILocation *InlinedAt,
               .SiteFuncId;
 
     Site->SiteFuncId = NextFuncId++;
+
+    // Set to unknown on overflow. We only have 16 bits to play with here.
+    unsigned IAColumn = InlinedAt->getColumn();
+    if (IAColumn > UINT16_MAX)
+      IAColumn = 0;
+
     OS.emitCVInlineSiteIdDirective(
         Site->SiteFuncId, ParentFuncId, maybeRecordFile(InlinedAt->getFile()),
-        InlinedAt->getLine(), InlinedAt->getColumn(), SMLoc());
+        InlinedAt->getLine(), IAColumn, SMLoc());
     Site->Inlinee = Inlinee;
     InlinedSubprograms.insert(Inlinee);
     auto InlineeIdx = getFuncIdForSubprogram(Inlinee);
@@ -523,9 +529,10 @@ void CodeViewDebug::maybeRecordLocation(const DebugLoc &DL,
       LI.isNeverStepInto())
     return;
 
-  ColumnInfo CI(DL.getCol(), /*EndColumn=*/0);
-  if (CI.getStartColumn() != DL.getCol())
-    return;
+  // Set to unknown on overflow. We only have 16 bits to play with here.
+  unsigned Column = DL.getCol();
+  if (Column > UINT16_MAX)
+    Column = 0;
 
   if (!CurFn->HaveLineInfo)
     CurFn->HaveLineInfo = true;
@@ -558,7 +565,7 @@ void CodeViewDebug::maybeRecordLocation(const DebugLoc &DL,
     addLocIfNotPresent(CurFn->ChildSites, Loc);
   }
 
-  OS.emitCVLocDirective(FuncId, FileId, DL.getLine(), DL.getCol(),
+  OS.emitCVLocDirective(FuncId, FileId, DL.getLine(), Column,
                         /*PrologueEnd=*/false, /*IsStmt=*/false,
                         DL->getFilename(), SMLoc());
 }
diff --git a/llvm/lib/IR/DebugInfoMetadata.cpp b/llvm/lib/IR/DebugInfoMetadata.cpp
index 79b7ce040b552..e553dcd45395a 100644
--- a/llvm/lib/IR/DebugInfoMetadata.cpp
+++ b/llvm/lib/IR/DebugInfoMetadata.cpp
@@ -61,37 +61,27 @@ DebugVariableAggregate::DebugVariableAggregate(const DbgVariableRecord *DVR)
 DILocation::DILocation(LLVMContext &C, StorageType Storage, unsigned Line,
                        unsigned Column, uint64_t AtomGroup, uint8_t AtomRank,
                        ArrayRef<Metadata *> MDs, bool ImplicitCode)
-    : MDNode(C, DILocationKind, Storage, MDs), AtomGroup(AtomGroup),
-      AtomRank(AtomRank) {
+    : MDNode(C, DILocationKind, Storage, MDs), Column(Column),
+      AtomGroup(AtomGroup), AtomRank(AtomRank) {
   assert(AtomRank <= 7 && "AtomRank number should fit in 3 bits");
+  assert(AtomGroup < (1ULL << 29) && "AtomGroup number should fit in 29 bits");
   if (AtomGroup)
     C.updateDILocationAtomGroupWaterline(AtomGroup + 1);
 
   assert((MDs.size() == 1 || MDs.size() == 2) &&
          "Expected a scope and optional inlined-at");
-  // Set line and column.
-  assert(Column < (1u << 16) && "Expected 16-bit column");
-
+  // Set line. The column is stored in the dedicated 32-bit Column field via the
+  // member initializer above.
   SubclassData32 = Line;
-  SubclassData16 = Column;
 
   setImplicitCode(ImplicitCode);
 }
 
-static void adjustColumn(unsigned &Column) {
-  // Set to unknown on overflow.  We only have 16 bits to play with here.
-  if (Column >= (1u << 16))
-    Column = 0;
-}
-
 DILocation *DILocation::getImpl(LLVMContext &Context, unsigned Line,
                                 unsigned Column, Metadata *Scope,
                                 Metadata *InlinedAt, bool ImplicitCode,
                                 uint64_t AtomGroup, uint8_t AtomRank,
                                 StorageType Storage, bool ShouldCreate) {
-  // Fixup column.
-  adjustColumn(Column);
-
   if (Storage == Uniqued) {
     if (auto *N = getUniqued(Context.pImpl->DILocations,
                              DILocationInfo::KeyTy(Line, Column, Scope,
@@ -1521,9 +1511,6 @@ DILexicalBlock *DILexicalBlock::getImpl(LLVMContext &Context, Metadata *Scope,
                                         Metadata *File, unsigned Line,
                                         unsigned Column, StorageType Storage,
                                         bool ShouldCreate) {
-  // Fixup column.
-  adjustColumn(Column);
-
   assert(Scope && "Expected scope");
   DEFINE_GETIMPL_LOOKUP(DILexicalBlock, (Scope, File, Line, Column));
   Metadata *Ops[] = {File, Scope};
diff --git a/llvm/lib/IR/LLVMContextImpl.h b/llvm/lib/IR/LLVMContextImpl.h
index 57ff622b08813..a80c993f1bc29 100644
--- a/llvm/lib/IR/LLVMContextImpl.h
+++ b/llvm/lib/IR/LLVMContextImpl.h
@@ -297,13 +297,13 @@ template <> struct MDNodeKeyImpl<MDTuple> : MDNodeOpsKey {
 template <> struct MDNodeKeyImpl<DILocation> {
   Metadata *Scope;
   Metadata *InlinedAt;
-  uint64_t AtomGroup : 61;
+  uint64_t AtomGroup : 29;
   uint64_t AtomRank : 3;
   unsigned Line;
-  uint16_t Column;
+  unsigned Column;
   bool ImplicitCode;
 
-  MDNodeKeyImpl(unsigned Line, uint16_t Column, Metadata *Scope,
+  MDNodeKeyImpl(unsigned Line, unsigned Column, Metadata *Scope,
                 Metadata *InlinedAt, bool ImplicitCode, uint64_t AtomGroup,
                 uint8_t AtomRank)
       : Scope(Scope), InlinedAt(InlinedAt), AtomGroup(AtomGroup),
@@ -324,8 +324,7 @@ template <> struct MDNodeKeyImpl<DILocation> {
   }
 
   unsigned getHashValue() const {
-    uint64_t LineColumnAndImplicitCode =
-        Line | (uint64_t(Column) << 32) | (uint64_t(ImplicitCode) << 48);
+    uint64_t LineAndColumn = uint64_t(Line) | (uint64_t(Column) << 32);
     // Hashing AtomGroup and AtomRank substantially impacts performance whether
     // Key Instructions is enabled or not. We can't detect whether it's enabled
     // here cheaply; avoiding hashing zero values is a good approximation. This
@@ -334,9 +333,9 @@ template <> struct MDNodeKeyImpl<DILocation> {
     // outweighed by the overall compile time savings by performing this check.
     // * (hash_combine(x) != hash_combine(x, 0))
     if (AtomGroup || AtomRank)
-      return hash_combine(LineColumnAndImplicitCode, Scope, InlinedAt,
-                          AtomGroup | (uint64_t(AtomRank) << 61));
-    return hash_combine(LineColumnAndImplicitCode, Scope, InlinedAt);
+      return hash_combine(LineAndColumn, ImplicitCode, Scope, InlinedAt,
+                          AtomGroup | (uint64_t(AtomRank) << 29));
+    return hash_combine(LineAndColumn, ImplicitCode, Scope, InlinedAt);
   }
 };
 
diff --git a/llvm/unittests/IR/MetadataTest.cpp b/llvm/unittests/IR/MetadataTest.cpp
index 7222c885548e0..d0a35720d99a0 100644
--- a/llvm/unittests/IR/MetadataTest.cpp
+++ b/llvm/unittests/IR/MetadataTest.cpp
@@ -1627,6 +1627,21 @@ TEST_F(DILocationTest, getDistinct) {
   EXPECT_EQ(L1, DILocation::get(Context, 2, 7, N));
 }
 
+TEST_F(DILocationTest, WideColumn) {
+  MDNode *N = getSubprogram();
+  // Column numbers wider than 16 bits are preserved (up to 32 bits) rather
+  // than being clamped to zero.
+  DILocation *L = DILocation::get(Context, 2, 0x10000, N);
+  EXPECT_EQ(0x10000u, L->getColumn());
+
+  DILocation *Max = DILocation::get(Context, 2, UINT32_MAX, N);
+  EXPECT_EQ(UINT32_MAX, Max->getColumn());
+
+  // Distinct columns produce distinct (and uniqued) nodes.
+  EXPECT_EQ(L, DILocation::get(Context, 2, 0x10000, N));
+  EXPECT_NE(L, Max);
+}
+
 TEST_F(DILocationTest, getTemporary) {
   MDNode *N = MDNode::get(Context, {});
   auto L = DILocation::getTemporary(Context, 2, 7, N);

@topolarity topolarity force-pushed the ct/dwarf-32bit-col-trunk branch from 54aadb7 to 89fe48a Compare June 3, 2026 06:04
@SLTozer

SLTozer commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Could you explain the purpose of this in a bit more detail? It's likely that a different solution will be needed, because there are some upcoming plans to change the representation of DILocations, and a 32-bit column field won't be feasible under those changes; to be more exact, it could be done, but it would be fairly costly to do so, and so it would probably have to be enabled only behind a CMake flag, at which point it would probably be easier to just add a new field rather than using column as a side-channel.

@topolarity

Copy link
Copy Markdown
Author

Could you explain the purpose of this in a bit more detail?

Sure thing! And thanks for the heads up.

The downstream application is that we have a language-native encoding of debug information, but it requires an "IR PC" as an index for lookups. That "IR PC" is an index from the original Julia source IR, which the LLVM IR is generated from.

End-to-end our native debug format maps "IR PC" to "debug information", and the column number in DWARF is co-opted to give us a map from "machine PC" to "IR PC". We don't actually rely on this second mapping being in DWARF (we control the full pipeline from codegen to stacktrace generation, etc. so any such map would be fine), but it's a convenient way to extract the information from LLVM.

The only alternative I can think of would be to somehow track the DILocation identity itself and query it at the end of the pipeline? Perhaps it would be enough to be able to annotate a DILocation as "having identity" so that it is not merged / deduplicated structurally? We would not be opposed to generating the equivalent of DWARF's line / col number tables ourselves, but it doesn't seem that there's any way to easily recover that information at the end of the pipeline unless the DILocations were stamped with some ~large enough ID.

@jmorse

jmorse commented Jun 9, 2026

Copy link
Copy Markdown
Member

Hmmm -- if I understand right, what you need overall is a key/index to access more Julia IR information at the end of compilation, that gets preserved like source locations do. Would it be feasible to instead just adopt the line-number field of DILocations to store that index? That would force you to recover the true line-number from the index / Julia IR at the end of compilation, but it'd be compatible with the future source-location changes and avoids the need for abnormal layouts of classes / custom builds of LLVM.

@SLTozer pointed out offline that you would lose some information when source locations get merged and line numbers replaced with zero; but I imagine that's something you already need to address if you're putting the index in the column number. There are also one or two places where the line-number in DISubprogram is used to generate DILocations, which again I imagine is something you'd have to deal with anyway.

@topolarity

Copy link
Copy Markdown
Author

Would it be feasible to instead just adopt the line-number field of DILocations to store that index?

We could do that, but it'd require some end-of-pipeline mangling.

We're in an awkward position requirements-wise, because we want to preserve the "important" fields for native DWARF users (perf, etc.) incl. true line number, file, etc. At the same time, we also carry our custom debuginfo format that needs this "IR PC" to index into. The column was a compromise solution to leave the important bits in native DWARF while still supporting the new format.

Anyway it might be possible to do a post-MC DWARF filtering step that allows us to extract our "Machine PC -> IR PC" map and then swap back in the "native" DWARF info in place of the IR-PC-as-line-number we seeded the pipeline with.

you would lose some information when source locations get merged and line numbers replaced with zero

Yeah we're prepared to lose some information in those cases

@jmorse

jmorse commented Jun 25, 2026

Copy link
Copy Markdown
Member

NB, this proposal likely aligns with your interests: https://discourse.llvm.org/t/rfc-multi-level-line-table-support-in-llvm/91149

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants