SOLR-18071 Support Stored Fields in Export Writer #4053

kotman12 · 2026-01-15T22:57:22Z

https://issues.apache.org/jira/browse/SOLR-18071

Description

Adds support for exporting stored-only fields (fields without docValues) in the /export request handler via a new includeStoredFields parameter. Previously, all fields in the field list (fl) were required to have docValues enabled. This change allows users to include stored fields that don't have docValues, which can be useful, i.e. when exporting fields which don't support docValues or trying to export data that has already been indexed without DVs.

Solution

If fl explicitly names a stored-only field and includeStoredFields is not enabled, the request fails with a 400 and a hint to add includeStoredFields=true. For glob patterns (e.g., fl=* or fl=intdv,*), stored-only fields are skipped unless includeStoredFields=true, to preserve backward compatibility. The current implementation fetches from StoredFields DV-enabled fields when some stored fields have already been requested. This avoids DV-lookup for a field, which makes sense since we have to parse the StoredFields anyway. My (somewhat limited) benchmarks appear to corroborate that this is the best choice for performance.

A quirky thing about this implementation is that the very much internal FieldWriter API was changed to support more than one field. This makes it more interchangeable with the existing StoredFieldVisitor interface, which assumes one visitor per many fields. I landed on this rather than creating an adapter to bridge the two as it appeared to be simpler. It's worth stressing again that the FieldWriter is very much internal to the export package and the boolean it returned was effectively discarded (the local fieldIndex it drives isn't even used anywhere). It could be argued that FieldWriter::write could be void and I'd also be open to such a change.

Tests

Explicit stored-only field export succeeds with includeStoredFields=true (single-valued and multi-valued).
Explicit stored-only field export fails without the parameter and includes the includeStoredFields=true hint and field name.
Glob fl skips stored-only fields without the parameter (request succeeds, stored-only fields not present).
Glob fl includes stored-only fields with the parameter.
Coverage for stored field types: string, int, long, float, double, boolean, date.

Also have some performance comparisons of exporting vs not exporting stored fields:
stored-fields-export-writer-1k-doc-benchmark.txt
stored-fields-export-writer-140k-doc-benchmark.txt

Checklist

Please review the following and check all that apply:

I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
I have created a Jira issue and added the issue ID to my pull request title.
I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
I have developed this patch against the main branch.
I have run ./gradlew check.
I have added tests for my changes.
I have added documentation for the Reference Guide
I have added a changelog entry for my change

epugh

I think this will scratch some of my own issues!

epugh · 2026-01-16T16:41:29Z

solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java

      }
      SchemaField schemaField = req.getSchema().getField(field);
-      if (!schemaField.hasDocValues()) {
-        throw new IOException(schemaField + " must have DocValues to use this feature.");


Love this! I've been burned so many times on wanting to do this and discovering missing DocValues.

epugh · 2026-01-16T16:44:22Z

solr/core/src/java/org/apache/solr/handler/export/DoubleFieldWriter.java


  @Override
-  public boolean write(
+  public int write(


I went down thorugh the code, why the boolean -> int? Looks like we only ever return 0 or 1?

I cover this a little in the second paragraph of #Solution. To elaborate, I introduced StoredFieldsWriter which is a special type of FieldWriter that writes many fields. I did this because it was the easiest way I saw to hook into the output of a StoredFieldVisitor , which writes multiple fields. Up until this point FieldWriter would only write one field at a time. It would return a boolean representing whether it successfully wrote the field. If the boolean was true an internal loop would increment the fieldIndex which then ... did absolutely nothing. So not entirely clear what the purpose is of keeping track of how many fields were written. But in order to maintain this potentially useful (although not currently useful) feature, I decided to expand the API to return an int which returns how many fields a particular FieldWriter wrote so we can increment by an arbitrary amount to support the new StoredFieldsWriter . I would be equally happy to just make FieldWriter::write be void instead of tracking this unused index.

I also realize you may have missed StoredFieldsWriter because I put the class inside the ExportWriter. I think I am going to put it in a standalone file like the other FieldWriters so it is more consistent and clear.

I would be equally happy to just make FieldWriter::write be void instead of tracking this unused index.

+1

Then you could retract my comment on javadocs for write

epugh · 2026-01-16T16:46:24Z

solr/core/src/test/org/apache/solr/handler/export/TestExportWriter.java

+  @Test
+  public void testIncludeStoredFieldsExplicitRequest() throws Exception {
+    // Test that stored-only fields are returned when includeStoredFields=true
+    clearIndex();


maybe not speciifc to your goal, but should clearIndex() be in a before method/setup step? If it's on every unit test?

I leave it to your judgement, should this be in this PR or a follow-up? Honestly, I just copied the pattern of the existing tests.

probably should be in a follow up, and I suspect that this pattern is in many tests, that all should be cleaned up to fix this bad pattern and establish a better new one. (off topic, but having a prompts.md that captures this community knowlege for AI is also good for humans).

being consistent with nearby tests in a single source file (aka suite) makes sense.

epugh · 2026-01-16T16:48:28Z

solr/solr-ref-guide/modules/query-guide/pages/exporting-result-sets.adoc

 An optional parameter `batchSize` determines the size of the internal buffers for partial results.
 The default value is `30000` but users may want to specify smaller values to limit the memory use (at the cost of degraded performance) or higher values to improve export performance (the relationship is not linear and larger values don't bring proportionally larger performance increases).

+An optional parameter `includeStoredFields` (default `false`) enables exporting fields that only have stored values (no docValues).


some day we'll have a way of thinking about should this be "true" ;-). Challenge of open source is we struggle to know what hsould be a default yes even when it has knock on impacts until a lot of time passes. ;-)

Good point. I will add an explanation.

Actually, can I put the explanation in the CHANGELOG? The docs already warn:

Note that retrieving stored fields may significantly impact export performance compared to docValues fields, as stored fields require additional I/O operations.

dsmiley · 2026-01-17T18:10:24Z

Just a quick observation -- I see a force push that came after Eric's review. I strongly recommend against force pushing because it resets the review state for anyone who has already reviewed -- Eric in this case.

dsmiley

I really like how overall simple the implementation turned out to be. This wasn't a heavy lift. Nice job!

Curious... at this point, how do you think /export contrasts with /select + cursorMark? Pros/cons... (this could be documented in the ref guide as well to durably record these insights)

dsmiley · 2026-01-17T18:13:46Z

solr/core/src/java/org/apache/solr/handler/export/FieldWriter.java

 import org.apache.lucene.index.LeafReaderContext;
 import org.apache.solr.common.MapWriter;

 abstract class FieldWriter {


deserves javadoc, at least for clarifying it's return semantics

dsmiley · 2026-01-17T18:19:14Z

solr/core/src/java/org/apache/solr/handler/export/StoredFieldsWriter.java

+    if (map == null) {
+      map = new WeakHashMap<>();
+      storedFieldsMap.set(map);
+    }


this pattern can be improved to basically be handled at the ThreadLocal declaration to provide an initializer.

But moreover I'm concerned about the use of ThreadLocal in the first place -- it's typically a tool of last resort. And further true with use of weak references.

BTW I'm fine with your use of it here but was speaking out-loud, maybe a little too out-loud :-)

dsmiley · 2026-01-17T18:22:29Z

solr/core/src/java/org/apache/solr/handler/export/StoredFieldsWriter.java

+
+class StoredFieldsWriter extends FieldWriter {
+
+  private final Map<String, SchemaField> fields;


nit: I'd prefer this be named schema or something including that word

dsmiley · 2026-01-17T18:24:23Z

solr/core/src/java/org/apache/solr/handler/export/StoredFieldsWriter.java

+      if (fieldType instanceof BoolField) {
+        // Convert "T"/"F" stored value to boolean true/false
+        addField(fieldInfo.name, Boolean.valueOf(fieldType.indexedToReadable(value)));
+      } else {
+        addField(fieldInfo.name, value);
+      }


when I see special cases like this, I ask myself... is there a fieldType method that should be handling this? If not do we need to add one?
CC @hossman if you are interested in review; you've looked at this topic in a nearby issue lately

dsmiley · 2026-01-17T18:30:02Z

solr/core/src/java/org/apache/solr/handler/export/StoredFieldsWriter.java

+class StoredFieldsWriter extends FieldWriter {
+
+  private final Map<String, SchemaField> fields;
+  private final ThreadLocal<WeakHashMap<IndexReader.CacheKey, StoredFields>> storedFieldsMap =


ThreadLocals are typically static. In this case, I think it should be because the information in it isn't specific to this instance of StoredFieldsWriter in any way that I can see. Java static analyzers including built into IntelliJ will generally advise you on this point.

dsmiley · 2026-01-17T19:04:22Z

solr/core/src/test/org/apache/solr/handler/export/TestExportWriter.java

+
+    assertJsonEquals(
+        resp,
+        "{\n"


if you didn't get the memo -- we're on Java 21 now :-). multi-line string please

dsmiley · 2026-01-17T19:07:05Z

solr/core/src/java/org/apache/solr/handler/export/ExportWriter.java

+      // Check if field can use DocValues
+      boolean canUseDocValues =
+          schemaField.hasDocValues()
+              && (!(fieldType instanceof SortableTextField) || schemaField.useDocValuesAsStored());


is there precedent for this special case RE instanceof SortableTextField?

dsmiley · 2026-01-17T22:51:33Z

solr/core/src/java/org/apache/solr/handler/export/StoredFieldsWriter.java

+      return fields.containsKey(fieldInfo.name) ? Status.YES : Status.NO;
+    }
+
+    private <T> void addField(String fieldName, T value) throws IOException {


the logic flow is confusing... I started to write why it wouldn't work but end up finally contorting myself to see how it could work. Could you try to reflow it somehow? Or if you like it's flow then add comments? (or both)

You've never seen this pattern before where you have to track state of multi-valued fields in a callback-style API where the caller sends one element at a time? It's not the greatest but it's what I'm dealing with here. The flush here can be thought of as being akin to, say, Jackson's JsonGenerator::writeEndObject. The look-behind is because I need to buffer and I don't get a signal on the last element that it is the last element. I don't really control that. Nota bene something similar happens in Solr here although I suppose it is less-contortion inducing because it happens all in one function. But this is hard to do here given the API.

One thing I can do is buffer into a multi-map-like thing instead of rolling my own StoredFieldVisitor, i.e. I can buffer into a Document (basically a multi-map), i.e. maybe with DocumentStoredFieldVisitor. I'd need to convert the multi-map thing (i.e. Document) to a collection-property-aware thing like SolrDocument, perhaps using the same convertLuceneDocToSolrDoc I just linked. But that is a lot of transformation so I am pretty neutral about it.

Maybe you can elaborate on what is confusing you? Also, any thoughts on the alternatives I just suggested would be welcome.

I have; I'm saying nonetheless I had some trouble reading it how you coded it. It's subjective; maybe "i'm the problem". In particular, I was surprised to see the very first bit of logic was to check if the current field is multi-valued... instead I expected it to check if we're adding to an existing field or starting a new field. There's no right answer here... it's subjective as to who finds what easier to read. So updated the logic and added a bit more comments. LMK what you think. If you don't like it than I'll refer it back and maybe you add more comments.

(I don't like the alternative suggestion of incorporating use of Document; seems quite unnecessary).

dsmiley · 2026-01-17T22:52:37Z

solr/core/src/java/org/apache/solr/handler/export/StoredFieldsWriter.java

+    }
+
+    private int flush() throws IOException {
+      if (lastFieldName != null && multiValue != null && !multiValue.isEmpty()) {


I think the logic should be improved to read super simple here. Like, if multiValue is not null then put it -- that's it. No lastFieldName check, no isEmpty check.

dsmiley · 2026-01-17T22:54:46Z

solr/core/src/java/org/apache/solr/handler/export/DoubleFieldWriter.java


  @Override
-  public boolean write(
+  public int write(


I would be equally happy to just make FieldWriter::write be void instead of tracking this unused index.

+1

Then you could retract my comment on javadocs for write

kotman12 added 10 commits January 10, 2026 20:49

opus assisted first pass

c7c2703

clean up cruft

7a41252

change expected exception message and remove duplication

a16594d

allow arbitrary index shift in field writer

ec14338

add udvas test

58e9a5b

avoid DV lookup if we're reading StoredFields

9d25470

revert ... overridden by mistake

cad9607

clean up comment

8f08d35

better comment

e064f6e

actually test glob pattern

815892a

github-actions bot added documentation Improvements or additions to documentation tests labels Jan 15, 2026

changelog entry

093af42

kotman12 force-pushed the stored-fields-export-writer branch from bf813b7 to 093af42 Compare January 15, 2026 23:01

epugh reviewed Jan 16, 2026

View reviewed changes

kotman12 added 2 commits January 16, 2026 14:55

add unreleased changelog entry

ac7966a

move StoredFieldsWriter to its own file

d510972

kotman12 force-pushed the stored-fields-export-writer branch from 9ff0b20 to d510972 Compare January 16, 2026 20:22

dsmiley reviewed Jan 17, 2026

View reviewed changes

simplify logic (subjective)

c30ed95


		class StoredFieldsWriter extends FieldWriter {

		private final Map<String, SchemaField> fields;

SOLR-18071 Support Stored Fields in Export Writer #4053

Are you sure you want to change the base?

SOLR-18071 Support Stored Fields in Export Writer #4053

Conversation

kotman12 commented Jan 15, 2026

Description

Solution

Tests

Checklist

Uh oh!

epugh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kotman12 Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsmiley commented Jan 17, 2026

Uh oh!

dsmiley left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kotman12 Jan 16, 2026 •

edited

Loading