Skip to content

GXflow FullText Search Migration to GitHub#1079

Open
LucasLazogue wants to merge 9 commits intomasterfrom
feat/move-gxflow-full-text-search-to-gxclasses
Open

GXflow FullText Search Migration to GitHub#1079
LucasLazogue wants to merge 9 commits intomasterfrom
feat/move-gxflow-full-text-search-to-gxclasses

Conversation

@LucasLazogue
Copy link

Issue: 107158

Move the GXflow FullText Search External Object implementation into GxClasses.
As previously done for GAM, these binaries are no longer included in the GeneXus installation and are provided through packages.
Add lucene.version (2.2.0) and pdfbox.version (3.0.3) to pom.xml, aligning with the version used in the gxsearch module.

#GXSEC

@genexusbot
Copy link
Collaborator

Cherry pick to beta failed, 2 conflicted files in commit caad503
  • gxflowfulltextsearch/pom.xml
  • pom.xml

@genexusbot
Copy link
Collaborator

Manual cherry pick to beta success

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the GXflow FullText Search External Object implementation from the GeneXus installation into the GxClasses repository as a standalone module, following the pattern previously used for GAM. The implementation uses Lucene 2.2.0 for full-text indexing and search capabilities, supporting document formats including PDF, DOCX, TXT, and HTML.

Changes:

  • Added new gxflowfulltextsearch module with full-text search implementation using Lucene 2.2.0
  • Centralized version management for Lucene (2.2.0), PDFBox (3.0.3), commons-collections4 (4.1), and commons-logging (1.2) in parent POM
  • Updated existing modules to use centralized version properties instead of hardcoded versions

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pom.xml Added version properties for lucene, pdfbox, commons-collections4, and commons-logging; added gxflowfulltextsearch module
java/pom.xml Updated commons-collections4 and pdfbox dependencies to use centralized version properties
gxsearch/pom.xml Updated lucene dependencies to use centralized version property
gxflowfulltextsearch/pom.xml New module POM with dependencies for Lucene, PDFBox, POI, and logging
gxflowfulltextsearch/src/main/java/com/genexus/CA/search/Searcher.java New class implementing search functionality with XML result format
gxflowfulltextsearch/src/main/java/com/genexus/CA/search/Indexer.java New class implementing document indexing with support for multiple file formats
gxflowfulltextsearch/src/main/java/com/genexus/CA/search/IndexManager.java New class managing indexer instances per directory using ConcurrentHashMap
gxflowfulltextsearch/src/main/java/com/genexus/CA/search/AnalyzerManager.java New class managing Lucene analyzers with language support

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

logger.error("Error closing IndexReader", e);
}
}
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing break statement: The switch case for OPERATION_DELETE (line 128-156) is missing a break statement. While this may work because it's the last case, it's a maintenance risk and violates conventional switch statement patterns. Add an explicit break statement after line 156.

Suggested change
}
}
break;

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented at 02e49d8

Comment on lines 119 to 123
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential resource leak: If an exception occurs between writer creation (line 120) and writer.close() (line 123), the IndexWriter will not be closed, potentially leaving the index in a locked state. Use a try-finally block or try-with-resources to ensure the writer is always closed.

Suggested change
try {
IndexWriter writer = new IndexWriter(this.getIndexDirectory(), AnalyzerManager.getAnalyzer(lang), false);
writer.addDocument(doc);
// writer.optimize(); // This is a costly operation and should not be done for every document.
writer.close();
try (IndexWriter writer = new IndexWriter(this.getIndexDirectory(), AnalyzerManager.getAnalyzer(lang), false)) {
writer.addDocument(doc);
// writer.optimize(); // This is a costly operation and should not be done for every document.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented at 02e49d8

Comment on lines 44 to 46
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential resource leak: If an exception occurs after the IndexWriter is created (line 45) but before it's closed (line 46), the writer will not be closed. Use a try-finally block to ensure the writer is always closed even if an exception occurs.

Suggested change
try {
IndexWriter writer = new IndexWriter(this.indexDirectory, new StandardAnalyzer(), true);
writer.close();
try (IndexWriter writer = new IndexWriter(this.indexDirectory, new StandardAnalyzer(), true)) {
// IndexWriter is created to initialize the index directory and closed automatically.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented at 02e49d8

Comment on lines 173 to 177
try {
new IndexSearcher(dir);
return true;
} catch (IOException e) {
return false;
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: The IndexSearcher created in the try block is never closed. In Lucene 2.2.0, IndexSearcher should be explicitly closed to release file handles. Store the searcher in a variable and close it in a finally block.

Suggested change
try {
new IndexSearcher(dir);
return true;
} catch (IOException e) {
return false;
IndexSearcher searcher = null;
try {
searcher = new IndexSearcher(dir);
return true;
} catch (IOException e) {
return false;
} finally {
if (searcher != null) {
try {
searcher.close();
} catch (IOException e) {
logger.error("Error closing IndexSearcher", e);
}
}

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented at 02e49d8

Comment on lines +17 to +131
public class Searcher {
private static final Logger logger = LogManager.getLogger(Searcher.class);

private static String escapeXml(String text) {
if (text == null) {
return "";
}
return text.replace("&", "&")
.replace("<", "&lt;")
.replace(">", "&gt;")
.replace("\"", "&quot;")
.replace("'", "&apos;");
}

public static String search(String dir, String lang, String query, int maxResults, int from) {
StringBuilder buff = new StringBuilder();
long startTime = System.currentTimeMillis();

if (from < 0) {
logger.warn("Search 'from' cannot be negative. Using 0 instead. from={}", from);
from = 0;
}
if (maxResults < 0) {
logger.warn("Search 'maxResults' cannot be negative. Using 0 instead. maxResults={}", maxResults);
maxResults = 0;
}

if (!indexExists(dir)) {
buff.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Results hits=\"0\" time=\"0ms\"></Results>");
return buff.toString();
}

IndexSearcher searcher = null;
try {
searcher = new IndexSearcher(dir);
String[] fields = new String[]{"title", "content", "summary"};
Occur[] clauses = new Occur[]{Occur.SHOULD, Occur.SHOULD, Occur.SHOULD};

Query q;
try {
q = MultiFieldQueryParser.parse(query, fields, clauses, AnalyzerManager.getAnalyzer(lang));
} catch (ParseException e) {
try {
String escapedQuery = QueryParser.escape(query);
q = MultiFieldQueryParser.parse(escapedQuery, fields, clauses, AnalyzerManager.getAnalyzer(lang));
logger.warn("Query had invalid syntax. Escaped version was used: {}", escapedQuery, e);
} catch (ParseException escapedException) {
logger.warn("Could not parse query, falling back to TermQuery: " + query, escapedException);
q = new TermQuery(new Term("content", query));
}
}

if (lang != null && !lang.trim().isEmpty() && !"IND".equalsIgnoreCase(lang)) {
Query q2 = new TermQuery(new Term("language", lang));
BooleanQuery bq = new BooleanQuery();
bq.add(q, Occur.MUST);
bq.add(q2, Occur.MUST);
q = bq;
}

Hits hits = searcher.search(q);
int totalHits = hits.length();

long endTime = System.currentTimeMillis();
String time = String.valueOf(endTime - startTime);

buff.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
buff.append("<Results hits=\"").append(totalHits).append("\" time=\"").append(time).append("ms\">");

int end = Math.min(totalHits, from + maxResults);
for (int i = from; i < end; i++) {
buff.append("<Result>");
Document doc = hits.doc(i);
String uri = doc.getField("uri").stringValue();
buff.append("<URI>").append(escapeXml(uri)).append("</URI>");
buff.append("</Result>");
}
} catch (Exception e) {
logger.error("Error during search", e);
// Return an empty but valid XML in case of error
buff.setLength(0); // Clear buffer
buff.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Results hits=\"0\" time=\"0ms\"></Results>");
return buff.toString();
} finally {
if (searcher != null) {
try {
searcher.close();
} catch (Exception e) {
logger.error("Error closing IndexSearcher", e);
}
}
}

buff.append("</Results>");
return buff.toString();
}

private static boolean indexExists(String dir) {
IndexSearcher searcher = null;
try {
searcher = new IndexSearcher(dir);
return true;
} catch (Exception e) {
return false;
} finally {
if (searcher != null) {
try {
searcher.close();
} catch (Exception e) {
logger.warn("Error closing IndexSearcher during indexExists check", e);
}
}
}
}
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage: This new module lacks test coverage while other similar modules in the repository (such as gxsearch, gamutils, gxcryptography) have comprehensive test suites. Consider adding tests for core functionality like indexing, searching, document parsing, and error handling.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented at e5b8c3b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments