GXflow FullText Search Migration to GitHub#1079
Conversation
Cherry pick to beta failed, 2 conflicted files in commit caad503
|
Manual cherry pick to beta success |
There was a problem hiding this comment.
Pull request overview
This PR migrates the GXflow FullText Search External Object implementation from the GeneXus installation into the GxClasses repository as a standalone module, following the pattern previously used for GAM. The implementation uses Lucene 2.2.0 for full-text indexing and search capabilities, supporting document formats including PDF, DOCX, TXT, and HTML.
Changes:
- Added new
gxflowfulltextsearchmodule with full-text search implementation using Lucene 2.2.0 - Centralized version management for Lucene (2.2.0), PDFBox (3.0.3), commons-collections4 (4.1), and commons-logging (1.2) in parent POM
- Updated existing modules to use centralized version properties instead of hardcoded versions
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Added version properties for lucene, pdfbox, commons-collections4, and commons-logging; added gxflowfulltextsearch module |
| java/pom.xml | Updated commons-collections4 and pdfbox dependencies to use centralized version properties |
| gxsearch/pom.xml | Updated lucene dependencies to use centralized version property |
| gxflowfulltextsearch/pom.xml | New module POM with dependencies for Lucene, PDFBox, POI, and logging |
| gxflowfulltextsearch/src/main/java/com/genexus/CA/search/Searcher.java | New class implementing search functionality with XML result format |
| gxflowfulltextsearch/src/main/java/com/genexus/CA/search/Indexer.java | New class implementing document indexing with support for multiple file formats |
| gxflowfulltextsearch/src/main/java/com/genexus/CA/search/IndexManager.java | New class managing indexer instances per directory using ConcurrentHashMap |
| gxflowfulltextsearch/src/main/java/com/genexus/CA/search/AnalyzerManager.java | New class managing Lucene analyzers with language support |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| logger.error("Error closing IndexReader", e); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing break statement: The switch case for OPERATION_DELETE (line 128-156) is missing a break statement. While this may work because it's the last case, it's a maintenance risk and violates conventional switch statement patterns. Add an explicit break statement after line 156.
| } | |
| } | |
| break; |
There was a problem hiding this comment.
Potential resource leak: If an exception occurs between writer creation (line 120) and writer.close() (line 123), the IndexWriter will not be closed, potentially leaving the index in a locked state. Use a try-finally block or try-with-resources to ensure the writer is always closed.
| try { | |
| IndexWriter writer = new IndexWriter(this.getIndexDirectory(), AnalyzerManager.getAnalyzer(lang), false); | |
| writer.addDocument(doc); | |
| // writer.optimize(); // This is a costly operation and should not be done for every document. | |
| writer.close(); | |
| try (IndexWriter writer = new IndexWriter(this.getIndexDirectory(), AnalyzerManager.getAnalyzer(lang), false)) { | |
| writer.addDocument(doc); | |
| // writer.optimize(); // This is a costly operation and should not be done for every document. |
There was a problem hiding this comment.
Potential resource leak: If an exception occurs after the IndexWriter is created (line 45) but before it's closed (line 46), the writer will not be closed. Use a try-finally block to ensure the writer is always closed even if an exception occurs.
| try { | |
| IndexWriter writer = new IndexWriter(this.indexDirectory, new StandardAnalyzer(), true); | |
| writer.close(); | |
| try (IndexWriter writer = new IndexWriter(this.indexDirectory, new StandardAnalyzer(), true)) { | |
| // IndexWriter is created to initialize the index directory and closed automatically. |
| try { | ||
| new IndexSearcher(dir); | ||
| return true; | ||
| } catch (IOException e) { | ||
| return false; |
There was a problem hiding this comment.
Resource leak: The IndexSearcher created in the try block is never closed. In Lucene 2.2.0, IndexSearcher should be explicitly closed to release file handles. Store the searcher in a variable and close it in a finally block.
| try { | |
| new IndexSearcher(dir); | |
| return true; | |
| } catch (IOException e) { | |
| return false; | |
| IndexSearcher searcher = null; | |
| try { | |
| searcher = new IndexSearcher(dir); | |
| return true; | |
| } catch (IOException e) { | |
| return false; | |
| } finally { | |
| if (searcher != null) { | |
| try { | |
| searcher.close(); | |
| } catch (IOException e) { | |
| logger.error("Error closing IndexSearcher", e); | |
| } | |
| } |
| public class Searcher { | ||
| private static final Logger logger = LogManager.getLogger(Searcher.class); | ||
|
|
||
| private static String escapeXml(String text) { | ||
| if (text == null) { | ||
| return ""; | ||
| } | ||
| return text.replace("&", "&") | ||
| .replace("<", "<") | ||
| .replace(">", ">") | ||
| .replace("\"", """) | ||
| .replace("'", "'"); | ||
| } | ||
|
|
||
| public static String search(String dir, String lang, String query, int maxResults, int from) { | ||
| StringBuilder buff = new StringBuilder(); | ||
| long startTime = System.currentTimeMillis(); | ||
|
|
||
| if (from < 0) { | ||
| logger.warn("Search 'from' cannot be negative. Using 0 instead. from={}", from); | ||
| from = 0; | ||
| } | ||
| if (maxResults < 0) { | ||
| logger.warn("Search 'maxResults' cannot be negative. Using 0 instead. maxResults={}", maxResults); | ||
| maxResults = 0; | ||
| } | ||
|
|
||
| if (!indexExists(dir)) { | ||
| buff.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Results hits=\"0\" time=\"0ms\"></Results>"); | ||
| return buff.toString(); | ||
| } | ||
|
|
||
| IndexSearcher searcher = null; | ||
| try { | ||
| searcher = new IndexSearcher(dir); | ||
| String[] fields = new String[]{"title", "content", "summary"}; | ||
| Occur[] clauses = new Occur[]{Occur.SHOULD, Occur.SHOULD, Occur.SHOULD}; | ||
|
|
||
| Query q; | ||
| try { | ||
| q = MultiFieldQueryParser.parse(query, fields, clauses, AnalyzerManager.getAnalyzer(lang)); | ||
| } catch (ParseException e) { | ||
| try { | ||
| String escapedQuery = QueryParser.escape(query); | ||
| q = MultiFieldQueryParser.parse(escapedQuery, fields, clauses, AnalyzerManager.getAnalyzer(lang)); | ||
| logger.warn("Query had invalid syntax. Escaped version was used: {}", escapedQuery, e); | ||
| } catch (ParseException escapedException) { | ||
| logger.warn("Could not parse query, falling back to TermQuery: " + query, escapedException); | ||
| q = new TermQuery(new Term("content", query)); | ||
| } | ||
| } | ||
|
|
||
| if (lang != null && !lang.trim().isEmpty() && !"IND".equalsIgnoreCase(lang)) { | ||
| Query q2 = new TermQuery(new Term("language", lang)); | ||
| BooleanQuery bq = new BooleanQuery(); | ||
| bq.add(q, Occur.MUST); | ||
| bq.add(q2, Occur.MUST); | ||
| q = bq; | ||
| } | ||
|
|
||
| Hits hits = searcher.search(q); | ||
| int totalHits = hits.length(); | ||
|
|
||
| long endTime = System.currentTimeMillis(); | ||
| String time = String.valueOf(endTime - startTime); | ||
|
|
||
| buff.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>"); | ||
| buff.append("<Results hits=\"").append(totalHits).append("\" time=\"").append(time).append("ms\">"); | ||
|
|
||
| int end = Math.min(totalHits, from + maxResults); | ||
| for (int i = from; i < end; i++) { | ||
| buff.append("<Result>"); | ||
| Document doc = hits.doc(i); | ||
| String uri = doc.getField("uri").stringValue(); | ||
| buff.append("<URI>").append(escapeXml(uri)).append("</URI>"); | ||
| buff.append("</Result>"); | ||
| } | ||
| } catch (Exception e) { | ||
| logger.error("Error during search", e); | ||
| // Return an empty but valid XML in case of error | ||
| buff.setLength(0); // Clear buffer | ||
| buff.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Results hits=\"0\" time=\"0ms\"></Results>"); | ||
| return buff.toString(); | ||
| } finally { | ||
| if (searcher != null) { | ||
| try { | ||
| searcher.close(); | ||
| } catch (Exception e) { | ||
| logger.error("Error closing IndexSearcher", e); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| buff.append("</Results>"); | ||
| return buff.toString(); | ||
| } | ||
|
|
||
| private static boolean indexExists(String dir) { | ||
| IndexSearcher searcher = null; | ||
| try { | ||
| searcher = new IndexSearcher(dir); | ||
| return true; | ||
| } catch (Exception e) { | ||
| return false; | ||
| } finally { | ||
| if (searcher != null) { | ||
| try { | ||
| searcher.close(); | ||
| } catch (Exception e) { | ||
| logger.warn("Error closing IndexSearcher during indexExists check", e); | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing test coverage: This new module lacks test coverage while other similar modules in the repository (such as gxsearch, gamutils, gxcryptography) have comprehensive test suites. Consider adding tests for core functionality like indexing, searching, document parsing, and error handling.
Issue: 107158
Move the GXflow FullText Search External Object implementation into GxClasses.
As previously done for GAM, these binaries are no longer included in the GeneXus installation and are provided through packages.
Add
lucene.version(2.2.0) andpdfbox.version(3.0.3) topom.xml, aligning with the version used in the gxsearch module.#GXSEC