Conversation
d763e66 to
e520bf8
Compare
|
There was a problem hiding this comment.
Pull request overview
This pull request introduces a new endpoint to retrieve the status of reindex jobs by their execution ID. The change shifts the codebase from using job instance IDs to job execution IDs for tracking batch jobs, which provides more granular tracking since multiple executions can exist for the same job instance.
Changes:
- Added GET
/linked-data/reindex/statusendpoint that returns detailed job status including timestamps, user info, reindex type, and processing statistics - Changed reindex endpoints to return job execution IDs instead of job instance IDs
- Introduced entity mappings for Spring Batch tables (BatchJobExecution, BatchStepExecution, BatchJobExecutionParams) to enable direct database queries
- Reorganized reindex endpoints into a separate interface in the module descriptor
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/resources/swagger.api/mod-linked-data.yaml | Added new status endpoint definition and updated response descriptions |
| src/main/resources/swagger.api/schema/reindexJobStatusDto.json | Defined schema for reindex job status response |
| src/main/java/org/folio/linked/data/model/entity/batch/*.java | Created entity mappings for Spring Batch tables |
| src/main/java/org/folio/linked/data/repo/BatchJobExecutionRepository.java | Added repository for querying job executions |
| src/main/java/org/folio/linked/data/mapper/dto/ReindexJobStatusMapper.java | Implemented mapper to convert batch entities to status DTO |
| src/main/java/org/folio/linked/data/service/batch/ReindexJobService*.java | Added getStatus method to service interface and implementation |
| src/main/java/org/folio/linked/data/controller/ReindexingController.java | Implemented new status endpoint handler |
| src/main/java/org/folio/linked/data/configuration/batch/BatchConfig.java | Modified thread pool configuration and extracted step name constant |
| src/test/java/**/*Test.java | Added comprehensive unit tests for mapper and service |
| src/test/java/org/folio/linked/data/e2e/ReIndexControllerIT.java | Added integration test for status endpoint |
| README.md | Updated documentation with new endpoint usage examples |
| descriptors/ModuleDescriptor-template.json | Reorganized reindex endpoints into separate interface with new permission |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @Entity | ||
| public class BatchJobExecution { |
There was a problem hiding this comment.
The entity classes for Spring Batch tables (BatchJobExecution, BatchStepExecution, BatchJobExecutionParams) are missing explicit @Table annotations. While JPA's default naming strategy should work (converting class names to lowercase with underscores), it's a best practice to explicitly specify the table names for external tables like Spring Batch's schema. Consider adding @Table annotations for clarity and to prevent issues if the naming strategy changes.
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
There's an extra blank line at the end of the file. While this doesn't affect functionality, removing it would maintain consistency with other files in the codebase.
| @ParameterizedTest | ||
| @CsvSource({"true, FULL", "false, INCREMENTAL"}) | ||
| void getReindexJobStatus_shouldReturnStatus_whenJobExists(boolean isFullReindex, | ||
| ReindexTypeEnum expectedType) throws Exception { | ||
| // given | ||
| resourceTestService.saveGraph(getSampleWork()); | ||
| var reindexUrl = isFullReindex ? FULL_REINDEX_URL : INCREMENTAL_REINDEX_URL; | ||
| var jobExecutionId = parseLong(mockMvc.perform(post(reindexUrl).headers(defaultHeaders(env))) | ||
| .andExpect(status().isOk()) | ||
| .andReturn().getResponse().getContentAsString()); | ||
| awaitJobCompletion(jobExecutionId); | ||
|
|
||
| // when / then | ||
| mockMvc.perform(get(REINDEX_STATUS_URL) | ||
| .param("jobExecutionId", String.valueOf(jobExecutionId)) | ||
| .headers(defaultHeaders(env))) | ||
| .andExpect(status().isOk()) | ||
| .andExpect(content().contentType(APPLICATION_JSON)) | ||
| .andExpect(jsonPath("$.status").value("COMPLETED")) | ||
| .andExpect(jsonPath("$.reindexType").value(expectedType.getValue())) | ||
| .andExpect(jsonPath("$.startDate").isNotEmpty()) | ||
| .andExpect(jsonPath("$.endDate").isNotEmpty()) | ||
| .andExpect(jsonPath("$.startedBy").isNotEmpty()) | ||
| .andExpect(jsonPath("$.linesRead").isNumber()) | ||
| .andExpect(jsonPath("$.linesSent").isNumber()); | ||
| } |
There was a problem hiding this comment.
Consider adding an integration test for the 404 case when a job execution ID doesn't exist. While there's a unit test for this scenario in ReindexJobServiceImplTest, an integration test would verify the full request/response flow and ensure proper error handling through the controller layer. For example, a test that calls the endpoint with a non-existent ID and expects a 404 response.
| @Entity | ||
| public class BatchStepExecution { |
| @Entity | ||
| @IdClass(BatchJobExecutionParams.BatchJobExecutionParamsId.class) |
| + " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'isFullReindex')") | ||
| private String isFullReindex; | ||
|
|
||
| @OneToMany(mappedBy = "jobExecution", fetch = FetchType.EAGER) |
There was a problem hiding this comment.
The stepExecutions relationship uses FetchType.EAGER, which means all step executions will be loaded immediately when fetching a job execution. While this is convenient for the current use case (status endpoint always needs step data), it could become a performance issue if job executions have many steps or if this entity is used in other contexts where step data isn't needed. Consider whether LAZY loading with an explicit join in the repository query would be more appropriate.
| @OneToMany(mappedBy = "jobExecution", fetch = FetchType.EAGER) | |
| @OneToMany(mappedBy = "jobExecution", fetch = FetchType.LAZY) |
|
|
||
| @Formula("(SELECT p.parameter_value FROM batch_job_execution_params p" | ||
| + " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'startedBy')") | ||
| private String startedBy; | ||
|
|
There was a problem hiding this comment.
The @Formula subquery references 'job_execution_id' without a table qualifier in the WHERE clause. While Hibernate should interpret this as the current entity's column, it would be clearer and safer to use an alias for the outer table. For example: '@formula("(SELECT p.parameter_value FROM batch_job_execution_params p WHERE p.job_execution_id = this_.job_execution_id AND p.parameter_name = 'startedBy')")'. However, since Hibernate doesn't support 'this_' in formulas by default, the current approach is acceptable but could be documented with a comment explaining the implicit reference.
| @Formula("(SELECT p.parameter_value FROM batch_job_execution_params p" | |
| + " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'startedBy')") | |
| private String startedBy; | |
| // Note: In this @Formula, the unqualified column name 'job_execution_id' is | |
| // intentionally used to refer to the current entity's column. Hibernate resolves | |
| // it against the outer BatchJobExecution table. Using an alias like 'this_' is | |
| // not supported in @Formula expressions. | |
| @Formula("(SELECT p.parameter_value FROM batch_job_execution_params p" | |
| + " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'startedBy')") | |
| private String startedBy; | |
| // See comment above: 'job_execution_id' is intentionally unqualified so that | |
| // Hibernate interprets it as the column of the outer BatchJobExecution entity. |
| The returned job instance ID can be used to monitor the job's status. | ||
|
|
||
| ### Reindex job status | ||
| The status of a reindex job can be retrieved by making a GET request to the `/linked-data/reindex/status` endpoint with the `jobExecutionId` query parameter: |
There was a problem hiding this comment.
The documentation still refers to "job instance ID" but the code now returns a job execution ID. This should be updated throughout for consistency. For example, line 320 says "The returned job instance ID can be used to monitor the job's status" but it should say "job execution ID".
| ) { | ||
| var exec = new ThreadPoolTaskExecutor(); | ||
| exec.setCorePoolSize(poolSize); | ||
| exec.setMaxPoolSize(poolSize); |
There was a problem hiding this comment.
The thread pool configuration has been changed from using a bounded queue to an unbounded queue, and setCorePoolSize has been added. While this change makes the pool create all threads immediately (corePoolSize = maxPoolSize), the unbounded queue could potentially lead to memory issues if tasks are submitted faster than they can be processed. Consider documenting this change in the PR description or adding a comment explaining the rationale. Alternatively, if the bounded queue behavior was intentional, this might be an unintended change.
| exec.setMaxPoolSize(poolSize); | |
| exec.setMaxPoolSize(poolSize); | |
| exec.setQueueCapacity(poolSize); |
| @EqualsAndHashCode(of = {"jobExecutionId", "parameterName"}) | ||
| @Entity | ||
| @IdClass(BatchJobExecutionParams.BatchJobExecutionParamsId.class) | ||
| public class BatchJobExecutionParams { |
There was a problem hiding this comment.
Do we need this class? Seems like this is not used anywhere.



No description provided.