Skip to content

MODLD-907: Reindex job status endpoint introduced#462

Open
PBobylev wants to merge 2 commits intomasterfrom
MODLD-907
Open

MODLD-907: Reindex job status endpoint introduced#462
PBobylev wants to merge 2 commits intomasterfrom
MODLD-907

Conversation

@PBobylev
Copy link
Contributor

No description provided.

@PBobylev PBobylev self-assigned this Feb 27, 2026
@PBobylev PBobylev force-pushed the MODLD-907 branch 2 times, most recently from d763e66 to e520bf8 Compare February 27, 2026 09:04
@sonarqubecloud
Copy link

@PBobylev PBobylev changed the title MODLD-907: Reindex job status endpoint introduced + ResourceReader switched to JDBC MODLD-907: Reindex job status endpoint introduced Feb 27, 2026
@pkjacob pkjacob requested a review from Copilot February 27, 2026 17:38
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces a new endpoint to retrieve the status of reindex jobs by their execution ID. The change shifts the codebase from using job instance IDs to job execution IDs for tracking batch jobs, which provides more granular tracking since multiple executions can exist for the same job instance.

Changes:

  • Added GET /linked-data/reindex/status endpoint that returns detailed job status including timestamps, user info, reindex type, and processing statistics
  • Changed reindex endpoints to return job execution IDs instead of job instance IDs
  • Introduced entity mappings for Spring Batch tables (BatchJobExecution, BatchStepExecution, BatchJobExecutionParams) to enable direct database queries
  • Reorganized reindex endpoints into a separate interface in the module descriptor

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/main/resources/swagger.api/mod-linked-data.yaml Added new status endpoint definition and updated response descriptions
src/main/resources/swagger.api/schema/reindexJobStatusDto.json Defined schema for reindex job status response
src/main/java/org/folio/linked/data/model/entity/batch/*.java Created entity mappings for Spring Batch tables
src/main/java/org/folio/linked/data/repo/BatchJobExecutionRepository.java Added repository for querying job executions
src/main/java/org/folio/linked/data/mapper/dto/ReindexJobStatusMapper.java Implemented mapper to convert batch entities to status DTO
src/main/java/org/folio/linked/data/service/batch/ReindexJobService*.java Added getStatus method to service interface and implementation
src/main/java/org/folio/linked/data/controller/ReindexingController.java Implemented new status endpoint handler
src/main/java/org/folio/linked/data/configuration/batch/BatchConfig.java Modified thread pool configuration and extracted step name constant
src/test/java/**/*Test.java Added comprehensive unit tests for mapper and service
src/test/java/org/folio/linked/data/e2e/ReIndexControllerIT.java Added integration test for status endpoint
README.md Updated documentation with new endpoint usage examples
descriptors/ModuleDescriptor-template.json Reorganized reindex endpoints into separate interface with new permission

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +18
@Entity
public class BatchJobExecution {
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entity classes for Spring Batch tables (BatchJobExecution, BatchStepExecution, BatchJobExecutionParams) are missing explicit @Table annotations. While JPA's default naming strategy should work (converting class names to lowercase with underscores), it's a best practice to explicitly specify the table names for external tables like Spring Batch's schema. Consider adding @Table annotations for clarity and to prevent issues if the naming strategy changes.

Copilot uses AI. Check for mistakes.
}
}
}

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an extra blank line at the end of the file. While this doesn't affect functionality, removing it would maintain consistency with other files in the codebase.

Suggested change

Copilot uses AI. Check for mistakes.
Comment on lines +93 to +118
@ParameterizedTest
@CsvSource({"true, FULL", "false, INCREMENTAL"})
void getReindexJobStatus_shouldReturnStatus_whenJobExists(boolean isFullReindex,
ReindexTypeEnum expectedType) throws Exception {
// given
resourceTestService.saveGraph(getSampleWork());
var reindexUrl = isFullReindex ? FULL_REINDEX_URL : INCREMENTAL_REINDEX_URL;
var jobExecutionId = parseLong(mockMvc.perform(post(reindexUrl).headers(defaultHeaders(env)))
.andExpect(status().isOk())
.andReturn().getResponse().getContentAsString());
awaitJobCompletion(jobExecutionId);

// when / then
mockMvc.perform(get(REINDEX_STATUS_URL)
.param("jobExecutionId", String.valueOf(jobExecutionId))
.headers(defaultHeaders(env)))
.andExpect(status().isOk())
.andExpect(content().contentType(APPLICATION_JSON))
.andExpect(jsonPath("$.status").value("COMPLETED"))
.andExpect(jsonPath("$.reindexType").value(expectedType.getValue()))
.andExpect(jsonPath("$.startDate").isNotEmpty())
.andExpect(jsonPath("$.endDate").isNotEmpty())
.andExpect(jsonPath("$.startedBy").isNotEmpty())
.andExpect(jsonPath("$.linesRead").isNumber())
.andExpect(jsonPath("$.linesSent").isNumber());
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an integration test for the 404 case when a job execution ID doesn't exist. While there's a unit test for this scenario in ReindexJobServiceImplTest, an integration test would verify the full request/response flow and ensure proper error handling through the controller layer. For example, a test that calls the endpoint with a non-existent ID and expects a 404 response.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +16
@Entity
public class BatchStepExecution {
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing explicit @Table annotation. Consider adding @table(name = "batch_step_execution") for clarity and to ensure compatibility with Spring Batch's table schema.

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +16
@Entity
@IdClass(BatchJobExecutionParams.BatchJobExecutionParamsId.class)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing explicit @Table annotation. Consider adding @table(name = "batch_job_execution_params") for clarity and to ensure compatibility with Spring Batch's table schema.

Copilot uses AI. Check for mistakes.
+ " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'isFullReindex')")
private String isFullReindex;

@OneToMany(mappedBy = "jobExecution", fetch = FetchType.EAGER)
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stepExecutions relationship uses FetchType.EAGER, which means all step executions will be loaded immediately when fetching a job execution. While this is convenient for the current use case (status endpoint always needs step data), it could become a performance issue if job executions have many steps or if this entity is used in other contexts where step data isn't needed. Consider whether LAZY loading with an explicit join in the repository query would be more appropriate.

Suggested change
@OneToMany(mappedBy = "jobExecution", fetch = FetchType.EAGER)
@OneToMany(mappedBy = "jobExecution", fetch = FetchType.LAZY)

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +32

@Formula("(SELECT p.parameter_value FROM batch_job_execution_params p"
+ " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'startedBy')")
private String startedBy;

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @Formula subquery references 'job_execution_id' without a table qualifier in the WHERE clause. While Hibernate should interpret this as the current entity's column, it would be clearer and safer to use an alias for the outer table. For example: '@formula("(SELECT p.parameter_value FROM batch_job_execution_params p WHERE p.job_execution_id = this_.job_execution_id AND p.parameter_name = 'startedBy')")'. However, since Hibernate doesn't support 'this_' in formulas by default, the current approach is acceptable but could be documented with a comment explaining the implicit reference.

Suggested change
@Formula("(SELECT p.parameter_value FROM batch_job_execution_params p"
+ " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'startedBy')")
private String startedBy;
// Note: In this @Formula, the unqualified column name 'job_execution_id' is
// intentionally used to refer to the current entity's column. Hibernate resolves
// it against the outer BatchJobExecution table. Using an alias like 'this_' is
// not supported in @Formula expressions.
@Formula("(SELECT p.parameter_value FROM batch_job_execution_params p"
+ " WHERE p.job_execution_id = job_execution_id AND p.parameter_name = 'startedBy')")
private String startedBy;
// See comment above: 'job_execution_id' is intentionally unqualified so that
// Hibernate interprets it as the column of the outer BatchJobExecution entity.

Copilot uses AI. Check for mistakes.
Comment on lines 320 to +323
The returned job instance ID can be used to monitor the job's status.

### Reindex job status
The status of a reindex job can be retrieved by making a GET request to the `/linked-data/reindex/status` endpoint with the `jobExecutionId` query parameter:
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation still refers to "job instance ID" but the code now returns a job execution ID. This should be updated throughout for consistency. For example, line 320 says "The returned job instance ID can be used to monitor the job's status" but it should say "job execution ID".

Copilot uses AI. Check for mistakes.
) {
var exec = new ThreadPoolTaskExecutor();
exec.setCorePoolSize(poolSize);
exec.setMaxPoolSize(poolSize);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread pool configuration has been changed from using a bounded queue to an unbounded queue, and setCorePoolSize has been added. While this change makes the pool create all threads immediately (corePoolSize = maxPoolSize), the unbounded queue could potentially lead to memory issues if tasks are submitted faster than they can be processed. Consider documenting this change in the PR description or adding a comment explaining the rationale. Alternatively, if the bounded queue behavior was intentional, this might be an unintended change.

Suggested change
exec.setMaxPoolSize(poolSize);
exec.setMaxPoolSize(poolSize);
exec.setQueueCapacity(poolSize);

Copilot uses AI. Check for mistakes.
@EqualsAndHashCode(of = {"jobExecutionId", "parameterName"})
@Entity
@IdClass(BatchJobExecutionParams.BatchJobExecutionParamsId.class)
public class BatchJobExecutionParams {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this class? Seems like this is not used anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants