Skip to content

AJDA-594 (input-mapping) job-based workspace loading for async input mapping#414

Merged
ErikZigo merged 22 commits intomainfrom
erik-AJDA-594-part1
Sep 11, 2025
Merged

AJDA-594 (input-mapping) job-based workspace loading for async input mapping#414
ErikZigo merged 22 commits intomainfrom
erik-AJDA-594-part1

Conversation

@ErikZigo
Copy link
Copy Markdown
Contributor

@ErikZigo ErikZigo commented Sep 1, 2025

Fixes https://keboola.atlassian.net/browse/AJDA-594


Job-based table loading architecture for workspace operations, laying the foundation for asynchronous input mapping.

  • WorkspaceLoadJob - Encapsulates table loading operations with unique job IDs
  • WorkspaceLoadQueue - Manages collections of load jobs for batch processing
  • WorkspaceLoadPlan - Orchestrates loading instructions with clone/copy strategies
  • WorkspaceTableLoadInstruction - Configuration for individual table loads

Public API Enhancements

  • prepareAndExecuteTableLoads() - New method enabling separation of preparation and execution phases
  • prepareTableLoadsToWorkspace() - Validates and plans table loads to workspace before execution
  • executeTableLoadsToWorkspace() - Submits prepared workspace load jobs to Storage API for asynchronous execution

Code Organization

  • Complex logic extracted into focused private methods across Reader and Strategy classes
  • Unified manifest and data path building logic
  • Clear separation of concerns between preparation, decision-making, and execution phases

The changes maintain full backward compatibility

@ErikZigo ErikZigo force-pushed the erik-AJDA-594-part1 branch from b677284 to e8643cb Compare September 5, 2025 07:34
Comment thread libs/input-mapping/src/Reader.php
Comment thread libs/input-mapping/src/Table/Strategy/AbstractStrategy.php Outdated
Comment thread libs/input-mapping/src/Reader.php
Comment thread libs/input-mapping/src/Reader.php
Comment on lines +254 to +265
if (!$plan->preserve) {
$this->logger->info('Cleaning workspace before loading tables.');
$cleanJobId = $workspaces->queueWorkspaceCloneInto($workspaceId, [
'input' => [], // workspace will be only cleaned
'preserve' => 0,
]);

// Wait for clean job to complete before proceeding
$this->clientWrapper->getBranchClient()->handleAsyncTasks([$cleanJobId]);

// Don't add CLEAN job to queue since it's already completed
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tohle je odlisnost od puvodni implementace.

Tam to bylo synchronne za sebou. Clone a pak Copy. Pokud nemel byt preserve, tak prvni job i promazal workspace.

Tady ale chceme aby to bylo async. A zafrontovaly se oba joby.
Takze cleanup se musi vyresit predtim. Proto tenhle job. Zaroven se ceka na nej, nez se dokonci

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No to je prekerní tohle, vůbec se mě to neliíbí, protože to je zase hroznej overhead s jobama, ale jako lepší řešní nemám (pochopitelně bez toho, aby to podporovalo connection a týhle věci jsme se úplně zbavili).

Napadá mě jedině ten job založit a podívat se jestli se switchnul na processing a pak teprv založit ten druhej, čimž by se zařídilo, že jdou po sobě, jenže by to současně znamenalo, že když se zafrontují, tak se to celý zasekne a vytimoutuje, což je naprd.
Jenže stejně tak se to celý zafrontuje a vytimoutuje i když se bude čekat na ten cleanup job.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pro betu mi to přijde legitimní a směroplatný takhle

@ErikZigo ErikZigo changed the title feat (input-mapping) workspace cleanup before load if needed AJDA-594 (input-mapping) job-based workspace loading for async input mapping Sep 6, 2025
@ErikZigo ErikZigo marked this pull request as ready for review September 6, 2025 11:31
@ErikZigo
Copy link
Copy Markdown
Contributor Author

ErikZigo commented Sep 6, 2025

Pokracovani by mohlo byt nejak takhle #417

@odinuv odinuv requested a review from Copilot September 8, 2025 18:38
$this->logger->info(sprintf('Table "%s" will be cloned.', $table->getSource()));
if ($loadType === WorkspaceLoadType::CLONE) {
return [
'table' => $table,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jako moc nechápu, proč se tady neposílají ty options taky, oni jsou samozřejmě všechny zbytečný (kromě overwrite - to se validuje v canClone, kterej se tam asi moc nepoužívá), ale ani tak by jako nemělo ničemu vadit to tam poslat

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a ano, vidím, že to tak bylo už předtím :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do tohodle bych nehrabal. Ted ten endpoint v KBC ty ostatni parametry ignoruje. Ale az se to mergne do jednoho nedpointu, tak by to zaclo asi hazet chyby.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vtip je v tom, že ty ostatní parametry se tam nepošlou, ale klidně to tam nech, bylo to tam, může to tam zůstat

'overwrite' => $table->getOverwrite(),
'dropTimestampColumn' => !$table->keepInternalTimestampColumn(),
];
// ?????? sourceBranchId is alreeady set above and from same method, why check again?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No jo, to jsem psal já a vůbec nevím proč to tam je, podle mě je to úplná pitomost. Na svou obhajobu můžu říct jen to, že to tam bylo kvůli SOXu a řekl bych, že to vzniklo tak, že tam byl nějakej bug na connection což zmiňuje ten komentář a pak to někdo fixnul a dal jsem to nahoru a zapomněl už oddělat tu podmínku

dal bych to pryč

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$loadOptions = $table->getStorageApiLoadOptions($this->tablesState);
$loadType = $this->decideTableLoadMethod($table, $loadOptions);

$instructionLoadOptions = $loadType === WorkspaceLoadType::CLONE ? null : $loadOptions;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tady mi přijde, že by mit taky nemuselo být to větvení


public function hasCloneInstructions(): bool
{
return !empty($this->getCloneInstructions());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ještě, že je to pole malý a můžem si dovolit takovýhle plejtvání

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to psalo ai :)


namespace Keboola\InputMapping\Table\Strategy;

enum WorkspaceJobType: string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nemůžem použít WorkspaceLoadType ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To by vlastně šlo 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$copyTableOptions,
['overwrite' => true],
),
new WorkspaceTableLoadInstruction(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jako tohle je celkem unreal kombinace

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jako muzu to rozdelit do vice testu, ale ty jsi prvni kdo je spis pro jejic mergovani do jednoho. Tak jsem ti chtel udelat radost :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nejde o oddělení testů, ale o to, že ty takovýhle instrukce se v reálu neobjeví

self::assertInstanceOf(WorkspaceLoadQueue::class, $result);
self::assertCount(1, $result->jobs);
self::assertTrue($this->testHandler->hasInfoThatContains('Cleaning workspace before loading tables.'));
self::assertTrue($this->testHandler->hasInfoThatContains('Cloning 1 tables to workspace.'));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jako procházet tyhle vygenerovaný testy je fakt na sebevraždu, půlka z nich se překryvá s něčím jiným

);

$reflection = new ReflectionClass($local);
$method = $reflection->getMethod('getDataFilePath');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toto taky duplikuje getManifestPathDataProvider, jako kdyby se to vytáhlo, tak aby to šlo otestovat, tak by k tomu nemuselo bejt 1500 řádků testů

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


$plan = new WorkspaceLoadPlan([$copyInstruction, $viewInstruction], false);

self::assertFalse($plan->hasCloneInstructions());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jako zase, tohle se dá přidat k předchozímu testu a dál je to zbytečný, stačí jich tady polovina


$workspaceStrategy = $this->createMock(AbstractWorkspaceStrategy::class);
$workspaceStrategy
->expects(self::once())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tady má bejt zase $this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was marked as outdated.

Copy link
Copy Markdown
Member

@odinuv odinuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jako může se to mergnout - ve bude to jistý zlepšení, i když to pořád neřeší ten asynchronní load - buede potřeba řešit tohle https://keboolaglobal.slack.com/archives/C0553EM2RGX/p1757331962719179

a bylo by super probrat ty testy a nechat tam jen ty užitečný, jako víc řádků testů neznamená, že je to líp otestovaný, sice jsou to mocky většinou a moc to nezpomalí to, ale udržovat se budou muset

@ErikZigo ErikZigo requested a review from Copilot September 9, 2025 09:48

This comment was marked as outdated.

@ErikZigo ErikZigo requested a review from Copilot September 9, 2025 10:21

This comment was marked as outdated.

@ErikZigo ErikZigo requested a review from Copilot September 9, 2025 10:24

This comment was marked as outdated.

@ErikZigo ErikZigo requested a review from Copilot September 9, 2025 13:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

The PR introduces a job-based table loading architecture for workspace operations, enabling asynchronous input mapping in the future. The changes lay the foundation by introducing structured job management with unique IDs, load queues for batch processing, and orchestration plans with clone/copy strategies.

  • New job-based architecture with WorkspaceLoadJob, WorkspaceLoadQueue, and WorkspaceLoadPlan classes
  • Enhanced AbstractWorkspaceStrategy with separate preparation and execution phases
  • New public API methods: prepareAndExecuteTableLoads(), prepareTableLoadsToWorkspace(), and executeTableLoadsToWorkspace()

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated no comments.

Show a summary per file
File Description
WorkspaceLoadPlanTest.php Tests for workspace load plan functionality including instruction filtering
WorkspaceLoadJobTest.php Tests for workspace load job validation and type constraints
TestWorkspaceStrategy.php Test helper class extending AbstractWorkspaceStrategy
SnowflakeTest.php Added test for getWorkspaceType() method
LocalTest.php Tests for data file path building logic
BigQueryTest.php Added test for getWorkspaceType() method
AbstractWorkspaceStrategyTest.php Comprehensive tests for workspace strategy preparation and execution
AbstractStrategyTest.php Tests for manifest path building logic
AbstractFileStrategyTest.php Tests for file strategy validation and getters
ReaderTest.php Tests for new prepareAndExecuteTableLoads() method
WorkspaceTableLoadInstruction.php Value object for table load instructions
WorkspaceLoadType.php Enum defining clone, copy, and view load types
WorkspaceLoadQueue.php Container for workspace load jobs
WorkspaceLoadPlan.php Orchestration plan with instruction filtering
WorkspaceLoadJob.php Job representation with validation
Snowflake.php Made getWorkspaceType() public
S3.php Refactored to use getManifestPath() helper
Local.php Refactored to use getDataFilePath() and getManifestPath() helpers
BigQuery.php Made getWorkspaceType() public
AbstractWorkspaceStrategy.php Major refactoring with new job-based methods and extracted logic
AbstractStrategy.php Added getManifestPath() helper and abstract methods
AbstractFileStrategy.php Added getter methods for metadata storage and destination
ABS.php Refactored to use getManifestPath() helper
Reader.php Added prepareAndExecuteTableLoads() method and refactored table processing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@ErikZigo ErikZigo requested a review from odinuv September 9, 2025 15:25
@ErikZigo
Copy link
Copy Markdown
Contributor Author

Release Notes

The changes maintain full backward compatibility:

  • Extracted complex logic into focused private methods.
  • Code deduplication

New feature:

  • Implementing job-based table loading architecture for workspaces

Change Type

Minor

Impact Analysis

  • New functionality will be used only by AI Service/SQL editor.
  • Input Mapping behaviour stays same as is, with no changes.

Customer Communication Plans

Not needed

Justification

  • Requirement for AI Service/SQL editor. Allow users to start working with editor while table loading continues in background, eliminating wait times.
  • Eliminates code duplication in path management.

Deployment Plan

Release lib as minor version. Primary bump its version in AI service.

Rollback Plan

Fix version of input mapping in AI service to previous version.

Post-Release Support Plan

No special post-release support needed

@ErikZigo ErikZigo requested a review from ujovlado September 11, 2025 12:16
@ErikZigo ErikZigo merged commit d1234d2 into main Sep 11, 2025
14 checks passed
@ErikZigo ErikZigo deleted the erik-AJDA-594-part1 branch September 11, 2025 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants