Skip to content

AUDR analysis updates RFC#91

Open
justincc wants to merge 14 commits intomasterfrom
justincc-rfc-audr-analysis-updates
Open

AUDR analysis updates RFC#91
justincc wants to merge 14 commits intomasterfrom
justincc-rfc-audr-analysis-updates

Conversation

@justincc
Copy link

@justincc justincc commented Jul 26, 2019

August 12: Last call for community review
August 23: Last day for oversight review

During community review, reviewers

  • Discussed requirements around updating/deleting files in envelopes through ingest api as a part of the process of updating analysis bundles
  • Small updates to wording and titles

Copy link
Contributor

@diekhans diekhans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking really good; clear and to the point. images are great. just minor updates requested.

6. For files that are in both envelopes but don’t need updating, do nothing.

![](../images/0000-update-file-relationships.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to describe that updated files are not really updated but deleted and then an independent new file is created and why. Also that this is a short term compromise to simplify implementation at the expense of a more complex data mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it appears this is no longer the case and the figure just needs updated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an updated analysis the figure tries to show the possible scenarios of a) not changing a file, b) updating a file, c) adding a file, d) deleting a file. But it sounds like this isn't easy to see and I should improve the diagram?

On the content, yes on further consideration deleting and re-adding all the other files seems a worse idea since it means some special case work and I'm not sure it ultimately helps analysis much. This is a change from what was discussed so I need feedback from @rexwangcc and @samanehsan.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincc can you clarify why the old files need to be deleted in order to add analysis files that have new uuids?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@samanehsan Because everything in the previous version of that bundle will automatically carry forward.

Copy link

@samanehsan samanehsan Aug 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you confirm that 2 analyses triggering in quick succession is a scenario that's not sufficiently theoretical for us to ignore temporarily?

@justincc I don't think the scenario is unrealistic because it could happen if a primary submitter makes/updates a submission even an hour or two apart. Looking at this production workflow as an example, after analysis creates a submission envelope in ingest it takes 2 hours to upload the files and confirm the submission. So if an update came in less than 2 hours later, we would have this "simultaneous" submission problem.

Also in this case, the confirm submission step only took 4 minutes but the workflow can wait up to 2 hours for the submission to validate. And then there is additional time for the submission to go through ingest and appear as a results bundle in the data store. So in the worst-case scenario, "quick succession" actually means 2+ hours.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincc, this is the concern that @rexwangcc and I mentioned in tech-arch today ^

Copy link
Author

@justincc justincc Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think the answer here is that ingest may need to do complex engineering to ensure non-interference between submissions. Does that address the issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what that will look like in ingest? Is the idea that analysis would still specify which files we think are new/updated/deleted and then ingest confirms? That seems like we would be duplicating effort.

Copy link
Author

@justincc justincc Aug 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment there's no plan for ingest to confirm anything, it just does what you say or some known interpretation thereof. If it's an update then the new update (which is given as the file, not a delta) will replace whatever the old version was. If it's addition of a file that already exists then it could just replace what is already there if this is the easiest behaviour. Or if it's a deletion of something that's already deleted then we could ignore that if this is the easiest behaviour.

@parthshahva parthshahva self-requested a review July 26, 2019 20:28
mweiden added a commit that referenced this pull request Jul 30, 2019
Many people are confused about usage here:
* #91 (comment)
* #90 (comment)
* ... have observed others
justincc and others added 2 commits July 31, 2019 17:30
Co-Authored-By: Matt Weiden <538456+mweiden@users.noreply.github.com>
Co-Authored-By: Matt Weiden <538456+mweiden@users.noreply.github.com>
mweiden added a commit that referenced this pull request Jul 31, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* #91 (comment)
* #90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
6. For files that are in both envelopes but don’t need updating, do nothing.

![](../images/0000-update-file-relationships.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincc I'm a little confused as to why the old files from the original analysis bundle need to be deleted. Why would you copy forward files from the old analysis bundle to the new analysis bundle? Shouldn't the copy forward be from the new primary bundle to the new analysis bundle?

Copy link
Contributor

@diekhans diekhans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!

diekhans pushed a commit to barkasn/dcp-community that referenced this pull request Aug 16, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* HumanCellAtlas#91 (comment)
* HumanCellAtlas#90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
@samanehsan
Copy link

@justincc there's an "rfc-paused" label you can add as well

diekhans pushed a commit to diekhans/dcp-community that referenced this pull request Oct 31, 2019
* Hint that the title name should be replaced

Many people are confused about usage here:
* HumanCellAtlas#91 (comment)
* HumanCellAtlas#90 (comment)
* ... have observed others

* Update rfc-template.md

* Update rfc-template.md
@hannes-ucsc hannes-ucsc removed their request for review May 4, 2021 00:39
@samanehsan samanehsan removed their request for review October 7, 2025 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants