-
Notifications
You must be signed in to change notification settings - Fork 0
asserting that ingest exports valid provenance metadata #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,7 +9,7 @@ before_script: | |
| - apt-get -y update | ||
| - apt-get -y install jq | ||
| - pip install -r requirements.txt | ||
| - export DEPLOYMENT_ENV=$CI_COMMIT_REF_NAME | ||
| - export DEPLOYMENT_ENV=integration | ||
| - export AWS_DEFAULT_REGION=us-east-1 | ||
| - export SWAGGER_URL="https://dss.$DEPLOYMENT_ENV.data.humancellatlas.org/v1/swagger.json" | ||
| - mkdir -p ~/.config/hca | ||
|
|
@@ -25,6 +25,7 @@ dcp_wide_test_SS2: | |
| only: | ||
| - integration | ||
| - staging | ||
| - validate-schema-versions | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'm a little confused why |
||
| script: | ||
| - python -m unittest tests.integration.test_end_to_end_dcp.TestSmartSeq2Run.test_smartseq2_run | ||
|
|
||
|
|
@@ -33,6 +34,7 @@ dcp_wide_test_metadata_update: | |
| only: | ||
| - integration | ||
| - staging | ||
| - validate-schema-versions | ||
| script: | ||
| - python -m unittest tests.integration.test_end_to_end_dcp.TestSmartSeq2Run.test_update | ||
|
|
||
|
|
@@ -41,5 +43,6 @@ dcp_wide_test_optimus: | |
| only: | ||
| - integration | ||
| - staging | ||
| - validate-schema-versions | ||
| script: | ||
| - python -m unittest tests.integration.test_end_to_end_dcp.TestOptimusRun.test_optimus_run | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,3 +7,4 @@ awscli | |
| hca-ingest | ||
| cromwell-tools>=1.1.2 | ||
| termcolor | ||
| jsonschema | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -5,6 +5,7 @@ | |
|
|
||
| from urllib.parse import urlparse | ||
| from datetime import datetime | ||
| from jsonschema import validate | ||
| import boto3 | ||
|
|
||
| from .azul_agent import AzulAgent | ||
|
|
@@ -98,6 +99,7 @@ def run(self, dataset_fixture, run_name_prefix="test"): | |
| else: | ||
| # == Non-scaling Logic == | ||
| self.wait_for_primary_bundles() | ||
| self.assert_valid_schema_versions_in_provenance() | ||
| self.wait_for_analysis_workflows() | ||
| self.wait_for_secondary_bundles() | ||
|
|
||
|
|
@@ -228,6 +230,20 @@ def wait_for_primary_bundles(self): | |
| raise RuntimeError(f'Expected {self.expected_bundle_count} primary bundles, but only ' | ||
| f'got {primary_bundles_count}') | ||
|
|
||
| def assert_valid_schema_versions_in_provenance(self): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we please move the assert function down to join the group of other assert functions below? (Roughly line 387).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps also write a small docstring for this function please so that folks who aren't familiar with the schema versions in the provenance can understand what is being validated here? |
||
| primary_bundles = [self.data_store.bundle_manifest(bundle_uuid, "aws") for bundle_uuid in self.submission_envelope.bundles()] | ||
|
|
||
| metadata_files = [] | ||
| for bundle in primary_bundles: | ||
| metadata_file_manifests = filter(lambda file: "metadata" in file["content-type"], bundle["bundle"]["files"]) | ||
| metadata_files.extend([self.data_store.get_file(file["uuid"], "aws") for file in metadata_file_manifests]) | ||
|
|
||
| for metadata_file in metadata_files: | ||
| schema_url = metadata_file["describedBy"] | ||
| schema = requests.get(schema_url).json() | ||
| validate(metadata_file, schema=schema) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This validate function won't actually check to make sure that At least this is what I think. Have you verified to make sure this test fails if there is a mismatch? |
||
|
|
||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Delete extra line break here. |
||
| def wait_for_analysis_workflows(self): | ||
| if not self.analysis_agent: | ||
| Progress.report("NO CREDENTIALS PROVIDED FOR ANALYSIS AGENT, SKIPPING WORKFLOW(s) CHECK...") | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,11 +8,11 @@ | |
| from ingest.importer.submission import Submission | ||
|
|
||
| from tests.wait_for import WaitFor | ||
| from ..utils import Progress, Timeout | ||
| from ..cloudwatch_handler import CloudwatchHandler | ||
| from ..data_store_agent import DataStoreAgent | ||
| from ..dataset_fixture import DatasetFixture | ||
| from ..dataset_runner import DatasetRunner | ||
| from tests.utils import Progress, Timeout | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think these changes are not needed once you are done testing in your own environment. Maybe remove them from the PR and keep these changes local/untracked? |
||
| from tests.cloudwatch_handler import CloudwatchHandler | ||
| from tests.data_store_agent import DataStoreAgent | ||
| from tests.dataset_fixture import DatasetFixture | ||
| from tests.dataset_runner import DatasetRunner | ||
|
|
||
| cloudwatch_handler = CloudwatchHandler() | ||
| DEPLOYMENTS = ('dev', 'staging', 'integration', 'prod') | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$CI_COMMIT_REF_NAMEresolves to the deployment being tested in the scheduled dcp-wide integration tests, e.g.integration,staging, andprod. Shouldn't this change be reverted to continue testing in the other environments?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a temporary change to just get the test running and passing in gitlab. It should be reverted once the changes are ready to merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An intermittent issue un-related to the changes caused the previous test failure for this branch. Will re-run and post a link to the completed pipeline