Fix conf.has_option not respects default provider metadata#64209
Fix conf.has_option not respects default provider metadata#64209jason810496 wants to merge 16 commits intoapache:mainfrom
Conversation
c4c72ef to
ab2a76f
Compare
6fe99cb to
191986a
Compare
There was a problem hiding this comment.
Pull request overview
Refactors Airflow’s shared configuration parser so provider metadata/cfg fallback defaults are respected by conf.has_option/conf.get_default_value, while removing deprecated “load/restore provider configuration” pathways that created conf↔providers-manager recursion.
Changes:
- Reworks shared
AirflowConfigParserprovider-default handling by introducing a cachedconfiguration_description“view” and adjusting fallback/sentinel behavior forhas_optionand provider metadata lookup. - Removes deprecated provider-config load/restore hooks from core + SDK providers managers and updates unit tests accordingly.
- Adds new test coverage (core + task SDK) asserting provider metadata vs cfg fallback precedence, and tweaks default config lint script diagnostics.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
shared/configuration/src/airflow_shared/configuration/parser.py |
Core logic changes for provider fallbacks, configuration_description, has_option, get_default_value, and write/as_dict behavior. |
shared/configuration/tests/configuration/test_parser.py |
Updates shared parser tests for the new provider-fallback behavior. |
airflow-core/src/airflow/configuration.py |
Adapts core config wrapper to new internal _configuration_description and cache invalidation. |
airflow-core/src/airflow/providers_manager.py |
Removes deprecated conf-loading side effect from provider initialization. |
airflow-core/tests/unit/core/test_configuration.py |
Adds/updates tests for provider fallback precedence and provider toggle behavior. |
task-sdk/src/airflow/sdk/configuration.py |
Mirrors core configuration changes for SDK config wrapper. |
task-sdk/src/airflow/sdk/providers_manager_runtime.py |
Removes deprecated conf-loading side effect and unused property. |
task-sdk/tests/task_sdk/test_configuration.py |
New SDK tests validating provider metadata/cfg fallback priority and has_option. |
task-sdk/tests/task_sdk/test_providers_manager_runtime.py |
Updates runtime providers manager tests to use cache invalidation. |
devel-common/src/tests_common/test_utils/config.py |
Adds shared test data for provider-metadata vs cfg-fallback defaults. |
airflow-core/src/airflow/config_templates/provider_config_fallback_defaults.cfg |
Updates a celery default import path in cfg fallback defaults. |
shared/providers_discovery/src/airflow_shared/providers_discovery/providers_discovery.py |
Switches logging from structlog to stdlib logging. |
scripts/in_container/run_check_default_configuration.py |
Adjusts env/logging and failure diagnostics for default config linting. |
shared/configuration/src/airflow_shared/configuration/parser.py
Outdated
Show resolved
Hide resolved
shared/configuration/src/airflow_shared/configuration/parser.py
Outdated
Show resolved
Hide resolved
| # For cfg-only options with no provider metadata, infer sensitivity from name. | ||
| if "sensitive" not in opt_dict and option.endswith(("password", "secret")): | ||
| opt_dict["sensitive"] = True |
There was a problem hiding this comment.
Not sure do we need _secrets_masker().should_hide_value_for_key(option) or not?
We have secret masker for CLI, API, etc (anyplace that might expose the sensitive conf to user). The secret masker will handle the masking for us at higher level so perhaps we don't even need to handle the sensitive case here.
Thoughts?
There was a problem hiding this comment.
But secrets masker has to know if its sensitive? Some field names might not be deemed sensitive from name, like remote_task_handler_kwargs
There was a problem hiding this comment.
The case here only handle provider_default_fallback_values.cfg as we couldn't specify which [section/option] is sensitive or not in .cfg.
We have sensitive: bool field in provider.yaml though, so I'm trying to make the .cfg case here compatible with the sensitive field of provider.yaml by inferring from the option name.
But maybe it's not worthwhile to handle at all as you mention even we use secret masker here is still not accurate enough to make sure the option is sensitive or not.
|
I want to take a look at this one tomorrow. |
Thank you Amogh! |
Lee-W
left a comment
There was a problem hiding this comment.
A few nits, but overall looks good.
shared/providers_discovery/src/airflow_shared/providers_discovery/providers_discovery.py
Show resolved
Hide resolved
Simplifies and centralizes provider configuration loading by removing redundant methods and properties, ensuring consistent state management. Updates fallback logic to use a sentinel value instead of None, improving accuracy in config lookups. Improves test coverage for provider metadata fallbacks and clarifies lazy initialization behavior. Streamlines provider manager interfaces to reduce duplication and potential import cycles.
Ensures provider configuration sections do not overlap with core sections and gives priority to provider metadata when merging configuration options. Refactors tests to verify correct toggling of provider configuration and proper separation of provider and core config sections. Updates fallback lookup order and cleans up unused properties for clarity and maintainability.
Root cause: providers_discovery.py used structlog.getLogger() directly. Before structlog.configure() is called (which happens later in settings.py:726), structlog's default PrintLogger writes to stdout with no level filtering. So debug logs during early provider discovery pollute the stdout of airflow config list --default, corrupting the generated config file. Fix: Switched to logging.getLogger() (stdlib). stdlib logging defaults to WARNING level and writes to stderr, so debug logs are suppressed and stdout stays clean. This is also the correct pattern for shared library code — structlog configuration is the application's responsibility.
89206ea to
00e6a84
Compare
|
Thanks @Lee-W for the review and all the CI pass after addressing the comments. It's ready for next review round, thanks! |
amoghrajesh
left a comment
There was a problem hiding this comment.
Some comments, nothing too major but worth bringing to your notice
| # For cfg-only options with no provider metadata, infer sensitivity from name. | ||
| if "sensitive" not in opt_dict and option.endswith(("password", "secret")): | ||
| opt_dict["sensitive"] = True |
There was a problem hiding this comment.
But secrets masker has to know if its sensitive? Some field names might not be deemed sensitive from name, like remote_task_handler_kwargs
| if value is not VALUE_NOT_FOUND_SENTINEL: | ||
| return value | ||
| return VALUE_NOT_FOUND_SENTINEL |
There was a problem hiding this comment.
Just noticed this, shouldn't it be inverted?
There was a problem hiding this comment.
The VALUE_NOT_FOUND_SENTINEL here is more accurate than existing None if I understand correct.
The whole method looks like the following now:
def _get_option_from_provider_metadata_config_fallbacks(
self,
deprecated_key: str | None,
deprecated_section: str | None,
key: str,
section: str,
issue_warning: bool = True,
extra_stacklevel: int = 0,
**kwargs,
) -> str | ValueNotFound:
"""Get config option from provider metadata fallback defaults."""
value = self.get_from_provider_metadata_config_fallback_defaults(section, key, **kwargs)
if value is not VALUE_NOT_FOUND_SENTINEL:
return value
return VALUE_NOT_FOUND_SENTINEL
shared/configuration/src/airflow_shared/configuration/parser.py
Outdated
Show resolved
Hide resolved
restore_core_default_configuration instead of directly remove
|
Thanks @amoghrajesh for the review and good catch on compatibility, thread-safe issue for test cases! |
What
TL;DR;
Cleanup for deprecated path for the new "make conf respect provider metadata" mechanism.
The
write(include_providers=True),with conf.make_sure_configuration_loaded(with_providers=False)or anyconf.as_dictare the caller that introduce the recursive call betweenconfmodule and Provider Manager, so it would be better to clean them up in case they cause more regression just because we need to maintain those interfaces.Before the clean up:
configuration_descriptionas stateful attributesload_providers_configuration,restore_core_default_configurationto change the state ofconfiguration_descriptionjust to reflect the sources of theconfconfiguration_descriptionbefore the cleanup will not respect_provider_cfg_config_fallback_default_valuesor_provider_metadata_config_fallback_default_values.After the current clean up:
configuration_descriptionas viewconfiguration_descriptionis now a cache_property that can reflect_provider_cfg_config_fallback_default_valuesand_provider_metadata_config_fallback_default_valuesconfig sourcesWas generative AI tooling used to co-author this PR?