Skip to content

schedule: zephyr_dma_domain: Fix various issues#7672

Closed
LaurentiuM1234 wants to merge 5 commits intothesofproject:mainfrom
LaurentiuM1234:zephyr_dma_domain_agg_irq_fix
Closed

schedule: zephyr_dma_domain: Fix various issues#7672
LaurentiuM1234 wants to merge 5 commits intothesofproject:mainfrom
LaurentiuM1234:zephyr_dma_domain_agg_irq_fix

Conversation

@LaurentiuM1234
Copy link
Copy Markdown
Contributor

Issues fixed by this series of patches:

  1. Zephyr DMA domain doesn't support DMACs with aggregated IRQs. This lead to the DMA IRQ handler giving multiple semaphore resources (since the handler is registered multiple times for the same IRQ but with different data)
  2. Each time a DMA IRQ was triggered all pipeline tasks were executed which, in the case of i.MX8MP, would lead to warnings such as "no bytes to copy". To fix this, a pipeline task is now executed only if a DMA IRQ was triggered for the channel used by said task.

@LaurentiuM1234 LaurentiuM1234 force-pushed the zephyr_dma_domain_agg_irq_fix branch 3 times, most recently from efdd18d to 2389f13 Compare May 23, 2023 10:51
In the case of DMACs having a single IRQ for all their channels,
dma_irq_handler() is registered for each of these channels resulting
in it giving the semaphore more than 1 resource at a time which is
wrong.

This commit fixes the issue by checking if the IRQ was triggered for
the channel passed as argument to the handler before giving the
semaphore resources.

Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
This field will be used for Zepyhr DMA domain internal housekeeping.

Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Since domain_is_pending is not used by all domains (e.g: Intel's
Zephyr timer domain) and we don't want to force the domains to
pointlessly define their own domain_is_pending which would return true
we do this inside ll_schedule_domain.h. If the domain_is_pending()
operation is not defined then the function will just return true, thus
not affecting the flow for domains which don't need this function.

Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Currently, tasks are scheduled whenever an IRQ comes. This means that,
for example, if there's 2 pipeline tasks (capture and playback => 2
channels) and the DMA IRQ is triggered for the capture pipeline task,
the playback task will also be scheduled resulting in warnings such as
"no bytes to copy".

This commit fixes this issue by making use of the domain_is_pending()
function which will return true if the task can be executed.

This only affects the Zephyr DMA domain.

Currently, this approach will not work for the mixer.

Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>
Currently, because of the fact only registrable pipeline tasks
can be executed, the mixer no longer works since the pipeline
tasks going into the mixer are non-registrable.

In order to overcome this issue, non-registrable pipeline tasks
will point to the same struct zephyr_dma_domain_data as the
registrable pipeline task so when the DMA IRQ comes and marks
the registrable pipeline task as executable these non-registrable
pipeline tasks can also be executed. This is done by keeping track
of how many pipeline task point to the same struct
zephyr_dma_domain_data and of how many times domain_is_pending() has
been called for said struct zephyr_dma_domain_data. When pending_count
reaches 0, data->triggered can be set to false, meaning both the
registrable and non-registrable pipeline tasks have been executed.

Let's assume the following scenario with 4 pipeline tasks:

1)
	Host -------> Volume --------> DAI

2)
	Host -------> Mixer ----------> DAI
			^
			|
	Host -----------|

This fix assumes that the mixer pipeline tasks will be
registered as a group, meaning the first pipeline task
will not be registered in between the pipeline tasks
associated with the mixer.

Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com>

tr_info(&ll_tr, "zephyr_dma_domain_unregister()");

data->pending_count--;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that you should do this unconditionally. What if we're unregistering a domain that isn't pending? Can this lead to a negative value for pending_count?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this shouldn't happen since domain_is_pending() always resets its value when 0. Since cancelling a task is synchronous that means domain_is_pending() will always be called.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to stress test and if this does happen then we have a really big problem.

Copy link
Copy Markdown
Collaborator

@paulstelian97 paulstelian97 May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I feel like this very decrement can get it below zero, before domain_is_pending() has a chance to reset it.

I don't want this to be even possible (even if it takes a long while to actually happen -- see my recent submission which had a bug that took 3 years to manifest)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a particular scenario in mind?

Copy link
Copy Markdown
Collaborator

@paulstelian97 paulstelian97 May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say, if you're somehow stopping two pipelines quickly which leads to unregistering things inbetween them. Say if it's playback and capture (which work with different scheduling component). The following order may cause issues:

  1. IRQ arrives for playback; it is handled, pending = 1
  2. Playback pipeline is torn down, pending = 0
  3. Capture pipeline is torn down before IRQ ever arrives for capture this period, pending = -1

It's pretty hard to actually have this happen I might say, but having "pending" not equate an actual pipeline walk pending simply doesn't sit well with me (variable name hinting one thing while it technically isn't that thing)

Edit: Or is "pending_count" separate between scheduling components, and thus more of a thing to help order things? That would make the scenario impossible, in which case my complaint becomes one of naming.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Or is "pending_count" separate between scheduling components, and thus more of a thing to help order things? That would make the scenario impossible, in which case my complaint becomes one of naming.

Yep, pretty much. The purpose of pending_count is to allow non-registrable pipeline tasks to also be executed since, in the case of mixers, the only task that would be executed without it would be the registrable one. In the case of registrable pipeline tasks (like in your case) this value isn't shared. We can also consider that we have a different pending_count for each scheduling component like you said.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then my only remaining complaint is that the name can send you down the wrong train of thought. I can accept it if nobody else agrees with me though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an assert() at least

@dbaluta
Copy link
Copy Markdown
Collaborator

dbaluta commented May 23, 2023

I think this is good enough to be removed from Draft state.

/* private data used by Zephyr DMA domain for internal
* housekeeping - don't touch.
*/
void *priv;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't mention about Zephyr DMA domain here. Other's might use this also in the future. This is common enough inside structures so a generic comment like private data for internal housekeeping would be enough.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot you just add your ctrl_data pointer to struct zephyr_ll_pdata stored in task.priv_data?

if (domain->ops->domain_is_pending)
return domain->ops->domain_is_pending(domain, task, comp);
else
return true;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usual pattern here is to remove the else and say somthing like this.

if (foo)
     return foo();

return true;

* zephyr_dma_domain_is_pending to decide
* whether there was a DMA IRQ for the channel
* associated with this task.
*/
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chars per line limit is at 100. So we can compact the compact the comments here on fewer lines.

/* private data used by Zephyr DMA domain for internal
* housekeeping - don't touch.
*/
void *priv;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot you just add your ctrl_data pointer to struct zephyr_ll_pdata stored in task.priv_data?

*/
bool triggered;
/* used to keep track of how many pipeline tasks point
* to the same data structure.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the same as just "number of tasks, sharing the same data structure?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

* to the same data structure.
*/
int usage_count;
/* used to keep track of how many times data->triggered
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we similarly simplify this and remove "used to keep track of?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

!crt_data->sched_comp) && !pipe_task->registrable) {
/* attach crt_data to current pipeline
* task only if there's a match between
* the scheduling components or i
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"if"


return 0;
} else if (!pipe_task->registrable) {
return 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do this:

if (!pipe_task->registrable) {
	if (crt_data->sched_comp && crt_data->sched_comp != pipe_task->sched_comp)
		return 0;

	/* attach crt_data to current pipeline
	...
}

return ret;
}

/* don't let non-registrable tasks go further */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't add any information - it simply restates the next statement. If a comment is needed here, it should explain something that isn't obvious from the code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Will remove or rework it.


tr_info(&ll_tr, "zephyr_dma_domain_unregister()");

data->pending_count--;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add an assert() at least

@LaurentiuM1234
Copy link
Copy Markdown
Contributor Author

Closing this as it will come in conflict with #7923. Will eventually re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants