Skip to content

smart_amp: revamp to two-layer modular design structure#8036

Merged
lgirdwood merged 1 commit intothesofproject:mainfrom
johnylin76:smart_amp_revamp
Sep 26, 2023
Merged

smart_amp: revamp to two-layer modular design structure#8036
lgirdwood merged 1 commit intothesofproject:mainfrom
johnylin76:smart_amp_revamp

Conversation

@johnylin76
Copy link
Contributor

The motivation is depicted in #7801

This PR revamps smart_amp component design to two-layer structure, i.e. generic layer and inner model layer.

Generic layer is the common part of smart amp process which can be regarded as the glue code interacting between SOF component ops and inner model. While inner model may have various implementations respectively for solution suppliers in a modular way.

This PR revamps Smart Amplifier component design to a two-layer structured design, i.e. generic layer and inner model layer. The latter can have various implementations respectively for Amplifier solution suppliers, while the former is the common part of Smart Amp process adaptable for all solutions.

In structural aspect, one can regard generic layer as the glue code that wraps inner model in a SOF component. Ops are defined for interaction between two layers. Inner model is the solution-specific modular code, which may have static libraries linked as needed. The structure is figured below:

                                  SRC(FF)   SINK(OUT)  +-SRC(FB)  bytectl
  +- SMART_AMP         |^ comp ops    |         ^      |           ^|
  | +------------------v|-------------v---------|------v-----------|v---------+
  | | Generic Layer                 (chan remap/fmt conv)          ||         |
  | |    (memory mgr)--+------> :::::::::BUFFERS:::::::::::::      |+> CONFIG |
  | +------------------|-|^-----------|---------^------|-----------^|---------+
  |                    | || mod ops   |         |      |           ||
  | +------------------v-v|-----------v---------|------v-----------|v---------+
  | | Inner Model   :::::::::::::::::::::BUFFERS:::::::::::::::::::::::       |
  | |                  (solution-specific impl/wrapper)            |+> MODEL  |
  | +------------------------------|^------------------------------^----------+
  +---                             v| lib ops                      | CALDATA
                             Static Libs (as needed)     ----------+
  Note:
  - FF(Feed-Forward): un-processed playback frame source
  - FB(Feedback): feedback reference frame source (from the capture pipeline)

As illustrated above, generic layer handles the cross-communication for inner model and SOF pipeline flow, as well as the smart-amp common tasks including:

  1. Channel remapping for input/output frames
  2. Frame format conversion for input/output frames. It allows inner model to work with different format from SOF audio stream. (Now it only allows the precision of inner model format >= SOF stream, e.g. inner model: S32_LE, SOF stream: S16_LE)
  3. Runtime memory full-management. The dynamic memory required by inner model will be allocated/owned/released by generic layer.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff @johnylin76 ! Just some minor comments.
@mengdonglin @alex-cri are testing pass thru smart amp in CI today ?

}

/* set frame formats to inner model */
ret = mod->mod_ops->set_fmts(mod, sad->ff_mod.frame_fmt, sad->fb_mod.frame_fmt);
Copy link
Member

@lgirdwood lgirdwood Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we are missing a module_set_fmt() wrapper for this in module core ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get the idea. Do you mean we need the ops set_fmt() to set the format of core?

The design now has ops set_fmts() to set the format of feed-forward (FF) data path and feedback (FB) path respectively. Indeed the core format can be independent of input/ouput data format, however I wonder if that has the real need (in fact I think FF and FB formats could align each other (as well as core format) then set_fmts() could be replaced with set_fmt()). Since the design now is proposed to handle input/output data format conversion in generic layer, it should be fair to require the format alignment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant an inline static wrapper func around the API ops dereference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Anyway it triggered my re-consideration of the format for module core. I revised to unify the module formats which include core, ff-input and fb-input previously. Now the module ops will only set one format .

Comment on lines +23 to +57
for (fr = 0 ; fr < frames ; fr++) {
for (ch = 0 ; ch < src_mod->channels; ch++) {
if (chan_map[ch] == -1) {
*mod_ptr = 0;
} else {
src_idx = src_frag + chan_map[ch];
src_ptr = audio_stream_read_frag_s32(src, src_idx);
*mod_ptr = *src_ptr;
}
mod_ptr++;
}
src_frag += src_ch;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singalsu @andrula-song is this using the optimal method now ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to write with frames without wrap helper to avoid audio_stream_read_frag_s32() usage for every sample read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the most waste part is if else, take 48 frame, 2 channel as example, if need 96 check for if/else, this is big cost.
better to use channel in first loop, and then check, chan_map[ch], if == -1, with one loop, else with another loop, like below:

for (ch = 0 ; ch < src_mod->channels; ch++) {
	if (chan_map[ch] == -1) {
		for (fr = 0 ; fr < frames ; fr++) {
			*mod_ptr = 0;
			mod_ptr += src_mod->channels;
		}
	} else {
		for (fr = 0 ; fr < frames ; fr++) {
			src_idx = src_frag + chan_map[ch];
			src_ptr = audio_stream_read_frag_s16(src, src_idx);
			*mod_ptr = *src_ptr;
                            mod_ptr += src_mod->channels;
		}
		src_frag += src_ch;
	}
}

this will have only 2 if/else, will save cycles a lot.
Note: this is just reference code, please have more investigation to get correct offset calculation.

Copy link
Collaborator

@lyakh lyakh Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 I don't think that's equivalent to the above: in the original code src_frag is incremented for each frame and kept the same across channels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice. I will do the follow-up optimization

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is just for demo purpose for save if/else, as I said in the note, "need more investigation to get correct offset"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just updated the commit for better buffer read/write strategy.

} else {
src_idx = src_frag + chan_map[ch];
src_ptr = audio_stream_read_frag_s32(src, src_idx);
*mod_ptr = *src_ptr << 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, use audio_stream_read_frag_s32(), there is an example in aria_algo_get_data() in aria_generic.c

@cujomalainey
Copy link
Contributor

@lgirdwood how much power (and would it be even worth it) if userspace opened the PCM in non-blocking so we can just never yield samples to userspace allowing both CRAS and the DMA to sit idle?

@lgirdwood
Copy link
Member

@lgirdwood how much power (and would it be even worth it) if userspace opened the PCM in non-blocking so we can just never yield samples to userspace allowing both CRAS and the DMA to sit idle?

@cujomalainey not sure - ideally if we do nothing then we should not consume power. However, this proposal may enable DMA channels and maybe other IPs in preparation for stream propagation (and consume some power, but I would think less than the full stream).

I assume you want to minimize the number of PCMs for smart amp when not needed ?I think @ranj063 has recently been looking at a way to get rid of the second PCM when its not needed (IIRC for AEC?).

@lgirdwood lgirdwood added this to the v2.8 milestone Aug 31, 2023
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnylin76 This looks ready to land, I think we just need the optimal copy() code and a few minor comments and we are good.

The selection for Smart Amplifier component implementation
will depend on the Amplifier solution supplier. It is fair
to treat the supported solutions as mutually exclusive ones.
There should be no more than one solution selected per build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we setup a system similar to modules and have support for all devices at once? You could also do composition and just have this by a stand in for default implementation in a module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are able to do it for sure. The code implementation will be a bit more complicated (as how module_adapter is implemented https://github.com/thesofproject/sof/blob/main/src/include/sof/audio/module_adapter/module/generic.h

I didn't implement that way due to the proposed plan in #7801 said the long-term goal is to use module API for all components, so I would defer that to the future task for converting smart_amp to use module API.


/* available formats: SOF_IPC_FRAME_S16_LE, SOF_IPC_FRAME_S24_4LE, SOF_IPC_FRAME_S32_LE */
for (i = 0; i < num_ff_fmts; i++) {
if (ff_fmts[i] >= ff_src_fmt && ff_fmts[i] <= SOF_IPC_FRAME_S32_LE) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comparing formats for greater or less looks a bit suspicious... Currently we don't do that, do we?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I agree with that. The comparison will be easy for audio_format in IPC4 by comparing the depth attribute.
The design now (in IPC3 format) relies on the background knowledge that the enum value is the same order as the depth among {S16_LE=0, S24_4LE=1, S32_LE=2}, which is implicit.

To elaborate the current design, I would propose to wrap the implicit comparison into helper functions with comments for the background. Or maybe add get_depth(fmt) helper and compare by get_depth(fmt). get_depth(fmt) can also help smart_amp_generic implementation where we need to specify the shift bit for each conversion pair, e.g. S16-to-S24, S16-to-S32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the commit. I added a helper function get_sample_bitdepth() in include/sof/audio/format.h. I'm not sure if it's appropriate, we can also put it inside of smaty_amp if that is the case. WDYT?

buffer_release(buf_c);
} else {
/* should not be used anyway */
fb_src_fmt = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is SOF_IPC_FRAME_S16_LE. Maybe -1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just revised the code to prevent the redundant code like this.

@johnylin76 johnylin76 requested a review from kv2019i as a code owner September 7, 2023 02:24
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first-pass review, a few comments inline but this seems solid. With 2.8 opened for development, this would be a good time to get bigger changes like this in. We do have a couple of big changes ongoing (one is the move to module adapter) and now with 2.8 we will revamp the buffer structure (#8106 ), so some coordination is needed to avoid too many rebase hassles.

size = SMART_AMP_FB_BUF_DB_SZ * sizeof(int32_t);
ret = smart_amp_buf_alloc(&sad->fb_mod.buf, size);
if (ret < 0)
return ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit suspicious how ff_mod.buf memory is freed if error happens here. It seems this is ok, you free all on free_data_buffers, but it's a bit unconventional to have the caller free partial resources, so a comment would be nice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually make this even a bit stronger: I find it significantly better to process all function-internal errors within that same function. In this case this would result in a duplicated call to the clean up function, but I think it would be better than relying on the caller to clean up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret = smart_amp_buf_alloc(&sad->ff_mod.buf, size);
if (ret < 0)
return ret;
total_size += size;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: you can remove initialisation and just do total_size = size here

size = SMART_AMP_FB_BUF_DB_SZ * sizeof(int32_t);
ret = smart_amp_buf_alloc(&sad->fb_mod.buf, size);
if (ret < 0)
return ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually make this even a bit stronger: I find it significantly better to process all function-internal errors within that same function. In this case this would result in a duplicated call to the clean up function, but I think it would be better than relying on the caller to clean up.

return NULL;
if (frames == 0) {
comp_warn(dev, "smart_amp_copy(): feedback frame size zero warning.");
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning 0 means no error, and a warning message is issued. And this is a processing function. Can it happen that this warning will flood the log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this warning message is as it was actually (just moved here due to the revamp.

Based on my previous debugging experience, this can be reproduced every now and then with no harm (well... at least as far as I've met), but is not an expected behavior.

{
enum DSM_API_MESSAGE retcode;
struct dsm_api_memory_size_ext_t memsize;
int *circularbuffersize = hspk->circularbuffersize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a general comment: such names would be easier to read with underscores, e.g. circular_buffer_size. Can be a follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as the original code (authored by Maxim) before the revamp (just moved from elsewhere). So I would keep it as is. And I think it is probably named that way on purposes to be aligned with the pre-built libraries for Maxim

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 21, 2023

@johnylin76 if you have time now to do a rebase after #8217 is merged, we could now push this in.

@lgirdwood
Copy link
Member

@johnylin76 if you have time now to do a rebase after #8217 is merged, we could now push this in.

@johnylin76 #827 now merged, good to go ?

@johnylin76
Copy link
Contributor Author

@johnylin76 if you have time now to do a rebase after #8217 is merged, we could now push this in.

@johnylin76 #827 now merged, good to go ?

Thanks. I just addressed the 2nd-round comments and rebased the commit.

The motivation is depicted in
#7801

This commit revamps smart_amp component design to two-layer structure, i.e.
generic layer and inner model layer.

Generic layer is the common part of smart amp process which can be regarded
as the glue code interacting between SOF component ops and inner model.
While inner model may have various implementations respectively for
solution suppliers in a modular way.

Signed-off-by: Pin-chih Lin <johnylin@google.com>
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @johnylin76 , looks good now!

@lgirdwood lgirdwood merged commit 9ca86c9 into thesofproject:main Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants