[PATCH v15 00/16] Add audio support in v4l2 framework

Wed May 8 18:00:59 AEST 2024

On 06/05/2024 10:49, Shengjiu Wang wrote:
> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab at kernel.org> wrote:
>>
>> Em Fri, 3 May 2024 10:47:19 +0900
>> Mark Brown <broonie at kernel.org> escreveu:
>>
>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
>>>> Mauro Carvalho Chehab <mchehab at kernel.org> escreveu:
>>>
>>>>> There are still time control associated with it, as audio and video
>>>>> needs to be in sync. This is done by controlling the buffers size
>>>>> and could be fine-tuned by checking when the buffer transfer is done.
>>>
>>> ...
>>>
>>>> Just complementing: on media, we do this per video buffer (or
>>>> per half video buffer). A typical use case on cameras is to have
>>>> buffers transferred 30 times per second, if the video was streamed
>>>> at 30 frames per second.
>>>
>>> IIRC some big use case for this hardware was transcoding so there was a
>>> desire to just go at whatever rate the hardware could support as there
>>> is no interactive user consuming the output as it is generated.
>>
>> Indeed, codecs could be used to just do transcoding, but I would
>> expect it to be a border use case. See, as the chipsets implementing
>> codecs are typically the ones used on mobiles, I would expect that
>> the major use cases to be to watch audio and video and to participate
>> on audio/video conferences.
>>
>> Going further, the codec API may end supporting not only transcoding
>> (which is something that CPU can usually handle without too much
>> processing) but also audio processing that may require more
>> complex algorithms - even deep learning ones - like background noise
>> removal, echo detection/removal, volume auto-gain, audio enhancement
>> and such.
>>
>> On other words, the typical use cases will either have input
>> or output being a physical hardware (microphone or speaker).
>>
> 
> All, thanks for spending time to discuss, it seems we go back to
> the start point of this topic again.
> 
> Our main request is that there is a hardware sample rate converter
> on the chip, so users can use it in user space as a component like
> software sample rate converter. It mostly may run as a gstreamer plugin.
> so it is a memory to memory component.
> 
> I didn't find such API in ALSA for such purpose, the best option for this
> in the kernel is the V4L2 memory to memory framework I found.
> As Hans said it is well designed for memory to memory.
> 
> And I think audio is one of 'media'.  As I can see that part of Radio
> function is in ALSA, part of Radio function is in V4L2. part of HDMI
> function is in DRM, part of HDMI function is in ALSA...
> So using V4L2 for audio is not new from this point of view.
> 
> Even now I still think V4L2 is the best option, but it looks like there
> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
> a duplication of code (bigger duplication that just add audio support
> in V4L2 I think).

After reading this thread I still believe that the mem2mem framework is
a reasonable option, unless someone can come up with a method that is
easy to implement in the alsa subsystem. From what I can tell from this
discussion no such method exists.

>From the media side there are arguments that it adds extra maintenance
load, which is true, but I believe that it is quite limited in practice.

That said, perhaps we should make a statement that while we support the
use of audio m2m drivers, this is only for simple m2m audio processing like
this driver, specifically where there is a 1-to-1 mapping between input and
output buffers. At this point we do not want to add audio codec support or
similar complex audio processing.

Part of the reason is that codecs are hard, and we already have our hands
full with all the video codecs. Part of the reason is that the v4l2-mem2mem
framework probably needs to be forked to make a more advanced version geared
towards codecs since the current framework is too limiting for some of the
things we want to do. It was really designed for scalers, deinterlacers, etc.
and the codec support was added later.

If we ever allow such complex audio processing devices, then we would have
to have another discussion, and I believe that will only be possible if
most of the maintenance load would be on the alsa subsystem where the audio
experts are.

So my proposal is to:

1) add a clear statement to dev-audio-mem2mem.rst (patch 08/16) that only
   simple audio devices with a 1-to-1 mapping of input to output buffer are
   supported. Perhaps also in videodev2.h before struct v4l2_audio_format.

2) I will experiment a bit trying to solve the main complaint about creating
   new audio fourcc values and thus duplicating existing SNDRV_PCM_FORMAT_
   values. I have some ideas for that.

But I do not want to spend time on 2 until we agree that this is the way
forward.

Regards,

	Hans