[PATCH v15 00/16] Add audio support in v4l2 framework

Fri May 17 00:50:39 AEST 2024

On 15. 05. 24 22:33, Nicolas Dufresne wrote:
> Hi,
> 
> GStreamer hat on ...
> 
> Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit :
>> On 15. 05. 24 12:19, Takashi Iwai wrote:
>>> On Wed, 15 May 2024 11:50:52 +0200,
>>> Jaroslav Kysela wrote:
>>>>
>>>> On 15. 05. 24 11:17, Hans Verkuil wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> On 5/13/24 13:56, Jaroslav Kysela wrote:
>>>>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>>>>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>>>>>>>> one of the components in the pipeline.
>>>>>>>>>
>>>>>>>>> I was thinking of loopback with endpoints using compress streams,
>>>>>>>>> without physical endpoint, something like:
>>>>>>>>>
>>>>>>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>>>>>>>> compress capture (send data back to userspace)
>>>>>>>>>
>>>>>>>>> Unless I'm missing something, you should be able to process data as fast
>>>>>>>>> as you can feed it and consume it in such case.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Actually in the beginning I tried this,  but it did not work well.
>>>>>>>> ALSA needs time control for playback and capture, playback and capture
>>>>>>>> needs to synchronize.  Usually the playback and capture pipeline is
>>>>>>>> independent in ALSA design,  but in this case, the playback and capture
>>>>>>>> should synchronize, they are not independent.
>>>>>>>
>>>>>>> The core compress API core no strict timing constraints. You can eventually0
>>>>>>> have two half-duplex compress devices, if you like to have really independent
>>>>>>> mechanism. If something is missing in API, you can extend this API (like to
>>>>>>> inform the user space that it's a producer/consumer processing without any
>>>>>>> relation to the real time). I like this idea.
>>>>>>
>>>>>> I was thinking more about this. If I am right, the mentioned use in gstreamer
>>>>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled
>>>>>> using one system call like blocking ioctl).  The goal is just to offload the
>>>>>> CPU work to the DSP (co-processor). If there are no requirements for the
>>>>>> queuing, we can implement this ioctl in the compress ALSA API easily using the
>>>>>> data management through the dma-buf API. We can eventually define a new
>>>>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
>>>>>> handle this new data scheme. The API may be extended later on real demand, of
>>>>>> course.
>>>>>>
>>>>>> Otherwise all pieces are already in the current ALSA compress API
>>>>>> (capabilities, params, enumeration). The realtime controls may be created
>>>>>> using ALSA control API.
>>>>>
>>>>> So does this mean that Shengjiu should attempt to use this ALSA approach first?
>>>>
>>>> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
>>>> data conversion forcefully. It looks like a simple job and ALSA APIs
>>>> may be extended for this simple purpose.
>>>>
>>>> Shengjiu, what are your requirements for gstreamer support? Would be a
>>>> new blocking ioctl enough for the initial support in the compress ALSA
>>>> API?
>>>
>>> If it works with compress API, it'd be great, yeah.
>>> So, your idea is to open compress-offload devices for read and write,
>>> then and let them convert a la batch jobs without timing control?
>>>
>>> For full-duplex usages, we might need some more extensions, so that
>>> both read and write parameters can be synchronized.  (So far the
>>> compress stream is a unidirectional, and the runtime buffer for a
>>> single stream.)
>>>
>>> And the buffer management is based on the fixed size fragments.  I
>>> hope this doesn't matter much for the intended operation?
>>
>> It's a question, if the standard I/O is really required for this case. My
>> quick idea was to just implement a new "direction" for this job supporting
>> only one ioctl for the data processing which will execute the job in "one
>> shot" at the moment. The I/O may be handled through dma-buf API (which seems
>> to be standard nowadays for this purpose and allows future chaining).
>>
>> So something like:
>>
>> struct dsp_job {
>>      int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
>>      int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
>>      ... maybe some extra data size members here ...
>>      ... maybe some special parameters here ...
>> };
>>
>> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
>>
>> This ioctl will be blocking (thus synced). My question is, if it's feasible
>> for gstreamer or not. For this particular case, if the rate conversion is
>> implemented in software, it will block the gstreamer data processing, too.
> 
> Yes, GStreamer threading is using a push-back model, so blocking for the time of
> the processing is fine. Note that the extra simplicity will suffer from ioctl()
> latency.
> 
> In GFX, they solve this issue with fences. That allow setting up the next
> operation in the chain before the data has been produced.

The fences look really nicely and seem more modern. It should be possible with 
dma-buf/sync_file.c interface to handle multiple jobs simultaneously and share 
the state between user space and kernel driver.

In this case, I think that two non-blocking ioctls should be enough - add a 
new job with source/target dma buffers guarded by one fence and abort (flush) 
all active jobs.

I'll try to propose an API extension for the ALSA's compress API in the 
linux-sound mailing list soon.

					Jaroslav

-- 
Jaroslav Kysela <perex at perex.cz>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.