FW: [Openal-devel] Faster buffer model, and more random stuff
Sherief N. Farouk
sherief at mganin.com
Mon Feb 25 15:29:21 PST 2008
You want there to be a constant shadow memory buffer in the AL buffer that the
app can't control? That's even more of a problem for kernel modules/sound
card drivers, I think, being that it's more persistant. And the dynamic
allocation problem may still be present anyway, for implementations that try
to keep unused memory down.
IMO, an app should be able to have control over such memory use. It shouldn't
have to guess if a function call may or may not create an associated shadow
buffer with no control over it. If an app has trouble with not enough memory
because of AL, the implementaiton should be worked on to have a smaller
memory footprint, not by potentially having memory the app can't control.
>>: We’re still having communication problems. There exists, at most, one memory buffer at any given time using the map buffer model. Think of this: When I call alBufferData, where does the implementation store the data? First, it MUST either transfer immediately to HW or make its own copy of it, because I might (and most likely will) follow the BufferData call with a delete. So, if I'm decoding OGG and transferring it to OpenAL under the current model, the flow is like this:
Byte* Data = allocate_space_for_pcm();
alBufferData(Data, ...); //A copy or stall must ensue here
delete  Data;
Now, the few performance considerations: First, the copy/stall is UNAVOIDABLE. Second, the buffer size isn't fixed, so a new alloc or at least some if statements go through every time BufferData is called. When I'm using all my buffers to stream OGG, this code is executed a lot. So, my main game loop has too many allocations/deallocations, most audio related once can be avoided via the immutable buffer model.
Now for the shadow buffer: IT DOESN'T EXIST. IT NEEDN'T BE THERE. Simply put, here's what a most-converged implementation of MapBuffer should do:
If buffer is on card, and the mapping is write only, and the buffer has no pending render commands, return pointer to card mapped memory space.
If buffer is on card, and mapping is write only, and the buffer has pending render commands, allocate a temporary buffer-sized block, return ptr to it, enqueue its transfer upon finish of last render command pending on buffer.
If buffer is on client, regardless of r/w access requested, and there are no pending render commands, return pointer to it.
If buffer is on client, regardless of r/w access requested, and there are pending render commands, allocate a temporary buffer to be moved upon execution of pending rendering.
Note that all the cases where a second buffer is allocated are done merely for the speed factor, so that the function returns as fast as possible. I'm assuming an implementation will be as async as possible and benefit from multi-threaded operation, leaving the application thread non-blocked. You can always wait till pending render commands are done, if you're THAT memory tight. And since buffer memory consumption is known upfront, you can make the decision whether to copy and work in parallel or wait, depending on available scratch pool memory. You can't simply do that with BufferData.
> > >>>: I fail to see why there should be a distinction. Why can't it simply
> > >>>: be one of the proposed buffer usage flags?
> The format the mixer writes to may not be compatible with its buffer
> storage format (eg. interlaced vs deinterlaced). In that case, a render
> buffer could be used like a mappable buffer object, and call alBufferData
> with that using a special format identifier to avoid card->cpu->card
> >>: If the format isn't compatible with the buffer storage format, then it
> >>: should fail, plain and simple.
So an implementation that uses an esoteric internal buffer storage format but
mixes to a more conventional 16-bit interlaced format shouldn't be able to do
render-to-buffer at all? And how would this work, eg. with 5.1 formats? In
AL_EXT_MCFORMATS and Windows, the channel order is different than other
For example on Windows, OpenAL Soft writes fr/fr/cntr/lfe/bl/br, whereas on
other systems it writes fr/fl/bl/br/cntr/lfe. A 5.1 buffer always has to be
in the Windows-style order, but the mixer doesn't necessarilly write it out
that way. Such OS dependancy isn't nice.
In comparison, a render buffer wouldn't need to use any specific format. It
could be given a hint when created, but the actual rendering can be done in a
mixer-efficient way. An app could then read from it and specify the format it
should be presented as (converting as needed).
>>>: And if you had quoted the sentence after "plain and simple.", you'll find that it says " Although I recommend a more forgiving strategy of convert-on-mapping-to-read. Keep in mind that this is how GL does it for RGBA8 (cards store natively in ABGR, for nvidia, IIRC).". That's what I recommend, render to whatever you prefer internally, convert-on-read.
> I don't see why it's not possible to have both. There'd be a queryable
> number of buffer units per source that the fixed-function pipeline can
> blend, and the blending is subsumed when a listener shader is active (which
> can sample from all buffers on a source and do the blending itself).
> >>: I'll look into that. So which pipeline are you suggesting now?
The first, just with the fixed-function bypassed when the corresponding shader
is active, and the 3D positioning being calculated after the source shader
(eg. a vector -> array of gains, to be applied to the result of the listener
shader). Being able to change the 3d position per sample, while potentially
nice, is hardly practical in software or even in today's sound hardware I'd
imagine. It can probably be revisited if hardware shaders ever become
>>: I'll look into that and let you know asap.
More information about the Openal-devel