Vista Audio Stack and API

preview
Posted by Charles // Tue, Dec 13, 2005 9:56 PM

Charles recently caught up with seasoned Niner, Larry Osterman, an SDE and 20 year Microsoft veteran, and Elliot H Omiya, a Software Architect and audio guru, to dig into the innerworkings of Vista's updated Audio Stack and new user mode API. Much of the guts of Windows audio have been moved up into the land of the user and this has consequences for both Windows audio developers at the API level and for Windows at the general programmability, reliability and stability levels. Great stuff.

Enjoy

Show: Going Deep

Tags: , ,

Video Length: 00:45:10 Replies: 46 // Views: 58,916
  Zeo
  Channel 9 :)
 
  Tue, Dec 13 2005 10:29 PM
Yea!!!! More Vista Videos!!!!

  Manip
  Life's too short for chess.
 
  Tue, Dec 13 2005 11:54 PM
I have no questions; good video.

  bonk
  Ich bin der Wurstfachverkäuferin !
 
  Wed, Dec 14 2005 8:58 AM
Very interesting video. I don't know if I missed that but is MF a COM API ? If not what style of API is it ?

More Vista Videos !! Please !!


  papavb
 
 
  Wed, Dec 14 2005 7:24 AM

cool vid. can't wait for more of these going deep interviews. you're a dev's best friend charles



  Chadk
  Mooo!
 
  Wed, Dec 14 2005 10:09 AM
Its Larry! He rock.

We want lots more of this kind.
A going deep with the network stack in vista, would also be great


  Cyonix
  Me
 
  Wed, Dec 14 2005 10:19 AM
yeah charles is king

  LarryOsterman
 
 
  Wed, Dec 14 2005 11:43 AM
bonk wrote:
Very interesting video. I don't know if I missed that but is MF a COM API ? If not what style of API is it ?

More Vista Videos !! Please !!


MF is a com-ish API, but you don't activate objects with CoCreateInstance.


  Minh
  Does this make my head look fat?
 
  Wed, Dec 14 2005 1:06 PM
I'm just curious about how you provide per-app volume control for "legacy" apps that don't talk directly to the WASAPI. Do you create a session for an app on the first request for sound service? How do you differentiate between 2 instances of the same .EXE?

BTW, it's a bit funny that a video about audio has such a low audio level.


  GRiNSER
  fenster wien [=windows vienna] - austria rocks ;-)
 
  Wed, Dec 14 2005 3:18 PM
is there an ability in vista to set on which speaker you hear an application? for example i want the media player on the 2 front speakers and the game on the 2 back speakers? if it's possible, is there also a possibility to set the volume for each application on a per speaker base?


  LarryOsterman
 
 
  Wed, Dec 14 2005 3:39 PM
Minh wrote:
I'm just curious about how you provide per-app volume control for "legacy" apps that don't talk directly to the WASAPI. Do you create a session for an app on the first request for sound service? How do you differentiate between 2 instances of the same .EXE?

BTW, it's a bit funny that a video about audio has such a low audio level.


First off, with a couple of exceptions (directks, asio, etc) every app that plays audio interacts with wasapi.  You simply can't play audio without it.  So the legacy apps are talking to wasapi, even if they don't know it.

Having said that, every audio stream is associated with an audio session, it is meaningless to talk about streams without sessions.

If you don't do anything special, each stream in a process is associated with the same session (the first stream creates the session and all subsequent streams (that don't do anything special) are associated with that session).

Each session has an identifier that contains "interesting" things about the session.  Among the "interesting" things are the name of endpoint on which the session is activated, the executable name, and the executables process ID.  The last is how you differentiate between sessions with the same executable.

When we save the volume for an app, we use all the "interesting" things except the process ID.


  gaelhatchue
 
 
  Wed, Dec 14 2005 3:41 PM
GRiNSER wrote:
is there an ability in vista to set on which speaker you hear an application? for example i want the media player on the 2 front speakers and the game on the 2 back speakers? if it's possible, is there also a possibility to set the volume for each application on a per speaker base?

My guess is that you can't do that. Everything gets mixed in the mix buffer, and the result goes to the sound card end point. This could only be possible with multiple sound cards. Am I right Larry?



  LarryOsterman
 
 
  Wed, Dec 14 2005 3:42 PM
gaelhatchue wrote:
GRiNSER wrote:
is there an ability in vista to set on which speaker you hear an application? for example i want the media player on the 2 front speakers and the game on the 2 back speakers? if it's possible, is there also a possibility to set the volume for each application on a per speaker base?

My guess is that you can't do that. Everything gets mixed in the mix buffer, and the result goes to the sound card. This could only be possible with multiple sound cards. Am I right Larry?


Close.   You can't do that, but you CAN differentiate which device gets which audio output.  So while you can't split audio to different channels on a single adapter, if you have a set of USB headphones and a 5.1 surround system, you could redirect the output of your IM application to the USB headphones and the media player to the 5.1 system.

Doing this requires some cooperation from the application unfortunately - it doesn't come for free.  Applications that use the existing APIs to find out the preferred voice communications device will work correctly, but apps that just look for the default won't


  GRiNSER
  fenster wien [=windows vienna] - austria rocks ;-)
 
  Wed, Dec 14 2005 4:08 PM
oh thats a pity, so i have to use an USB headset if i wanted to have the music on my speakers and the phonecalls on my headset? why isn't every soundcard channel an endpoint? more granularity would be fine


  Tzim
 
 
  Wed, Dec 14 2005 7:13 PM
If i understand correctly, the app streams are mixed in user mode by the Windows Audio Service. But what if my soundcard does support hardware mixing ?

Am I loosing those kind of features , or does the service uses thoses hardware capabilities by sending the streams 'unmixed' to the sound card ?

As for FP processing instead of 16/24 bit int... isn't it a waste of CPU cycles to convert to float, just to convert back to int to pass the stream to the sound card (most of them dont support FP samples), when the user uses poor low end speakers. Or maybe you leave the choice not to use FP ?


  LarryOsterman
 
 
  Wed, Dec 14 2005 8:00 PM
GRiNSER wrote:
oh thats a pity, so i have to use an USB headset if i wanted to have the music on my speakers and the phonecalls on my headset? why isn't every soundcard channel an endpoint? more granularity would be fine


Because people like to listen to audio in stereo, not mono.

An endpoint is an address.  If each channel had its own address, you'd not be able to render stereo audio.


  LarryOsterman
 
 
  Wed, Dec 14 2005 8:02 PM
Tzim wrote:
If i understand correctly, the app streams are mixed in user mode by the Windows Audio Service. But what if my soundcard does support hardware mixing ?

Am I loosing those kind of features , or does the service uses thoses hardware capabilities by sending the streams 'unmixed' to the sound card ?

As for FP processing instead of 16/24 bit int... isn't it a waste of CPU cycles to convert to float, just to convert back to int to pass the stream to the sound card (most of them dont support FP samples), when the user uses poor low end speakers. Or maybe you leave the choice not to use FP ?


If your sound card supports hardware mixing, it means you can play non PCM audio at the same time the OS is playing PCM audio.  So there's value in that.

This software mixing thingy isn't new for Vista - it's essentially the same way that XP and Win2K (and Win98) worked.

As far as the int-float thingy goes, many audio solutions available today support float rendering, and more and more will in the future.

And the DSP is many orders of magnitude more accurate when working with floating point numbers.


  ZippyV
  Fired Up
 
  Wed, Dec 14 2005 9:20 PM

Does this new api make it easier to do soundstuff from a .net application?



  Charles
  Welcome Change
 
  Wed, Dec 14 2005 9:34 PM
ZippyV wrote:

Does this new api make it easier to do soundstuff from a .net application?



Did you watch the video?  

All of this API is targeted at unmanaged C++ developers. I'd imagine the COM stuff could be interoped with in the way you interact with COM today from the managed world.

C

  staceyw
  Bouncin'
 
  Thu, Dec 15 2005 12:25 AM
"All of this API is targeted at unmanaged C++ developers. I'd imagine the COM stuff could be interoped with in the way you interact with COM today from the managed world."

First, thanks Charles and Audio guys!

I did not follow the "no managed" reasons either - other then understanding time constrants.

It would seem they could just wrap the wasapi in a Managed wrapper.  I mean c# can still use pointers and unmanaged buffers if you have to.  And it would seem that with audio, you would be handing off most stuff directly to win32 anyway, letting it do its thing async. So why would the CLR even come into play there?  I am surely missing something basic here.

Even if it gliched, I would rather have some kind of managed support then Zero.  Also, things like enumerating and listing devices (i.e. management) would not seem to have any neg effect on audio perf as your not doing any audio.  Seems a bit strange;  Avalon is managed from ground up.  Vista audio - zero managed.  Don't get me wrong - love the new designs and apis from what you showed.  Fine work.

P.S. Wonder if this "kind" of design would work as a SIP in Singlarity?  I mean lifting audio shared memory location(s) into the SIP.  I guess that could be inside the kernel component they lift into each SIP.  Interesting stuff. 

Thanks again!!
--
William

  LaBomba
  Does that come with 2.0?
 
  Thu, Dec 15 2005 1:29 AM

Nice!

More vista videos!

- LB



  androidi
 
 
  Thu, Dec 15 2005 6:39 PM
LarryOsterman wrote:

Close.   You can't do that, but you CAN differentiate which device gets which audio output.  So while you can't split audio to different channels on a single adapter, if you have a set of USB headphones and a 5.1 surround system, you could redirect the output of your IM application to the USB headphones and the media player to the 5.1 system.

Doing this requires some cooperation from the application unfortunately - it doesn't come for free.  Applications that use the existing APIs to find out the preferred voice communications device will work correctly, but apps that just look for the default won't

GRiNSER wrote:
oh thats a pity, so i have to use an USB headset if i wanted to have the music on my speakers and the phonecalls on my headset? why isn't every soundcard channel an endpoint? more granularity would be fine


Because people like to listen to audio in stereo, not mono.

An endpoint is an address.  If each channel had its own address, you'd not be able to render stereo audio.


Last time I checked most audio cards had plenty of stereo outputs, for front, rear, side etc. Are you saying that for every "user endpoint" one should have one "soundcard" so to speak? Many motherboards come with onboard audio and 3-4 stereo output pairs and if I want to connect 3 user stereo endpoints (pair of speakers), I have to get 3 soundcards? There are multiple network adapters in some motherboards and pci cards, their channels are not "bundled" by default, why should this be different for audio? If the bundled mode (7.1) is default, fine, but I may want to use pair of loudspeakers and headphones and that still leaves one or more stereo endpoints free in the card. If you can mix in eight distinct streams of data, you ought to be able to mix four pairs of stereo too?

All in all, great improvements but I guess the audio hardware vendors need to have separate endpoint for every stereo output pair in the soundcard, otherwise from the sound of it the system is not flexible enough. All that is needed is changing the drivers right?


  Flavio
 
 
  Thu, Dec 15 2005 10:25 AM
Thanks for a very informative video. It's great that you have added explicit support for applications that need to directly access the audio hardware without kmixer-like processing. I hope there will be support for specifying the buffering settings and to programmatically control the sampling frequency at which the hardware operates. When directly writing to the soundcard dma buffers the sample format will be that of the hardware, not necessarily 32 bit floats, right?
Will applications that directly talk to WDM drivers continue to work on Vista?

Flavio.


  gillsr
 
 
  Thu, Dec 15 2005 6:16 PM
I have been wondering if MS has been investigating IP speaker/soundcard system that can be detected with uPnP?

So you can have stand alone speakers for kitchen, bedrooms, etc (ala Apple Airport Express) then be able to name them in windows and redirect audio to one or all of them for a home sound system and then share them with other devices like laptops, media centres, etc
Would be quite interesting and would not need a big multichannel system that is wired to speakers all over the house.

  bluvg
 
 
  Fri, Dec 16 2005 5:56 AM
I have two questions for Larry:

The first is related to the business impact of audio and somewhat related to mixing: can you talk to Bill G. or Steve and have Microsoft buy this company?

http://www.holosonics.com/

If the processing equipment could be done either through a driver or through a sound card and if the price point could drop to the $100-200/node range, this would be awesome for businesses for things like conference calls (no feedback!), IP telephony, LiveMeeting/Webex, online training, etc.  I've seen issues with users that want to watch a training video online, but because of their open cubicle position, it is too disruptive to other users.  They can't use headphones, because they need to be able to hear their phone and respond to other users walking by, plus it looks somewhat "unprofessional." 

One other question--as a hobby, I do some composing work, and sometimes I do this remotely via RDP.  Are there going to be changes in how audio is processed over RDP?  My current problem is that MIDI playback apparently doesn't transmit over RDP.  I use a program called Sibelius, and playback unfortunately does not work remotely.

Page 1 of 2 [47 total records]
Channel 9 Forums » The Videos » Vista Audio Stack and API [1] 2 »