Saturday, August 30, 2008
How Pixar Fosters Collective Creativity
I just read this article by Ed Catmull on the business principles that have driven Pixar to their great success. Link. The first reason you should be interested is that Mr. Catmull is one of the pioneers of computer graphics. The second is Pixar, the company he created and still runs, is one of the most consistently successful companies around that creates artistic products. The type of work Pixar does is very close to the work we do in games and there is plenty to learn from his experiences.
Wednesday, August 13, 2008
Global Illumination
Talking about precalculated lighting reminded me of this awesome paper I just read from Pixar Point Based Color Bleeding. They got a 10x speed up on GI over raytracing. I did a bunch of research on this topic and it's funny I was just one small insight away from what they are doing. Not doing it this way required me to have to go down a completely different path. Sometimes it's small things that change success to failure.
Tuesday, August 12, 2008
Deferred rendering 2
I'll start off by saying check out the new papers from siggraph posted here. I was really surprised with the one on Starcraft II. Blizzard in the past has stayed behind the curve purposely to keep their requirements low and audience large. It seems this time they have kept the low range while expanding more into the high end. I was also surprised due to nature of the visuals in the game. It's part adventure game? Count me in. It's looking great. It also has an interesting deferred rendering architecture which leads me to my next thing.
Deferred rendering part II. Perhaps I should have just waited and made one monster post but now you'll just have to live with it.
Light Pre-Pass
post
This was recently proposed by Wolfgang Engel. The main idea is to split material rendering into two parts. First part is writing out depth and normal to a small G-buffer. It's possible this can even all fit in one render target. With this information you can get all that is important from the lights which is N dot L and R dot V or N dot H whichever you want. The buffer is as follows:
LightColor.r * N.L * Att
LightColor.g * N.L * Att
LightColor.b * N.L * Att
R.V^n * N.L * Att
With this information standard forward rendering can be done just once. This comprises the second part of the material rendering.
He explains that R.V^n can be derived later by dividing out the N.L * Att but I don't understand any reason to do this. This also means a divide by the color that is just wrong. There's also the mysterious exponent that must be a global or something meaning no surface changeable exponent.
There are really a number of issues here. Specular doesn't have any color at all, not even from the lights. If you instead store R.V in the forth channel and try to apply the power and multiply by LightColor * N.L * Att in the forward pass the multiplications have been shuffled with additions and it doesn't work out. There is no specular color or exponent and it is dependent on everything being the phong lighting equation. It has solved the deep framebuffer problem but it is a lot more restrictive than traditional deferred rendering. All in all it's nice for a demo but not for production.
Naughty Dog's Pre-Lighting
presentation
I have to admit when I sat through this talk I didn't really understand why they were doing what they were doing. It seemed overly complicated to me. After reading the slides afterwards the brilliance started to show through. The slides are pretty confusing so I will at least explain what I think they mean from it. Insomniac has since adopted this method as well but I can't seem to find that presentation. The idea is very similar to the Pre-pass lighting method. It is likely what you would get if you take Light Pre-Pass to it's logical conclusion.
Surface rendering is split in 2 parts. First pass it renders out depth, normal and specular exponent. Second, the lights are drawn additively into two HDR buffers, diffuse and specular. The materials specular exponent has been saved out so this can all be done correctly. These two buffers can then be used in the second surface pass as the accumulated lighting and material attributes such as diffuse color and spec color can be applied. They apply some extra trickery that complicates the slides that is combining light drawing in quads so a single pixel on screen never gets drawn during light drawing more than once.
This is completely usable in a production environment as proven by Uncharted having shipped and looking gorgeous. Lights can be handled one at a time (even though they don't) so multiple shadows pose no problems. The size of the framebuffer is smaller. HDR obviously works fine.
It doesn't solve all the problems though. Most are small and without testing it myself I can't say whether they are significant or not. The one nagging problem of being stuck with phong lighting still remains. This time it's just a different part of Phong that has been exposed and is rigid in the system.
Light Pass Combined Forward Rendering
I am going to propose another alternative that I haven't really seen talked about. The idea is similar to Light indexed deferred. The idea there was forward rendering style but with all the lights that hit that pixel rendering in one pass. This can be handled far simpler if when drawing that surface the light parameters were merely passed in when drawing the surface and more than one light is applied at a time. This is nothing new. Crysis can apply up to 4 lights at a time. What I haven't seen discussed is what to do when a light only hits part of a surface. Light indexed rendering handles this on a per pixel basis so it is a non issue. If the lights are "indexed" per surface then there can be many more lights that have to affect every pixel than is needed.
We can solve this problem in another way other than screen space. For instance, splitting the world geometry at the bounds of static lights will get you pixel perfect light coverage for any mesh you wish to split. The surfaces with the worst problems are the largest, being hit with the most lights. These are almost always large walls, floors and ceilings. Splitting this type of geometry is not typically very expensive and is rarely instanced. For objects that don't fall in this category they are typically instanced, relatively contained meshes that do not have very smooth transitions with other geometry. I suggest keeping only a fixed number of real affecting lights to render these surfaces by combining any less significant lights into a spherical harmonic. For more details see Tom Forsyth's post on it. In my experience the light count hasn't posed an issue.
The one remaining issue is shadows. Because all lights for a surface are applied at once shadows can't be done a light at a time. This is the same issue as light indexed rendering and the solution will be the same as well. All shadows have to be calculated and stored, likely as a screen space buffer. The obvious choice is 4 shadowing lights using 4 components of a RGBA8 render target. This is the same solution Crytek is using. That doesn't mean only 4 shadowing lights are allowed on screen at a time. There is nothing stopping you from rendering a surface again after you've completed everything using those 4 lights.
Given the limit of 4 shadowing lights this turns into a forward rendering architecture that is only one pass. It gets rid of all the redundant work from draws, tris, and material setup. It also gives you all the power of a forward renderer such as changing the light equation to be whatever you want it to. It doesn't rely in any way on screen space buffers for doing the lighting besides the shadow buffer. This means no additional memory and 360 edram headaches.
There are plenty of problems with this. Splitting meshes only works with static lights. In all of the games I've referenced so far this poses no problems. Most environmental lighting does not move (at least the bounds), nor does the scenery to a large extent. Splitting a mesh adds more triangles, vertices, and draw calls than before. In the cases where you split this it is typically not a major issue.
You do not get one of the cool things from deferred rendering and that is independence from the number of lights. In the Starcraft II paper that came out today they had a scene with over 50 lights in it including every bulb on a string of Xmas lights. This is not a major issue for a standard deferred renderer but it is for pass combined forward rendering. It is really cool to be able to do that but in my opinion it is not very important. The impact on the scene from those Xmas lights actually casting light is minimal and there are likely other ways of doing it besides tiny dynamic lights.
Summary
That is my round up of dynamic lighting architectures. I left out any kind of precalculated lighting such as lightmaps, environment maps or Carmack's baked lighting into a unique virtual texture as it's pretty much just a different topic.
Deferred rendering part II. Perhaps I should have just waited and made one monster post but now you'll just have to live with it.
Light Pre-Pass
post
This was recently proposed by Wolfgang Engel. The main idea is to split material rendering into two parts. First part is writing out depth and normal to a small G-buffer. It's possible this can even all fit in one render target. With this information you can get all that is important from the lights which is N dot L and R dot V or N dot H whichever you want. The buffer is as follows:
LightColor.r * N.L * Att
LightColor.g * N.L * Att
LightColor.b * N.L * Att
R.V^n * N.L * Att
With this information standard forward rendering can be done just once. This comprises the second part of the material rendering.
He explains that R.V^n can be derived later by dividing out the N.L * Att but I don't understand any reason to do this. This also means a divide by the color that is just wrong. There's also the mysterious exponent that must be a global or something meaning no surface changeable exponent.
There are really a number of issues here. Specular doesn't have any color at all, not even from the lights. If you instead store R.V in the forth channel and try to apply the power and multiply by LightColor * N.L * Att in the forward pass the multiplications have been shuffled with additions and it doesn't work out. There is no specular color or exponent and it is dependent on everything being the phong lighting equation. It has solved the deep framebuffer problem but it is a lot more restrictive than traditional deferred rendering. All in all it's nice for a demo but not for production.
Naughty Dog's Pre-Lighting
presentation
I have to admit when I sat through this talk I didn't really understand why they were doing what they were doing. It seemed overly complicated to me. After reading the slides afterwards the brilliance started to show through. The slides are pretty confusing so I will at least explain what I think they mean from it. Insomniac has since adopted this method as well but I can't seem to find that presentation. The idea is very similar to the Pre-pass lighting method. It is likely what you would get if you take Light Pre-Pass to it's logical conclusion.
Surface rendering is split in 2 parts. First pass it renders out depth, normal and specular exponent. Second, the lights are drawn additively into two HDR buffers, diffuse and specular. The materials specular exponent has been saved out so this can all be done correctly. These two buffers can then be used in the second surface pass as the accumulated lighting and material attributes such as diffuse color and spec color can be applied. They apply some extra trickery that complicates the slides that is combining light drawing in quads so a single pixel on screen never gets drawn during light drawing more than once.
This is completely usable in a production environment as proven by Uncharted having shipped and looking gorgeous. Lights can be handled one at a time (even though they don't) so multiple shadows pose no problems. The size of the framebuffer is smaller. HDR obviously works fine.
It doesn't solve all the problems though. Most are small and without testing it myself I can't say whether they are significant or not. The one nagging problem of being stuck with phong lighting still remains. This time it's just a different part of Phong that has been exposed and is rigid in the system.
Light Pass Combined Forward Rendering
I am going to propose another alternative that I haven't really seen talked about. The idea is similar to Light indexed deferred. The idea there was forward rendering style but with all the lights that hit that pixel rendering in one pass. This can be handled far simpler if when drawing that surface the light parameters were merely passed in when drawing the surface and more than one light is applied at a time. This is nothing new. Crysis can apply up to 4 lights at a time. What I haven't seen discussed is what to do when a light only hits part of a surface. Light indexed rendering handles this on a per pixel basis so it is a non issue. If the lights are "indexed" per surface then there can be many more lights that have to affect every pixel than is needed.
We can solve this problem in another way other than screen space. For instance, splitting the world geometry at the bounds of static lights will get you pixel perfect light coverage for any mesh you wish to split. The surfaces with the worst problems are the largest, being hit with the most lights. These are almost always large walls, floors and ceilings. Splitting this type of geometry is not typically very expensive and is rarely instanced. For objects that don't fall in this category they are typically instanced, relatively contained meshes that do not have very smooth transitions with other geometry. I suggest keeping only a fixed number of real affecting lights to render these surfaces by combining any less significant lights into a spherical harmonic. For more details see Tom Forsyth's post on it. In my experience the light count hasn't posed an issue.
The one remaining issue is shadows. Because all lights for a surface are applied at once shadows can't be done a light at a time. This is the same issue as light indexed rendering and the solution will be the same as well. All shadows have to be calculated and stored, likely as a screen space buffer. The obvious choice is 4 shadowing lights using 4 components of a RGBA8 render target. This is the same solution Crytek is using. That doesn't mean only 4 shadowing lights are allowed on screen at a time. There is nothing stopping you from rendering a surface again after you've completed everything using those 4 lights.
Given the limit of 4 shadowing lights this turns into a forward rendering architecture that is only one pass. It gets rid of all the redundant work from draws, tris, and material setup. It also gives you all the power of a forward renderer such as changing the light equation to be whatever you want it to. It doesn't rely in any way on screen space buffers for doing the lighting besides the shadow buffer. This means no additional memory and 360 edram headaches.
There are plenty of problems with this. Splitting meshes only works with static lights. In all of the games I've referenced so far this poses no problems. Most environmental lighting does not move (at least the bounds), nor does the scenery to a large extent. Splitting a mesh adds more triangles, vertices, and draw calls than before. In the cases where you split this it is typically not a major issue.
You do not get one of the cool things from deferred rendering and that is independence from the number of lights. In the Starcraft II paper that came out today they had a scene with over 50 lights in it including every bulb on a string of Xmas lights. This is not a major issue for a standard deferred renderer but it is for pass combined forward rendering. It is really cool to be able to do that but in my opinion it is not very important. The impact on the scene from those Xmas lights actually casting light is minimal and there are likely other ways of doing it besides tiny dynamic lights.
Summary
That is my round up of dynamic lighting architectures. I left out any kind of precalculated lighting such as lightmaps, environment maps or Carmack's baked lighting into a unique virtual texture as it's pretty much just a different topic.
Sunday, August 10, 2008
Deferred rendering
I remember back to the intro data structure and algorithm class I took in college. The thing the professor kept trying to hammer home was not how a red and black tree works. It was data structures and algorithms have strengths and weaknesses. This point affects everything we do in graphics. The majority of things we implement have already been done before, either by other game developers, offline graphics years ago, academics, etc. There is very few places that we come up with something brand new. Most of the inventions are even small variations to existing techniques. So, given this fact, the most important skill we can have is the ability to get knowledge of all the available options, the strengths and weaknesses of each and apply the one best for the current job. And at every chance we get, add our own little tweaks and flavor to make it better than what has come before.
Forward Rendering
Forward rendering means for every light to surface interaction the surface is drawn with that lighting information. Every one of these light surface interactions is drawn additively to the screen. There are many problems with this method. Since each surface needs to be drawn again for every light that hits it there are many redundant draws, triangles, and material pixel operations such as texture fetches or adding in detail maps. With all these disadvantages it is still very popular. For instance it is the way Doom 3 and Unreal 3 engines work.
Deferred Rendering
Deferred rendering was invented to solve these problems. Traditional deferred rendering is drawing all needed surface and material attributes to a deep framebuffer called the G-buffer. For each light seen the light geometry can be drawn to the screen with a shader that reads from the G buffer and can add up the light interaction to the color buffer. No direct interaction with the surface and the light is needed. This is only possible because anything that shader would need is put in the G buffer. Any surfaces to be drawn only need to fill in the G buffer with their attributes once meaning no redundant draws, triangles, or material pixel cost.
Deferred rendering is not without its disadvantages either. The G buffer can take up quite a bit of space so trying to pack the attributes to the smallest space possible is the goal. Since the G buffer is always quite fat the attribute drawing pass is almost always ROP bound. In my opinion the worst problem is special materials that do something non standard can't work. Exactly what custom materials can do is defined by what is in the G buffer. Usually only the common attributes are packed for space reasons.
Gurialla's deferred system for Killzone 2 is explained in this great presentation.
In their system the attributes stored in the G buffer are:
There's a few things immediately obvious that this can't do. There is no floating point color buffer so real HDR is not possible nor things like gamma correct lighting that require higher precision color. Because only spec intensity is stored only grayscale specularity is possible. Since the game looks pretty gray this is likely not a problem for them but it is for other people.
What isn't obvious is the lighting equation has to be the same across all materials. It is likely Phong based. This rules out cool things like hair shaders, anisotropic brushed metal, fake subsurface scattering, fresnel, roughness, fuzz, cloth shaders, etc.
So, how about some alternative methods? A few have been popping up over the past year.
Light Indexed Deferred Rendering
paper
This builds a screen buffer of light indexes that interact with that pixel. The base implementation ignores depth so the light may not hit the front most surface at that pixel. It can be extended to do so. The advantages is a large number of lights and only one pass of surface drawing. It also solves the problem of custom materials because the lighting happens when the surface is drawn. No attributes of either the light nor the surface have to be picked to store out. It also solves the ROP problem because no real deep framebuffer is ever drawn.
There are a number of problems with it though. First off to pass in light data that is indexable dx9 does not support dynamic indexed uniform access. This means the data needs to be passed in with textures for all the light data. This is a major pain in the ass and can be a performance problem depending on how often it is updated and how many textures are required to pass the data. To pass everything multiple floating point textures may be needed. Another problem is that applying a light does not happen one at a time. This means calculating shadows for a light need to stay around until all lights are applied. So for 4 shadowing lights on screen, you will likely need a 4 channel screen texture to composite them because stencil wont work and the number of shadow maps will likely be high. Now if you have 5 you'll need another screen texture. Other methods apply one light at a time so there is no need to save the shadows from multiple lights at once. You're "G-buffer" is now based on # of light indexes per pixel + # of shadows on screen. You can change this to # of shadows per pixel if you read the light index buffer when writing out the shadows.
Next Time
I didn't think this post was going to be so massive so I've decided to split it up and post this part now. I will go through the other deferred rendering alternatives as well as another option I haven't heard people talking about that I particularly like in my next post.
Forward Rendering
Forward rendering means for every light to surface interaction the surface is drawn with that lighting information. Every one of these light surface interactions is drawn additively to the screen. There are many problems with this method. Since each surface needs to be drawn again for every light that hits it there are many redundant draws, triangles, and material pixel operations such as texture fetches or adding in detail maps. With all these disadvantages it is still very popular. For instance it is the way Doom 3 and Unreal 3 engines work.
Deferred Rendering
Deferred rendering was invented to solve these problems. Traditional deferred rendering is drawing all needed surface and material attributes to a deep framebuffer called the G-buffer. For each light seen the light geometry can be drawn to the screen with a shader that reads from the G buffer and can add up the light interaction to the color buffer. No direct interaction with the surface and the light is needed. This is only possible because anything that shader would need is put in the G buffer. Any surfaces to be drawn only need to fill in the G buffer with their attributes once meaning no redundant draws, triangles, or material pixel cost.
Deferred rendering is not without its disadvantages either. The G buffer can take up quite a bit of space so trying to pack the attributes to the smallest space possible is the goal. Since the G buffer is always quite fat the attribute drawing pass is almost always ROP bound. In my opinion the worst problem is special materials that do something non standard can't work. Exactly what custom materials can do is defined by what is in the G buffer. Usually only the common attributes are packed for space reasons.
Gurialla's deferred system for Killzone 2 is explained in this great presentation.
In their system the attributes stored in the G buffer are:
- RGBA8 for color
- standard depth/stencil buffer (can be used to derive world position)
- normal
- XY motion vectors
- spec exponent
- spec intensity
- diffuse color
There's a few things immediately obvious that this can't do. There is no floating point color buffer so real HDR is not possible nor things like gamma correct lighting that require higher precision color. Because only spec intensity is stored only grayscale specularity is possible. Since the game looks pretty gray this is likely not a problem for them but it is for other people.
What isn't obvious is the lighting equation has to be the same across all materials. It is likely Phong based. This rules out cool things like hair shaders, anisotropic brushed metal, fake subsurface scattering, fresnel, roughness, fuzz, cloth shaders, etc.
So, how about some alternative methods? A few have been popping up over the past year.
Light Indexed Deferred Rendering
paper
This builds a screen buffer of light indexes that interact with that pixel. The base implementation ignores depth so the light may not hit the front most surface at that pixel. It can be extended to do so. The advantages is a large number of lights and only one pass of surface drawing. It also solves the problem of custom materials because the lighting happens when the surface is drawn. No attributes of either the light nor the surface have to be picked to store out. It also solves the ROP problem because no real deep framebuffer is ever drawn.
There are a number of problems with it though. First off to pass in light data that is indexable dx9 does not support dynamic indexed uniform access. This means the data needs to be passed in with textures for all the light data. This is a major pain in the ass and can be a performance problem depending on how often it is updated and how many textures are required to pass the data. To pass everything multiple floating point textures may be needed. Another problem is that applying a light does not happen one at a time. This means calculating shadows for a light need to stay around until all lights are applied. So for 4 shadowing lights on screen, you will likely need a 4 channel screen texture to composite them because stencil wont work and the number of shadow maps will likely be high. Now if you have 5 you'll need another screen texture. Other methods apply one light at a time so there is no need to save the shadows from multiple lights at once. You're "G-buffer" is now based on # of light indexes per pixel + # of shadows on screen. You can change this to # of shadows per pixel if you read the light index buffer when writing out the shadows.
Next Time
I didn't think this post was going to be so massive so I've decided to split it up and post this part now. I will go through the other deferred rendering alternatives as well as another option I haven't heard people talking about that I particularly like in my next post.
Friday, August 8, 2008
First post
I finally decided to start a blog on graphics stuff. I've always wanted to get more out there in the graphics / game dev community but I hardly ever post on forums. I talk plenty with my fellow colleagues but I never get to communicate outside my tiny sphere so I'm changing that all now. Hopefully I won't be just talking to myself. Classic first post with nothing to say other than exclaiming that I'm saying something. Next post I will be talking about deferred vs forward rendering and some options available beyond the standard implementations but that I will have to leave for tomorrow because it is getting late.