Monday, May 14, 2012

Sparse shadows through tracing

The system I described last time allowed specular highlights to reach large distances but only requires calculating them on the tiles where they will show up. This is great but it means now we must calculate shadows for these very large distances. Growing the shadow maps to include geometry at a much greater distance is hugely wasteful. Fortunately there is a solution.

Before I get to that though I want to talk about a concept I think is going to be very important for next gen renderers and that is having more than one representation for scene geometry. Matt Swoboda talked about this in his GDC presentation this year [1] and I am in complete agreement with him. We will need geometry in similar formats as we've had in the past for efficient rasterization (vertex buffers, index buffers, displacement maps). This will be used whenever the rays are coherent simply because HW rasterization is much faster than any other algorithm currently for coherent rays. Examples of use are primary rays and shadow rays in the form of shadow maps.

Incoherent rays will be very important for next gen renderers but we need a different representation to efficiently trace rays. Any that support tracing cones will likely be more useful than ones which can only trace rays. Possible representations are signed distance fields [2][1][9], SVOs [3], surfel trees [4], and billboard cloud trees [5][9]. I'll also include screen space representations although these don't store the whole scene. These include mip map chains of min/max depth maps [6], variance depth maps [7] and adaptive transparency screen buffers [8]. Examples of use for these trace friendly data structures are indirect diffuse (radiosity), indirect specular (reflections) and sparse shadowing of direct specular. The last one is what helps with our current issue.

The Samaritan demo[9] from Epic had a very similar issue that they solved in the same way I am suggesting. They had many point lights which generated specular highlights at any distance. To shadow them they did a cone trace in the direction of the reflection vector against a signed distance field that was stored in a volume texture. This was already being done for other reflections so using that data to shadow the point lights doesn’t come at much cost. The signed distance field data structure could be swapped with any of the others I listed. What is important is that the shadowing is calculated with a cone trace.


What I propose as the solution to our problem is to use traditional shadow maps only within the diffuse radius. Do a cone trace down the reflection vector. The cone trace will return a visibility function that any specular outside the range of a shadow map can cheaply use to shadow.

Actually, having shadowing data independent from the lights means it can be used for culling as well. The max unoccluded ray distance can be accumulated per tile which puts a cap on the culling cone for light sources. I anticipate this form of occlusion culling will actually be a very significant optimization.

This shadowing piece of the puzzle means the changes I suggested in my last post, in theory, come at a fairly low cost assuming you already do cone tracing for indirect specular. That may seem like a large assumption but to demonstrate how practical cone tracing is, a very simple, approximate form of cone tracing can be done purely against the depth buffer. This is what I do with screen space reflections on current gen hardware. I don’t do cone tracing exactly but instead reduce the trace distance with low glossiness and fade out the samples at the end of the trace. This acts like occlusion coverage fades by the radius of the cone at the point of impact which is a visually acceptable approximation. In other words the crudest form of cone tracing can already be done in current gen. It is fairly straightforward to extend this to true cone tracing on faster hardware using one of the screen space methods I listed. Replacing screen space with global is much more complex but doable.

The result is hopefully point light specularity “just works”. The problem is then shifted to determining which lights in the world to attempt to draw. Considering we have >10000 in one map in Prey 2 this may not be easy :). Honestly I haven’t thought about how to solve this yet.

I, like everyone else who has talked about tiled light culling, am leaving out an important part which is how to efficiently meld shadow maps and tiled culling for the diffuse portion. I will be covering ideas on how to handle that next time.

Finally, I want to reach out to all that have read these posts that if you have an idea on how the cone based culling can be adapted to a blinn distribution please let me know.


[1] http://directtovideo.wordpress.com/2012/03/15/get-my-slides-from-gdc2012/
[2] http://iquilezles.org/www/material/nvscene2008/rwwtt.pdf
[3] http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf
[4] http://www.mpi-inf.mpg.de/~ritschel/Papers/Microrendering.pdf
[5] http://graphics.cs.yale.edu/julie/pubs/bc03.pdf
[6] http://www.drobot.org/pub/M_Drobot_Programming_Quadtree%20Displacement%20Mapping.pdf
[7] http://www.punkuser.net/vsm/vsm_paper.pdf
[8] http://software.intel.com/en-us/articles/adaptive-transparency/
[9] http://www.nvidia.com/content/PDF/GDC2011/GDC2011EpicNVIDIAComposite.pdf

Sunday, April 29, 2012

Tiled Light Culling

First off I'm sorry that I haven't updated this blog in so long. Much of what I have wanted to talk about on this blog, but couldn't, was going to be covered in my GDC talk but that was cancelled due to forces outside my control. If you follow me on twitter (@BrianKaris) you probably heard all about it. My comments were picked up by the press and quoted in every story about Prey 2 since. That was not my intention but oh, well. So, I will go back to what I was doing which is to talk here about things I am not directly working on.

Tiled lighting

There has been a lot of talk and excitement recently concerning tiled deferred [1][2] and tiled forward [3] rendering.

I’d like to talk about an idea I’ve had on how to do tile culled lighting a little differently.

The core behind either tiled forward or tiled deferred is to cull lights per tile. In other words for each tile, calculate which of the lights on screen affect it. The base level of culling is done by calculating a min and max depth for the tile and using this to construct a frustum. This frustum is intersected with a sphere from the light to determine which lights hit solid geometry in that tile. More complex culling can be done in addition to this such as back faced culling using a normal cone.

This very basic level of culling, sphere vs frustum, only works with the addition of an artificial construct which is the radius of the light. Physically correct light falloff is inverse squared.

Light falloff

Small tangent I've been meaning to talk about for a while. To calculate the correct falloff from a sphere or disk light you should use these two equations [4]:

Falloff:
$$Sphere = \frac{r^2}{d^2}$$
$$Disk = \frac{r^2}{r^2+d^2}$$

If you are dealing with light values in lumens you can replace the r^2 factor with 1. For a sphere light this gives you 1/d^2 which is what you expected. The reason I bring this up is I found it very helpful in understanding why the radiance appears to approach infinity when the distance to the light approaches zero. Put a light bulb on the ground and this obviously isn’t true. The truth from the above equation is the falloff approaches 1 when the distance to the sphere approaches zero. This gets hidden when the units change from lux to lumens and the surface area gets factored out. The moral of the story is don’t allow surfaces to penetrate the shape of a light because the math will not be correct anymore.

Culling inverse squared falloff

Back to tiled culling. Inverse squared falloff means there is no distance in which the light contributes zero illumination. This is very inconvenient for a game world filled with lights. Two possibilities, first is to subtract a constant term from the falloff but max with 0. The second is windowing the falloff with something like (1-d^2/a^2)^2. The first loses energy over the entire influence of the light. The second loses energy only away from the source. I should note the tolerance should be proportional to the lights intensity. For simplicity I will use the following for this post:
$$Falloff = max( 0, \frac{1}{d^2}-tolerance)$$

The distance cutoff can be thought of as an error tolerance per light. Unfortunately glossy specular doesn’t work well in this framework at all. The intensity of a glossy, energy conserving specular highlight, even for a dielectric, will be WAY higher than the lambert diffuse. This spoils that idea of the distance falloff working as an error tolerance for both diffuse and specular because they are at completely different scales. In other words, for glossy specular, the distance will have to be very large for even a moderate tolerance, compared to diffuse.

This points to there being two different tolerances, one for diffuse the other for specular. If these both just affect the radius of influence we might as well just set the radius of both as the maximum because diffuse doesn’t take anything more to calculate than specular. Fortunately, maximum intensity of the specular inversely scales with the size of the highlight. This of course is the entire point of energy conservation but energy conservation helps us in culling. The higher the gloss, the larger the radius of influence the tighter the cone of influencing normals.

If it isn’t clear what I mean, think of a chrome ball. With a mirror finish, a light source, even as dim as a candle, is visible at really large distances. The important area on the ball is very small, just the size of the candle flame’s reflection. The less glossy the ball, the less distance the light source is visible but the more area on the ball the specular highlight covers.

Before we can cull using this information we need specular to go to zero past a tolerance just like distance falloff. The easiest is to subtract the tolerance from the specular distribution and max it with zero. For simplicity I will use phong for this post:
$$Phong = max( 0, \frac{n+2}{2}dot(L,R)^n-tolerance)$$

Specular cone culling

This nicely maps to a cone of L vectors per pixel that will give a non-zero specular highlight.

Cone axis:
$$R = 2 N dot( N, V ) - V$$

Cone angle:
$$Angle = acos \left( \sqrt[n]{\frac{2 tolerance}{n+2}} \right)$$

Just like how a normal cone can be generated for the means of back face culling, these specular cones can be unioned for the tile and used to cull. We can now cull specular on a per tile basis which is what is exciting about tiled light culling.

I should mention the two culling factors need to actually be combined for specular. The sphere for falloff culling needs to expand based on gloss. The (n+2)/2 should be rolled into the distance falloff which leaves angle as just acos(tolerance^(1/n)). I’ve leave these details as an exercise for the reader. Now, to be clear I'm not advocating having diffuse and specular light lists. I'm suggesting culling the light if diffuse is below tolerance AND spec is below tolerance.

This leaves us with a scheme much like biased importance sampling. I haven’t tried this so I can’t comment on how practical it is but it has the potential to produce much more lively reflective surfaces due to having more specular highlights for minimal increase in cost. It also is nice to know your image is off by a known error tolerance from ground truth (per light in respect to shading).

The way I handle this light falloff business for current gen in P2 is by having all lighting beyond the artist set bounds of the deferred light get precalculated. For diffuse falloff I take what was truncated from the deferred light and add it to the lightmap (and SH probes). For specular I add it to the environment map. This means I can maintain the inverse squared light falloff and not lose any energy. I just split it into runtime and precalculated portions. Probably most important, light sources that are distant still show up in glossy reflections. This new culling idea may get that without the slop that comes from baking it into fixed representations.

I intended to also talk about how to add shadows but this is getting long. I'll save it for the next post.

References:
[1] http://visual-computing.intel-research.net/art/publications/deferred_rendering/
[2] http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3
[3] http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/
[4] http://www.iquilezles.org/www/articles/sphereao/sphereao.htm