Sunday, April 29, 2012

Tiled Light Culling

First off I'm sorry that I haven't updated this blog in so long. Much of what I have wanted to talk about on this blog, but couldn't, was going to be covered in my GDC talk but that was cancelled due to forces outside my control. If you follow me on twitter (@BrianKaris) you probably heard all about it. My comments were picked up by the press and quoted in every story about Prey 2 since. That was not my intention but oh, well. So, I will go back to what I was doing which is to talk here about things I am not directly working on.

Tiled lighting

There has been a lot of talk and excitement recently concerning tiled deferred [1][2] and tiled forward [3] rendering.

I’d like to talk about an idea I’ve had on how to do tile culled lighting a little differently.

The core behind either tiled forward or tiled deferred is to cull lights per tile. In other words for each tile, calculate which of the lights on screen affect it. The base level of culling is done by calculating a min and max depth for the tile and using this to construct a frustum. This frustum is intersected with a sphere from the light to determine which lights hit solid geometry in that tile. More complex culling can be done in addition to this such as back faced culling using a normal cone.

This very basic level of culling, sphere vs frustum, only works with the addition of an artificial construct which is the radius of the light. Physically correct light falloff is inverse squared.

Light falloff

Small tangent I've been meaning to talk about for a while. To calculate the correct falloff from a sphere or disk light you should use these two equations [4]:

Falloff:
$$Sphere = \frac{r^2}{d^2}$$
$$Disk = \frac{r^2}{r^2+d^2}$$

If you are dealing with light values in lumens you can replace the r^2 factor with 1. For a sphere light this gives you 1/d^2 which is what you expected. The reason I bring this up is I found it very helpful in understanding why the radiance appears to approach infinity when the distance to the light approaches zero. Put a light bulb on the ground and this obviously isn’t true. The truth from the above equation is the falloff approaches 1 when the distance to the sphere approaches zero. This gets hidden when the units change from lux to lumens and the surface area gets factored out. The moral of the story is don’t allow surfaces to penetrate the shape of a light because the math will not be correct anymore.

Culling inverse squared falloff

Back to tiled culling. Inverse squared falloff means there is no distance in which the light contributes zero illumination. This is very inconvenient for a game world filled with lights. Two possibilities, first is to subtract a constant term from the falloff but max with 0. The second is windowing the falloff with something like (1-d^2/a^2)^2. The first loses energy over the entire influence of the light. The second loses energy only away from the source. I should note the tolerance should be proportional to the lights intensity. For simplicity I will use the following for this post:
$$Falloff = max( 0, \frac{1}{d^2}-tolerance)$$

The distance cutoff can be thought of as an error tolerance per light. Unfortunately glossy specular doesn’t work well in this framework at all. The intensity of a glossy, energy conserving specular highlight, even for a dielectric, will be WAY higher than the lambert diffuse. This spoils that idea of the distance falloff working as an error tolerance for both diffuse and specular because they are at completely different scales. In other words, for glossy specular, the distance will have to be very large for even a moderate tolerance, compared to diffuse.

This points to there being two different tolerances, one for diffuse the other for specular. If these both just affect the radius of influence we might as well just set the radius of both as the maximum because diffuse doesn’t take anything more to calculate than specular. Fortunately, maximum intensity of the specular inversely scales with the size of the highlight. This of course is the entire point of energy conservation but energy conservation helps us in culling. The higher the gloss, the larger the radius of influence the tighter the cone of influencing normals.

If it isn’t clear what I mean, think of a chrome ball. With a mirror finish, a light source, even as dim as a candle, is visible at really large distances. The important area on the ball is very small, just the size of the candle flame’s reflection. The less glossy the ball, the less distance the light source is visible but the more area on the ball the specular highlight covers.

Before we can cull using this information we need specular to go to zero past a tolerance just like distance falloff. The easiest is to subtract the tolerance from the specular distribution and max it with zero. For simplicity I will use phong for this post:
$$Phong = max( 0, \frac{n+2}{2}dot(L,R)^n-tolerance)$$

Specular cone culling

This nicely maps to a cone of L vectors per pixel that will give a non-zero specular highlight.

Cone axis:
$$R = 2 N dot( N, V ) - V$$

Cone angle:
$$Angle = acos \left( \sqrt[n]{\frac{2 tolerance}{n+2}} \right)$$

Just like how a normal cone can be generated for the means of back face culling, these specular cones can be unioned for the tile and used to cull. We can now cull specular on a per tile basis which is what is exciting about tiled light culling.

I should mention the two culling factors need to actually be combined for specular. The sphere for falloff culling needs to expand based on gloss. The (n+2)/2 should be rolled into the distance falloff which leaves angle as just acos(tolerance^(1/n)). I’ve leave these details as an exercise for the reader. Now, to be clear I'm not advocating having diffuse and specular light lists. I'm suggesting culling the light if diffuse is below tolerance AND spec is below tolerance.

This leaves us with a scheme much like biased importance sampling. I haven’t tried this so I can’t comment on how practical it is but it has the potential to produce much more lively reflective surfaces due to having more specular highlights for minimal increase in cost. It also is nice to know your image is off by a known error tolerance from ground truth (per light in respect to shading).

The way I handle this light falloff business for current gen in P2 is by having all lighting beyond the artist set bounds of the deferred light get precalculated. For diffuse falloff I take what was truncated from the deferred light and add it to the lightmap (and SH probes). For specular I add it to the environment map. This means I can maintain the inverse squared light falloff and not lose any energy. I just split it into runtime and precalculated portions. Probably most important, light sources that are distant still show up in glossy reflections. This new culling idea may get that without the slop that comes from baking it into fixed representations.

I intended to also talk about how to add shadows but this is getting long. I'll save it for the next post.

References:
[1] http://visual-computing.intel-research.net/art/publications/deferred_rendering/
[2] http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3
[3] http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/
[4] http://www.iquilezles.org/www/articles/sphereao/sphereao.htm

Wednesday, August 10, 2011

New Prey 2 screenshot


It has been a long time since I updated this blog with substantial content but I wanted to point out that Bethesda just released this new screenshot of Prey 2. It's a great shot but it's also a fantastic demonstration of some new graphical features I added to our latest build of the game.

First, there's the depth of field in the background which is HDR circular bokeh DOF.

Secondly, in the puddles on the ground, you will see screen space reflections. They aren't planer reflections, they work on every surface and run on every platform. SSR really adds a ton of dimension and accuracy to our wet, metal filled, alien noir city. I can't talk yet about how it works unfortunately.

So, check it out and tell me what you think. Hopefully in not too long I can start talking about how some of the tech works but for now you just get a glimpse.

-Brian

Sunday, January 30, 2011

Virtualized volume textures

First off it's been a very long time since I made a post. Sorry about that. I've found it difficult to come up with subjects to discuss that I both know enough about and am allowed to publicly talk about. For many, personal hobby projects can be the source of subjects to write about but all the at home pet project stuff I do, I do in the HH codebase and check in if it is a success. Personally I find this more rewarding than the alternative because it can go into a commercial product that hopefully will be seen by millions as well as it can get artist love which is very hard to get with hobby projects. The two biggest downsides are that I no longer own work I do in my free time and I can't easily talk about it. Now on to a technique that fits the bill as I have no particular commercial use for at the moment.

Irradiance volumes using volume textures is a technique that has been getting some use lately. Check out the following for some places I've noticed it.
Split/Second
Cryengine 3 (Cascaded Light Propagation Volumes)
FEAR 2
Rust

Probably the biggest downside to volumes vs more traditional lightmaps is the resolution. Volume textures take up quite a bit of memory so they need to be fairly low resolution. Unfortunately much of this data is covering empty space. It's convenient for empty space to have some data coverage so that the same solution can be used for dynamic objects but the same resolution is certainly not needed. How would you store a different resolution of volume data on world geometry than in empty space?

The most straight forward solution to me is an indirection texture. Interestingly what this turns into is virtualizing the volume texture just like you would a 2d texture. That indirection volume texture acts like the page table to your physical texture. Each page table texel translates the XYZ coordinates into a page or brick in the physical volume texture and subsequently the UVW coordinates in the physical volume texture. If you need a refresher on virtual texturing check out Sean Barrett's and id's presentations on the topic. All the same limitations apply in 3d as they do in 2d. Pages will need borders to have proper filtering. The smaller the page size the more waste due to borders and the larger the page table gets.

Another way of thinking of this is as a sparse voxel octree. Instead of the page table being managed like a quadtree it would work like an octree. Typically this data structure is thought of only for ray casting but there's nothing inherent about it that requires that. SVO's have also only been stored as trees, requiring traversal to get to the leaves. So long as you have bricks and aren't working with the granularity of individual voxels the traversal can be changed into a single texture look up just like the quadtree traversal in a 2d virtual texture.

Thinking about it as a SVO is helpful because volume data usually has more sparseness than 2d textures. In this case we don't really care about bricks where there is no geometry. If you use a screen read back to determine which pages to load this will happen automatically. Better yet you don't need to even store this data on disk. Better than that you don't need to generate that data in the first place during the offline baking process. Don't worry about dynamic objects. There is still data covering the empty space, it is just at the resolution of one page covering the world. If you need it at a higher resolution than that you can force a minimum depth to the octree.

In the end it still likely can't compete with the resolution a lightmap can get you if high res is what you are looking for. If lower res is the target it probably will be quite a bit more memory efficient because of not having any waste due to 2d parameterization. As for what a good page size would be I'm not sure as I haven't implemented any of this. If I did I probably couldn't talk about it yet. If anyone does implement this I'd love to hear about it.

Sunday, January 3, 2010

RGBD

This entire post turned out to be hogwash. I'm wiping it out to prevent the spread of misinformation. If you are interested in why it's all nonsense see my comment below. Thanks to Sean Barrett for pointing it out. The result of all of this has been positive ignoring the part of me looking extremely foolish. RGBM is more useful than I originally claimed because a larger scale factor can be used or if a fairly small range is required the gamma correction is likely not needed.

Friday, November 6, 2009

UDK

Epic released UDK today. It is the next step in their plan to completely dominate the game engine licensing market place. They've been doing a pretty good job of that so far. To all the other companies trying to compete I'd suggest following in their footsteps. The reason I believe they are so successful is because they do everything they can to get the engine out there and in peoples hands. You can't be scared of people looking at or stealing your stuff. If devs can't look at it they certainly won't plunk down major money to buy it. Epic has been powering their freight train of engine licensing with visibility of their product. With UDK now they have a brand new group of students and indie developers who are going to be familiar with unreal engine as well as make the barrier to evaluate the engine for commercial purposes practically nothing. You get to see all the tools and a major portion of documentation without even having to talk to someone at Epic. You can see engine updates whenever they release a new build. If you want to know more they will send you more than enough stuff to make up your mind in the form of an evaluation version. Compare that to most of the other engine providers. Many will never let you see any code and require all sorts of paper work to see private viewings of just an engine demonstration. As expected they are also not getting very many licensees. Stop being so paranoid and let people see your stuff.

On to the tech stuff. I haven't spent a ton of time looking at it yet but I've noticed some new things already. Of coarse there's the obvious stuff like Lightmass that they've talked all about. There's docs on all that which is great. I'm going to talk about what they aren't talking about. First they changed the way they encode their lightmaps. Previously it was 3 DXT1's that stored the 3 colors of incoming light in the 3 HL2 basis directions. They still use the same concept now but instead only have 2 DXT1's. The first stores the incoming light intensity in each of the 3 directions as RGB. The second is the average color of all 3 directions. The loss in quality may be small as the error is mainly bumps picking up different colors of light. The memory is 2/3's of what it was so it seems like a smart optimization.

What impressed me though was their signed distance field baked shadows. Previously they stored baked shadow masks as G8 or 8 bit luminance format textures. This was used simply as a mask to the light. Now they are computing a signed distance field like this paper. It's still stored in a G8 format texture so there's no difference in storage. The big difference comes from the sharpness and quality from a relatively low res texture. The same smooth lines that are useful for vector graphics for Valve's use also work well for shadows. After I read this paper when it came out I thought the exact same thing. This is perfect for baked shadow textures! I implemented it then but I was trying to have another value in the texture to specify how far the occluder was so I could control the width of the soft edge. The signed distance field could be in the green channel of a DXT1 and the softness could be in red. If that wasn't good enough softness in green and distance field in alpha of a DXT5. The prototype was scraped after it didn't really give me what I needed and was being replaced with a different shadow system anyways. I am really impressed with Epic's results though. For a fairly large map they get good quality sun shadows from only 4 1024x1024 G8 textures. That's only 4 mb of shadow data. I bet it would still work well from DXT1's too with the data in the green channel. That would bring it down to 2 mb. The only down side is it is unable to represent soft shadows unless someone can get my encoding softness as an extra value idea to work which would be cool. Apparently I should have given that experiment more consideration.

Thursday, October 1, 2009

Uncharted 2

We are hiring!
First up, Human Head Studios, where I work, is hiring for basically every department. We are currently staffing up and are looking for talented people. I can't tell you what we are working on but I can say it's awesome and we are doing some very interesting and innovative things on many fronts including technology. The tech department where I am, is developing our own cutting edge tech that I intend to be competitive with all the games I talk about on this blog. I really wish I could be more specific but alas, I cannot. Not yet at least. I'm trying to whet your appetite without getting in trouble if you haven't noticed.

Of most interest to the common visitor of this blog, we are looking for tech programmers. Of most interest to me we are looking for a graphics guy that I will work with on aforementioned secret, awesome, graphics tech. If you understand all of what I talk about here and live and breathe graphics you may be the perfect fit. For submission details see here. Mention my name and you might filter up in the pile. Being a reader of my blog has to count for something right?

Uncharted 2
On to the main event. The Uncharted 2 multiplayer demo came out yesterday and I suggest you check it out. The first game has been a benchmark for graphics on the ps3 and I'm sure when number two comes out it will set the bar again. Although all you can see at the moment are some of the mp maps it is very pretty and much can be discerned from dissecting it.

There are many options in the demo that offer an ability to study things. You can start games with just yourself. There is a machinima mode that I haven't fully figured out yet but it allows you to fool with more than in a normal game. You can also replay a play through using the cinema mode. That allows you to pause, single step forward, fly around and even change post processing and lighting a bit.

Right out of the gate I noticed their nice DOF blur. It's one of the best I've seen. It looks like there is both a small blur and a large blur that are lerped between to simulate a continuous blur radius. It looks better than gaussian but I could be imagining things. It is definitely done in HDR as bright things dominate in the blur.

Speaking of blur they now have object motion blur. In the first game there was motion blur when the camera swung around quickly but it only happened in the distance. It was at low resolution and pretty blurry. I didn't really care for it as any time I moved the camera quickly to look at something it blurred and my eyes lost focus on what I was trying to look at. This type of motion blur is still there but doesn't seem to bother me as much. It's likely less blurry than before but I can't say for sure. It is done in HDR though. In addition to the camera motion blur objects have there own geometry that draws to blur them. This is the same thing Tomb Raider Underworld did for character motion blur. U2 uses it for objects as well as characters like the bus that drives by in one of the videos. The blur trails behind and can look weird in certain situations when paused but while playing the game these oddities aren't noticeable and it looks pretty nice.

Glare or bloom seemed to be map specific. In the snow map the sky was bright enough to bloom everywhere. In other maps light sources that where bright enough to go almost white but were obviously saturated when blurred didn't bloom at all. The control to change the sun intensity could be jacked up to the point of being completely blown out but with no hint of bloom. On that note, they are correctly tone mapping and not clamping off bright colors, something so many games are getting wrong. Please, people. Use a tone curve that doesn't just clamp off bright colors, as in not linear. Bloom all you want but without a nice response curve your brights will look terrible.

In regards to lighting the only dynamic shadow in their maps was the sun. I don't know whether that is the same in single player or whether it's mp only. There were other dynamic light sources but they didn't cast dynamic shadows. Strangely they did have precomputed shadows that looked like it was the light being baked into a lightmap. I'm not sure what was happening here as the machinima playground map showed evidence of the the artifact from storing precomputed lighting at the verts which is what they did in U1. It's possible that some objects had lightmaps and others were vertex lit. Another possibility is the lightmap looking shadows were not in the baked lighting but were more like light masks. It's hard to say from what I saw. There were also other lights that seemed to be completely baked and not influence characters.

I don't remember from U1 but in this demo the ambient character lighting doesn't seem to change much or be influenced by local features. There seemed to be a warm light direction always there in the night time city map that should be a very cool ambient practically everywhere. I'm guessing the ambient lighting is artist set up and isn't based on sampling the environment.

They are still using a lot of vertex color blending of textures. I think this is done in the shader where it lerps between 2 sets of textures based on a vertex value and a heightmap texture. The heightmap corresponds with one or both of the texture sets and the vertex value is like an alpha test. This hides the typical smooth vertex colored blending and matches the material that is fading. Think of brick and mortar to get the idea. Brick is one material, concrete the other. The heightmap is based off of the brick so as it transitions the mortar starts to dominate and cover the brick, masking the gradient of the vertex values until it's all concrete. In U1 it was very much like an alphatest because it was a hard transition. For U2 these transitions can be smooth. The vertex value is likely a separate vertex stream so that different variations can be used with a single model instance. A material set up in this fashion can be used in a variety of places and with a simple bit of vertex coloring the geometry seems to be uniquely textured.

Here's just a list of some other miscellaneous things I noticed:

SSAO darkens ambient term. Nice addition, helps ground the characters and lessens the pressure on detailed baked lighting. It's much more subtly done than Gears which tended to look very constrasty.

Shadowmap sampling pattern changed to be a bit smoother. Not a big change. There is major shadow acne on characters at times which is not cool.

Particle systems with higher fill expense fade out when you get close to them. There were some nice clouds of dust in a few places around the maps that are gone when you get up to them.

Last shadowmap cascade (3rd?) fades out to no shadow in the distance. This is mostly not noticeable but it can result in the inside of buildings when outdoors turning bright in the distance.

They still have low res translucent drawing although I don't think it's frame rate dependent anymore. I saw low and normal res translucent things at the same time. The odd thing was the low res wasn't bilinearly filtered but nearest. I have no idea why they would do that.

The snow particles stretch with camera movement to fake motion blur. Cool trick.

I can't wait to play the full game. I loved the first and this is looking to be even better. As I said before this will be the new bar so you should definitely check it out for your self.

Tuesday, April 28, 2009

RGBM color encoding

LogLUV
There has been some talk about using LogLUV encoding to store HDR colors in 32 bits by packing it all into a RGBA_8888 target. I won't go into the details as a better explanation than I could give is here. This encoding has the benefit over standard floating point buffers of reducing ROP bandwidth and storage space at the expense of some shader instructions for both encoding and decoding. It has been used in shipped games. Heavenly Sword used it to achieve 4xaa with HDR on a PS3. Uncharted used it for the 2xaa output from their material pass. The code for encoding and decoding is as follows (copied from previous link):
// M matrix, for encoding
const static float3x3 M = float3x3(
0.2209, 0.3390, 0.4184,
0.1138, 0.6780, 0.7319,
0.0102, 0.1130, 0.2969);

// Inverse M matrix, for decoding
const static float3x3 InverseM = float3x3(
6.0014, -2.7008, -1.7996,
-1.3320, 3.1029, -5.7721,
0.3008, -1.0882, 5.6268);

float4 LogLuvEncode(in float3 vRGB) {
float4 vResult;
float3 Xp_Y_XYZp = mul(vRGB, M);
Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
vResult.w = frac(Le);
vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
return vResult;
}

float3 LogLuvDecode(in float4 vLogLuv) {
float Le = vLogLuv.z * 255 + vLogLuv.w;
float3 Xp_Y_XYZp;
Xp_Y_XYZp.y = exp2((Le - 127) / 2);
Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
float3 vRGB = mul(Xp_Y_XYZp, InverseM);
return max(vRGB, 0);
}
RGBM
There is a different encoding which I prefer. I don't know whether anyone else is using this but I imagine someone is. The idea is to use RGB and a multiplier in alpha. This is often used as a HDR format for textures. I prefer it over RGBE for HDR textures. Here's some info related even though 2 DXT5's is a bit overkill for most applications. Here's some slides from Lost Planet on storing lightmaps as DXT5's. Both LogLUV and these texture encodings are about storing the luminance information separately with a higher precision. This is a standard color compression thing which becomes even more powerful when dealing with HDR data. What at first doesn't make sense is if RGBM is stored in a RGBA_8888 there is no increase in precision by placing luminance in the alpha over having it stored with RGB. The thing is luminance isn't only in alpha. What is essentially stored in alpha is a range value. The remainder of the luminance is stored with the chrominance in rgb. The code is really simple to do this encoding:
float4 RGBMEncode( float3 color ) {
float4 rgbm;
color *= 1.0 / 6.0;
rgbm.a = saturate( max( max( color.r, color.g ), max( color.b, 1e-6 ) ) );
rgbm.a = ceil( rgbm.a * 255.0 ) / 255.0;
rgbm.rgb = color / rgbm.a;
return rgbm;
}

float3 RGBMDecode( float4 rgbm ) {
return 6.0 * rgbm.rgb * rgbm.a;
}
I should also note that it is best to convert the colors from linear to gamma space before encoding. If you plan to use them again in linear a simple additional sqrt and square will work fine for encoding and decoding respectively. The constant 6 gives a range in linear space of 51.5. Sure it's no 1.84e19 of LogLUV but honestly did you really need that? 51.5 should be plenty so long as exposure has already been factored in. This constant can be changed to fit your tastes. Those 3 max's can be replaced with a max4 on the 360 if the compiler is smart enough. I haven't looked to see if it does this. Also the epsilon value to prevent dividing by zero I haven't found necessary in practice. The hardware must output black in the event of denormals which is the same as handling it correctly. I haven't tried it on a large range of hardware so beware if you remove it.

There are some major advantages of RGBM over LogLUV. First off is the cost of encoding and decoding. There is no need for matrix multiplies or logs and exp. Especially of note is how cheap the decoding is. It behaves very well in filtering so you can still use the 4 samples in 1, bilinear trick for downsizing. This isn't technically correct but the difference is negligible.

As far as quality I can't see any banding even in dark stress test cases on a fancy monitor after I've turned all the lights off. It also unsurprisingly handles very bright and saturated colors with the same level of quality. I found no discernible differences in my testing versus LogLUV. I don't have any sort of data on what amount of error it has or whether it covers whatever color space. What I can tell you is that it handles my HDR needs perfectly.

Storing your colors encoded means you cannot do any blending into the buffer. This rules out multi-pass additive lighting and transparency. You will have to use another buffer for transparent things such as particles. This is also a good time to try a downsized buffer since you need a separate one anyways. Now a transparency buffer can store additive, alpha blended and multiply type transparency but only grayscale multiplies since they are going into the alpha channel. Multiply decals can be very useful in adding surface variation while still having tiling textures underneath. These often use color to tint the underlying surface and need to be at full res.

Now for the cool part. Because what is stored in RGB is basically still a color, you can apply multiply blending straight into a buffer stored as RGBM. Multiplying will never increase the range required to store the colors so this is a non destructive operation. In practice I have seen no perceivable precision problems crop up due to this. It is also mathematically correct so there are no worries as to whether it will get weird results.

Wednesday, February 25, 2009

Killzone 2

I got a bogus DMCA notice on this post. Google took it down and now I'm putting it back up. I just finished Killzone 2 and it really is graphically impressive. If you are reading this blog then you are interested in graphics which means you owe it to yourself to play this game. The other levels in the game I think are actually more impressive than the one in the demo. The level in the demo was pretty geometrically simple. Lots of boxy bsp brush looking shapes. The later levels are a lot more complex. In particular the sand level was very pretty.

Level Construction
There didn't seem to be much high poly mesh rendered to a normal map looking stuff. Most everything was made from texture tiles and heightmap generated normalmaps. Most of the textures are fairly desaturated to the point of being likely grayscale with most of the color coming from the lighting and post processing. This is something we did quite a bit in Prey and is something we are trying to change. You may notice the post changing when you walk through some door ways. The most likely candidates are doors from inside to outside.

FX
Their biggest triumph I think is in the fx and atmospherics. There is a ridiculous number of particles. The explosions are some of the best I've seen in a game. There is a lot of dust from bullet impacts, foot falls, wind, explosions. There's smoke coming from explosions, world fires, rocket trails. Each bullet impact also causes a spray of trailed sparks that collide with the world and bounce. Particles are not the only thing contributing. There are also a lot of tricks with sheets and moving textures. For the dust blowing in the wind effect there is a distinct shell above the ground with a scrolling texture plus lots of particles. The common trick with sheets is fading them out when they get edge on and when you get close to them. Add soft z clipping and a flat sheet can look very volumetric. There is also a lot of light shafts done with sheets. One of these situations you can see in the demo. All of this results in a huge amount of overdraw. It has already been pointed out that they are using downsized drawing. This looks to be 1/4 the screen dimensions (1/16 the fill). This is bilinearly upsampled to apply it to the screen opposed to using the msaa trick and drawing straight in. Having the filtering makes it look quite a bit better. It looks like it averages about 10% of the GPU frame time. That would mean they didn't need to sacrifice much to get these kind of effects.

Shadows

All the shadows are dynamic shadow maps. Sunlight is cascaded shadow maps with each level at 512x512. Omni lights use cube shadow maps. They are drawing the back faces to the shadow map to reduce aliasing. Some of the shadow maps can be pretty low resolution. This isn't as bad as Dead Space because they have really nice filtering. This is likely because the rsx has shadow map bilinear pcf for free. I can't tell exactly what the sample pattern is but it looks to alternate. They have stated there is up to 12 samples per pixel. There is a really large number of lights casting dynamic shadows at a time. Even muzzle flashes cast shadows. Lightning flashes cast shadows. At a distance the shadows fade out but the distance is pretty far. To be expected their triangle counts were evenly split between screen rendering and shadow map rendering at about 200k-400k. They should be able to get away with a lot more than that amount of tris.

Lighting

I think this is the first game to really milk deferred lighting for what its worth. There are a ton of lights. The good guys have like 3 small lights on each one of them. That doesn't include muzzle flashes. The bad guys are defined by the red glowing eyes. These have a small light attached to them so the glowing eyes actually light the guys and anything they are close to. In the factory level you can see 230 lights on screen at once. I'm curious if all of these are drawn or if a good fraction is faded out. If there aren't any faded that is insane. 200 draw calls just in lights and that doesn't count stencil drawing that can happen before. Their draw counts seem to always be below 1000 so this is not likely the case.

Post processing

A fair amount of their screen post processing is done on SPU's. As far as I know this is a first. The DOF has a variable blur size. This is most easily visible when going back and forth to the menu. There is motion blur on everything but the blur distance is clamped to very small.

Misc
Environment maps are used on many surfaces. They are mostly crisp to show sharp reflections. I didn't see any situation where they were locational. They are instead tied to the material.

Another neat effect was the water from the little streams. This wasn't actually clipping with the ground or another piece of geometry at all. It is merely in the ground material and it masked to where it should be. The plane moves up and down by changing what range of a heightmap to mask to.

Their profiler says they are spending up to 30% of an SPU on scene portals. I assumed this meant area / portal visibility. In the demo this made sense. After playing it all it no longer makes sense. There are many areas in the game that are just not portalable. I'm not sure what that could mean anymore. They could use it as a component of visibility and the other component is not on the SPUs. In that case I am curious what they used for visibility.

The texture memory amount stayed constant. This must mean that they are not doing any texture streaming.

They have the player character cast shadows but you can not see his model. I found this to be kind of strange especially when you can see the shadows at the feet z fighting with the ground but no feet that would have conveniently hid the problem. It's expensive to get the camera in the head thing to work really well so I understand why they didn't wish to do it but personally I would have gone with both or nothing concerning the players shadow. BTW, why is the player like a foot and a half shorter than everyone else?

For more killzone info:
Deferred lighting
Profiling numbers

It isn't quite to the level of the original prerendered footage but honestly who expected it to be? It is a damn good effort from the folks at Guerrilla. I look forward to their presentation at GDC next week. This is the first year since I've been doing this professionally that I am not going to GDC. I'll have to try and get what I can from the powerpoints and audio recordings. You are all posting your slides right? Wink, wink.

Saturday, January 10, 2009

Virtual Geometry Images

Geometry images are one of those ideas so simple you ask yourself "Why didn't I think of this?" I'll admit it isn't the topic of much discussion concerning the "more geometry" problem for the next generation. They work great for compression but they don't inherently solve any of the other problems. Multi-chart geometry images have a complicated zipping procedure that is also invalid if a part is rendered at a different resolution.

A year ago when I was researching a solution to "more geometry" on DirectX 9 level hardware I came across this paper that was in line with the direction I was thinking. The idea is an extension to virtual textures by having another layer with the textures that is a geometry image. For every texture page that is brought in there is a geometry image page with it. By decomposing the scene into a seemless texture atlas you are also doing a Reyes like split operation. The splitting is a preprocess and the dice is real time. The paper also explains an elegant seem solution.

My plan on how to get this running really fast was to use instancing. With virtual textures every page is the same size. This simplifies many things. The way detail is controlled is similar to a quad tree. The same size pages just cover less of the surface and there are more of them. If we mirror this with geometry images every time we wish to use this patch of geometry it will be a fixed size grid of quads. This works perfectly with instancing if the actual position data is fetched from a texture like geometry images imply. The geometry you are instancing then is grid of quads with the vertex data being only texture coordinates from 0 to 1. The per instance data is passed in with a stream and the appropriate frequency divider. This passes data such as patch world space position, patch texture position and scale, edge tessellation amount, etc.

If patch tessellation is tied to the texture resolution this provides the benefit that no page table needs to be maintained for the textures. This does mean that there may be a high amount of tessellation in a flat area merely because texture resolution was required. Textures and geometry can be at a different resolution but still be tied such as the texture is 2x the size as the geometry image. This doesn't affect the system really.

If the performance is there to have the two at the same resolution a new trick becomes available. Vertex density will match pixel density so all pixel work can be pushed to the vertex shader. This gets around the quad problem with tiny triangles. If you aren't familiar with this, all pixel processing on modern GPU's gets grouped into 2x2 quads. Unused pixels in the quad get processed anyways and thrown out. This means if you have many pixel size triangles your pixel performance will approach 1/4 the speed. If the processing is done in the vertex shader instead this problem goes away. At this point the pipeline is looking similar to Reyes.

If this is not a possibility for performance reasons, and it's likely not, the geometry patches and the texture can be untied. This allows the geometry to tessellate in detailed areas and not in flat areas. The texture page table will need to come back though which is unfortunate.

Geometry images were first designed for compression so disk space should be a pretty easy problem. One issue though is edge pixels. Between each page the edge pixels need to be exact otherwise there will be cracks. This can be handled by losslessly compressing just the edge and using normal lossy image compression for the interiors. As the patches mip down they will be using shared data from disk so this shouldn't be an issue. It should be stored uncompressed in memory thought or the crack problem will return.

Unfortunately vertex texture fetch performance, at least on current console hardware, is very slow. There is a high amount of latency. Triangles are not processed in parallel either. With DirectX 11 tessellators it sounds like they will be processed in parallel. I do not know whether vertex texture fetch will be up to the speed of a pixel texture fetch. I would sure hope so. I need to read specs for both the API and this new hardware before I can postulate on how exactly this scheme can be done with tessellators instead of instanced patches but I think it will work nicely. I also have to give the disclaimer that I have not implemented this. The performance and details of the implementation are not yet known because I haven't done it.

To compare this scheme with the others it has some advantages. Given that it is still triangle rasterization dynamic objects are not a problem. To make this work with animated meshes it will probably need bone indexes and weights stored in a texture along with the position. This can be contained to an animation only geometry pool. It doesn't have the advantage subd meshes have that you can animate just the control points. This advantage may not work that well anyways because you need a fine grained cage to get good animation control which increases patch number, draw count, and tessellation of the lowest detail LOD (the cage itself).

It's ability to LOD is better than subd meshes but not as good as voxels. The reason for this is the charts a model has to be split up into are usually quite a bit bigger than the patches of a subd mesh. This really depends on how intricate the mesh is though. It scales the same subd meshes do but just with a different multiplier. Things like terrain will work very well. Things like foliage work terribly.

Tools side, anything can be converted into this format. Writing the tool unfortunately looks very complicated. This primarily lies with the texture parametrization required to build the seemless texture atlas. After UV's are calculated the rest should be pretty straight forward.

I do like this format better than subd meshes with displacement maps but it's still not ideal. Tiny triangles start to lose the benefits of rasterization. There will be overdraw and triangles missing the center of pixels. More important I think is that it doesn't handle all geometry well, so it doesn't give the advantage of telling your artists they can make any model and place it however they want and it will have no impact on the performance of the game. Once they start making trees or fences you might as well go back to how they used to work because this scheme will run even slower than the old way. The same can be said for subd meshes btw.

To sum it up I think it's a pretty cool system but it's not perfect and doesn't solve all the problems.

Thursday, January 8, 2009

More Geometry

There has been a lot of talk lately about the next generation of hardware and how we are going to render things. The primary topic seems to be "How are we going to render a lot more geometry than we can now?" There are two approaches that are getting a lot of attention. The first is subdivision surfaces with displacement maps and the second is voxel ray casting.

Here are some others that are getting a bit of attention.

Ray casting against triangles
Intel's ray tracer
Cuda ray tracer

Point splatting
Far Voxels
QSplat
Atom

Progressive meshes
Progressive Buffers
View dependent progressive mesh on the GPU

Progressive Buffers is one of my favorite papers. It's one that I keep coming back to time and time again.

Otoy
interview

Who knows exactly what is going on in Otoy. It almost seems like they are being deliberately confusing. It's for a game engine, it's for movie CG, it's lightstage (which I thought was a non-commercial product), it's a server side renderer, it's a web 3d viewer / streaming video.

What I have gathered it generates an unstructured point cloud on the gpu and creates a point hierarchy. It uses this for ray tracing not just eye rays but shadows and reflections. The reflections are massively cached. It's not clear how. I can't figure out how this is working with full animations like they have in their videos. Either that would require regenerating the points, which makes ray casting into it kind of pointless, or it has to deal with holes. Whatever they are doing the results are very impressive.

Subdivision surfaces with displacement maps
This has a lot of powerful people behind it. Both nVidia and AMD are behind it along with DirectX 11 API support through the hull shader, tessellator, and domain shader. I'm not a big fan of this. First off, it's only really useful in data amplification not data reduction. For example our studio and many others are now using the subd cage for the in game models for our characters. That means the the lowest tessellation level the subd surface can get to, the subd cage, is the same poly count as our current near LOD character meshes. It makes subdivision surfaces not useful at all in LODing moderate distances. This can be reduced some but likely not by enough to solve the problem. It looks really complicated to implement and rife with problems. It requires artists input to create models that work well with it. The data imported from the modelling package can be specific to that package. It's hard to beat the generality of a triangle soup.

The plus side is there is no issue with multiple moving meshes or deforming. They are fully animatable. To allow good animation control the tessellation of the cage may need to be higher. In theory every piece of geometry in your current engine could be replaced with subd models with displacement maps and the rest of your pipeline would work exactly the same.

Check out some work in this direction from Michael Bunnell of Fantasy Lab:
GPU Gems 2 chapter
Fantasy Lab

Voxel ray casting
This has been made popular by John Carmack who has described this as his planned approach for idTech6. Considering he's always at the head of the pack this should give it pretty strong backing. John refers to his implementation as a sparse voxel octree (SVO). The idea is to extend his current virtual texturing mip blocks to 3D with a "mip" hierarchy of voxels that will be stored as an octree. The way this is even remotely reasonable is that you only need to store the important data, no data in empty space. This is very different from most scientific and medical applications that require the whole data block. This structure is great for compression. Geometry compression now turns into image compression which is a well studied problem and effective. LODing works from the whole screen bing a single voxel to subpixel detail. To render it every screen pixel casts a ray into the octree until it hits a voxel the size of a pixel or smaller. This means that both rendering is reduced by LODing and memory is reduced. If you don't need the voxel data you don't need it in memory.

I like this approach because it gives one elegant way of handling textures, geometry, streaming, compression and LODing all in one system. There are no demands on the input data. Anything can be converted into voxels. Due to a streaming structure very similar to virtual texturing it allows unique geometry and unique texturing. This means there are no restrictions on the artists. There is little way an artist can impact performance or memory with assets they create. That puts the art and visuals in the artists hands and the technical decisions in the engineers hands.

There are some problems. Ray tracing has always been slow. Ray casting is a search where rasterization is binning. Binning is almost always faster than searching. In a highly parallel environment the synchronization required in binning may tip the scales in searching's favor. As triangles shrink the number of multiple triangles in one bin or bin misses also hurts the speed advantage. It has now been demonstrated to be fast enough to render on current hardware at 60fps. This should be enough proof to let this concern slide a bit. It does mean it will only be able to be rendered once. It's unlikely there will be power left to render a shadow map or ray trace to the light for that matter. My guess is John's intent is to have the lighting fully baked into the texture like how idTech5 works currently. This also cuts down on the required memory as only one color is required per voxel.

Memory is another possible problem. Jon Olick's demo of one character using a SVO required ~1gb of video memory which was not enough to completely hide paging. His plans to decrease this size was entropy encoding which means each child's data is based on its parents data. As far as I'm aware this is only going to work if you use a kd-tree restart traversal which is slower than the other alternatives. Otherwise he would need to evaluate the voxel data for the whole stack once he wishes to draw the pixel.

The most important problem is it doesn't work with dynamic meshes. The scheme I believe John Carmack is planning on using is a static world with baked lighting with all dynamic objects and characters using traditional triangle meshes. I expect this to work out well. The performance of this type of situation has been shown to be there so it's not too risky to pursue this direction. There is something about it that bugs me. You release the constraints of the environmental artists but leave the other artists with the same problems. If you handle texturing inherently with voxels does that mean he needs to keep around his virtual texturing for everything else? Treating the world and the dynamic objects in it separately has been required in the past with lightmaps and vertex lighting. To bring back this Hanna Barbara effect in a even more complicated way leaves a bad taste in my mouth. I'm really looking for a uniform solution for geometry and textures.

For more detailed information see Jon Olick's Siggraph presentation. He is a former employee of id. Also check out the interview that started this all off.

The brick based voxel implementations seem like a better solution to me than having the tree uniform. This means the leaves are a different type than the nodes. They consist of a fixed size brick of voxels. Being in a brick format has many advantages. It allows free filtering and hardware volume texture compression through DXT formats.

Check out these for brick approaches:
Gigavoxels
GPU ray casting of voxels