Graphic Rants

Normal map filtering using vMF (part 3)

2018-05-12T14:07:00.000-05:00

$$ \newcommand{\vv}{\mathbf{v}} \newcommand{\rv}{\mathbf{r}} \newcommand{\muv}{\boldsymbol\mu} \newcommand{\omegav}{\boldsymbol\omega} \newcommand{\mudotv}{\muv\cdot\vv} $$ What can we use this for? One example of a place where distributions are summed up is in normal map and roughness filtering. Normal and roughness maps are textures describing the distribution of microfacet normals. The normal of the normal map is the mean of the distribution and the roughness describes the width of the distribution. We can fit our chosen NDF with a vMF by finding a mapping from roughness to sharpness $\lambda$.

This mapping for Beckmann is given by [1] as: \begin{equation} \lambda \approx \frac{2}{\alpha^2} \label{eq:roughness_to_lambda} \end{equation} and following my previous post about specular models we can use the $\alpha$ from any of those distributions in this equation for a reasonable approximation.

Once you have vMFs we can sum or filter them in $\rv$ form. Then we can turn it back to normal and roughness by inverting the function: \begin{equation} \alpha \approx \sqrt{\frac{2}{\lambda}} \label{eq:lambda_to_roughness} \end{equation} We must be careful with floating point precision and divide by zero though. Instead of calculating $\lambda$ we can instead calculate its reciprocal which avoids multiple places where a divide by nearly zero can happen.

// Convert normal and roughness to r form

float InvLambda = 0.5 * Alpha*Alpha;

float exp2L = exp( -2.0 / InvLambda );

float CothLambda = InvLambda > 0.1 ? (1 + exp2L) / (1 - exp2L) : 1;

float3 r = ( CothLambda - InvLambda ) * Normal;

// Filter in r form

// Convert back to normal and roughness

float r2 = clamp( dot(r,r), 1e-8, 1 );

InvLambda = rsqrt( r2 ) * ( 1 - r2 ) / ( 3 - r2 );

Alpha = sqrt( 2 * InvLambda );

Normal = normalize(r);

How does this compare to the common approaches? The first to do something like this was Toksvig [2] which follows similar logic with vector length corresponding with gloss and uses properties of Gaussian distributions but not SGs exactly. LEAN mapping [3] is based on Gaussians as well but planar distributions, not spherical. The approach I just explained should in theory work just as well with object space normals.

Even though it was part of the original approach the common way to use "Toksvig" filtering (including UE4's implementation) is to find the normal variance and increase the roughness by it. There is no influence from the roughness on the normals when doing that and there should be. The correct way will affect how the normals are filtered. A smooth normal should have more weight in the filter than a rough normal.

vMF has been used for this task before in [5] and later [6]. There is a major difference from our approach in that Frequency domain normal map filtering relies on convolving instead of averaging. It finds the vMF for the distribution of normals over the filter region. It then convolves the normal and roughness by that vMF. But what is a convolution?

Convolution

Graphics programmers know of convolutions like blurring. It sort of spreads data out right? What does it mean mathematically though? A convolution of one function by another creates a new function that is equivalent to the integral of the function being convolved multiplied by a the convolving function translated to that point.

Think of a blur kernel with weights per tap. That kernel center is translated to the pixel in the image that we write the blur result to. Each tap of the kernel is a value from the blur function. We multiply those kernel values by the image that is being convolved. All of those samples are then added together. Now usually a blur doesn't have infinite support or every pixel of the image would need to be sampled but the only reason that doesn't need to happen is because the convolving function, ie the blur kernel, is zero past a certain distance from the center of the function. Otherwise the integral needs to cover the entire domain. In the 1D case that means from negative to positive infinity. In the case of a sphere that means over the entire surface of the sphere.

This symbolically looks like this for 1D: \begin{equation} (f * g) (x) = \int_{-\infty}^\infty f(t) g(x-t)\,dt \end{equation} We now have the definition but why would we want to convolve a function besides image blurring? A convolution of one function by another creates a new function that when evaluated is equal to if the both functions were multiplied together and integrated at that translated point. Think of this like precalculating the multiplication and integration of those functions for any translated point. The integral of the product is done ahead of time and now we can evaluate it for any translation.

This is exactly the use case for preconvolving environment maps by the reflected GGX distribution. GGX is the convolving function, the environment map is the function being convolved, the reflection vector direction used to sample the preconvolved environment map is the "translation". SGs are very simple to multiply and integrate as we have already seen so precomputing it often doesn't save much. Convolving does have its uses though so let's see how to do it.

Convolving SGs

The convolution of two SGs is not closed in the SG basis, meaning it does not result in exactly a SG. Fortunately it can be well approximated by one. [7] gave a simple approximation that is fairly accurate so long as the lobe sharpnesses aren't very low: \begin{equation} \begin{aligned} \left(G_1 * G_2\right) \left( \vv \right) &= \int_{S^2} G_1(\omegav) G_2\left( \omegav; \vv, \lambda_2, a_2 \right) \,d\omegav \\ &\approx G \left( \vv; \muv_1, \frac{\lambda_1 \lambda_2}{\lambda_1 + \lambda_2}, 2\pi\frac{a_1 a_2}{\lambda_1 + \lambda_2} \right) \end{aligned} \label{eq:convolve_sg} \end{equation} The first line of the equation above may shed more light on how we can use this if it isn't clear already. This is identical to the inner product but with $\muv_2$ replaced with a free parameter.

For the case of normal map filtering we don't care about amplitude. We want a normalized SG. That means for this case the only part that matters is the convolved $\lambda'$: \begin{equation} \lambda' = \frac{\lambda_1 \lambda_2}{\lambda_1 + \lambda_2} \end{equation} We can ignore the rest of eq \eqref{eq:convolve_sg} for the moment. If we replace $\lambda$ everywhere with $\alpha$ using eq \eqref{eq:lambda_to_roughness} we get a nice simple equation: \begin{equation} \alpha' = \sqrt{ \alpha_1^2 + \alpha_2^2 } \end{equation} Leave $\lambda$ in for one of them and we get: \begin{equation} \alpha' = \sqrt{ \alpha^2 + \frac{2}{\lambda} } \label{eq:alpha_prime} \end{equation} which looks just like what [6] used except for the 2 factor. I believe this is a mistake in their course notes. In equation (37) of their notes they have it as 1/2 and in the code sample it is 1. I think the confusion comes from the Frequency Domain Normal Map Filtering paper working with Torrance Sparrow and not Cook Torrance, and $\sigma \neq \alpha$. Overall it means less roughness from normal variance. In my tests using eq \eqref{eq:alpha_prime} that we just derived looks closer to Toksvig results. Otherwise the range is off and less rough. MJP uses the same 2/a^2 for SG sharpness in his blog post so we don't disagree there.

As a gut check if $\alpha=0$ and all final roughness comes from normal variation then $\alpha'=\sqrt{\frac{2}{\lambda}}$ which is what we established in eq \eqref{eq:lambda_to_roughness}. If there is no normal variation then this equation explodes but if you calculate InvLambda like I did in the code snippet the second term becomes zero and $\alpha'=\alpha$ which is what we want.

Next up, converting from SH to SG (coming soon).

References

[1] Wang et al. 2007, "All-Frequency Rendering of Dynamic, Spatially-Varying Reflectance"
[2] Toksvig 2004, "Mipmapping Normal Maps"
[3] Olano et al. 2010, "LEAN Mapping"
[4] Hill 2011, "Specular Showdown in the Wild West"
[5] Han et al. 2007, "Frequency Domain Normal Map Filtering"
[6] Neubelt et al. 2013, "Crafting a Next-Gen Material Pipeline for The Order: 1886"
[7] Iwasaki 2012, "Interactive Bi-scale Editing of Highly Glossy Materials"

von Mises-Fisher (part 2)

2018-05-12T12:44:00.000-05:00

$$ \newcommand{\vv}{\mathbf{v}} \newcommand{\rv}{\mathbf{r}} \newcommand{\muv}{\boldsymbol\mu} \newcommand{\mudotv}{\muv\cdot\vv} $$ A normalized SG has the same equation as the probability distribution function for a von Mises-Fisher (vMF) distribution on the 3 dimensional sphere. This affords us a few more tools and applications to work with. A vMF distribution can be defined for any dimension. I'll focus on 3D here because it is the most widely usable for computer graphics and simplifies discussion. Because a vMF does not have a free amplitude parameter it is written as: \begin{equation} \begin{aligned} V(\vv;\muv,\lambda) = \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } e^{\lambda(\mudotv - 1)} \end{aligned} \label{eq:vmf} \end{equation} The more common form you will likely see in literature is this: \begin{equation} \begin{aligned} V(\vv;\muv,\lambda) = \frac{\lambda}{ 4\pi \sinh(\lambda) } e^{\lambda(\mudotv)} \end{aligned} \label{eq:vmf_sinh} \end{equation} which is equivalent due to the identity \begin{equation} \begin{aligned} \sinh(x) = \frac{ 1 - e^{-2x} }{ 2e^{-x} } \end{aligned} \label{eq:sinh_identity} \end{equation} The form in eq \eqref{eq:vmf} is more numerically stable so should be used in practice as explained by [2].

Compare the equation for a vMF to the equation for a SG and it is easy to see that: \begin{equation} \begin{aligned} V(\vv;\muv,\lambda) = G\left( \vv; \muv, \lambda, \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } \right) \end{aligned} \label{eq:vmf_to_sg} \end{equation} That means a vMF is equivalent to a normalized SG and by moving terms from one side to the other we can show that a SG is equivalent to a scaled vMF. \begin{equation} \begin{aligned} G\left( \vv; \muv, \lambda, a \right) = \frac{2\pi a}{\lambda} \left( 1 - e^{-2 \lambda} \right) V(\vv;\muv,\lambda) \end{aligned} \label{eq:sg_to_vmf} \end{equation}

Fitting a vMF distribution to data

Fitting a vMF distribution to directions or points on a sphere is a very similar process as fitting a normal distribution to points on a line. In the case of a normal distribution, one calculates the mean and variance of the data set and then chooses a normal distribution with the same mean and variance as the best fit to the data.

For the vMF distribution the mean direction and spherical variance are used. Calculating these properties for a set of directions is simple. \begin{equation} \begin{aligned} \rv = \frac{1}{n}\sum_{i=1}^{n} \textbf{x}_i \end{aligned} \label{eq:r_average} \end{equation} where $\textbf{x}_1, \textbf{x}_2, ..., \textbf{x}_n$ are a set of unit vectors.

Often values are associated with these directions. So instead taking a simple average we can take a weighted average. \begin{equation} \begin{aligned} \rv = \frac{\sum_{i=1}^{n} \textbf{x}_i w_i}{\sum_{i=1}^{n} w_i} \end{aligned} \label{eq:r_weighted_average} \end{equation} We have the two properties, the mean direction $\muv = \frac{\rv}{\|\rv\|}$ and the spherical variance $\sigma^2 = 1 - \|\rv\|$. To fit a vMF distribution to the data we need to know what these properties are for the vMF distribution. Since the vMF distribution is convex, circularly symmetric about its axis, and is max in the direction of $\muv$, it is fairly obvious that the mean direction will be $\muv$ so I won't derive that here.

The spherical variance $\sigma^2$ on the other hand is a bit more involved. Because we already know the direction of $\rv$ is $\muv$ we can simplify this calculation to the integral of the projection of the function onto $\muv$. \begin{equation} \begin{split} \|\rv\| &= \int_{S^2} V(\vv;\muv,\lambda) (\mudotv) d\vv \\ &= \frac{\lambda}{ 4\pi \sinh(\lambda) } \int_{S^2} e^{\lambda(\mudotv)} (\mudotv) d\vv \\ \end{split} \end{equation} Because the integral over the sphere is rotation-invariant we will replace $\muv$ with the x-axis. \begin{equation} \begin{split} &= \frac{\lambda}{ 4\pi \sinh(\lambda) } \int_{0}^{2 \pi} \int_{0}^{\pi} e^{\lambda\cos\theta} \cos\theta\sin\theta d\theta d\phi \\ &= \frac{\lambda}{ 4\pi \sinh(\lambda) } 2 \pi \int_{0}^{\pi} e^{\lambda\cos\theta} \cos\theta\sin\theta d\theta \\ \end{split} \end{equation} Substituting $t=-\cos\theta$ and $dt=\sin\theta d\theta$ \begin{equation} \begin{split} &= \frac{\lambda}{ 2 \sinh(\lambda) } \int_{-1}^{1} -t e^{-\lambda t} dt \\ &= \frac{\lambda}{ 2 \sinh(\lambda) } \left( \frac{ 2 \lambda \cosh(\lambda) - 2 \sinh(\lambda) }{ \lambda^2 } \right) \\ &= \frac{\cosh(\lambda)}{ \sinh(\lambda) } - \frac{\sinh(\lambda)}{ \lambda \sinh(\lambda) } \\ \end{split} \end{equation} Arriving in its final form \begin{equation} \|\rv\| = \coth(\lambda)-\frac{1}{\lambda} \label{eq:r_length} \end{equation} Although simple in form, this function unfortunately isn't invertible. [1] provides an approximation which is close enough for our purposes. \begin{equation} \begin{aligned} \lambda &= \|\rv\| \frac{ 3 - \|\rv\|^2}{1 - \|\rv\|^2} \end{aligned} \end{equation} Now that we have a way to calculate the mean and spherical variance for a data set and we know the corresponding vMF mean and spherical variance, we can fit a vMF to the data set.

Using eq \eqref{eq:r_weighted_average} to calculate $\rv$, the vMF fit to that data is \begin{equation} V\left( \vv; \frac{\rv}{\|\rv\|},\|\rv\| \frac{ 3 - \|\rv\|^2}{1 - \|\rv\|^2} \right) \label{eq:r_to_vmf} \end{equation}
Going the other direction from $V(\vv;\muv,\lambda)$ form to $\rv$ form using eq \eqref{eq:r_length} is this: \begin{equation} \rv = \left( \coth(\lambda)-\frac{1}{\lambda} \right) \muv \label{eq:vmf_to_r} \end{equation}

Addition of SGs

We now have a way to convert to and from $\rv$ form. $\rv$ is linearly filterable as shown in how it was originally defined in eq \eqref{eq:r_weighted_average}. This means if our vMF functions are representing a spherical distribution of something then a weighted sum of those distributions can be approximately fit by another vMF. In other words we can approximate the resulting distribution by converting to $\rv$ form, filtering, and then converting back to traditional $V(\vv;\muv,\lambda)$ form.

By using the weighted average eq \eqref{eq:r_weighted_average} we can apply this concept to non normalized SGs too. This allows us to not just filter (ie sum with a total weight of 1) but add as well. A non-normalized SG as shown in eq \eqref{eq:sg_to_vmf} is a scaled vMF. We can use this scale as the weight when summing and use the total weight as the final scale for the summed SG.

This is the $\rv$ form for $G(\vv;\muv,\lambda, a)$. It includes an additional weight value you can think of like the energy this SG is adding to the sum: \begin{equation} \begin{aligned} \rv_i &= \left( \coth(\lambda_i)-\frac{1}{\lambda_i} \right) \muv_i \\ w_i &= \frac{2\pi a_i}{\lambda_i} \left( 1 - e^{-2 \lambda_i} \right) \\ \end{aligned} \end{equation} This weight is of course used in the weighted sum \begin{equation} \begin{aligned} \rv &= \frac{\sum_{i=1}^{n} \rv_i w_i}{\sum_{i=1}^{n} w_i} \\ w &= \sum_{i=1}^{n} w_i \\ \end{aligned} \end{equation} Using eq \eqref{eq:r_to_vmf} and eq \eqref{eq:vmf_to_sg} we can convert back to a scaled vMF and finally to a SG in $G(\vv;\muv,\lambda, a)$ form: \begin{equation} \begin{aligned} G\left( \vv; \muv, \lambda, a \right) &= w V(\vv;\muv,\lambda) \\ &= G\left( \vv; \muv, \lambda, w \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } \right) \end{aligned} \end{equation} While addition and filtering are approximate they can be useful. The accuracy of the result is very dependent on the angle between the $\mu$ vectors or lobe axii. Adding sharp lobes pointed in different directions will result in a single wide lobe.

Next, what we can use this for:
Normal map filtering using vMF

References

[1] Banerjee et al. 2005, "Clustering on the Unit Hypersphere using von Mises-Fisher Distributions"
[2] Jakob 2012, Numerically stable sampling of the von Mises Fisher distribution on S2 (and other tricks)"

Spherical Gaussians (part 1)

2018-05-12T12:42:00.000-05:00

$$ \newcommand{\vv}{\mathbf{v}} \newcommand{\rv}{\mathbf{r}} \newcommand{\muv}{\boldsymbol\mu} \newcommand{\mudotv}{\muv\cdot\vv} $$ A Spherical Gaussian (SG) is a function of unit vector $\vv$ and is defined as \begin{equation} G(\vv;\muv,\lambda, a) = a e^{\lambda(\mudotv - 1)} \end{equation} where unit vector $\muv$, scalar $\lambda$, and scalar $a$ represent the lobe axis, lobe sharpness, and lobe amplitude of the SG, respectively.

The formula can be read as evaluating a SG in the direction of $\vv$ where the SG has the parameters of $\muv,\lambda, a$. An abbreviated notation $G(\vv)$ can be used instead when the parameters can be assumed. Often the more verbose notation is used to assign values to the parameters.

SGs have a number of nice properties including simple equations for a number of common operations.

Product of Two SGs

The product of two SG's can be represented exactly as another SG. This product is sometimes referred to as the vector product. This formula was first properly given in [1] (it was shown earlier but in an non-normalized form).

Let $\lambda_m = \lambda_1 + \lambda_2$ and let $\muv_m = \frac{\lambda_1\muv_1 + \lambda_2\muv_2}{\lambda_1 + \lambda_2}$, then \begin{equation} \begin{split} G_1(\vv)G_2(\vv) = G\left(\vv; \frac{\muv_m}{\|\muv_m\|}, \lambda_m\|\muv_m\|, a_1 a_2 e^{\lambda_m\left(\|\muv_m\| - 1\right)}\right) \end{split} \label{eq:sg_product} \end{equation}

Raising to a power

Given that the product of two SGs is another SG it shouldn't be much of a surprise that a SG raised to a power can be expressed exactly as another SG: \begin{equation} \begin{aligned} G(\vv)^n &= G(\vv; \muv,n\lambda, a^n) \end{aligned} \label{eq:sg_power} \end{equation}

Integration Over The Sphere

The integral of a SG over the sphere has a closed form solution.

[2] showed that the integral was: \begin{equation} \int_{S^2}G(\vv) d\vv = 2 \pi \frac{a}{\lambda} \left( 1 - e^{-2\lambda} \right) \label{eq:sg_integral} \end{equation}

Inner product

The inner product is defined as the integral over the sphere of the product of two SGs. We can already find the product of two SGs and integrate over a sphere. Putting those together we have: \begin{equation} \int_{S^2}G_1(\vv) G_2(\vv) d\vv = \frac{4 \pi a_0 a_1}{e^{\lambda_m}} \frac{ \sinh\left(\|\muv_m\| \right) }{ \|\muv_m\| } \label{eq:sg_inner_product_sinh} \end{equation} This equation has numerical precision issues when evaluated with floating point arithmetic. An alternative form which is more stable is the following: \begin{equation} \int_{S^2}G_1(\textbf{v}) G_2(\textbf{v}) d\textbf{v} = 2 \pi a_0 a_1 \frac{ e^{ \|\boldsymbol\mu_m\| - \lambda_m } - e^{ -\|\boldsymbol\mu_m\| - \lambda_m } }{ \|\boldsymbol\mu_m\| } \label{eq:sg_inner_product_exp} \end{equation}

Normalization

Although there are other definitions for normalization I use the term to mean having an integral over the sphere equal to 1. Normalizing a SG is a simple matter of dividing it by its integral over the sphere. \begin{equation} \begin{aligned} \frac{ G(\vv) }{ \int_{S^2}G(\vv) d\vv } = G\left( \vv; \muv, \lambda, \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } \right) \end{aligned} \label{eq:sg_normalized} \end{equation} Notice that the original $a$ parameter canceled out. Instead lobe amplitude is derived purely from the lobe sharpness $\lambda$.

These are all the common operations that have closed form solutions. So far nothing new here but hopefully it is helpful to have all these equations in a centralized place for reference. I didn't include derivations for any of these formulas. If readers think that would be useful to see maybe those could be added at a later date.

Special thanks to David Neubelt. Although this has been heavily modified from what we previously had I'm sure his touch is still present.

Now on to some less well covered concepts.
von Mises-Fisher (part 2)

References

[1] Wang et al. 2007, "All-Frequency Rendering of Dynamic, Spatially-Varying Reflectance"
[2] Tsai et al. 2006, "All-Frequency Precomputed Radiance Transfer using Spherical Radial Basis Functions and Clustered Tensor Approximation"

Spherical Gaussian series

2018-05-12T12:41:00.000-05:00

Intro and backstory

About 4 years ago now I ran into Spherical Gaussian (SG) math in a few different publications in a row, enough that it triggered the pattern detection in my brain. All were using SGs to approximate specular lobes. I remember feeling very similar 10 years prior when Spherical Harmonics were starting to become all the rage. Back with SH I noticed it fairly early, primarily from Tom Forsyth's slides on the topic and took the time to dig into the math and make sure I had this new useful thing in my toolbox. Doing so has proven to be well worth the time. I decided that I should do the same again and learn SGs and related math, in particular to build up a toolbox of operations I can do with them. Maybe it would prove as useful as SH has.

In the years since I'd say it has certainly been worth the effort. I don't think I can say it has proven as useful as SH has been to computer graphics but it was still worth learning. I intended to write up what I had found and share the toolbox of equations compiled in a centralized place at the very least. Unfortunately laziness and procrastination got in the way.

Also scope creep. About 3 years ago I mentioned to David Neubelt that I intended to write this up and he too had done a lot of work with SGs at Ready at Dawn so we decided we'd collaborate and write a joint paper and submit it to JCGT. The intention was to make something similar to Stupid Spherical Harmonics (SH) Tricks but for SG. That scope and seriousness is much larger than a simple single author blog post. We worked on derivations to all the formulas, wanted to have quality solutions to cube map fitting, multiple use cases proven in production, and a ton of other things to make it an exhaustive, professional, and ultimately great paper. This was much more than I ever intended as a simple blog post. I still think that paper we had in mind would be great to exist but the actual end result is it bloated the expectation of what either of us had the bandwidth or maybe the attention span to complete and after a couple of months of work the unfinished paper regretfully stagnated.

A year went by and eventually MJP of Ready at Dawn did in fact do a blog series write up on Spherical Gaussians. It is excellent and I suggest you read it before continuing if you haven't already. There are still some things I intended to cover that he did not as well as things he did that I never have done nor planned to cover so I think these should compliment each other well.

It is a royal shame I have not posted this in the 3+ years since I intended to. I had even promised folks publicly a write up was coming and then didn't deliver. I haven't posted anything on this blog since then in fact. I hope to do better and hopefully finally getting this out will unclog the pipes.

Without further ado,

Part 1 - Spherical Gaussians
Part 2 - von Mises-Fisher
Part 3 - Normal map filtering using vMF
Part 4 - Coming soon...

UE4 available to all

2014-03-29T19:50:00.000-05:00

The big news from Epic at GDC this year was that Unreal Engine 4 is now available to everyone for only $19/month + 5% royalties.

There's been a lot said already about why this is cool, opinions, comparisons to competitors and so forth. From a business model point of view I feel it has been covered better than I could possibly say. If you are interested check out Mark Rein's post or hear it from the man himself Tim Sweeney for very compelling reasons why this is a good idea for us to provide and for developers to subscribe.

I wanted to give my own personal perspective as an engineer. What Epic just did is absolutely revolutionary and I am privileged to be part of it. I am super excited! Let me explain why.

If you've followed this blog for a while you may remember I gave Epic huge props for releasing UDK back in 2009. That was great for artists and designers, not that great for programmers. This is the next step down that path and its a doozy.

Now there have been naysayers that compare the numbers 19/month + 5%, is that better or worse, is it new or been done before, and so on. They are missing the point. It isn't the price, it isn't even the features. The revolution is this: for a 1/3 the cost of a new video game you can get access to the complete source code for our cutting edge game engine. The entire engine source code, every bit of it, the exact same we use in house and the same provided to private licensees, is available to you for only $19 (except for XB1 and PS4 code we aren't legally able to give you due to NDAs). This is the same tech that will be powering many of the top AAA games developed this generation. As Mark put it "Right now 7 of the top 21 (!) all-time highest rated Xbox 360 games (by Metacritic score) were powered by Unreal Engine 3" and you damn well better bet we are going to do the same or better this time around with UE4.

Now you may say John Carmack and id have open sourced their engines many times. That is true and I commend them for it. John has had a lasting impact on how many coders, myself included, write software due to these efforts. Unfortunately, the strict licensing terms have resulted in basically no commercial products using the id engines from that open source license. John has even lamented this recently "It is self-serving at this point, but looking back, I wish I could have released all the Id code under a more permissive license. GPL never did really do anything for game code, and I do wonder whether it was a fundamental cultural incompatibility. GPL was probably the best that could have flown politically for the code releases -- posterity without copy-paste into competition." And my own personal twist with this situation is after Zenimax bought id, their engines were no longer commercially licensable meaning anything built on top of that code base had to be scraped. GPL is simply not an option for commercial games.

There are other examples of open sourced game engines or cheaply priced engine source code but they all have something in common which is they were never up to date. The id engines were only open sourced years after the game that used them shipped. Id had already moved onto their next technology. For example the engine that powered Doom 3 wasn't opened sourced until 6 years after the game shipped. By that point Rage had even shipped. The reason for this is obvious. The point where releasing the code is non-threatening to execs is the point where a competitor can't use the code to their advantage. This means every single engine code base available to indie devs or students was either not commercially developed or purposely not competitive in its capabilities.

With the UE4 subscription, you can today get complete source that is totally up to date and will continue to be. This isn't any old engine, its the same cutting edge technology that will be powering many of the top AAA games of this generation. And as soon as we add new features you'll get them. In fact you'll even see the work in progress before they're done. How crazy is that?

Look, its easy to brush this off as the guy who works for Epic telling me I should give him money. That isn't why I'm writing this. What I am most excited about by far is in what I can share and give back the the gamedev community. That is the reason I have this blog, its the reason I have presented at conferences and will again in the future.

If you are a teacher, check out the license terms, they are ridiculously nice for schools. If you are a student, tell your school about it or grab a personal subscription.

For all those wanting to get into the game industry, the same response you will hear from everyone is make something. No one lands a game job because of their great grades or fancy degrees. They get it by making something cool. I wrote my own engine. That's how I landed my first job. Although I learned a ton if I were in that place today I would modify UE4. There is literally nothing else more applicable to a engine programming position than proving you can do the work. Hell, make something cool enough we'll hire you!

I can't wait to see what people make with this. It's a great time to be a programmer!

Tone mapping

2013-12-15T15:30:00.000-06:00

When working with HDR values, two troublesome situations often arise.

The first happens when one tries to encode an HDR color using an encoding that has a limited range, for instance RGBM. Values outside the range still need to be handled gracefully, ie not clipped.

The second happens when an HDR signal is under sampled. One very bright sample can completely dominate the result. In path tracing these are commonly called fireflies.

In both cases the obvious solution is to reduce the range. This sounds exactly like tonemapping so break out those tone mapping operators, right? Well yes and no. Common tone mapping operators work on color channels individually. This has the downside of desaturating the colors which can look really bad if later operations attenuate the values, for instance reflections, glare, or DOF.

Instead I use a function that modifies only the luminance of the color. The simplest of which is this:

$$ T(color) = \frac{color}{ 1 + \frac{luma}{range} } $$
Where $T$ is the tone mapping function, $color$ is the color to be tone mapped, $luma$ is the luminance of $color$, and $range$ is the range that I wish to tone map into. If the encoding must fit RGB individually in range then $luma$ is the max RGB component.

Inverting this operation is just as easy. $$ T_{inverse}(color) = \frac{color}{ 1 - \frac{luma}{range} } $$
This operation, when used to reduce fireflies, can also be thought of as a weighting function for each sample: $$ weight = \frac{1}{ 1 + luma } $$
For a weighted average, sum all samples and divide by the summed weights. The result will be the same as if the samples were tone mapped using $T$ with $range$ of 1, averaged, then inverse tone mapped using $T_{inverse}$.

If a more expensive function is acceptable then keeping more of the color range linear is best. To do this use the functions below where 0 to $a$ is linear and $a$ to $b$ is tone mapped. $$ T(color) = \left\{ \begin{array}{l l} color & \quad \text{if $luma \leq a$}\\ \frac{color}{luma} \left( \frac{ a^2 - b*luma }{ 2a - b - luma } \right) & \quad \text{if $luma \gt a$} \end{array} \right. $$ $$ T_{inverse}(color) = \left\{ \begin{array}{l l} color & \quad \text{if $luma \leq a$}\\ \frac{color}{luma} \left( \frac{ a^2 - ( 2a - b )luma }{ b - luma } \right) & \quad \text{if $luma \gt a$} \end{array} \right. $$
These are same as the first two functions if $a=0$ and $b=range$.

I have used these methods for lightmap encoding, environment map encoding, fixed point bloom, screen space reflections, path tracing, and more.

Specular BRDF Reference

2013-08-03T21:16:00.000-05:00

$$ \newcommand{\nv}{\mathbf{n}} \newcommand{\lv}{\mathbf{l}} \newcommand{\vv}{\mathbf{v}} \newcommand{\hv}{\mathbf{h}} \newcommand{\mv}{\mathbf{m}} \newcommand{\rv}{\mathbf{r}} \newcommand{\ndotl}{\nv\cdot\lv} \newcommand{\ndotv}{\nv\cdot\vv} \newcommand{\ndoth}{\nv\cdot\hv} \newcommand{\ndotm}{\nv\cdot\mv} \newcommand{\vdoth}{\vv\cdot\hv} $$ While I worked on our new shading model for UE4 I tried many different options for our specular BRDF. Specifically, I tried many different terms for to Cook-Torrance microfacet specular BRDF: $$ f(\lv, \vv) = \frac{D(\hv) F(\vv, \hv) G(\lv, \vv, \hv)}{4(\ndotl)(\ndotv)} $$ Directly comparing different terms requires being able to swap them while still using the same input parameters. I thought it might be a useful reference to put these all in one place using the same symbols and same inputs. I will use the same form as Naty [1], so please look there for background and theory. I'd like to keep this as a living reference so if you have useful additions or suggestions let me know.

First let me define alpha that will be used for all following equations using UE4's roughness: $$ \alpha = roughness^2 $$

Normal Distribution Function (NDF)

The NDF, also known as the specular distribution, describes the distribution of microfacets for the surface. It is normalized [12] such that: $$ \int_\Omega D(\mv) (\ndotm) d\omega_i = 1 $$ It is interesting to notice all models have $\frac{1}{\pi \alpha^2}$ for the normalization factor in the isotropic case.

Blinn-Phong [2]: $$ D_{Blinn}(\mv) = \frac{1}{ \pi \alpha^2 } (\ndotm)^{ \left( \frac{2}{ \alpha^2 } - 2 \right) } $$ This is not the common form but follows when $power = \frac{2}{ \alpha^2 } - 2$.

Beckmann [3]: $$ D_{Beckmann}(\mv) = \frac{1}{ \pi \alpha^2 (\ndotm)^4 } \exp{ \left( \frac{(\ndotm)^2 - 1}{\alpha^2 (\ndotm)^2} \right) } $$

GGX (Trowbridge-Reitz) [4]: $$ D_{GGX}(\mv) = \frac{\alpha^2}{\pi((\ndotm)^2 (\alpha^2 - 1) + 1)^2} $$

GGX Anisotropic [5]: $$ D_{GGXaniso}(\mv) = \frac{1}{\pi \alpha_x \alpha_y} \frac{1}{ \left( \frac{(\mathbf{x} \cdot \mv)^2}{\alpha_x^2} + \frac{(\mathbf{y} \cdot \mv)^2}{\alpha_y^2} + (\ndotm)^2 \right)^2 } $$

Geometric Shadowing

The geometric shadowing term describes the shadowing from the microfacets. This means ideally it should depend on roughness and the microfacet distribution.

Implicit [1]: $$ G_{Implicit}(\lv,\vv,\hv) = (\ndotl)(\ndotv) $$

Neumann [6]: $$ G_{Neumann}(\lv,\vv,\hv) = \frac{ (\ndotl)(\ndotv) }{ \mathrm{max}( \ndotl, \ndotv ) } $$

Cook-Torrance [11]: $$ G_{Cook-Torrance}(\lv,\vv,\hv) = \mathrm{min}\left( 1, \frac{ 2(\ndoth)(\ndotv) }{\vdoth}, \frac{ 2(\ndoth)(\ndotl) }{\vdoth} \right) $$

Kelemen [7]: $$ G_{Kelemen}(\lv,\vv,\hv) = \frac{ (\ndotl)(\ndotv) }{ (\vdoth)^2 } $$

Smith

The following geometric shadowing models use Smith's method[8] for their respective NDF. Smith breaks $G$ into two components: light and view, and uses the same equation for both: $$ G(\lv, \vv, \hv) = G_{1}(\lv) G_{1}(\vv) $$ I will define $G_1$ below for each model and skip duplicating the above equation.

Beckmann [4]: $$ c = \frac{\ndotv}{ \alpha \sqrt{1 - (\ndotv)^2} } $$ $$ G_{Beckmann}(\vv) = \left\{ \begin{array}{l l} \frac{ 3.535 c + 2.181 c^2 }{ 1 + 2.276 c + 2.577 c^2 } & \quad \text{if $c < 1.6$}\\ 1 & \quad \text{if $c \geq 1.6$} \end{array} \right. $$

Blinn-Phong:
The Smith integral has no closed form solution for Blinn-Phong. Walter [4] suggests using the same equation as Beckmann.

GGX [4]: $$ G_{GGX}(\vv) = \frac{ 2 (\ndotv) }{ (\ndotv) + \sqrt{ \alpha^2 + (1 - \alpha^2)(\ndotv)^2 } } $$ This is not the common form but is a simple refactor by multiplying by $\frac{\ndotv}{\ndotv}$.

Schlick-Beckmann:
Schlick [9] approximated the Smith equation for Beckmann. Naty [1] warns that Schlick approximated the wrong version of Smith, so be sure to compare to the Smith version before using. $$ k = \alpha \sqrt{ \frac{2}{\pi} } $$ $$ G_{Schlick}(\vv) = \frac{\ndotv}{(\ndotv)(1 - k) + k } $$

Schlick-GGX:
For UE4, I used the Schlick approximation and matched it to the GGX Smith formulation by remapping $k$ [10]: $$ k = \frac{\alpha}{2} $$

Fresnel

The Fresnel function describes the amount of light that reflects from a mirror surface given its index of refraction. Instead of using IOR we instead use the parameter or $F_0$ which is the reflectance at normal incidence.

None: $$ F_{None}(\mathbf{v}, \mathbf{h}) = F_0 $$

Schlick [9]: $$ F_{Schlick}(\mathbf{v}, \mathbf{h}) = F_0 + (1 - F_0) ( 1 - (\vdoth) )^5 $$

Cook-Torrance [11]: $$ \eta = \frac{ 1 + \sqrt{F_0} }{ 1 - \sqrt{F_0} } $$ $$ c = \vdoth $$ $$ g = \sqrt{ \eta^2 + c^2 - 1 } $$ $$ F_{Cook-Torrance}(\mathbf{v}, \mathbf{h}) = \frac{1}{2} \left( \frac{g - c}{g + c} \right)^2 \left( 1 + \left( \frac{ (g + c)c - 1 }{ (g - c)c+ 1 } \right)^2 \right) $$

Optimize

Be sure to optimize the BRDF shader code as a whole. I choose these forms of the equations to either match the literature or to demonstrate some property. They are not in the optimal form to compute in a pixel shader. For example, grouping Smith GGX with the BRDF denominator we have this: $$ \frac{ G_{GGX}(\lv) G_{GGX}(\vv) }{4(\ndotl)(\ndotv)} $$ In optimized HLSL it looks like this:

float a2 = a*a;
float G_V = NoV + sqrt( (NoV - NoV * a2) * NoV + a2 );
float G_L = NoL + sqrt( (NoL - NoL * a2) * NoL + a2 );
return rcp( G_V * G_L );

If you are using this on an older non-scalar GPU you could vectorize it as well.

References

[1] Hoffman 2013, "Background: Physics and Math of Shading"
[2] Blinn 1977, "Models of light reflection for computer synthesized pictures"
[3] Beckmann 1963, "The scattering of electromagnetic waves from rough surfaces"
[4] Walter et al. 2007, "Microfacet models for refraction through rough surfaces"
[5] Burley 2012, "Physically-Based Shading at Disney"
[6] Neumann et al. 1999, "Compact metallic reflectance models"
[7] Kelemen 2001, "A microfacet based coupled specular-matte brdf model with importance sampling"
[8] Smith 1967, "Geometrical shadowing of a random rough surface"
[9] Schlick 1994, "An Inexpensive BRDF Model for Physically-Based Rendering"
[10] Karis 2013, "Real Shading in Unreal Engine 4"
[11] Cook and Torrance 1982, "A Reflectance Model for Computer Graphics"
[12] Reed 2013, "How Is the NDF Really Defined?"

Epic, SIGGRAPH, etc

2013-07-28T15:52:00.000-05:00

I'm resurrecting this blog from the dead. I'm sorry it's been neglected for a year but I've been busy. If you follow me on twitter (@BrianKaris) then this probably isn't news, but for those that don't here's an update:

A year ago I left Human Head and accepted a position on the rendering team at Epic Games. Since then we made the UE4 Infiltrator demo. I've worked on temporal AA, reflections, shading, materials, and other misc cool stuff for UE4 and games being developed here at Epic. I'm surrounded by a bunch of really smart, talented people, with whom it has been a pleasure to work.

Just this last week I presented in the SIGGRAPH 2013 course: Physically Based Shading in Theory and Practice. If you saw my talk and are interested in the subject but haven't looked at the course notes I highly suggest you follow that link and check them out as well as the other presenter's materials. Like previous years, the talks are only a taste of the content that the course notes cover in detail.

Now with that out of the way, hopefully I can start making some good posts again.

Sparse shadows through tracing

2012-05-14T00:04:00.000-05:00

The system I described last time allowed specular highlights to reach large distances but only requires calculating them on the tiles where they will show up. This is great but it means now we must calculate shadows for these very large distances. Growing the shadow maps to include geometry at a much greater distance is hugely wasteful. Fortunately there is a solution.

Before I get to that though I want to talk about a concept I think is going to be very important for next gen renderers and that is having more than one representation for scene geometry. Matt Swoboda talked about this in his GDC presentation this year [1] and I am in complete agreement with him. We will need geometry in similar formats as we've had in the past for efficient rasterization (vertex buffers, index buffers, displacement maps). This will be used whenever the rays are coherent simply because HW rasterization is much faster than any other algorithm currently for coherent rays. Examples of use are primary rays and shadow rays in the form of shadow maps.

Incoherent rays will be very important for next gen renderers but we need a different representation to efficiently trace rays. Any that support tracing cones will likely be more useful than ones which can only trace rays. Possible representations are signed distance fields [2][1][9], SVOs [3], surfel trees [4], and billboard cloud trees [5][9]. I'll also include screen space representations although these don't store the whole scene. These include mip map chains of min/max depth maps [6], variance depth maps [7] and adaptive transparency screen buffers [8]. Examples of use for these trace friendly data structures are indirect diffuse (radiosity), indirect specular (reflections) and sparse shadowing of direct specular. The last one is what helps with our current issue.

The Samaritan demo[9] from Epic had a very similar issue that they solved in the same way I am suggesting. They had many point lights which generated specular highlights at any distance. To shadow them they did a cone trace in the direction of the reflection vector against a signed distance field that was stored in a volume texture. This was already being done for other reflections so using that data to shadow the point lights doesn’t come at much cost. The signed distance field data structure could be swapped with any of the others I listed. What is important is that the shadowing is calculated with a cone trace.

What I propose as the solution to our problem is to use traditional shadow maps only within the diffuse radius. Do a cone trace down the reflection vector. The cone trace will return a visibility function that any specular outside the range of a shadow map can cheaply use to shadow.

Actually, having shadowing data independent from the lights means it can be used for culling as well. The max unoccluded ray distance can be accumulated per tile which puts a cap on the culling cone for light sources. I anticipate this form of occlusion culling will actually be a very significant optimization.

This shadowing piece of the puzzle means the changes I suggested in my last post, in theory, come at a fairly low cost assuming you already do cone tracing for indirect specular. That may seem like a large assumption but to demonstrate how practical cone tracing is, a very simple, approximate form of cone tracing can be done purely against the depth buffer. This is what I do with screen space reflections on current gen hardware. I don’t do cone tracing exactly but instead reduce the trace distance with low glossiness and fade out the samples at the end of the trace. This acts like occlusion coverage fades by the radius of the cone at the point of impact which is a visually acceptable approximation. In other words the crudest form of cone tracing can already be done in current gen. It is fairly straightforward to extend this to true cone tracing on faster hardware using one of the screen space methods I listed. Replacing screen space with global is much more complex but doable.

The result is hopefully point light specularity “just works”. The problem is then shifted to determining which lights in the world to attempt to draw. Considering we have >10000 in one map in Prey 2 this may not be easy :). Honestly I haven’t thought about how to solve this yet.

I, like everyone else who has talked about tiled light culling, am leaving out an important part which is how to efficiently meld shadow maps and tiled culling for the diffuse portion. I will be covering ideas on how to handle that next time.

Finally, I want to reach out to all that have read these posts that if you have an idea on how the cone based culling can be adapted to a blinn distribution please let me know.

[1] http://directtovideo.wordpress.com/2012/03/15/get-my-slides-from-gdc2012/
[2] http://iquilezles.org/www/material/nvscene2008/rwwtt.pdf
[3] http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf
[4] http://www.mpi-inf.mpg.de/~ritschel/Papers/Microrendering.pdf
[5] http://graphics.cs.yale.edu/julie/pubs/bc03.pdf
[6] http://www.drobot.org/pub/M_Drobot_Programming_Quadtree%20Displacement%20Mapping.pdf
[7] http://www.punkuser.net/vsm/vsm_paper.pdf
[8] http://software.intel.com/en-us/articles/adaptive-transparency/
[9] http://www.nvidia.com/content/PDF/GDC2011/GDC2011EpicNVIDIAComposite.pdf

Tiled Light Culling

2012-04-29T20:29:00.000-05:00

First off I'm sorry that I haven't updated this blog in so long. Much of what I have wanted to talk about on this blog, but couldn't, was going to be covered in my GDC talk but that was cancelled due to forces outside my control. If you follow me on twitter (@BrianKaris) you probably heard all about it. My comments were picked up by the press and quoted in every story about Prey 2 since. That was not my intention but oh, well. So, I will go back to what I was doing which is to talk here about things I am not directly working on.

Tiled lighting

There has been a lot of talk and excitement recently concerning tiled deferred [1][2] and tiled forward [3] rendering.

I’d like to talk about an idea I’ve had on how to do tile culled lighting a little differently.

The core behind either tiled forward or tiled deferred is to cull lights per tile. In other words for each tile, calculate which of the lights on screen affect it. The base level of culling is done by calculating a min and max depth for the tile and using this to construct a frustum. This frustum is intersected with a sphere from the light to determine which lights hit solid geometry in that tile. More complex culling can be done in addition to this such as back faced culling using a normal cone.

This very basic level of culling, sphere vs frustum, only works with the addition of an artificial construct which is the radius of the light. Physically correct light falloff is inverse squared.

Light falloff

Small tangent I've been meaning to talk about for a while. To calculate the correct falloff from a sphere or disk light you should use these two equations [4]:

Falloff:
$$Sphere = \frac{r^2}{d^2}$$
$$Disk = \frac{r^2}{r^2+d^2}$$

If you are dealing with light values in lumens you can replace the r^2 factor with 1. For a sphere light this gives you 1/d^2 which is what you expected. The reason I bring this up is I found it very helpful in understanding why the radiance appears to approach infinity when the distance to the light approaches zero. Put a light bulb on the ground and this obviously isn’t true. The truth from the above equation is the falloff approaches 1 when the distance to the sphere approaches zero. This gets hidden when the units change from lux to lumens and the surface area gets factored out. The moral of the story is don’t allow surfaces to penetrate the shape of a light because the math will not be correct anymore.

Culling inverse squared falloff

Back to tiled culling. Inverse squared falloff means there is no distance in which the light contributes zero illumination. This is very inconvenient for a game world filled with lights. Two possibilities, first is to subtract a constant term from the falloff but max with 0. The second is windowing the falloff with something like (1-d^2/a^2)^2. The first loses energy over the entire influence of the light. The second loses energy only away from the source. I should note the tolerance should be proportional to the lights intensity. For simplicity I will use the following for this post:
$$Falloff = max( 0, \frac{1}{d^2}-tolerance)$$

The distance cutoff can be thought of as an error tolerance per light. Unfortunately glossy specular doesn’t work well in this framework at all. The intensity of a glossy, energy conserving specular highlight, even for a dielectric, will be WAY higher than the lambert diffuse. This spoils that idea of the distance falloff working as an error tolerance for both diffuse and specular because they are at completely different scales. In other words, for glossy specular, the distance will have to be very large for even a moderate tolerance, compared to diffuse.

This points to there being two different tolerances, one for diffuse the other for specular. If these both just affect the radius of influence we might as well just set the radius of both as the maximum because diffuse doesn’t take anything more to calculate than specular. Fortunately, maximum intensity of the specular inversely scales with the size of the highlight. This of course is the entire point of energy conservation but energy conservation helps us in culling. The higher the gloss, the larger the radius of influence the tighter the cone of influencing normals.

If it isn’t clear what I mean, think of a chrome ball. With a mirror finish, a light source, even as dim as a candle, is visible at really large distances. The important area on the ball is very small, just the size of the candle flame’s reflection. The less glossy the ball, the less distance the light source is visible but the more area on the ball the specular highlight covers.

Before we can cull using this information we need specular to go to zero past a tolerance just like distance falloff. The easiest is to subtract the tolerance from the specular distribution and max it with zero. For simplicity I will use phong for this post:
$$Phong = max( 0, \frac{n+2}{2}dot(L,R)^n-tolerance)$$

Specular cone culling

This nicely maps to a cone of L vectors per pixel that will give a non-zero specular highlight.

Cone axis:
$$R = 2 N dot( N, V ) - V$$

Cone angle:
$$Angle = acos \left( \sqrt[n]{\frac{2 tolerance}{n+2}} \right)$$

Just like how a normal cone can be generated for the means of back face culling, these specular cones can be unioned for the tile and used to cull. We can now cull specular on a per tile basis which is what is exciting about tiled light culling.

I should mention the two culling factors need to actually be combined for specular. The sphere for falloff culling needs to expand based on gloss. The (n+2)/2 should be rolled into the distance falloff which leaves angle as just acos(tolerance^(1/n)). I’ve leave these details as an exercise for the reader. Now, to be clear I'm not advocating having diffuse and specular light lists. I'm suggesting culling the light if diffuse is below tolerance AND spec is below tolerance.

This leaves us with a scheme much like biased importance sampling. I haven’t tried this so I can’t comment on how practical it is but it has the potential to produce much more lively reflective surfaces due to having more specular highlights for minimal increase in cost. It also is nice to know your image is off by a known error tolerance from ground truth (per light in respect to shading).

The way I handle this light falloff business for current gen in P2 is by having all lighting beyond the artist set bounds of the deferred light get precalculated. For diffuse falloff I take what was truncated from the deferred light and add it to the lightmap (and SH probes). For specular I add it to the environment map. This means I can maintain the inverse squared light falloff and not lose any energy. I just split it into runtime and precalculated portions. Probably most important, light sources that are distant still show up in glossy reflections. This new culling idea may get that without the slop that comes from baking it into fixed representations.

I intended to also talk about how to add shadows but this is getting long. I'll save it for the next post.

References:
[1] http://visual-computing.intel-research.net/art/publications/deferred_rendering/
[2] http://www.slideshare.net/DICEStudio/spubased-deferred-shading-in-battlefield-3-for-playstation-3
[3] http://aras-p.info/blog/2012/03/27/tiled-forward-shading-links/
[4] http://www.iquilezles.org/www/articles/sphereao/sphereao.htm

New Prey 2 screenshot

2011-08-10T00:16:00.000-05:00

It has been a long time since I updated this blog with substantial content but I wanted to point out that Bethesda just released this new screenshot of Prey 2. It's a great shot but it's also a fantastic demonstration of some new graphical features I added to our latest build of the game.

First, there's the depth of field in the background which is HDR circular bokeh DOF.

Secondly, in the puddles on the ground, you will see screen space reflections. They aren't planer reflections, they work on every surface and run on every platform. SSR really adds a ton of dimension and accuracy to our wet, metal filled, alien noir city. I can't talk yet about how it works unfortunately.

So, check it out and tell me what you think. Hopefully in not too long I can start talking about how some of the tech works but for now you just get a glimpse.

-Brian

Virtualized volume textures

2011-01-30T17:42:00.000-06:00

First off it's been a very long time since I made a post. Sorry about that. I've found it difficult to come up with subjects to discuss that I both know enough about and am allowed to publicly talk about. For many, personal hobby projects can be the source of subjects to write about but all the at home pet project stuff I do, I do in the HH codebase and check in if it is a success. Personally I find this more rewarding than the alternative because it can go into a commercial product that hopefully will be seen by millions as well as it can get artist love which is very hard to get with hobby projects. The two biggest downsides are that I no longer own work I do in my free time and I can't easily talk about it. Now on to a technique that fits the bill as I have no particular commercial use for at the moment.

Irradiance volumes using volume textures is a technique that has been getting some use lately. Check out the following for some places I've noticed it.
Split/Second
Cryengine 3 (Cascaded Light Propagation Volumes)
FEAR 2
Rust

Probably the biggest downside to volumes vs more traditional lightmaps is the resolution. Volume textures take up quite a bit of memory so they need to be fairly low resolution. Unfortunately much of this data is covering empty space. It's convenient for empty space to have some data coverage so that the same solution can be used for dynamic objects but the same resolution is certainly not needed. How would you store a different resolution of volume data on world geometry than in empty space?

The most straight forward solution to me is an indirection texture. Interestingly what this turns into is virtualizing the volume texture just like you would a 2d texture. That indirection volume texture acts like the page table to your physical texture. Each page table texel translates the XYZ coordinates into a page or brick in the physical volume texture and subsequently the UVW coordinates in the physical volume texture. If you need a refresher on virtual texturing check out Sean Barrett's and id's presentations on the topic. All the same limitations apply in 3d as they do in 2d. Pages will need borders to have proper filtering. The smaller the page size the more waste due to borders and the larger the page table gets.

Another way of thinking of this is as a sparse voxel octree. Instead of the page table being managed like a quadtree it would work like an octree. Typically this data structure is thought of only for ray casting but there's nothing inherent about it that requires that. SVO's have also only been stored as trees, requiring traversal to get to the leaves. So long as you have bricks and aren't working with the granularity of individual voxels the traversal can be changed into a single texture look up just like the quadtree traversal in a 2d virtual texture.

Thinking about it as a SVO is helpful because volume data usually has more sparseness than 2d textures. In this case we don't really care about bricks where there is no geometry. If you use a screen read back to determine which pages to load this will happen automatically. Better yet you don't need to even store this data on disk. Better than that you don't need to generate that data in the first place during the offline baking process. Don't worry about dynamic objects. There is still data covering the empty space, it is just at the resolution of one page covering the world. If you need it at a higher resolution than that you can force a minimum depth to the octree.

In the end it still likely can't compete with the resolution a lightmap can get you if high res is what you are looking for. If lower res is the target it probably will be quite a bit more memory efficient because of not having any waste due to 2d parameterization. As for what a good page size would be I'm not sure as I haven't implemented any of this. If I did I probably couldn't talk about it yet. If anyone does implement this I'd love to hear about it.

RGBD

2010-01-03T15:20:00.002-06:00

This entire post turned out to be hogwash. I'm wiping it out to prevent the spread of misinformation. If you are interested in why it's all nonsense see my comment below. Thanks to Sean Barrett for pointing it out. The result of all of this has been positive ignoring the part of me looking extremely foolish. RGBM is more useful than I originally claimed because a larger scale factor can be used or if a fairly small range is required the gamma correction is likely not needed.

UDK

2009-11-06T00:11:00.002-06:00

Epic released UDK today. It is the next step in their plan to completely dominate the game engine licensing market place. They've been doing a pretty good job of that so far. To all the other companies trying to compete I'd suggest following in their footsteps. The reason I believe they are so successful is because they do everything they can to get the engine out there and in peoples hands. You can't be scared of people looking at or stealing your stuff. If devs can't look at it they certainly won't plunk down major money to buy it. Epic has been powering their freight train of engine licensing with visibility of their product. With UDK now they have a brand new group of students and indie developers who are going to be familiar with unreal engine as well as make the barrier to evaluate the engine for commercial purposes practically nothing. You get to see all the tools and a major portion of documentation without even having to talk to someone at Epic. You can see engine updates whenever they release a new build. If you want to know more they will send you more than enough stuff to make up your mind in the form of an evaluation version. Compare that to most of the other engine providers. Many will never let you see any code and require all sorts of paper work to see private viewings of just an engine demonstration. As expected they are also not getting very many licensees. Stop being so paranoid and let people see your stuff.

On to the tech stuff. I haven't spent a ton of time looking at it yet but I've noticed some new things already. Of coarse there's the obvious stuff like Lightmass that they've talked all about. There's docs on all that which is great. I'm going to talk about what they aren't talking about. First they changed the way they encode their lightmaps. Previously it was 3 DXT1's that stored the 3 colors of incoming light in the 3 HL2 basis directions. They still use the same concept now but instead only have 2 DXT1's. The first stores the incoming light intensity in each of the 3 directions as RGB. The second is the average color of all 3 directions. The loss in quality may be small as the error is mainly bumps picking up different colors of light. The memory is 2/3's of what it was so it seems like a smart optimization.

What impressed me though was their signed distance field baked shadows. Previously they stored baked shadow masks as G8 or 8 bit luminance format textures. This was used simply as a mask to the light. Now they are computing a signed distance field like this paper. It's still stored in a G8 format texture so there's no difference in storage. The big difference comes from the sharpness and quality from a relatively low res texture. The same smooth lines that are useful for vector graphics for Valve's use also work well for shadows. After I read this paper when it came out I thought the exact same thing. This is perfect for baked shadow textures! I implemented it then but I was trying to have another value in the texture to specify how far the occluder was so I could control the width of the soft edge. The signed distance field could be in the green channel of a DXT1 and the softness could be in red. If that wasn't good enough softness in green and distance field in alpha of a DXT5. The prototype was scraped after it didn't really give me what I needed and was being replaced with a different shadow system anyways. I am really impressed with Epic's results though. For a fairly large map they get good quality sun shadows from only 4 1024x1024 G8 textures. That's only 4 mb of shadow data. I bet it would still work well from DXT1's too with the data in the green channel. That would bring it down to 2 mb. The only down side is it is unable to represent soft shadows unless someone can get my encoding softness as an extra value idea to work which would be cool. Apparently I should have given that experiment more consideration.

Uncharted 2

2009-10-01T00:27:00.007-05:00

We are hiring!
First up, Human Head Studios, where I work, is hiring for basically every department. We are currently staffing up and are looking for talented people. I can't tell you what we are working on but I can say it's awesome and we are doing some very interesting and innovative things on many fronts including technology. The tech department where I am, is developing our own cutting edge tech that I intend to be competitive with all the games I talk about on this blog. I really wish I could be more specific but alas, I cannot. Not yet at least. I'm trying to whet your appetite without getting in trouble if you haven't noticed.

Of most interest to the common visitor of this blog, we are looking for tech programmers. Of most interest to me we are looking for a graphics guy that I will work with on aforementioned secret, awesome, graphics tech. If you understand all of what I talk about here and live and breathe graphics you may be the perfect fit. For submission details see here. Mention my name and you might filter up in the pile. Being a reader of my blog has to count for something right?

Uncharted 2
On to the main event. The Uncharted 2 multiplayer demo came out yesterday and I suggest you check it out. The first game has been a benchmark for graphics on the ps3 and I'm sure when number two comes out it will set the bar again. Although all you can see at the moment are some of the mp maps it is very pretty and much can be discerned from dissecting it.

There are many options in the demo that offer an ability to study things. You can start games with just yourself. There is a machinima mode that I haven't fully figured out yet but it allows you to fool with more than in a normal game. You can also replay a play through using the cinema mode. That allows you to pause, single step forward, fly around and even change post processing and lighting a bit.

Right out of the gate I noticed their nice DOF blur. It's one of the best I've seen. It looks like there is both a small blur and a large blur that are lerped between to simulate a continuous blur radius. It looks better than gaussian but I could be imagining things. It is definitely done in HDR as bright things dominate in the blur.

Speaking of blur they now have object motion blur. In the first game there was motion blur when the camera swung around quickly but it only happened in the distance. It was at low resolution and pretty blurry. I didn't really care for it as any time I moved the camera quickly to look at something it blurred and my eyes lost focus on what I was trying to look at. This type of motion blur is still there but doesn't seem to bother me as much. It's likely less blurry than before but I can't say for sure. It is done in HDR though. In addition to the camera motion blur objects have there own geometry that draws to blur them. This is the same thing Tomb Raider Underworld did for character motion blur. U2 uses it for objects as well as characters like the bus that drives by in one of the videos. The blur trails behind and can look weird in certain situations when paused but while playing the game these oddities aren't noticeable and it looks pretty nice.

Glare or bloom seemed to be map specific. In the snow map the sky was bright enough to bloom everywhere. In other maps light sources that where bright enough to go almost white but were obviously saturated when blurred didn't bloom at all. The control to change the sun intensity could be jacked up to the point of being completely blown out but with no hint of bloom. On that note, they are correctly tone mapping and not clamping off bright colors, something so many games are getting wrong. Please, people. Use a tone curve that doesn't just clamp off bright colors, as in not linear. Bloom all you want but without a nice response curve your brights will look terrible.

In regards to lighting the only dynamic shadow in their maps was the sun. I don't know whether that is the same in single player or whether it's mp only. There were other dynamic light sources but they didn't cast dynamic shadows. Strangely they did have precomputed shadows that looked like it was the light being baked into a lightmap. I'm not sure what was happening here as the machinima playground map showed evidence of the the artifact from storing precomputed lighting at the verts which is what they did in U1. It's possible that some objects had lightmaps and others were vertex lit. Another possibility is the lightmap looking shadows were not in the baked lighting but were more like light masks. It's hard to say from what I saw. There were also other lights that seemed to be completely baked and not influence characters.

I don't remember from U1 but in this demo the ambient character lighting doesn't seem to change much or be influenced by local features. There seemed to be a warm light direction always there in the night time city map that should be a very cool ambient practically everywhere. I'm guessing the ambient lighting is artist set up and isn't based on sampling the environment.

They are still using a lot of vertex color blending of textures. I think this is done in the shader where it lerps between 2 sets of textures based on a vertex value and a heightmap texture. The heightmap corresponds with one or both of the texture sets and the vertex value is like an alpha test. This hides the typical smooth vertex colored blending and matches the material that is fading. Think of brick and mortar to get the idea. Brick is one material, concrete the other. The heightmap is based off of the brick so as it transitions the mortar starts to dominate and cover the brick, masking the gradient of the vertex values until it's all concrete. In U1 it was very much like an alphatest because it was a hard transition. For U2 these transitions can be smooth. The vertex value is likely a separate vertex stream so that different variations can be used with a single model instance. A material set up in this fashion can be used in a variety of places and with a simple bit of vertex coloring the geometry seems to be uniquely textured.

Here's just a list of some other miscellaneous things I noticed:

SSAO darkens ambient term. Nice addition, helps ground the characters and lessens the pressure on detailed baked lighting. It's much more subtly done than Gears which tended to look very constrasty.

Shadowmap sampling pattern changed to be a bit smoother. Not a big change. There is major shadow acne on characters at times which is not cool.

Particle systems with higher fill expense fade out when you get close to them. There were some nice clouds of dust in a few places around the maps that are gone when you get up to them.

Last shadowmap cascade (3rd?) fades out to no shadow in the distance. This is mostly not noticeable but it can result in the inside of buildings when outdoors turning bright in the distance.

They still have low res translucent drawing although I don't think it's frame rate dependent anymore. I saw low and normal res translucent things at the same time. The odd thing was the low res wasn't bilinearly filtered but nearest. I have no idea why they would do that.

The snow particles stretch with camera movement to fake motion blur. Cool trick.

I can't wait to play the full game. I loved the first and this is looking to be even better. As I said before this will be the new bar so you should definitely check it out for your self.

RGBM color encoding

2009-04-28T01:38:00.012-05:00

LogLUV
There has been some talk about using LogLUV encoding to store HDR colors in 32 bits by packing it all into a RGBA_8888 target. I won't go into the details as a better explanation than I could give is here. This encoding has the benefit over standard floating point buffers of reducing ROP bandwidth and storage space at the expense of some shader instructions for both encoding and decoding. It has been used in shipped games. Heavenly Sword used it to achieve 4xaa with HDR on a PS3. Uncharted used it for the 2xaa output from their material pass. The code for encoding and decoding is as follows (copied from previous link):

// M matrix, for encoding
const static float3x3 M = float3x3(
  0.2209, 0.3390, 0.4184,
  0.1138, 0.6780, 0.7319,
  0.0102, 0.1130, 0.2969);

// Inverse M matrix, for decoding
const static float3x3 InverseM = float3x3(
  6.0014, -2.7008, -1.7996,
  -1.3320,  3.1029, -5.7721,
  0.3008, -1.0882,  5.6268);

float4 LogLuvEncode(in float3 vRGB)  {
  float4 vResult;
  float3 Xp_Y_XYZp = mul(vRGB, M);
  Xp_Y_XYZp = max(Xp_Y_XYZp, float3(1e-6, 1e-6, 1e-6));
  vResult.xy = Xp_Y_XYZp.xy / Xp_Y_XYZp.z;
  float Le = 2 * log2(Xp_Y_XYZp.y) + 127;
  vResult.w = frac(Le);
  vResult.z = (Le - (floor(vResult.w*255.0f))/255.0f)/255.0f;
  return vResult;
}

float3 LogLuvDecode(in float4 vLogLuv) {
  float Le = vLogLuv.z * 255 + vLogLuv.w;
  float3 Xp_Y_XYZp;
  Xp_Y_XYZp.y = exp2((Le - 127) / 2);
  Xp_Y_XYZp.z = Xp_Y_XYZp.y / vLogLuv.y;
  Xp_Y_XYZp.x = vLogLuv.x * Xp_Y_XYZp.z;
  float3 vRGB = mul(Xp_Y_XYZp, InverseM);
  return max(vRGB, 0);
}

RGBM
There is a different encoding which I prefer. I don't know whether anyone else is using this but I imagine someone is. The idea is to use RGB and a multiplier in alpha. This is often used as a HDR format for textures. I prefer it over RGBE for HDR textures. Here's some info related even though 2 DXT5's is a bit overkill for most applications. Here's some slides from Lost Planet on storing lightmaps as DXT5's. Both LogLUV and these texture encodings are about storing the luminance information separately with a higher precision. This is a standard color compression thing which becomes even more powerful when dealing with HDR data. What at first doesn't make sense is if RGBM is stored in a RGBA_8888 there is no increase in precision by placing luminance in the alpha over having it stored with RGB. The thing is luminance isn't only in alpha. What is essentially stored in alpha is a range value. The remainder of the luminance is stored with the chrominance in rgb. The code is really simple to do this encoding:

float4 RGBMEncode( float3 color ) {
  float4 rgbm;
  color *= 1.0 / 6.0;
  rgbm.a = saturate( max( max( color.r, color.g ), max( color.b, 1e-6 ) ) );
  rgbm.a = ceil( rgbm.a * 255.0 ) / 255.0;
  rgbm.rgb = color / rgbm.a;
  return rgbm;
}

float3 RGBMDecode( float4 rgbm ) {
  return 6.0 * rgbm.rgb * rgbm.a;
}

I should also note that it is best to convert the colors from linear to gamma space before encoding. If you plan to use them again in linear a simple additional sqrt and square will work fine for encoding and decoding respectively. The constant 6 gives a range in linear space of 51.5. Sure it's no 1.84e19 of LogLUV but honestly did you really need that? 51.5 should be plenty so long as exposure has already been factored in. This constant can be changed to fit your tastes. Those 3 max's can be replaced with a max4 on the 360 if the compiler is smart enough. I haven't looked to see if it does this. Also the epsilon value to prevent dividing by zero I haven't found necessary in practice. The hardware must output black in the event of denormals which is the same as handling it correctly. I haven't tried it on a large range of hardware so beware if you remove it.

There are some major advantages of RGBM over LogLUV. First off is the cost of encoding and decoding. There is no need for matrix multiplies or logs and exp. Especially of note is how cheap the decoding is. It behaves very well in filtering so you can still use the 4 samples in 1, bilinear trick for downsizing. This isn't technically correct but the difference is negligible.

As far as quality I can't see any banding even in dark stress test cases on a fancy monitor after I've turned all the lights off. It also unsurprisingly handles very bright and saturated colors with the same level of quality. I found no discernible differences in my testing versus LogLUV. I don't have any sort of data on what amount of error it has or whether it covers whatever color space. What I can tell you is that it handles my HDR needs perfectly.

Storing your colors encoded means you cannot do any blending into the buffer. This rules out multi-pass additive lighting and transparency. You will have to use another buffer for transparent things such as particles. This is also a good time to try a downsized buffer since you need a separate one anyways. Now a transparency buffer can store additive, alpha blended and multiply type transparency but only grayscale multiplies since they are going into the alpha channel. Multiply decals can be very useful in adding surface variation while still having tiling textures underneath. These often use color to tint the underlying surface and need to be at full res.

Now for the cool part. Because what is stored in RGB is basically still a color, you can apply multiply blending straight into a buffer stored as RGBM. Multiplying will never increase the range required to store the colors so this is a non destructive operation. In practice I have seen no perceivable precision problems crop up due to this. It is also mathematically correct so there are no worries as to whether it will get weird results.

Killzone 2

2009-02-25T18:24:00.000-06:00

I got a bogus DMCA notice on this post. Google took it down and now I'm putting it back up. I just finished Killzone 2 and it really is graphically impressive. If you are reading this blog then you are interested in graphics which means you owe it to yourself to play this game. The other levels in the game I think are actually more impressive than the one in the demo. The level in the demo was pretty geometrically simple. Lots of boxy bsp brush looking shapes. The later levels are a lot more complex. In particular the sand level was very pretty.

Level Construction
There didn't seem to be much high poly mesh rendered to a normal map looking stuff. Most everything was made from texture tiles and heightmap generated normalmaps. Most of the textures are fairly desaturated to the point of being likely grayscale with most of the color coming from the lighting and post processing. This is something we did quite a bit in Prey and is something we are trying to change. You may notice the post changing when you walk through some door ways. The most likely candidates are doors from inside to outside.

FX
Their biggest triumph I think is in the fx and atmospherics. There is a ridiculous number of particles. The explosions are some of the best I've seen in a game. There is a lot of dust from bullet impacts, foot falls, wind, explosions. There's smoke coming from explosions, world fires, rocket trails. Each bullet impact also causes a spray of trailed sparks that collide with the world and bounce. Particles are not the only thing contributing. There are also a lot of tricks with sheets and moving textures. For the dust blowing in the wind effect there is a distinct shell above the ground with a scrolling texture plus lots of particles. The common trick with sheets is fading them out when they get edge on and when you get close to them. Add soft z clipping and a flat sheet can look very volumetric. There is also a lot of light shafts done with sheets. One of these situations you can see in the demo. All of this results in a huge amount of overdraw. It has already been pointed out that they are using downsized drawing. This looks to be 1/4 the screen dimensions (1/16 the fill). This is bilinearly upsampled to apply it to the screen opposed to using the msaa trick and drawing straight in. Having the filtering makes it look quite a bit better. It looks like it averages about 10% of the GPU frame time. That would mean they didn't need to sacrifice much to get these kind of effects.

Shadows
All the shadows are dynamic shadow maps. Sunlight is cascaded shadow maps with each level at 512x512. Omni lights use cube shadow maps. They are drawing the back faces to the shadow map to reduce aliasing. Some of the shadow maps can be pretty low resolution. This isn't as bad as Dead Space because they have really nice filtering. This is likely because the rsx has shadow map bilinear pcf for free. I can't tell exactly what the sample pattern is but it looks to alternate. They have stated there is up to 12 samples per pixel. There is a really large number of lights casting dynamic shadows at a time. Even muzzle flashes cast shadows. Lightning flashes cast shadows. At a distance the shadows fade out but the distance is pretty far. To be expected their triangle counts were evenly split between screen rendering and shadow map rendering at about 200k-400k. They should be able to get away with a lot more than that amount of tris.

Lighting
I think this is the first game to really milk deferred lighting for what its worth. There are a ton of lights. The good guys have like 3 small lights on each one of them. That doesn't include muzzle flashes. The bad guys are defined by the red glowing eyes. These have a small light attached to them so the glowing eyes actually light the guys and anything they are close to. In the factory level you can see 230 lights on screen at once. I'm curious if all of these are drawn or if a good fraction is faded out. If there aren't any faded that is insane. 200 draw calls just in lights and that doesn't count stencil drawing that can happen before. Their draw counts seem to always be below 1000 so this is not likely the case.

Post processing
A fair amount of their screen post processing is done on SPU's. As far as I know this is a first. The DOF has a variable blur size. This is most easily visible when going back and forth to the menu. There is motion blur on everything but the blur distance is clamped to very small.

Misc
Environment maps are used on many surfaces. They are mostly crisp to show sharp reflections. I didn't see any situation where they were locational. They are instead tied to the material.

Another neat effect was the water from the little streams. This wasn't actually clipping with the ground or another piece of geometry at all. It is merely in the ground material and it masked to where it should be. The plane moves up and down by changing what range of a heightmap to mask to.

Their profiler says they are spending up to 30% of an SPU on scene portals. I assumed this meant area / portal visibility. In the demo this made sense. After playing it all it no longer makes sense. There are many areas in the game that are just not portalable. I'm not sure what that could mean anymore. They could use it as a component of visibility and the other component is not on the SPUs. In that case I am curious what they used for visibility.

The texture memory amount stayed constant. This must mean that they are not doing any texture streaming.

They have the player character cast shadows but you can not see his model. I found this to be kind of strange especially when you can see the shadows at the feet z fighting with the ground but no feet that would have conveniently hid the problem. It's expensive to get the camera in the head thing to work really well so I understand why they didn't wish to do it but personally I would have gone with both or nothing concerning the players shadow. BTW, why is the player like a foot and a half shorter than everyone else?

For more killzone info:
Deferred lighting
Profiling numbers

It isn't quite to the level of the original prerendered footage but honestly who expected it to be? It is a damn good effort from the folks at Guerrilla. I look forward to their presentation at GDC next week. This is the first year since I've been doing this professionally that I am not going to GDC. I'll have to try and get what I can from the powerpoints and audio recordings. You are all posting your slides right? Wink, wink.

Virtual Geometry Images

2009-01-10T20:07:00.003-06:00

Geometry images are one of those ideas so simple you ask yourself "Why didn't I think of this?" I'll admit it isn't the topic of much discussion concerning the "more geometry" problem for the next generation. They work great for compression but they don't inherently solve any of the other problems. Multi-chart geometry images have a complicated zipping procedure that is also invalid if a part is rendered at a different resolution.

A year ago when I was researching a solution to "more geometry" on DirectX 9 level hardware I came across this paper that was in line with the direction I was thinking. The idea is an extension to virtual textures by having another layer with the textures that is a geometry image. For every texture page that is brought in there is a geometry image page with it. By decomposing the scene into a seemless texture atlas you are also doing a Reyes like split operation. The splitting is a preprocess and the dice is real time. The paper also explains an elegant seem solution.

My plan on how to get this running really fast was to use instancing. With virtual textures every page is the same size. This simplifies many things. The way detail is controlled is similar to a quad tree. The same size pages just cover less of the surface and there are more of them. If we mirror this with geometry images every time we wish to use this patch of geometry it will be a fixed size grid of quads. This works perfectly with instancing if the actual position data is fetched from a texture like geometry images imply. The geometry you are instancing then is grid of quads with the vertex data being only texture coordinates from 0 to 1. The per instance data is passed in with a stream and the appropriate frequency divider. This passes data such as patch world space position, patch texture position and scale, edge tessellation amount, etc.

If patch tessellation is tied to the texture resolution this provides the benefit that no page table needs to be maintained for the textures. This does mean that there may be a high amount of tessellation in a flat area merely because texture resolution was required. Textures and geometry can be at a different resolution but still be tied such as the texture is 2x the size as the geometry image. This doesn't affect the system really.

If the performance is there to have the two at the same resolution a new trick becomes available. Vertex density will match pixel density so all pixel work can be pushed to the vertex shader. This gets around the quad problem with tiny triangles. If you aren't familiar with this, all pixel processing on modern GPU's gets grouped into 2x2 quads. Unused pixels in the quad get processed anyways and thrown out. This means if you have many pixel size triangles your pixel performance will approach 1/4 the speed. If the processing is done in the vertex shader instead this problem goes away. At this point the pipeline is looking similar to Reyes.

If this is not a possibility for performance reasons, and it's likely not, the geometry patches and the texture can be untied. This allows the geometry to tessellate in detailed areas and not in flat areas. The texture page table will need to come back though which is unfortunate.

Geometry images were first designed for compression so disk space should be a pretty easy problem. One issue though is edge pixels. Between each page the edge pixels need to be exact otherwise there will be cracks. This can be handled by losslessly compressing just the edge and using normal lossy image compression for the interiors. As the patches mip down they will be using shared data from disk so this shouldn't be an issue. It should be stored uncompressed in memory thought or the crack problem will return.

Unfortunately vertex texture fetch performance, at least on current console hardware, is very slow. There is a high amount of latency. Triangles are not processed in parallel either. With DirectX 11 tessellators it sounds like they will be processed in parallel. I do not know whether vertex texture fetch will be up to the speed of a pixel texture fetch. I would sure hope so. I need to read specs for both the API and this new hardware before I can postulate on how exactly this scheme can be done with tessellators instead of instanced patches but I think it will work nicely. I also have to give the disclaimer that I have not implemented this. The performance and details of the implementation are not yet known because I haven't done it.

To compare this scheme with the others it has some advantages. Given that it is still triangle rasterization dynamic objects are not a problem. To make this work with animated meshes it will probably need bone indexes and weights stored in a texture along with the position. This can be contained to an animation only geometry pool. It doesn't have the advantage subd meshes have that you can animate just the control points. This advantage may not work that well anyways because you need a fine grained cage to get good animation control which increases patch number, draw count, and tessellation of the lowest detail LOD (the cage itself).

It's ability to LOD is better than subd meshes but not as good as voxels. The reason for this is the charts a model has to be split up into are usually quite a bit bigger than the patches of a subd mesh. This really depends on how intricate the mesh is though. It scales the same subd meshes do but just with a different multiplier. Things like terrain will work very well. Things like foliage work terribly.

Tools side, anything can be converted into this format. Writing the tool unfortunately looks very complicated. This primarily lies with the texture parametrization required to build the seemless texture atlas. After UV's are calculated the rest should be pretty straight forward.

I do like this format better than subd meshes with displacement maps but it's still not ideal. Tiny triangles start to lose the benefits of rasterization. There will be overdraw and triangles missing the center of pixels. More important I think is that it doesn't handle all geometry well, so it doesn't give the advantage of telling your artists they can make any model and place it however they want and it will have no impact on the performance of the game. Once they start making trees or fences you might as well go back to how they used to work because this scheme will run even slower than the old way. The same can be said for subd meshes btw.

To sum it up I think it's a pretty cool system but it's not perfect and doesn't solve all the problems.

More Geometry

2009-01-08T22:40:00.013-06:00

There has been a lot of talk lately about the next generation of hardware and how we are going to render things. The primary topic seems to be "How are we going to render a lot more geometry than we can now?" There are two approaches that are getting a lot of attention. The first is subdivision surfaces with displacement maps and the second is voxel ray casting.

Here are some others that are getting a bit of attention.

Ray casting against triangles
Intel's ray tracer
Cuda ray tracer

Point splatting
Far Voxels
QSplat
Atom

Progressive meshes
Progressive Buffers
View dependent progressive mesh on the GPU

Progressive Buffers is one of my favorite papers. It's one that I keep coming back to time and time again.

Otoy
interview

Who knows exactly what is going on in Otoy. It almost seems like they are being deliberately confusing. It's for a game engine, it's for movie CG, it's lightstage (which I thought was a non-commercial product), it's a server side renderer, it's a web 3d viewer / streaming video.

What I have gathered it generates an unstructured point cloud on the gpu and creates a point hierarchy. It uses this for ray tracing not just eye rays but shadows and reflections. The reflections are massively cached. It's not clear how. I can't figure out how this is working with full animations like they have in their videos. Either that would require regenerating the points, which makes ray casting into it kind of pointless, or it has to deal with holes. Whatever they are doing the results are very impressive.

Subdivision surfaces with displacement maps
This has a lot of powerful people behind it. Both nVidia and AMD are behind it along with DirectX 11 API support through the hull shader, tessellator, and domain shader. I'm not a big fan of this. First off, it's only really useful in data amplification not data reduction. For example our studio and many others are now using the subd cage for the in game models for our characters. That means the the lowest tessellation level the subd surface can get to, the subd cage, is the same poly count as our current near LOD character meshes. It makes subdivision surfaces not useful at all in LODing moderate distances. This can be reduced some but likely not by enough to solve the problem. It looks really complicated to implement and rife with problems. It requires artists input to create models that work well with it. The data imported from the modelling package can be specific to that package. It's hard to beat the generality of a triangle soup.

The plus side is there is no issue with multiple moving meshes or deforming. They are fully animatable. To allow good animation control the tessellation of the cage may need to be higher. In theory every piece of geometry in your current engine could be replaced with subd models with displacement maps and the rest of your pipeline would work exactly the same.

Check out some work in this direction from Michael Bunnell of Fantasy Lab:
GPU Gems 2 chapter
Fantasy Lab

Voxel ray casting
This has been made popular by John Carmack who has described this as his planned approach for idTech6. Considering he's always at the head of the pack this should give it pretty strong backing. John refers to his implementation as a sparse voxel octree (SVO). The idea is to extend his current virtual texturing mip blocks to 3D with a "mip" hierarchy of voxels that will be stored as an octree. The way this is even remotely reasonable is that you only need to store the important data, no data in empty space. This is very different from most scientific and medical applications that require the whole data block. This structure is great for compression. Geometry compression now turns into image compression which is a well studied problem and effective. LODing works from the whole screen bing a single voxel to subpixel detail. To render it every screen pixel casts a ray into the octree until it hits a voxel the size of a pixel or smaller. This means that both rendering is reduced by LODing and memory is reduced. If you don't need the voxel data you don't need it in memory.

I like this approach because it gives one elegant way of handling textures, geometry, streaming, compression and LODing all in one system. There are no demands on the input data. Anything can be converted into voxels. Due to a streaming structure very similar to virtual texturing it allows unique geometry and unique texturing. This means there are no restrictions on the artists. There is little way an artist can impact performance or memory with assets they create. That puts the art and visuals in the artists hands and the technical decisions in the engineers hands.

There are some problems. Ray tracing has always been slow. Ray casting is a search where rasterization is binning. Binning is almost always faster than searching. In a highly parallel environment the synchronization required in binning may tip the scales in searching's favor. As triangles shrink the number of multiple triangles in one bin or bin misses also hurts the speed advantage. It has now been demonstrated to be fast enough to render on current hardware at 60fps. This should be enough proof to let this concern slide a bit. It does mean it will only be able to be rendered once. It's unlikely there will be power left to render a shadow map or ray trace to the light for that matter. My guess is John's intent is to have the lighting fully baked into the texture like how idTech5 works currently. This also cuts down on the required memory as only one color is required per voxel.

Memory is another possible problem. Jon Olick's demo of one character using a SVO required ~1gb of video memory which was not enough to completely hide paging. His plans to decrease this size was entropy encoding which means each child's data is based on its parents data. As far as I'm aware this is only going to work if you use a kd-tree restart traversal which is slower than the other alternatives. Otherwise he would need to evaluate the voxel data for the whole stack once he wishes to draw the pixel.

The most important problem is it doesn't work with dynamic meshes. The scheme I believe John Carmack is planning on using is a static world with baked lighting with all dynamic objects and characters using traditional triangle meshes. I expect this to work out well. The performance of this type of situation has been shown to be there so it's not too risky to pursue this direction. There is something about it that bugs me. You release the constraints of the environmental artists but leave the other artists with the same problems. If you handle texturing inherently with voxels does that mean he needs to keep around his virtual texturing for everything else? Treating the world and the dynamic objects in it separately has been required in the past with lightmaps and vertex lighting. To bring back this Hanna Barbara effect in a even more complicated way leaves a bad taste in my mouth. I'm really looking for a uniform solution for geometry and textures.

For more detailed information see Jon Olick's Siggraph presentation. He is a former employee of id. Also check out the interview that started this all off.

The brick based voxel implementations seem like a better solution to me than having the tree uniform. This means the leaves are a different type than the nodes. They consist of a fixed size brick of voxels. Being in a brick format has many advantages. It allows free filtering and hardware volume texture compression through DXT formats.

Check out these for brick approaches:
Gigavoxels
GPU ray casting of voxels

Other blogs

2008-11-20T00:56:00.003-06:00

I just stumbled on this blog the other day that I haven't seen linked in the graphics blog circle so I thought I'd make a point of mentioning it. Chris Evans, a technical artist from Crytek, has a blog. He has just left Crytek to go to ILM and I hope he keeps up the blog as it's full of art and technical topics. His main site also has some good stuff like cryTools, Crytek's suite of max scripts.

I didn't get a chance to play Resistance 2 yet but for a good rendering break down of the game check out Timothy Farrar's post. Timothy is a Senior Systems Programmer I work with at Human Head so you can trust him ;). From a part of the game I did see I think the object shadows work similarly to the first cascade of cascaded shadow maps as in there is one shadow map that is fit to the view frustum within a short range and fades out past that range. I haven't seen the rest of the game to know whether shadows can come from more than one direction that would make this not work.

Smooth transitions

2008-11-16T20:54:00.003-06:00

The T-rex night attack scene from Jurassic Park was a major milestone for CG. It showed a realistic and convincing CG character in a movie. As the T-rex came after a girl Dr. Grant tells her, "Don't move. If we don't move he won't see us". This is something graphics programmers should remember as well because it holds true for humans as well. Our eyes are very good at seeing sharp changes but not very good at seeing smooth changes. If there is a way to smooth a hard transition you can get away with a lot more than if you didn't. I think this is the prime thing missing in most LOD implementations. The LOD change gets pushed far enough in the distance that you can't tell when it pops from one LOD level to the next. If the pop was replaced with a smooth transition the LOD distance could be pushed significantly closer and still not be noticeable.

Fracture
I only played the demo but I really liked their cascaded shadow map implementation. It looks like there is just 2 levels to it. After that it goes to no shadows. What is really nice about it is there's a smooth transition between the levels. As you walk out to an object it will fade from no shadow to low res shadow to high res shadow. So many cascaded shadow maps in games look strange or jarring because there is a line on the ground where it goes from one res to another in the shadows. This moves with your view direction and movement.

Gears of War 2
In my opinion any serious game graphics programmer or artist is obligated to play Gears of War 2. It is the bar now for graphics on a console. For being blown away by visuals it gives Crysis a run for its money too. As far as tech that seems absurd but it's just the combination of art and tech with enough things I'd never seen before that this takes the prize for me. It isn't this through and through so it really takes playing the whole game to get what I mean.

As far as tech I was a bit surprised when Tim Sweeney showed at GDC the new things they added to the Unreal engine for the next Gears of War. I was surprised because it wasn't very much. In the time between UT2k4 and GoW they built a whole new engine. Sure it was an evolution but it was a large one. The renderer was rewritten, they created a whole set of high end tools and changed the internal framework drastically. For GoW to GoW2 they added SSAO, hordes of distant guys, water, changed character lighting, and destructible geometry (which wasn't really in the game). This doesn't sound like very much considering they have 18 engine programmers listed in the credits.

The change that impressed me the most is fading in textures when they stream in. Some UE3 games have gotten some flak for streaming images popping in. Instead of not pushing the memory as much they added fading in of mip levels. For smooth transitions this is brilliant! I'm guessing most will never know textures aren't streamed in on time because they will never see the pop again. I also noticed they may have pushed the texture streamer further because they didn't have to worry about subtle pops for new mip levels. I couldn't tell if this happens when an image downsizes because I never noticed any image downsize.

Fading new mip levels once they stream in is something I've wanted to do but I just don't know how they are doing it. If anyone knows or is doing something similar themselves I'd love to hear how it works. The problem I see is there are min and max mip levels settable for a texture sampler on a 360. These unfortunately are dwords. Lod bias is settable but this happens before the clamp to min and max. The only way I could see this working is if they calculate the gradient themselves in the shader and clamp it to the fade value as they lerp from the old clamp to the new full mip level. This seems to me like it would create a shader explosion if this needs to be turned on and off for every texture for every shader. The alternative is always manually calculating the texture gradient for all uv's used and then clamping it individually for each texture which I believe would be quite a bit slower.

Next up is the screen space ambient occlusion (SSAO). This helped with their low res lightmaps and showed off their very high poly environments. Personally I think it was over done in many cases but I guess overall it was an improvement. I was surprised by the implementation. They are using frame recirculation to reduce the per frame cost. You can tell because obscuring objects will wipe the AO away for a moment before it grows back in. It seems to be at a pretty high res, possibly screen res. Previous results are found either using the velocity buffer or just the depth buffer and camera transformation matrix. Using this position they can sample from the previous calculated results without smearing things as you look around.

Much of the visual splendor comes from clever material effects. They have scrolling distortion maps to distort the uv's. There's pulsing parallax maps that look so good for a moment I thought the whole area was deforming geometry. There was inner glow from using an additive reverse fresnel like the cave ant lions in Half Life ep2. There was a window pane with rain drops coming down it that I had to study for like 2 mins to figure out what was going on. My guess, 1 droplet map, 2 identical drop stream maps independently masked by scrolling textures. The final normal map was used to distort and look up into an environment map. Their artists really went to town with some of this stuff.

The lighting is still mostly directional lightmaps. Shadows are character based modulate shadows, this time higher res than before. It seems they are only on characters this time leaving the SSAO to handle the rest of the dynamic objects.

The Latest Games

2008-11-16T00:47:00.014-06:00

So, I've gotten some complaints that I haven't updated this blog in a while. Partly this is due to the abundance of games that have come out lately and partly this is due to my mind share being in work specific graphics algorithms. Since I can not talk about what I do for work I decided a better area for blog topics is other peoples games. The majority of my graphics research comes from reading about and studying games in whatever form I can get it. I'll go through some of the things I've found lately.

Dead Space
One of the stand out graphical features to me for Dead Space is dynamic shadows. All lights and shadows are dynamic. The shadows are projected or cube shadow maps depending on the light. It looks like bilinear filtering which makes the shadows look very pixelated and nasty at times. They really take advantage of the shadows being dynamic though by having objects and lights move around whenever they can. Swinging and vibrating light positions are abundant. For large lights the resolution really suffered and the players shadow could reduce to blocks.

Coronas and similar effects were used a lot and looked great. With bloom being all the rage to portray bright objects it's a nice slap in the face that the old standbys can sometimes get a lot better results than the new fancy system. Similar tricks were used to get light beams through dusty corridors by using sheets that fade out as you get near them. This masks their flat shape well. It seems the artists went to town with this effect making different light beam textures for different situations. Foggy particle systems were also used that faded out when you get near them.

David Blizard, Dead Space's lighting designer, claimed their frame budget for building the deferred lighting buffer was 7.5ms, 4ms for building the shadow buffers, and 2ms for post processing including bloom and an antialiasing pass which means they are not using multisampling. For ambient light there is baked ambient occlusion for the world that modulates an ambient color coming from the lighting.

Mirrors Edge
Although it uses the Unreal engine it looks distinctly not like any other Unreal game or for that matter any other game out. This is due to its "graphic design" looking art style. Tech wise this relies heavily on global illumination. For this they replaced the normal lightmap generation tool from Unreal with Beast. From what I've seen in the first hour or so the only dynamic shadows are from characters and they are modulate ones. All environment lighting is baked into the lightmap with Beast. It looks like they're at a higher resolution than I've seen used before. My guess is they sacrificed a larger portion of their texture memory and added streaming support for the lightmaps. It has auto exposure which I haven't seen from an Unreal game before. All of the reflections but one planer one were cube maps which caught me a little by surprise because DICE had talked about doing research into getting real reflections working. Overall I was impressed. You could tell there was a very tight art / tech vision for the game.

Fallout 3
The world in Fallout 3 is stunning. Tech side I don't see much improved from Oblivion. Maybe most of the changes are under the hood to get more things running faster but I don't see much different, just a few odds and ends. It does seem like the artists have really grown and they have come up with good ways for portraying the things they needed to.

A few things stood out to me. First is the light beams. This is a common new technique of using a shader that fades out the edges of a cone like mesh and some other stuff to portray a light cone. The first place I saw it was Gears of War but it was used heavily and well in Fallout 3.

Second was the grass that was done with many grass clump models with variation. The textures had a darkened core to simulate shadowing and with AA on they used alpha to coverage and looked great. So many games use just one grass sprite to fill in grass. No matter how dense you can get it it will not look right. Grass needs variation in size, color, texture, density and shape. Great job, it's some of the best grass I've seen.

The last thing that stood out to me was their crumble decals (don't know what else to call them). To portray crumbling damaged stone and concrete, alpha tested normal mapped decals were placed on the edges of models. It looks like the same instanced model was placed around with different placement of these crumble decals. It added both an apparent uniqueness to the models as well as broke up their straight edges. There was quite a few times I was fooled into thinking some perfectly straight edge was tessellated when it was just effective use of these decals. Also of note is that these decals will fade out in the distance often before anything else LODs. There was likely a system to handle these decals specially.

Overall, the art was nice but the tech was underwhelming. They really should have shadows by this point. There is no world shadows of any sort. With this addition I think the game would look twice as good as it does.

How Pixar Fosters Collective Creativity

2008-08-30T14:25:00.003-05:00

I just read this article by Ed Catmull on the business principles that have driven Pixar to their great success. Link. The first reason you should be interested is that Mr. Catmull is one of the pioneers of computer graphics. The second is Pixar, the company he created and still runs, is one of the most consistently successful companies around that creates artistic products. The type of work Pixar does is very close to the work we do in games and there is plenty to learn from his experiences.

Global Illumination

2008-08-13T00:43:00.005-05:00

Talking about precalculated lighting reminded me of this awesome paper I just read from Pixar Point Based Color Bleeding. They got a 10x speed up on GI over raytracing. I did a bunch of research on this topic and it's funny I was just one small insight away from what they are doing. Not doing it this way required me to have to go down a completely different path. Sometimes it's small things that change success to failure.

Deferred rendering 2

2008-08-12T21:40:00.006-05:00

I'll start off by saying check out the new papers from siggraph posted here. I was really surprised with the one on Starcraft II. Blizzard in the past has stayed behind the curve purposely to keep their requirements low and audience large. It seems this time they have kept the low range while expanding more into the high end. I was also surprised due to nature of the visuals in the game. It's part adventure game? Count me in. It's looking great. It also has an interesting deferred rendering architecture which leads me to my next thing.

Deferred rendering part II. Perhaps I should have just waited and made one monster post but now you'll just have to live with it.

Light Pre-Pass
post

This was recently proposed by Wolfgang Engel. The main idea is to split material rendering into two parts. First part is writing out depth and normal to a small G-buffer. It's possible this can even all fit in one render target. With this information you can get all that is important from the lights which is N dot L and R dot V or N dot H whichever you want. The buffer is as follows:

LightColor.r * N.L * Att
LightColor.g * N.L * Att
LightColor.b * N.L * Att
R.V^n * N.L * Att

With this information standard forward rendering can be done just once. This comprises the second part of the material rendering.

He explains that R.V^n can be derived later by dividing out the N.L * Att but I don't understand any reason to do this. This also means a divide by the color that is just wrong. There's also the mysterious exponent that must be a global or something meaning no surface changeable exponent.

There are really a number of issues here. Specular doesn't have any color at all, not even from the lights. If you instead store R.V in the forth channel and try to apply the power and multiply by LightColor * N.L * Att in the forward pass the multiplications have been shuffled with additions and it doesn't work out. There is no specular color or exponent and it is dependent on everything being the phong lighting equation. It has solved the deep framebuffer problem but it is a lot more restrictive than traditional deferred rendering. All in all it's nice for a demo but not for production.

Naughty Dog's Pre-Lighting
presentation

I have to admit when I sat through this talk I didn't really understand why they were doing what they were doing. It seemed overly complicated to me. After reading the slides afterwards the brilliance started to show through. The slides are pretty confusing so I will at least explain what I think they mean from it. Insomniac has since adopted this method as well but I can't seem to find that presentation. The idea is very similar to the Pre-pass lighting method. It is likely what you would get if you take Light Pre-Pass to it's logical conclusion.

Surface rendering is split in 2 parts. First pass it renders out depth, normal and specular exponent. Second, the lights are drawn additively into two HDR buffers, diffuse and specular. The materials specular exponent has been saved out so this can all be done correctly. These two buffers can then be used in the second surface pass as the accumulated lighting and material attributes such as diffuse color and spec color can be applied. They apply some extra trickery that complicates the slides that is combining light drawing in quads so a single pixel on screen never gets drawn during light drawing more than once.

This is completely usable in a production environment as proven by Uncharted having shipped and looking gorgeous. Lights can be handled one at a time (even though they don't) so multiple shadows pose no problems. The size of the framebuffer is smaller. HDR obviously works fine.

It doesn't solve all the problems though. Most are small and without testing it myself I can't say whether they are significant or not. The one nagging problem of being stuck with phong lighting still remains. This time it's just a different part of Phong that has been exposed and is rigid in the system.

Light Pass Combined Forward Rendering

I am going to propose another alternative that I haven't really seen talked about. The idea is similar to Light indexed deferred. The idea there was forward rendering style but with all the lights that hit that pixel rendering in one pass. This can be handled far simpler if when drawing that surface the light parameters were merely passed in when drawing the surface and more than one light is applied at a time. This is nothing new. Crysis can apply up to 4 lights at a time. What I haven't seen discussed is what to do when a light only hits part of a surface. Light indexed rendering handles this on a per pixel basis so it is a non issue. If the lights are "indexed" per surface then there can be many more lights that have to affect every pixel than is needed.

We can solve this problem in another way other than screen space. For instance, splitting the world geometry at the bounds of static lights will get you pixel perfect light coverage for any mesh you wish to split. The surfaces with the worst problems are the largest, being hit with the most lights. These are almost always large walls, floors and ceilings. Splitting this type of geometry is not typically very expensive and is rarely instanced. For objects that don't fall in this category they are typically instanced, relatively contained meshes that do not have very smooth transitions with other geometry. I suggest keeping only a fixed number of real affecting lights to render these surfaces by combining any less significant lights into a spherical harmonic. For more details see Tom Forsyth's post on it. In my experience the light count hasn't posed an issue.

The one remaining issue is shadows. Because all lights for a surface are applied at once shadows can't be done a light at a time. This is the same issue as light indexed rendering and the solution will be the same as well. All shadows have to be calculated and stored, likely as a screen space buffer. The obvious choice is 4 shadowing lights using 4 components of a RGBA8 render target. This is the same solution Crytek is using. That doesn't mean only 4 shadowing lights are allowed on screen at a time. There is nothing stopping you from rendering a surface again after you've completed everything using those 4 lights.

Given the limit of 4 shadowing lights this turns into a forward rendering architecture that is only one pass. It gets rid of all the redundant work from draws, tris, and material setup. It also gives you all the power of a forward renderer such as changing the light equation to be whatever you want it to. It doesn't rely in any way on screen space buffers for doing the lighting besides the shadow buffer. This means no additional memory and 360 edram headaches.

There are plenty of problems with this. Splitting meshes only works with static lights. In all of the games I've referenced so far this poses no problems. Most environmental lighting does not move (at least the bounds), nor does the scenery to a large extent. Splitting a mesh adds more triangles, vertices, and draw calls than before. In the cases where you split this it is typically not a major issue.

You do not get one of the cool things from deferred rendering and that is independence from the number of lights. In the Starcraft II paper that came out today they had a scene with over 50 lights in it including every bulb on a string of Xmas lights. This is not a major issue for a standard deferred renderer but it is for pass combined forward rendering. It is really cool to be able to do that but in my opinion it is not very important. The impact on the scene from those Xmas lights actually casting light is minimal and there are likely other ways of doing it besides tiny dynamic lights.

Summary

That is my round up of dynamic lighting architectures. I left out any kind of precalculated lighting such as lightmaps, environment maps or Carmack's baked lighting into a unique virtual texture as it's pretty much just a different topic.