## Saturday, May 12, 2018

### Normal map filtering using vMF (part 3)

$$\newcommand{\vv}{\mathbf{v}} \newcommand{\rv}{\mathbf{r}} \newcommand{\muv}{\boldsymbol\mu} \newcommand{\omegav}{\boldsymbol\omega} \newcommand{\mudotv}{\muv\cdot\vv}$$ What can we use this for? One example of a place where distributions are summed up is in normal map and roughness filtering. Normal and roughness maps are textures describing the distribution of microfacet normals. The normal of the normal map is the mean of the distribution and the roughness describes the width of the distribution. We can fit our chosen NDF with a vMF by finding a mapping from roughness to sharpness $\lambda$.

This mapping for Beckmann is given by  as: \begin{equation} \lambda \approx \frac{2}{\alpha^2} \label{eq:roughness_to_lambda} \end{equation} and following my previous post about specular models we can use the $\alpha$ from any of those distributions in this equation for a reasonable approximation.

Once you have vMFs we can sum or filter them in $\rv$ form. Then we can turn it back to normal and roughness by inverting the function: \begin{equation} \alpha \approx \sqrt{\frac{2}{\lambda}} \label{eq:lambda_to_roughness} \end{equation} We must be careful with floating point precision and divide by zero though. Instead of calculating $\lambda$ we can instead calculate its reciprocal which avoids multiple places where a divide by nearly zero can happen.

// Convert normal and roughness to r form
float InvLambda = 0.5 * Alpha*Alpha;
float exp2L = exp( -2.0 / InvLambda );
float CothLambda = InvLambda > 0.1 ? (1 + exp2L) / (1 - exp2L) : 1;
float3 r = ( CothLambda - InvLambda ) * Normal;

// Filter in r form

// Convert back to normal and roughness
float r2 = clamp( dot(r,r), 1e-8, 1 );
InvLambda = rsqrt( r2 ) * ( 1 - r2 ) / ( 3 - r2 );
Alpha = sqrt( 2 * InvLambda );
Normal = normalize(r);

How does this compare to the common approaches? The first to do something like this was Toksvig  which follows similar logic with vector length corresponding with gloss and uses properties of Gaussian distributions but not SGs exactly. LEAN mapping  is based on Gaussians as well but planar distributions, not spherical. The approach I just explained should in theory work just as well with object space normals.

Even though it was part of the original approach the common way to use "Toksvig" filtering (including UE4's implementation) is to find the normal variance and increase the roughness by it. There is no influence from the roughness on the normals when doing that and there should be. The correct way will affect how the normals are filtered. A smooth normal should have more weight in the filter than a rough normal.

vMF has been used for this task before in  and later . There is a major difference from our approach in that Frequency domain normal map filtering relies on convolving instead of averaging. It finds the vMF for the distribution of normals over the filter region. It then convolves the normal and roughness by that vMF. But what is a convolution?

### Convolution

Graphics programmers know of convolutions like blurring. It sort of spreads data out right? What does it mean mathematically though? A convolution of one function by another creates a new function that is equivalent to the integral of the function being convolved multiplied by a the convolving function translated to that point.

Think of a blur kernel with weights per tap. That kernel center is translated to the pixel in the image that we write the blur result to. Each tap of the kernel is a value from the blur function. We multiply those kernel values by the image that is being convolved. All of those samples are then added together. Now usually a blur doesn't have infinite support or every pixel of the image would need to be sampled but the only reason that doesn't need to happen is because the convolving function, ie the blur kernel, is zero past a certain distance from the center of the function. Otherwise the integral needs to cover the entire domain. In the 1D case that means from negative to positive infinity. In the case of a sphere that means over the entire surface of the sphere.

This symbolically looks like this for 1D: \begin{equation} (f * g) (x) = \int_{-\infty}^\infty f(t) g(x-t)\,dt \end{equation} We now have the definition but why would we want to convolve a function besides image blurring? A convolution of one function by another creates a new function that when evaluated is equal to if the both functions were multiplied together and integrated at that translated point. Think of this like precalculating the multiplication and integration of those functions for any translated point. The integral of the product is done ahead of time and now we can evaluate it for any translation.

This is exactly the use case for preconvolving environment maps by the reflected GGX distribution. GGX is the convolving function, the environment map is the function being convolved, the reflection vector direction used to sample the preconvolved environment map is the "translation". SGs are very simple to multiply and integrate as we have already seen so precomputing it often doesn't save much. Convolving does have its uses though so let's see how to do it.

### Convolving SGs

The convolution of two SGs is not closed in the SG basis, meaning it does not result in exactly a SG. Fortunately it can be well approximated by one.  gave a simple approximation that is fairly accurate so long as the lobe sharpnesses aren't very low: \begin{equation} \begin{aligned} \left(G_1 * G_2\right) \left( \vv \right) &= \int_{S^2} G_1(\omegav) G_2\left( \omegav; \vv, \lambda_2, a_2 \right) \,d\omegav \\ &\approx G \left( \vv; \muv_1, \frac{\lambda_1 \lambda_2}{\lambda_1 + \lambda_2}, 2\pi\frac{a_1 a_2}{\lambda_1 + \lambda_2} \right) \end{aligned} \label{eq:convolve_sg} \end{equation} The first line of the equation above may shed more light on how we can use this if it isn't clear already. This is identical to the inner product but with $\muv_2$ replaced with a free parameter.

For the case of normal map filtering we don't care about amplitude. We want a normalized SG. That means for this case the only part that matters is the convolved $\lambda'$: \begin{equation} \lambda' = \frac{\lambda_1 \lambda_2}{\lambda_1 + \lambda_2} \end{equation} We can ignore the rest of eq \eqref{eq:convolve_sg} for the moment. If we replace $\lambda$ everywhere with $\alpha$ using eq \eqref{eq:lambda_to_roughness} we get a nice simple equation: \begin{equation} \alpha' = \sqrt{ \alpha_1^2 + \alpha_2^2 } \end{equation} Leave $\lambda$ in for one of them and we get: \begin{equation} \alpha' = \sqrt{ \alpha^2 + \frac{2}{\lambda} } \label{eq:alpha_prime} \end{equation} which looks just like what  used except for the 2 factor. I believe this is a mistake in their course notes. In equation (37) of their notes they have it as 1/2 and in the code sample it is 1. I think the confusion comes from the Frequency Domain Normal Map Filtering paper working with Torrance Sparrow and not Cook Torrance, and $\sigma \neq \alpha$. Overall it means less roughness from normal variance. In my tests using eq \eqref{eq:alpha_prime} that we just derived looks closer to Toksvig results. Otherwise the range is off and less rough. MJP uses the same 2/a^2 for SG sharpness in his blog post so we don't disagree there.

As a gut check if $\alpha=0$ and all final roughness comes from normal variation then $\alpha'=\sqrt{\frac{2}{\lambda}}$ which is what we established in eq \eqref{eq:lambda_to_roughness}. If there is no normal variation then this equation explodes but if you calculate InvLambda like I did in the code snippet the second term becomes zero and $\alpha'=\alpha$ which is what we want.

Next up, converting from SH to SG (coming soon).

### References

 Wang et al. 2007, "All-Frequency Rendering of Dynamic, Spatially-Varying Reflectance"
 Toksvig 2004, "Mipmapping Normal Maps"
 Olano et al. 2010, "LEAN Mapping"
 Hill 2011, "Specular Showdown in the Wild West"
 Han et al. 2007, "Frequency Domain Normal Map Filtering"
 Neubelt et al. 2013, "Crafting a Next-Gen Material Pipeline for The Order: 1886"
 Iwasaki 2012, "Interactive Bi-scale Editing of Highly Glossy Materials"

### von Mises-Fisher (part 2)

$$\newcommand{\vv}{\mathbf{v}} \newcommand{\rv}{\mathbf{r}} \newcommand{\muv}{\boldsymbol\mu} \newcommand{\mudotv}{\muv\cdot\vv}$$ A normalized SG has the same equation as the probability distribution function for a von Mises-Fisher (vMF) distribution on the 3 dimensional sphere. This affords us a few more tools and applications to work with. A vMF distribution can be defined for any dimension. I'll focus on 3D here because it is the most widely usable for computer graphics and simplifies discussion. Because a vMF does not have a free amplitude parameter it is written as: \begin{equation} \begin{aligned} V(\vv;\muv,\lambda) = \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } e^{\lambda(\mudotv - 1)} \end{aligned} \label{eq:vmf} \end{equation} The more common form you will likely see in literature is this: \begin{equation} \begin{aligned} V(\vv;\muv,\lambda) = \frac{\lambda}{ 4\pi \sinh(\lambda) } e^{\lambda(\mudotv)} \end{aligned} \label{eq:vmf_sinh} \end{equation} which is equivalent due to the identity \begin{equation} \begin{aligned} \sinh(x) = \frac{ 1 - e^{-2x} }{ 2e^{-x} } \end{aligned} \label{eq:sinh_identity} \end{equation} The form in eq \eqref{eq:vmf} is more numerically stable so should be used in practice as explained by .

Compare the equation for a vMF to the equation for a SG and it is easy to see that: \begin{equation} \begin{aligned} V(\vv;\muv,\lambda) = G\left( \vv; \muv, \lambda, \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } \right) \end{aligned} \label{eq:vmf_to_sg} \end{equation} That means a vMF is equivalent to a normalized SG and by moving terms from one side to the other we can show that a SG is equivalent to a scaled vMF. \begin{equation} \begin{aligned} G\left( \vv; \muv, \lambda, a \right) = \frac{2\pi a}{\lambda} \left( 1 - e^{-2 \lambda} \right) V(\vv;\muv,\lambda) \end{aligned} \label{eq:sg_to_vmf} \end{equation}

### Fitting a vMF distribution to data

Fitting a vMF distribution to directions or points on a sphere is a very similar process as fitting a normal distribution to points on a line. In the case of a normal distribution, one calculates the mean and variance of the data set and then chooses a normal distribution with the same mean and variance as the best fit to the data.

For the vMF distribution the mean direction and spherical variance are used. Calculating these properties for a set of directions is simple. \begin{equation} \begin{aligned} \rv = \frac{1}{n}\sum_{i=1}^{n} \textbf{x}_i \end{aligned} \label{eq:r_average} \end{equation} where $\textbf{x}_1, \textbf{x}_2, ..., \textbf{x}_n$ are a set of unit vectors.

Often values are associated with these directions. So instead taking a simple average we can take a weighted average. \begin{equation} \begin{aligned} \rv = \frac{\sum_{i=1}^{n} \textbf{x}_i w_i}{\sum_{i=1}^{n} w_i} \end{aligned} \label{eq:r_weighted_average} \end{equation} We have the two properties, the mean direction $\muv = \frac{\rv}{\|\rv\|}$ and the spherical variance $\sigma^2 = 1 - \|\rv\|$. To fit a vMF distribution to the data we need to know what these properties are for the vMF distribution. Since the vMF distribution is convex, circularly symmetric about its axis, and is max in the direction of $\muv$, it is fairly obvious that the mean direction will be $\muv$ so I won't derive that here.

The spherical variance $\sigma^2$ on the other hand is a bit more involved. Because we already know the direction of $\rv$ is $\muv$ we can simplify this calculation to the integral of the projection of the function onto $\muv$. \begin{equation} \begin{split} \|\rv\| &= \int_{S^2} V(\vv;\muv,\lambda) (\mudotv) d\vv \\ &= \frac{\lambda}{ 4\pi \sinh(\lambda) } \int_{S^2} e^{\lambda(\mudotv)} (\mudotv) d\vv \\ \end{split} \end{equation} Because the integral over the sphere is rotation-invariant we will replace $\muv$ with the x-axis. \begin{equation} \begin{split} &= \frac{\lambda}{ 4\pi \sinh(\lambda) } \int_{0}^{2 \pi} \int_{0}^{\pi} e^{\lambda\cos\theta} \cos\theta\sin\theta d\theta d\phi \\ &= \frac{\lambda}{ 4\pi \sinh(\lambda) } 2 \pi \int_{0}^{\pi} e^{\lambda\cos\theta} \cos\theta\sin\theta d\theta \\ \end{split} \end{equation} Substituting $t=-\cos\theta$ and $dt=\sin\theta d\theta$ \begin{equation} \begin{split} &= \frac{\lambda}{ 2 \sinh(\lambda) } \int_{-1}^{1} -t e^{-\lambda t} dt \\ &= \frac{\lambda}{ 2 \sinh(\lambda) } \left( \frac{ 2 \lambda \cosh(\lambda) - 2 \sinh(\lambda) }{ \lambda^2 } \right) \\ &= \frac{\cosh(\lambda)}{ \sinh(\lambda) } - \frac{\sinh(\lambda)}{ \lambda \sinh(\lambda) } \\ \end{split} \end{equation} Arriving in its final form \begin{equation} \|\rv\| = \coth(\lambda)-\frac{1}{\lambda} \label{eq:r_length} \end{equation} Although simple in form, this function unfortunately isn't invertible.  provides an approximation which is close enough for our purposes. \begin{equation} \begin{aligned} \lambda &= \|\rv\| \frac{ 3 - \|\rv\|^2}{1 - \|\rv\|^2} \end{aligned} \end{equation} Now that we have a way to calculate the mean and spherical variance for a data set and we know the corresponding vMF mean and spherical variance, we can fit a vMF to the data set.

Using eq \eqref{eq:r_weighted_average} to calculate $\rv$, the vMF fit to that data is \begin{equation} V\left( \vv; \frac{\rv}{\|\rv\|},\|\rv\| \frac{ 3 - \|\rv\|^2}{1 - \|\rv\|^2} \right) \label{eq:r_to_vmf} \end{equation}
Going the other direction from $V(\vv;\muv,\lambda)$ form to $\rv$ form using eq \eqref{eq:r_length} is this: \begin{equation} \rv = \left( \coth(\lambda)-\frac{1}{\lambda} \right) \muv \label{eq:vmf_to_r} \end{equation}

### Addition of SGs

We now have a way to convert to and from $\rv$ form. $\rv$ is linearly filterable as shown in how it was originally defined in eq \eqref{eq:r_weighted_average}. This means if our vMF functions are representing a spherical distribution of something then a weighted sum of those distributions can be approximately fit by another vMF. In other words we can approximate the resulting distribution by converting to $\rv$ form, filtering, and then converting back to traditional $V(\vv;\muv,\lambda)$ form.

By using the weighted average eq \eqref{eq:r_weighted_average} we can apply this concept to non normalized SGs too. This allows us to not just filter (ie sum with a total weight of 1) but add as well. A non-normalized SG as shown in eq \eqref{eq:sg_to_vmf} is a scaled vMF. We can use this scale as the weight when summing and use the total weight as the final scale for the summed SG.

This is the $\rv$ form for $G(\vv;\muv,\lambda, a)$. It includes an additional weight value you can think of like the energy this SG is adding to the sum: \begin{equation} \begin{aligned} \rv_i &= \left( \coth(\lambda_i)-\frac{1}{\lambda_i} \right) \muv_i \\ w_i &= \frac{2\pi a_i}{\lambda_i} \left( 1 - e^{-2 \lambda_i} \right) \\ \end{aligned} \end{equation} This weight is of course used in the weighted sum \begin{equation} \begin{aligned} \rv &= \frac{\sum_{i=1}^{n} \rv_i w_i}{\sum_{i=1}^{n} w_i} \\ w &= \sum_{i=1}^{n} w_i \\ \end{aligned} \end{equation} Using eq \eqref{eq:r_to_vmf} and eq \eqref{eq:vmf_to_sg} we can convert back to a scaled vMF and finally to a SG in $G(\vv;\muv,\lambda, a)$ form: \begin{equation} \begin{aligned} G\left( \vv; \muv, \lambda, a \right) &= w V(\vv;\muv,\lambda) \\ &= G\left( \vv; \muv, \lambda, w \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } \right) \end{aligned} \end{equation} While addition and filtering are approximate they can be useful. The accuracy of the result is very dependent on the angle between the $\mu$ vectors or lobe axii. Adding sharp lobes pointed in different directions will result in a single wide lobe.

Next, what we can use this for:
Normal map filtering using vMF

### References

 Banerjee et al. 2005, "Clustering on the Unit Hypersphere using von Mises-Fisher Distributions"
 Jakob 2012, Numerically stable sampling of the von Mises Fisher distribution on S2 (and other tricks)"

### Spherical Gaussians (part 1)

$$\newcommand{\vv}{\mathbf{v}} \newcommand{\rv}{\mathbf{r}} \newcommand{\muv}{\boldsymbol\mu} \newcommand{\mudotv}{\muv\cdot\vv}$$ A Spherical Gaussian (SG) is a function of unit vector $\vv$ and is defined as \begin{equation} G(\vv;\muv,\lambda, a) = a e^{\lambda(\mudotv - 1)} \end{equation} where unit vector $\muv$, scalar $\lambda$, and scalar $a$ represent the lobe axis, lobe sharpness, and lobe amplitude of the SG, respectively.

The formula can be read as evaluating a SG in the direction of $\vv$ where the SG has the parameters of $\muv,\lambda, a$. An abbreviated notation $G(\vv)$ can be used instead when the parameters can be assumed. Often the more verbose notation is used to assign values to the parameters.

SGs have a number of nice properties including simple equations for a number of common operations.

### Product of Two SGs

The product of two SG's can be represented exactly as another SG. This product is sometimes referred to as the vector product. This formula was first properly given in  (it was shown earlier but in an non-normalized form).

Let $\lambda_m = \lambda_1 + \lambda_2$ and let $\muv_m = \frac{\lambda_1\muv_1 + \lambda_2\muv_2}{\lambda_1 + \lambda_2}$, then \begin{equation} \begin{split} G_1(\vv)G_2(\vv) = G\left(\vv; \frac{\muv_m}{\|\muv_m\|}, \lambda_m\|\muv_m\|, a_1 a_2 e^{\lambda_m\left(\|\muv_m\| - 1\right)}\right) \end{split} \label{eq:sg_product} \end{equation}

### Raising to a power

Given that the product of two SGs is another SG it shouldn't be much of a surprise that a SG raised to a power can be expressed exactly as another SG: \begin{equation} \begin{aligned} G(\vv)^n &= G(\vv; \muv,n\lambda, a^n) \end{aligned} \label{eq:sg_power} \end{equation}

### Integration Over The Sphere

The integral of a SG over the sphere has a closed form solution.

 showed that the integral was: \begin{equation} \int_{S^2}G(\vv) d\vv = 2 \pi \frac{a}{\lambda} \left( 1 - e^{-2\lambda} \right) \label{eq:sg_integral} \end{equation}

### Inner product

The inner product is defined as the integral over the sphere of the product of two SGs. We can already find the product of two SGs and integrate over a sphere. Putting those together we have: \begin{equation} \int_{S^2}G_1(\vv) G_2(\vv) d\vv = \frac{4 \pi a_0 a_1}{e^{\lambda_m}} \frac{ \sinh\left(\|\muv_m\| \right) }{ \|\muv_m\| } \label{eq:sg_inner_product_sinh} \end{equation} This equation has numerical precision issues when evaluated with floating point arithmetic. An alternative form which is more stable is the following: \begin{equation} \int_{S^2}G_1(\textbf{v}) G_2(\textbf{v}) d\textbf{v} = 2 \pi a_0 a_1 \frac{ e^{ \|\boldsymbol\mu_m\| - \lambda_m } - e^{ -\|\boldsymbol\mu_m\| - \lambda_m } }{ \|\boldsymbol\mu_m\| } \label{eq:sg_inner_product_exp} \end{equation}

### Normalization

Although there are other definitions for normalization I use the term to mean having an integral over the sphere equal to 1. Normalizing a SG is a simple matter of dividing it by its integral over the sphere. \begin{equation} \begin{aligned} \frac{ G(\vv) }{ \int_{S^2}G(\vv) d\vv } = G\left( \vv; \muv, \lambda, \frac{\lambda}{ 2\pi \left( 1 - e^{-2 \lambda} \right) } \right) \end{aligned} \label{eq:sg_normalized} \end{equation} Notice that the original $a$ parameter canceled out. Instead lobe amplitude is derived purely from the lobe sharpness $\lambda$.

These are all the common operations that have closed form solutions. So far nothing new here but hopefully it is helpful to have all these equations in a centralized place for reference. I didn't include derivations for any of these formulas. If readers think that would be useful to see maybe those could be added at a later date.

Special thanks to David Neubelt. Although this has been heavily modified from what we previously had I'm sure his touch is still present.

Now on to some less well covered concepts.
von Mises-Fisher (part 2)

### References

 Wang et al. 2007, "All-Frequency Rendering of Dynamic, Spatially-Varying Reflectance"
 Tsai et al. 2006, "All-Frequency Precomputed Radiance Transfer using Spherical Radial Basis Functions and Clustered Tensor Approximation"

### Intro and backstory

About 4 years ago now I ran into Spherical Gaussian (SG) math in a few different publications in a row, enough that it triggered the pattern detection in my brain. All were using SGs to approximate specular lobes. I remember feeling very similar 10 years prior when Spherical Harmonics were starting to become all the rage. Back with SH I noticed it fairly early, primarily from Tom Forsyth's slides on the topic and took the time to dig into the math and make sure I had this new useful thing in my toolbox. Doing so has proven to be well worth the time. I decided that I should do the same again and learn SGs and related math, in particular to build up a toolbox of operations I can do with them. Maybe it would prove as useful as SH has.

In the years since I'd say it has certainly been worth the effort. I don't think I can say it has proven as useful as SH has been to computer graphics but it was still worth learning. I intended to write up what I had found and share the toolbox of equations compiled in a centralized place at the very least. Unfortunately laziness and procrastination got in the way.

Also scope creep. About 3 years ago I mentioned to David Neubelt that I intended to write this up and he too had done a lot of work with SGs at Ready at Dawn so we decided we'd collaborate and write a joint paper and submit it to JCGT. The intention was to make something similar to Stupid Spherical Harmonics (SH) Tricks but for SG. That scope and seriousness is much larger than a simple single author blog post. We worked on derivations to all the formulas, wanted to have quality solutions to cube map fitting, multiple use cases proven in production, and a ton of other things to make it an exhaustive, professional, and ultimately great paper. This was much more than I ever intended as a simple blog post. I still think that paper we had in mind would be great to exist but the actual end result is it bloated the expectation of what either of us had the bandwidth or maybe the attention span to complete and after a couple of months of work the unfinished paper regretfully stagnated.

A year went by and eventually MJP of Ready at Dawn did in fact do a blog series write up on Spherical Gaussians. It is excellent and I suggest you read it before continuing if you haven't already. There are still some things I intended to cover that he did not as well as things he did that I never have done nor planned to cover so I think these should compliment each other well.

It is a royal shame I have not posted this in the 3+ years since I intended to. I had even promised folks publicly a write up was coming and then didn't deliver. I haven't posted anything on this blog since then in fact. I hope to do better and hopefully finally getting this out will unclog the pipes.

## Saturday, March 29, 2014

### UE4 available to all

The big news from Epic at GDC this year was that Unreal Engine 4 is now available to everyone for only $19/month + 5% royalties. There's been a lot said already about why this is cool, opinions, comparisons to competitors and so forth. From a business model point of view I feel it has been covered better than I could possibly say. If you are interested check out Mark Rein's post or hear it from the man himself Tim Sweeney for very compelling reasons why this is a good idea for us to provide and for developers to subscribe. I wanted to give my own personal perspective as an engineer. What Epic just did is absolutely revolutionary and I am privileged to be part of it. I am super excited! Let me explain why. If you've followed this blog for a while you may remember I gave Epic huge props for releasing UDK back in 2009. That was great for artists and designers, not that great for programmers. This is the next step down that path and its a doozy. Now there have been naysayers that compare the numbers 19/month + 5%, is that better or worse, is it new or been done before, and so on. They are missing the point. It isn't the price, it isn't even the features. The revolution is this: for a 1/3 the cost of a new video game you can get access to the complete source code for our cutting edge game engine. The entire engine source code, every bit of it, the exact same we use in house and the same provided to private licensees, is available to you for only$19 (except for XB1 and PS4 code we aren't legally able to give you due to NDAs). This is the same tech that will be powering many of the top AAA games developed this generation. As Mark put it "Right now 7 of the top 21 (!) all-time highest rated Xbox 360 games (by Metacritic score) were powered by Unreal Engine 3" and you damn well better bet we are going to do the same or better this time around with UE4.

Now you may say John Carmack and id have open sourced their engines many times. That is true and I commend them for it. John has had a lasting impact on how many coders, myself included, write software due to these efforts. Unfortunately, the strict licensing terms have resulted in basically no commercial products using the id engines from that open source license. John has even lamented this recently "It is self-serving at this point, but looking back, I wish I could have released all the Id code under a more permissive license. GPL never did really do anything for game code, and I do wonder whether it was a fundamental cultural incompatibility. GPL was probably the best that could have flown politically for the code releases -- posterity without copy-paste into competition." And my own personal twist with this situation is after Zenimax bought id, their engines were no longer commercially licensable meaning anything built on top of that code base had to be scraped. GPL is simply not an option for commercial games.

There are other examples of open sourced game engines or cheaply priced engine source code but they all have something in common which is they were never up to date. The id engines were only open sourced years after the game that used them shipped. Id had already moved onto their next technology. For example the engine that powered Doom 3 wasn't opened sourced until 6 years after the game shipped. By that point Rage had even shipped. The reason for this is obvious. The point where releasing the code is non-threatening to execs is the point where a competitor can't use the code to their advantage. This means every single engine code base available to indie devs or students was either not commercially developed or purposely not competitive in its capabilities.

With the UE4 subscription, you can today get complete source that is totally up to date and will continue to be. This isn't any old engine, its the same cutting edge technology that will be powering many of the top AAA games of this generation. And as soon as we add new features you'll get them. In fact you'll even see the work in progress before they're done. How crazy is that?

Look, its easy to brush this off as the guy who works for Epic telling me I should give him money. That isn't why I'm writing this. What I am most excited about by far is in what I can share and give back the the gamedev community. That is the reason I have this blog, its the reason I have presented at conferences and will again in the future.

If you are a teacher, check out the license terms, they are ridiculously nice for schools. If you are a student, tell your school about it or grab a personal subscription.

For all those wanting to get into the game industry, the same response you will hear from everyone is make something. No one lands a game job because of their great grades or fancy degrees. They get it by making something cool. I wrote my own engine. That's how I landed my first job. Although I learned a ton if I were in that place today I would modify UE4. There is literally nothing else more applicable to a engine programming position than proving you can do the work. Hell, make something cool enough we'll hire you!

I can't wait to see what people make with this. It's a great time to be a programmer!

## Sunday, December 15, 2013

### Tone mapping

When working with HDR values, two troublesome situations often arise.

The first happens when one tries to encode an HDR color using an encoding that has a limited range, for instance RGBM. Values outside the range still need to be handled gracefully, ie not clipped.

The second happens when an HDR signal is under sampled. One very bright sample can completely dominate the result. In path tracing these are commonly called fireflies.

In both cases the obvious solution is to reduce the range. This sounds exactly like tonemapping so break out those tone mapping operators, right? Well yes and no. Common tone mapping operators work on color channels individually. This has the downside of desaturating the colors which can look really bad if later operations attenuate the values, for instance reflections, glare, or DOF.

Instead I use a function that modifies only the luminance of the color. The simplest of which is this:
$$T(color) = \frac{color}{ 1 + \frac{luma}{range} }$$
Where $T$ is the tone mapping function, $color$ is the color to be tone mapped, $luma$ is the luminance of $color$, and $range$ is the range that I wish to tone map into. If the encoding must fit RGB individually in range then $luma$ is the max RGB component.

Inverting this operation is just as easy. $$T_{inverse}(color) = \frac{color}{ 1 - \frac{luma}{range} }$$
This operation, when used to reduce fireflies, can also be thought of as a weighting function for each sample: $$weight = \frac{1}{ 1 + luma }$$
For a weighted average, sum all samples and divide by the summed weights. The result will be the same as if the samples were tone mapped using $T$ with $range$ of 1, averaged, then inverse tone mapped using $T_{inverse}$.

If a more expensive function is acceptable then keeping more of the color range linear is best. To do this use the functions below where 0 to $a$ is linear and $a$ to $b$ is tone mapped. $$T(color) = \left\{ \begin{array}{l l} color & \quad \text{if luma \leq a}\\ \frac{color}{luma} \left( \frac{ a^2 - b*luma }{ 2a - b - luma } \right) & \quad \text{if luma \gt a} \end{array} \right.$$ $$T_{inverse}(color) = \left\{ \begin{array}{l l} color & \quad \text{if luma \leq a}\\ \frac{color}{luma} \left( \frac{ a^2 - ( 2a - b )luma }{ b - luma } \right) & \quad \text{if luma \gt a} \end{array} \right.$$
These are same as the first two functions if $a=0$ and $b=range$.

I have used these methods for lightmap encoding, environment map encoding, fixed point bloom, screen space reflections, path tracing, and more.

## Saturday, August 3, 2013

### Specular BRDF Reference

$$\newcommand{\nv}{\mathbf{n}} \newcommand{\lv}{\mathbf{l}} \newcommand{\vv}{\mathbf{v}} \newcommand{\hv}{\mathbf{h}} \newcommand{\mv}{\mathbf{m}} \newcommand{\rv}{\mathbf{r}} \newcommand{\ndotl}{\nv\cdot\lv} \newcommand{\ndotv}{\nv\cdot\vv} \newcommand{\ndoth}{\nv\cdot\hv} \newcommand{\ndotm}{\nv\cdot\mv} \newcommand{\vdoth}{\vv\cdot\hv}$$ While I worked on our new shading model for UE4 I tried many different options for our specular BRDF. Specifically, I tried many different terms for to Cook-Torrance microfacet specular BRDF: $$f(\lv, \vv) = \frac{D(\hv) F(\vv, \hv) G(\lv, \vv, \hv)}{4(\ndotl)(\ndotv)}$$ Directly comparing different terms requires being able to swap them while still using the same input parameters. I thought it might be a useful reference to put these all in one place using the same symbols and same inputs. I will use the same form as Naty , so please look there for background and theory. I'd like to keep this as a living reference so if you have useful additions or suggestions let me know.

First let me define alpha that will be used for all following equations using UE4's roughness: $$\alpha = roughness^2$$

## Normal Distribution Function (NDF)

The NDF, also known as the specular distribution, describes the distribution of microfacets for the surface. It is normalized  such that: $$\int_\Omega D(\mv) (\ndotm) d\omega_i = 1$$ It is interesting to notice all models have $\frac{1}{\pi \alpha^2}$ for the normalization factor in the isotropic case.

Blinn-Phong : $$D_{Blinn}(\mv) = \frac{1}{ \pi \alpha^2 } (\ndotm)^{ \left( \frac{2}{ \alpha^2 } - 2 \right) }$$ This is not the common form but follows when $power = \frac{2}{ \alpha^2 } - 2$.

Beckmann : $$D_{Beckmann}(\mv) = \frac{1}{ \pi \alpha^2 (\ndotm)^4 } \exp{ \left( \frac{(\ndotm)^2 - 1}{\alpha^2 (\ndotm)^2} \right) }$$

GGX (Trowbridge-Reitz) : $$D_{GGX}(\mv) = \frac{\alpha^2}{\pi((\ndotm)^2 (\alpha^2 - 1) + 1)^2}$$

GGX Anisotropic : $$D_{GGXaniso}(\mv) = \frac{1}{\pi \alpha_x \alpha_y} \frac{1}{ \left( \frac{(\mathbf{x} \cdot \mv)^2}{\alpha_x^2} + \frac{(\mathbf{y} \cdot \mv)^2}{\alpha_y^2} + (\ndotm)^2 \right)^2 }$$

The geometric shadowing term describes the shadowing from the microfacets. This means ideally it should depend on roughness and the microfacet distribution.

Implicit : $$G_{Implicit}(\lv,\vv,\hv) = (\ndotl)(\ndotv)$$

Neumann : $$G_{Neumann}(\lv,\vv,\hv) = \frac{ (\ndotl)(\ndotv) }{ \mathrm{max}( \ndotl, \ndotv ) }$$

Cook-Torrance : $$G_{Cook-Torrance}(\lv,\vv,\hv) = \mathrm{min}\left( 1, \frac{ 2(\ndoth)(\ndotv) }{\vdoth}, \frac{ 2(\ndoth)(\ndotl) }{\vdoth} \right)$$

Kelemen : $$G_{Kelemen}(\lv,\vv,\hv) = \frac{ (\ndotl)(\ndotv) }{ (\vdoth)^2 }$$

### Smith

The following geometric shadowing models use Smith's method for their respective NDF. Smith breaks $G$ into two components: light and view, and uses the same equation for both: $$G(\lv, \vv, \hv) = G_{1}(\lv) G_{1}(\vv)$$ I will define $G_1$ below for each model and skip duplicating the above equation.

Beckmann : $$c = \frac{\ndotv}{ \alpha \sqrt{1 - (\ndotv)^2} }$$ $$G_{Beckmann}(\vv) = \left\{ \begin{array}{l l} \frac{ 3.535 c + 2.181 c^2 }{ 1 + 2.276 c + 2.577 c^2 } & \quad \text{if c < 1.6}\\ 1 & \quad \text{if c \geq 1.6} \end{array} \right.$$

Blinn-Phong:
The Smith integral has no closed form solution for Blinn-Phong. Walter  suggests using the same equation as Beckmann.

GGX : $$G_{GGX}(\vv) = \frac{ 2 (\ndotv) }{ (\ndotv) + \sqrt{ \alpha^2 + (1 - \alpha^2)(\ndotv)^2 } }$$ This is not the common form but is a simple refactor by multiplying by $\frac{\ndotv}{\ndotv}$.

Schlick-Beckmann:
Schlick  approximated the Smith equation for Beckmann. Naty  warns that Schlick approximated the wrong version of Smith, so be sure to compare to the Smith version before using. $$k = \alpha \sqrt{ \frac{2}{\pi} }$$ $$G_{Schlick}(\vv) = \frac{\ndotv}{(\ndotv)(1 - k) + k }$$

Schlick-GGX:
For UE4, I used the Schlick approximation and matched it to the GGX Smith formulation by remapping $k$ : $$k = \frac{\alpha}{2}$$

## Fresnel

The Fresnel function describes the amount of light that reflects from a mirror surface given its index of refraction. Instead of using IOR we instead use the parameter or $F_0$ which is the reflectance at normal incidence.

None: $$F_{None}(\mathbf{v}, \mathbf{h}) = F_0$$

Schlick : $$F_{Schlick}(\mathbf{v}, \mathbf{h}) = F_0 + (1 - F_0) ( 1 - (\vdoth) )^5$$

Cook-Torrance : $$\eta = \frac{ 1 + \sqrt{F_0} }{ 1 - \sqrt{F_0} }$$ $$c = \vdoth$$ $$g = \sqrt{ \eta^2 + c^2 - 1 }$$ $$F_{Cook-Torrance}(\mathbf{v}, \mathbf{h}) = \frac{1}{2} \left( \frac{g - c}{g + c} \right)^2 \left( 1 + \left( \frac{ (g + c)c - 1 }{ (g - c)c+ 1 } \right)^2 \right)$$

## Optimize

Be sure to optimize the BRDF shader code as a whole. I choose these forms of the equations to either match the literature or to demonstrate some property. They are not in the optimal form to compute in a pixel shader. For example, grouping Smith GGX with the BRDF denominator we have this: $$\frac{ G_{GGX}(\lv) G_{GGX}(\vv) }{4(\ndotl)(\ndotv)}$$ In optimized HLSL it looks like this:

float a2 = a*a;
float G_V = NoV + sqrt( (NoV - NoV * a2) * NoV + a2 );
float G_L = NoL + sqrt( (NoL - NoL * a2) * NoL + a2 );
return rcp( G_V * G_L );

If you are using this on an older non-scalar GPU you could vectorize it as well.

### References

 Hoffman 2013, "Background: Physics and Math of Shading"
 Blinn 1977, "Models of light reflection for computer synthesized pictures"
 Beckmann 1963, "The scattering of electromagnetic waves from rough surfaces"
 Walter et al. 2007, "Microfacet models for refraction through rough surfaces"
 Burley 2012, "Physically-Based Shading at Disney"
 Neumann et al. 1999, "Compact metallic reflectance models"
 Kelemen 2001, "A microfacet based coupled specular-matte brdf model with importance sampling"
 Smith 1967, "Geometrical shadowing of a random rough surface"
 Schlick 1994, "An Inexpensive BRDF Model for Physically-Based Rendering"
 Karis 2013, "Real Shading in Unreal Engine 4"
 Cook and Torrance 1982, "A Reflectance Model for Computer Graphics"
 Reed 2013, "How Is the NDF Really Defined?"

## Sunday, July 28, 2013

### Epic, SIGGRAPH, etc

I'm resurrecting this blog from the dead. I'm sorry it's been neglected for a year but I've been busy. If you follow me on twitter (@BrianKaris) then this probably isn't news, but for those that don't here's an update:

A year ago I left Human Head and accepted a position on the rendering team at Epic Games. Since then we made the UE4 Infiltrator demo. I've worked on temporal AA, reflections, shading, materials, and other misc cool stuff for UE4 and games being developed here at Epic. I'm surrounded by a bunch of really smart, talented people, with whom it has been a pleasure to work.

Just this last week I presented in the SIGGRAPH 2013 course: Physically Based Shading in Theory and Practice. If you saw my talk and are interested in the subject but haven't looked at the course notes I highly suggest you follow that link and check them out as well as the other presenter's materials. Like previous years, the talks are only a taste of the content that the course notes cover in detail.

Now with that out of the way, hopefully I can start making some good posts again.

## Monday, May 14, 2012

### Sparse shadows through tracing

The system I described last time allowed specular highlights to reach large distances but only requires calculating them on the tiles where they will show up. This is great but it means now we must calculate shadows for these very large distances. Growing the shadow maps to include geometry at a much greater distance is hugely wasteful. Fortunately there is a solution.

Before I get to that though I want to talk about a concept I think is going to be very important for next gen renderers and that is having more than one representation for scene geometry. Matt Swoboda talked about this in his GDC presentation this year  and I am in complete agreement with him. We will need geometry in similar formats as we've had in the past for efficient rasterization (vertex buffers, index buffers, displacement maps). This will be used whenever the rays are coherent simply because HW rasterization is much faster than any other algorithm currently for coherent rays. Examples of use are primary rays and shadow rays in the form of shadow maps.

Incoherent rays will be very important for next gen renderers but we need a different representation to efficiently trace rays. Any that support tracing cones will likely be more useful than ones which can only trace rays. Possible representations are signed distance fields , SVOs , surfel trees , and billboard cloud trees . I'll also include screen space representations although these don't store the whole scene. These include mip map chains of min/max depth maps , variance depth maps  and adaptive transparency screen buffers . Examples of use for these trace friendly data structures are indirect diffuse (radiosity), indirect specular (reflections) and sparse shadowing of direct specular. The last one is what helps with our current issue.

The Samaritan demo from Epic had a very similar issue that they solved in the same way I am suggesting. They had many point lights which generated specular highlights at any distance. To shadow them they did a cone trace in the direction of the reflection vector against a signed distance field that was stored in a volume texture. This was already being done for other reflections so using that data to shadow the point lights doesn’t come at much cost. The signed distance field data structure could be swapped with any of the others I listed. What is important is that the shadowing is calculated with a cone trace.

What I propose as the solution to our problem is to use traditional shadow maps only within the diffuse radius. Do a cone trace down the reflection vector. The cone trace will return a visibility function that any specular outside the range of a shadow map can cheaply use to shadow.

Actually, having shadowing data independent from the lights means it can be used for culling as well. The max unoccluded ray distance can be accumulated per tile which puts a cap on the culling cone for light sources. I anticipate this form of occlusion culling will actually be a very significant optimization.

This shadowing piece of the puzzle means the changes I suggested in my last post, in theory, come at a fairly low cost assuming you already do cone tracing for indirect specular. That may seem like a large assumption but to demonstrate how practical cone tracing is, a very simple, approximate form of cone tracing can be done purely against the depth buffer. This is what I do with screen space reflections on current gen hardware. I don’t do cone tracing exactly but instead reduce the trace distance with low glossiness and fade out the samples at the end of the trace. This acts like occlusion coverage fades by the radius of the cone at the point of impact which is a visually acceptable approximation. In other words the crudest form of cone tracing can already be done in current gen. It is fairly straightforward to extend this to true cone tracing on faster hardware using one of the screen space methods I listed. Replacing screen space with global is much more complex but doable.

The result is hopefully point light specularity “just works”. The problem is then shifted to determining which lights in the world to attempt to draw. Considering we have >10000 in one map in Prey 2 this may not be easy :). Honestly I haven’t thought about how to solve this yet.

I, like everyone else who has talked about tiled light culling, am leaving out an important part which is how to efficiently meld shadow maps and tiled culling for the diffuse portion. I will be covering ideas on how to handle that next time.

Finally, I want to reach out to all that have read these posts that if you have an idea on how the cone based culling can be adapted to a blinn distribution please let me know.

 http://directtovideo.wordpress.com/2012/03/15/get-my-slides-from-gdc2012/
 http://iquilezles.org/www/material/nvscene2008/rwwtt.pdf
 http://maverick.inria.fr/Publications/2011/CNSGE11b/GIVoxels-pg2011-authors.pdf
 http://www.mpi-inf.mpg.de/~ritschel/Papers/Microrendering.pdf
 http://graphics.cs.yale.edu/julie/pubs/bc03.pdf
 http://www.punkuser.net/vsm/vsm_paper.pdf
 http://www.nvidia.com/content/PDF/GDC2011/GDC2011EpicNVIDIAComposite.pdf

## Sunday, April 29, 2012

### Tiled Light Culling

First off I'm sorry that I haven't updated this blog in so long. Much of what I have wanted to talk about on this blog, but couldn't, was going to be covered in my GDC talk but that was cancelled due to forces outside my control. If you follow me on twitter (@BrianKaris) you probably heard all about it. My comments were picked up by the press and quoted in every story about Prey 2 since. That was not my intention but oh, well. So, I will go back to what I was doing which is to talk here about things I am not directly working on.

### Tiled lighting

There has been a lot of talk and excitement recently concerning tiled deferred  and tiled forward  rendering.

I’d like to talk about an idea I’ve had on how to do tile culled lighting a little differently.

The core behind either tiled forward or tiled deferred is to cull lights per tile. In other words for each tile, calculate which of the lights on screen affect it. The base level of culling is done by calculating a min and max depth for the tile and using this to construct a frustum. This frustum is intersected with a sphere from the light to determine which lights hit solid geometry in that tile. More complex culling can be done in addition to this such as back faced culling using a normal cone.

This very basic level of culling, sphere vs frustum, only works with the addition of an artificial construct which is the radius of the light. Physically correct light falloff is inverse squared.

### Light falloff

Small tangent I've been meaning to talk about for a while. To calculate the correct falloff from a sphere or disk light you should use these two equations :

Falloff:
$$Sphere = \frac{r^2}{d^2}$$
$$Disk = \frac{r^2}{r^2+d^2}$$

If you are dealing with light values in lumens you can replace the r^2 factor with 1. For a sphere light this gives you 1/d^2 which is what you expected. The reason I bring this up is I found it very helpful in understanding why the radiance appears to approach infinity when the distance to the light approaches zero. Put a light bulb on the ground and this obviously isn’t true. The truth from the above equation is the falloff approaches 1 when the distance to the sphere approaches zero. This gets hidden when the units change from lux to lumens and the surface area gets factored out. The moral of the story is don’t allow surfaces to penetrate the shape of a light because the math will not be correct anymore.

### Culling inverse squared falloff

Back to tiled culling. Inverse squared falloff means there is no distance in which the light contributes zero illumination. This is very inconvenient for a game world filled with lights. Two possibilities, first is to subtract a constant term from the falloff but max with 0. The second is windowing the falloff with something like (1-d^2/a^2)^2. The first loses energy over the entire influence of the light. The second loses energy only away from the source. I should note the tolerance should be proportional to the lights intensity. For simplicity I will use the following for this post:
$$Falloff = max( 0, \frac{1}{d^2}-tolerance)$$

The distance cutoff can be thought of as an error tolerance per light. Unfortunately glossy specular doesn’t work well in this framework at all. The intensity of a glossy, energy conserving specular highlight, even for a dielectric, will be WAY higher than the lambert diffuse. This spoils that idea of the distance falloff working as an error tolerance for both diffuse and specular because they are at completely different scales. In other words, for glossy specular, the distance will have to be very large for even a moderate tolerance, compared to diffuse.

This points to there being two different tolerances, one for diffuse the other for specular. If these both just affect the radius of influence we might as well just set the radius of both as the maximum because diffuse doesn’t take anything more to calculate than specular. Fortunately, maximum intensity of the specular inversely scales with the size of the highlight. This of course is the entire point of energy conservation but energy conservation helps us in culling. The higher the gloss, the larger the radius of influence the tighter the cone of influencing normals.

If it isn’t clear what I mean, think of a chrome ball. With a mirror finish, a light source, even as dim as a candle, is visible at really large distances. The important area on the ball is very small, just the size of the candle flame’s reflection. The less glossy the ball, the less distance the light source is visible but the more area on the ball the specular highlight covers.

Before we can cull using this information we need specular to go to zero past a tolerance just like distance falloff. The easiest is to subtract the tolerance from the specular distribution and max it with zero. For simplicity I will use phong for this post:
$$Phong = max( 0, \frac{n+2}{2}dot(L,R)^n-tolerance)$$

### Specular cone culling

This nicely maps to a cone of L vectors per pixel that will give a non-zero specular highlight.

Cone axis:
$$R = 2 N dot( N, V ) - V$$

Cone angle:
$$Angle = acos \left( \sqrt[n]{\frac{2 tolerance}{n+2}} \right)$$

Just like how a normal cone can be generated for the means of back face culling, these specular cones can be unioned for the tile and used to cull. We can now cull specular on a per tile basis which is what is exciting about tiled light culling.

I should mention the two culling factors need to actually be combined for specular. The sphere for falloff culling needs to expand based on gloss. The (n+2)/2 should be rolled into the distance falloff which leaves angle as just acos(tolerance^(1/n)). I’ve leave these details as an exercise for the reader. Now, to be clear I'm not advocating having diffuse and specular light lists. I'm suggesting culling the light if diffuse is below tolerance AND spec is below tolerance.

This leaves us with a scheme much like biased importance sampling. I haven’t tried this so I can’t comment on how practical it is but it has the potential to produce much more lively reflective surfaces due to having more specular highlights for minimal increase in cost. It also is nice to know your image is off by a known error tolerance from ground truth (per light in respect to shading).

The way I handle this light falloff business for current gen in P2 is by having all lighting beyond the artist set bounds of the deferred light get precalculated. For diffuse falloff I take what was truncated from the deferred light and add it to the lightmap (and SH probes). For specular I add it to the environment map. This means I can maintain the inverse squared light falloff and not lose any energy. I just split it into runtime and precalculated portions. Probably most important, light sources that are distant still show up in glossy reflections. This new culling idea may get that without the slop that comes from baking it into fixed representations.

I intended to also talk about how to add shadows but this is getting long. I'll save it for the next post.

References:
 http://visual-computing.intel-research.net/art/publications/deferred_rendering/