gauss - The pseudo-Gaussian

Yourself · 2008-02-05 19:38:40

So this is kind of about gmlscripts, but not really.

I've always wondered what the probability distribution for the pseudo-Gaussian random number script on the site really is. I know it's not Gaussian (that's trivial to see).

Thanks to Mathematica I can now tell you. The distribution obviously depends on the number of random variables, n, summed up. In every case, the distribution is a polynomial spline. When n = 3, it is composed of 3 polynomials of degree 2 each. In general the spline is made up of n degree n-1 polynomials (so n = 4 is a cubic spline).

I've also found a pattern in the distribution along the edge of the interval, but haven't really found any pattern in the center, so that's not worth going in to yet.

Also thanks to Mathematica I can compute, analytically, what the distribution of any combination of uniform random variables will be.

EyeGuy · 2008-02-06 12:18:12

Well, in adition, I believe the distribution uses the Central Limit Theorem, which states that if you take the sum of N identically independentally distributed random variables, subtract the mean from each and divide the whole thing by "sqrt(N) * sigma" (where sigma is the square root of the variance), then as N -> infinity, then the distribution of this will approach a standard Normal (i.e. Gaussian) distribution.

[m]lim{n right infty}{{sum{i=1}{n}{X_{i}} - n mu} / {sqrt{n} sigma}} = W[/m]
(W is a standard normal distribution and each X[i] are each random variables of the same distribution)

Hmm... should that be line be:

Expand    g /= sqrt(6);

Wait... what's the variance of the triangle distribution "random(1) - random(1)"? Obviously, the mean is 0, but it seems like the variance probably isn't 1.

Last edited by EyeGuy (2008-02-06 12:35:03)

xot · 2008-02-06 18:44:13

This conversation is entirely over my head. laugh I've spent an hour trying to find the source for that algorithm, but I can't seem to locate it. I think it was from something to do with CGI where the method was chosen for its speed and aesthetics, rather than its accuracy.

Yourself · 2008-02-06 19:24:15

the sum of N identically independentally distributed random variables, subtract the mean from each

By definition of the mean that sum would be 0.

EyeGuy · 2008-02-07 11:05:06

Yea, the mean will obviously be 0, but right now the distribution of this pseudo-random function looks like it will have a variance that is significantly different than 1, which would explain the results I got whenever I was testing the two functions a while ago without thinking about how the pseudo-Gauss was working. It's not a big deal, but if it's going to end up with a normal distribution, it might as well has a variance of 1 (or better yet, have it as an argument like in yourself's exact_gauss script).

According to Wiki, the variance of |X1 - X2| where X1 and X2 are both uniform distributions with the range [0,1) is 1/18. I should be able to figure out what the variance without the absolute value would be, but it's been a while. But I think it's 3/18, because X1 - X2 would be a Triangular Distribution with a = -1, b = 0, c = 1, right?

g /= sqrt(6 * 3 / 18);

Wow.... What do you know, that equals 1. So you don't have to divide by anything at all?

Edit: of course, the 6th iteration of adding won't have EXACTLY a variance of 1, but it should probably be close. You could even make the number of iterations an argument, but I believe 6 iterations was chosen partially because of the fact that sqrt(6 * Var(X)) = 1.

Last edited by EyeGuy (2008-02-07 11:17:22)

xot · 2008-02-07 15:02:54

As I said, I'm out of my element here, but when you guys started talking about variance, that triggered something in my memory ... ds_list_variance. So, clueless, I stuffed a list full of samples to see what the results were. They weren't what I expected (shocking!) but in light of what EyeGuy just said seem to make more sense.

Expand// Result: 0.1667543440 (approx. 1/6)
dsid = ds_list_create();
repeat (1000000) ds_list_add(dsid,random(1)-random(1));
show_message(string_format(ds_list_variance(dsid),10,10));

Expand// Result: 0.0277266369 (approx. 1/36)
dsid = ds_list_create();
repeat (1000000) ds_list_add(dsid,gauss());
show_message(string_format(ds_list_variance(dsid),10,10));

Does that mean anything at all or am I barking up the wrong tree? unsure

*posts picture of bunny with a pancake on its head*

EyeGuy · 2008-02-07 16:51:53

Those are exactly what I expected. The variance of X1 - X2 (or "random(1) - random(1)") is 3/18 = 1/6. Which would mean that the sum of six of them would be approx. 1 = sqrt(6 * 3/18). And because you divide a distribution with variance approx. 1 by 6, you will get a distribution with variance aprox. (1/6)^2. More importantly:

Expand// Result: 1.0026576242 (approx. 1)
dsid = ds_list_create();
repeat (1000000) ds_list_add(dsid,gauss()*6);
show_message(string_format(ds_list_variance(dsid),10,10));

What this means is that I think you can get rid of the line "g /= 6;" and you will have a slightly faster function that has approximately the same variance as a the standard normal.

Last edited by EyeGuy (2008-02-07 16:55:01)

xot · 2008-02-07 17:14:21

Hmmmm. If you don't divide the sum by the number of samples, then you can't guarantee the result is in the range of {-1..1}, which seems to me to be an important (ie. useful) part of the function definition.

EyeGuy · 2008-02-07 17:46:25

The thing is, all the values will be very focused in the middle. The chances of them getting less than -0.66 or more than 0.66 are very insignificant. Here's an image of 1,000 blotches created at gauss()*100 pixels away from the center. The red rectangle shows the range -100 to 100 from the center and you can see there are none anywhere near the edges.

I don't know, it really depends on what you want. If you want a bell-like distribution with higher variance, not likely to get values near the edge, but still possible, and is still between -1 and 1, you could probably get something, but it might not be as quick. You could use this method divided by like 3 or 4 and force it to re-do itself in the somewhat-unlikely event that it exceeds (-1,1).

Last edited by EyeGuy (2008-02-07 17:48:38)

xot · 2008-02-07 18:57:47

You could use this method divided by like 3 or 4 and force it to re-do itself in the somewhat-unlikely event that it exceeds (-1,1).

If you are going to do that, you might as well just clamp it. I think you'd get the same results.

Tonal variations are a lot harder to read than spatial ones, so I made these histograms.

gauss2() is gauss() without g /= 6 (range shown: -3 to 3)

Yourself · 2008-02-08 01:20:39

That's why it's only an approximation. It doesn't share the same properties (or the same kind of flexibility) as an actual Gaussian distribution.

GMLscripts.com

#1 2008-02-05 19:38:40

gauss - The pseudo-Gaussian

#2 2008-02-06 12:18:12

Re: gauss - The pseudo-Gaussian

#3 2008-02-06 18:44:13

Re: gauss - The pseudo-Gaussian

#4 2008-02-06 19:24:15

Re: gauss - The pseudo-Gaussian

#5 2008-02-07 11:05:06

Re: gauss - The pseudo-Gaussian

#6 2008-02-07 15:02:54

Re: gauss - The pseudo-Gaussian

#7 2008-02-07 16:51:53

Re: gauss - The pseudo-Gaussian

#8 2008-02-07 17:14:21

Re: gauss - The pseudo-Gaussian

#9 2008-02-07 17:46:25

Re: gauss - The pseudo-Gaussian

#10 2008-02-07 18:57:47

Re: gauss - The pseudo-Gaussian

#11 2008-02-08 01:20:39

Re: gauss - The pseudo-Gaussian

Board footer