Probability of Random Variables

At some point, you may realize that all numbers you generate have an equal probability of being the next number, as it is in the case of the regular Rnd usage that we've done so far.

In the case of X = Int(11 * Rnd + 5), the 11 possible outcomes (Integers 5 through 15) all have an equal chance of being in X. So, the odds of X being 5 are 1 in 11, or 10 to 1, and are the same for the other 10 numbers.

Suppose instead that you really wanted to have a list of probabilities for each numbers, such as 5% for 5, 6, 7, 13, 14, and 15, and 10% for 8, 9, 11, and 12, and 30% for 10.
If you want to do probability listings for each number, it is of course important for the sum of all probabilities to be 100%... and each number will form some partition of the 100% probability, as it will be shown below.

P = Rnd will give a percentage-like number: it is neither lower than 0 and it is less than 1. The first 5%, or 0.05, will connect to the 5... the next 0.05 to 6, and the next 0.05 to 7. The order is not as much of a concern, since the probably of ANY 0.05 subdivision will be 5%.

In code, this would be like
Select Case Rnd
Case < 0.05 '5% partition
X = 5
Case < 0.1 '5% partition
X = 6
Case < 0.15 '5% partition
X = 7
Case < 0.25 '10% partition
X = 8
Case < 0.35 '10% partition
X = 9
Case < 0.65 '30% partition
X = 10
Case < 0.75 '10% partition
X = 11
Case < 0.85 '10% partition
X = 12
Case < 0.9 '5% partition
X = 13
Case < 0.95 '5% partition
X = 14
Case Else 'Should be 5%
X = 15
End Select

And this is one way of setting up probability tables.

Gaussian Random Variable Distribution

Gaussian Random Values are the way to go when you have quite a range of possible values and you really only want the values concentrated around a specific value.

Consider the instance above but you specifically want numbers that are approximately 10, meaning that 10 will come up much more often than 5 or 15. This is what goes on with Gaussian Random numbers.

The simplest example of a Gaussian Random variable is the total on two dice... the value of 7 (1 in 6) appears more frequently than 2 or 12 (both 1 in 36)... and all intermediate values occur less frequently as you move away from 7 (6 at 5 in 36).
Generating this value is easy:
X = Int(6 * Rnd + 1) + Int(6 * Rnd + 1)
it is merely the addition of two regular random variables.

So, the next step would be how to modify this so that you get any range of variables using this example of a Gaussian random variable. First of all, the average of the possible numbers in the above example is 7. This is called the expected value, typically denoted with an E (I bend mine).
The value of E is the average of Int(6 * Rnd + 1) plus the average of Int(6 * Rnd + 1).
Of course, the average for Int(6 * Rnd + 1) is always the same:

(1 + 2 + 3 + 4 + 5 + 6) / 6 = 15/6 = 3.5.
A shortcut would be to just add 1 to the number multiplying Rnd and divide by 2.
(1 + 6) / 2 = 3.5

Therefore E = 3.5 + 3.5 = 7

Suppose you wanted to expect a value of 25 like in the example above. This would mean that you'd need to add two random values that both have averages of 12.5.
Using the shortcut above, (1 + N) / 2 = A, where N is the number that multiplies Rnd and A is the average.
I can reverse this equation for N with A = 12.5. This would be

N = 2 * A - 1

which gives N = 24, which is uncoincidentally 1 less than the value that you want for E.
X = Int(24 * Rnd + 1) + Int(24 * Rnd + 1)

So, if you want to expect the value of E from this setup, you would just do:
X = Int( (E - 1) * Rnd + 1) + Int( (E - 1) * Rnd + 1)

Range and Offseting

The lowest number generated by all of the examples above is 2, because each "dice" rolls a minimum of 1. The highest number however would be the two times the highest number from

Int( (E - 1) * Rnd + 1)

if you recall, X = Int( (Highest + 1 - Lowest) * Rnd + Lowest ) will generate a number from Lowest to Highest... in this case E - 1 = Highest + 1 - Lowest, and Lowest = 1.
Therefore Highest = E - 1.

So, the highest value generated would be 2 * E - 2, and the range would be
2 To 2(E - 1), where E is the expected value lying snug in the center of this range.

[(2E - 2) + 2] / 2 = E

To offset the range is easy; just add the value to shift by at the end.

X = Int( (E - 1) * Rnd + 1) + Int( (E - 1) * Rnd + 1) - P

And now the range is 2 - P To 2(E - 1) - P
However, the expected value for this is no longer E, but is now E` = E - P.
For E to truly reflect the expected value for this system, P must be added to every occurence of E.
E` + P = E

X = Int( (E + P - 1) * Rnd + 1) + Int( (E + P - 1) * Rnd + 1) - P

And that resolves offsets...
The range is now 2 - P to 2(E + P - 1) - P and E remains the expected value

{[2(E + P - 1) - P] + (2 - P)} / 2
= {2E + 2P - 2 - P + 2 - P} / 2
= 2E / 2 = E

From this, we can specify any range we need.

Also,
Consider the case of the following

X = Int(6 * Rnd + M) + Int(6 * Rnd + M)

The range of this is 2M to 2M + 10, where the expected value E is 2 * (M + 2.5), or 2M + 5
(not E = 7, unless M = 1)
M + 2.5 being the average of Int(6 * Rnd + M).

In the more general case of:

X = Int((E - 1) * Rnd + M) + Int((E - 1) * Rnd + M)

The range is 2M To 2M + 2(E - 2), where the expected value E is 2M + E - 2.
Again, in order for E to reflect the expected value, each occurence must now be

X = Int((E - 2M + 1) * Rnd + M) + Int((E - 2M + 1) * Rnd + M)

which makes the range 2M To 2(E - M), and the expected value remains E.

Which was quite a bit more work, but we get prettier solutions.

Manipulating the Range

Now that we have our pretty ranges, we can easily manipulate them for values that we want. I explained them for a reason!
For instance if we want the random value to go from L to H, but remain mostly at the center,
we just compare L To H with 2M to 2(E - M).
Therefore, L = 2M (M = L/2)
and H = 2E - 2M = 2E - L,
which means the expected value E = (H + L) / 2, obviously, the average of L and H.
So: M = L/2, and E = (H + L) / 2,

making our formula
X = Int((H + L) / 2 - L + 1) * Rnd + L/2) + Int((H + L) / 2 - L + 1) * Rnd + L/2)
simplified to
X = Int(H/2 - L/2 + 1) * Rnd + L/2) + Int(H/2 - L/2 + 1) * Rnd + L/2)

... wow ...
think of it as a nice healthy proof so that you know for sure.

A range from 10 to 30 would simply be
X = Int((15 - 5 + 1) * Rnd + 5) + Int((15 - 5 + 1) * Rnd + 5)
or
X = Int(11 * Rnd + 5) + Int(11 * Rnd + 5)

Deviation and Stickiness

Now, of course, we can have another Gaussian variable simply by adding another dice.
X = Int(6 * Rnd + 1) + Int(6 * Rnd + 1) + Int(6 * Rnd + 1)

In this case, the expected value is E = 3 * A = 10.5.
Although you'll never actually get 10.5, the 10.5 means that 10 and 11 will appear equally.
(In fact, the .5 means that 50% of the time, you'll get the 11.)
The minimum is 3 and the maximum is 18.

This may seem no different from the above: the expected value is still the average of the minimum and the maximum.

However, there is a difference between
X = Int(6 * Rnd + 1) + Int(6 * Rnd + 1) + Int(6 * Rnd + 1)
and
X = Int(7.5 * Rnd + 1.5) + Int(7.5 * Rnd + 1.5)

First of all, there are decimals in the call to Int() so the minimum is actually 2 and the maximum is 16.

Second of all, even if their minimums were the same, the distribution of the three dice is stickier than the distribution of two dice.

In the two dice scenario, the probability of catching the minimum is 1/36. With three dice, the new minimum (3) has the probability of 1/216, quite less likely.

The more dice we add, the more sticky the inner values become, which means that it is
much less likely to get numbers that are farther away from the expected value.

Deviation decreases when you add more dice, Stickiness increases when you add more dice.

The formula, for D as the # of dice, is (unsurprisingly and unproven)
X = Int((H/D - L/D + 1) * Rnd + L/D)) + ...

or
X = Int((H/D - L/D + 1) * Rnd) + Int((H/D - L/D + 1) * Rnd + ... + L

So, if you want a range of 20 to 40 with 4 dice,
D = 4, H = 40, and L = 20... and the equation looks like
X = Int((10 - 5 + 1) * Rnd + (10 - 5 + 1) * Rnd + (10 - 5 + 1) * Rnd + (10 - 5 + 1) * Rnd) + 20
or
X = Int(6 * Rnd) + Int(6 * Rnd) + Int(6 * Rnd) + Int(6 * Rnd) + 20

as you can see, it's like throwing two pairs of dice and then adding 16 to the result.


Why isn't Int(2 * Rnd) + Int(2 * Rnd) equivalent to Int(3 * (Rnd + Rnd) / 2)?
The translation seems fine - the range of both is 0 to 2, and the expected value is 1 (when one rnd is 0.2 and the other 0.8), and both use two calls to rnd (therefore, D = 2). So, what's the difference (someone was clever enough to also realize that the two were not equal as well in the old tutorial).

The difference lies within their actual probability distribution.
First of all, if all Rnd calls were to return 0.5, the first (barely) returns 2, the second returns 1 (and if you really look at it, the 1.5 is not very close to 2 at all)... both Rnds would have to be greater than 0.66_7 before the second returns 2.


According to the picture, you can see that their probabilities are not equal.
In the first, the probability of getting 0, 1, and 2 are 0.25, 0.5, and 0.25, respectively.
The second has the respective probabilities of 0.22, 0.55, and 0.22.
The trend gets worse for higher values of E and D, so it's best to just squash it now.

Have fun with your new gaussian variables.

Numbers from 0 To 100 with D = 5 (that means five dice)
X = Int(21 * Rnd) + ...
49 62 49 65 49 42
85 54 48 25 59 32 49 29 53 55 61 63 39 35 34 59 45 48 11 53 37 60 37 38 52 26 41 39 63 44 52 33 66 73 43 57 59 33 51 61 44 35 32 54 55 36 73 42 26 70 34 62 52 37 58 75 64 28 38 55 39 63 65 20 59 50 46 58 75 48 36 37 71 60 34 33 43 55 33 53 55 51 41 36 84 58 25 37 54 63 54 64 38 54

as opposed to D = 1
X = Int(101 * Rnd)
71 53 58 29 30 78
1 76 82 71 4 41 87 79 37 97 88 5 95 36 53 77 5 59 47 30 62 65 26 28 83 83 59 99 92 22 70 98 24 53 10 100 68 1 58 10 10 80 28 4 29 38 30 95 98 40 28 16 16 65 41 41 71 32 63 20 18 58 8 46 91 26 79 38 29 92 63 63 43 9 56 70 92 84 2 54 92 43 68 50 51 46 35 40 27 5 24 98 6 39

and the formula with D = 10
X = Int(11 * Rnd) + ...
55 58 45 70 37 46 39 53 63 35 46 48 31 47 38 39 39 53 41 70 48 45 56 39 43 45 58 47
50 44 66 45 49 51 41 55 51 60 35 67 33 50 43 54 38 71 31 60 61 47 47 43 42 48 57 56 44 54 55 51 53 52 46 43 46 46 62 54 35 43 59 43 37 34 43 57 58 67 40 51 42 59 52 39 40 47 25 56 45 45 51 60 37 59 47 56 36 71 65 63

Notice how the maximum (in blue) is lower when D is higher. Also, the minimum (in red) is higher when D is lower, and E, the expected value (in green) occurs sooner when D is higher.
}