/*----------------------------------------------------------------------------*
RANDOM NUMBERS FROM ANY DISTRIBUTION, WITH RESAMPLING
Given an arbitrary cumulative probability distribution and a given value within
that distribution, this routine generates random numbers that represent that
part of the distribution where numbers equal or exceed the given value.
That is confusing, but translated to an application---age specific
mortality---it is less so. For that application, given an arbitrary distribution
defining the probability, measured at birth, of a newborn dying on or before a
given age, and given an age already achieved by an individual in this
population, this routine returns a random number drawn from the distribution
telling probabilistically how much longer the individual will live.
Mathematically, 'P(x)' is the cumulative distribution, with 'P(0)=0' and
'P(infinity)=1', The given value is 'g'. Random numbers are selected from a
transformed distribution, 'F(x) = (P(x+g)-P(g)) / (1-P(g))'. Note that all
"memoryless" distributions are invariant under this transformation, including
the cumulative exponential distribution, 'P(x)=1-e^(-r x)' and the trivial
distribution, 'P(x)=0', where everything lives forever. The trivial distribution
is a special case of the exponential.
Operationally, the cumulative distribution is supplied as a table of values 'V'
matched one-to-one with a table of probabilities 'P'. Below is an example of a
matched pair of tables corresponding to the projected probability of mortality
by age 'V' among males born in the United Kingdom in years 2003-2005. The table
represents 121 years but needs only 63 entries because of near linearities over
certain ranges of the table. In practice, the tables might, for simplicity and
uniformity, be built with one entry per year of age regardless of near
linearities in the data.
V P V P V P V P V P V P
--- ------- --- ------- --- ------- --- ------- --- ------- --- -------
0 0.00000 48 0.04596 61 0.11653 73 0.30497 86 0.73269 99 0.99042
1 0.00564 50 0.05246 62 0.12622 74 0.32979 88 0.80094 100 0.99373
16 0.00810 51 0.05619 63 0.13716 75 0.35663 89 0.83261 121 1
21 0.01086 52 0.06025 64 0.14872 76 0.38500 90 0.86191
27 0.01530 53 0.06469 65 0.16142 77 0.41529 91 0.88735
31 0.01857 54 0.06939 66 0.17486 78 0.44749 92 0.90970
35 0.02262 55 0.07448 67 0.18937 79 0.48086 93 0.92918
39 0.02764 56 0.08005 68 0.20518 80 0.51585 94 0.94588
41 0.03066 57 0.08597 69 0.22214 81 0.55166 95 0.95917
43 0.03413 58 0.09259 70 0.24052 83 0.62546 96 0.97040
45 0.03829 59 0.09967 71 0.26015 84 0.66231 97 0.97908
47 0.04314 60 0.10752 72 0.28180 85 0.69795 98 0.98562
For a different example, consider a hypothetical manufactured device exposed to
periodic dangers as it moves through a hazardous environment. Suppose the
devices have high probability of failing immediately upon being deployed,
precisely at the one-day boundary, and also precisely at the two-day boundary.
After two days the hypothetical devices have passed through the region of
environmental danger and do not fail. One-fourth of the devices fail at each
stage, leaving one-fourth intact at the end. That would be represented by the
following probability tables passed to this routine.
V P
----------- ----
0 0
0 0.25
1 0.25
1 0.50
2 0.50
2 0.75
10000000000 0.75
10000000000 1
The value "10000000000" above (tenth power of ten) represents infinity. If the
computer arithmetic is able to encode transfinite values, that encoding can be
used. Otherwise a very large finite value, as above, can be substituted.
The table does not really represent a function, since it does not assign a
single value 'P' to each value 'V'. It is what is called a "relation", which
assigns a set of values 'P' to each value 'V"' In this particular case, the
way to think of it is as the path a pen traces on graph paper, from (0,0) ->
(0,1/4) -> (1,1/4) -> (1,1/2) -> (2,1/2) -> (2,3/4) -> (infinity,3/4) ->
(infinity,1). Of course, the pen wouldn't make it all the way to infinity.
For a final example, although there are faster ways to generate such numbers,
here is a two-entry table to generate random numbers uniformly distributed
between -1 and 1:
V P
--- ---
-1 0
1 1
[The faster way is '2*Rand()-1'.]
ENTRY: 'V' is a table of strictly increasing values in the set of random
numbers to be generated.
'P' is a table of probabilities, each being the probability that a random
value will be less than or equal to the corresponding value in 'V'.
'n' is the number of entries in tables 'V' and 'P'.
'g' is given value in the range 'V[0]' to 'V[n-1]', inclusive.
EXIT: 'RandF' contains a random value drawn from the given distribution,
starting at value 'g'.
*/
typedef double dec;
typedef double decs;
//typedef float decs;
dec Val(), Rand();
dec RandF(decs V[], decs P[], int n, dec g)
{ int i; dec r, p, w;
if(V[0]>g || V[n-1]=X[i1]) return Y[i1]; //normal range.
i = Loc(X, i0, i1-i0+1, x); //Bracket the independent variable.
w = X[i+1]-X[i]; //Interpolate linearly within the
if(w) w = (x-X[i])/w; else w = 1; //bracketed range, watching for.
return Y[i] + w*(Y[i+1]-Y[i]); //discontinuities.
}
/*----------------------------------------------------------------------------*
TABLE LOOK-UP
This is a recursive binary search to process an ordered table of 'n' entries in
time proportional to 'log2(n)'. It requires the entries to be strictly
increasing, which may require a small increment (e.g., '1E-10') added in the
case of equal entries.
ENTRY: 'T' addresses a strictly increasing table of two or more values.
'b' contains the beginning entry to be examined in 'T'.
'n' contains the number of entries to be examined in 'T', at least 2.
'v' contains the value to be located in 'T', with 'T[b] <= v <= T[b+n-1]'.
EXIT: 'Loc' indexes the local pair of table entries containing 'v', such that
'T[loc] <= v <= T[loc+1]'.
*/
int Loc(decs T[], int b, int n, dec v)
{
int m = n/2 + n%2;
return m<=1? b: v