/*----------------------------------------------------------------------------* RANDOM NUMBERS FROM ANY DISTRIBUTION, WITH RESAMPLING Given an arbitrary cumulative probability distribution and a given value within that distribution, this routine generates random numbers that represent that part of the distribution where numbers equal or exceed the given value. That is confusing, but translated to an application---age specific mortality---it is less so. For that application, given an arbitrary distribution defining the probability, measured at birth, of a newborn dying on or before a given age, and given an age already achieved by an individual in this population, this routine returns a random number drawn from the distribution telling probabilistically how much longer the individual will live. Mathematically, 'P(x)' is the cumulative distribution, with 'P(0)=0' and 'P(infinity)=1', The given value is 'g'. Random numbers are selected from a transformed distribution, 'F(x) = (P(x+g)-P(g)) / (1-P(g))'. Note that all "memoryless" distributions are invariant under this transformation, including the cumulative exponential distribution, 'P(x)=1-e^(-r x)' and the trivial distribution, 'P(x)=0', where everything lives forever. The trivial distribution is a special case of the exponential. Operationally, the cumulative distribution is supplied as a table of values 'V' matched one-to-one with a table of probabilities 'P'. Below is an example of a matched pair of tables corresponding to the projected probability of mortality by age 'V' among males born in the United Kingdom in years 2003-2005. The table represents 121 years but needs only 63 entries because of near linearities over certain ranges of the table. In practice, the tables might, for simplicity and uniformity, be built with one entry per year of age regardless of near linearities in the data. V P V P V P V P V P V P --- ------- --- ------- --- ------- --- ------- --- ------- --- ------- 0 0.00000 48 0.04596 61 0.11653 73 0.30497 86 0.73269 99 0.99042 1 0.00564 50 0.05246 62 0.12622 74 0.32979 88 0.80094 100 0.99373 16 0.00810 51 0.05619 63 0.13716 75 0.35663 89 0.83261 121 1 21 0.01086 52 0.06025 64 0.14872 76 0.38500 90 0.86191 27 0.01530 53 0.06469 65 0.16142 77 0.41529 91 0.88735 31 0.01857 54 0.06939 66 0.17486 78 0.44749 92 0.90970 35 0.02262 55 0.07448 67 0.18937 79 0.48086 93 0.92918 39 0.02764 56 0.08005 68 0.20518 80 0.51585 94 0.94588 41 0.03066 57 0.08597 69 0.22214 81 0.55166 95 0.95917 43 0.03413 58 0.09259 70 0.24052 83 0.62546 96 0.97040 45 0.03829 59 0.09967 71 0.26015 84 0.66231 97 0.97908 47 0.04314 60 0.10752 72 0.28180 85 0.69795 98 0.98562 For a different example, consider a hypothetical manufactured device exposed to periodic dangers as it moves through a hazardous environment. Suppose the devices have high probability of failing immediately upon being deployed, precisely at the one-day boundary, and also precisely at the two-day boundary. After two days the hypothetical devices have passed through the region of environmental danger and do not fail. One-fourth of the devices fail at each stage, leaving one-fourth intact at the end. That would be represented by the following probability tables passed to this routine. V P ----------- ---- 0 0 0 0.25 1 0.25 1 0.50 2 0.50 2 0.75 10000000000 0.75 10000000000 1 The value "10000000000" above (tenth power of ten) represents infinity. If the computer arithmetic is able to encode transfinite values, that encoding can be used. Otherwise a very large finite value, as above, can be substituted. The table does not really represent a function, since it does not assign a single value 'P' to each value 'V'. It is what is called a "relation", which assigns a set of values 'P' to each value 'V"' In this particular case, the way to think of it is as the path a pen traces on graph paper, from (0,0) -> (0,1/4) -> (1,1/4) -> (1,1/2) -> (2,1/2) -> (2,3/4) -> (infinity,3/4) -> (infinity,1). Of course, the pen wouldn't make it all the way to infinity. For a final example, although there are faster ways to generate such numbers, here is a two-entry table to generate random numbers uniformly distributed between -1 and 1: V P --- --- -1 0 1 1 [The faster way is '2*Rand()-1'.] ENTRY: 'V' is a table of strictly increasing values in the set of random numbers to be generated. 'P' is a table of probabilities, each being the probability that a random value will be less than or equal to the corresponding value in 'V'. 'n' is the number of entries in tables 'V' and 'P'. 'g' is given value in the range 'V[0]' to 'V[n-1]', inclusive. EXIT: 'RandF' contains a random value drawn from the given distribution, starting at value 'g'. */ typedef double dec; typedef double decs; //typedef float decs; dec Val(), Rand(); dec RandF(decs V[], decs P[], int n, dec g) { int i; dec r, p, w; if(V[0]>g || V[n-1]=X[i1]) return Y[i1]; //normal range. i = Loc(X, i0, i1-i0+1, x); //Bracket the independent variable. w = X[i+1]-X[i]; //Interpolate linearly within the if(w) w = (x-X[i])/w; else w = 1; //bracketed range, watching for. return Y[i] + w*(Y[i+1]-Y[i]); //discontinuities. } /*----------------------------------------------------------------------------* TABLE LOOK-UP This is a recursive binary search to process an ordered table of 'n' entries in time proportional to 'log2(n)'. It requires the entries to be strictly increasing, which may require a small increment (e.g., '1E-10') added in the case of equal entries. ENTRY: 'T' addresses a strictly increasing table of two or more values. 'b' contains the beginning entry to be examined in 'T'. 'n' contains the number of entries to be examined in 'T', at least 2. 'v' contains the value to be located in 'T', with 'T[b] <= v <= T[b+n-1]'. EXIT: 'Loc' indexes the local pair of table entries containing 'v', such that 'T[loc] <= v <= T[loc+1]'. */ int Loc(decs T[], int b, int n, dec v) { int m = n/2 + n%2; return m<=1? b: v