{smcl} {* *! version 1.4.0 25feb2012}{...} {cmd:help ghk2()} {hline} {title:Title} {p 4 23 2} {hi:ghk2()} -- Geweke-Hajivassiliou-Keane (GHK) multivariate normal simulator using pre-generated points {title:Syntax} {p 19 25 2} {it:P} {cmd:=} {cmd:ghk2setup(}{it:real scalar n}{cmd:, }{it:real scalar m}{cmd:, }{it:real scalar d}{cmd:, }{it:string scalar type}{cmd:, }{break}{cmd:| }{it:real scalar pi}{cmd:)} {p 8 25 2} {it:real colvector }{cmd:ghk2(}{it:P}{cmd:, }{it:real matrix X}{cmd:, }{it:real matrix V}{cmd:, }{it:real scalar anti}{cmd:, }{it:real scalar start}{cmd:)} {p 8 25 2} {it:real colvector }{cmd:ghk2(}{it:P}{cmd:, }{it:real matrix Xl}{cmd:, }{it:real matrix Xu}{cmd:, }{it:real matrix V}{cmd:, }{it:real scalar anti}{cmd:, }{it:real scalar start}{cmd:)} {p 8 25 2} {it:real colvector }{cmd:ghk2(}{it:P}{cmd:, }{it:real matrix X}{cmd:, }{it:real matrix V}{cmd:, }{it:real scalar anti}{cmd:, }{it:real scalar start}{cmd:,} {it:real matrix dfdx}{cmd:, }{it:real matrix dfdv}{cmd:)} {p 8 25 2} {it:real colvector }{cmd:ghk2(}{it:P}{cmd:, }{it:real matrix Xl}{cmd:, }{it:real matrix Xu}{cmd:, }{it:real matrix V}{cmd:, }{it:real scalar anti}{cmd:, }{it:real scalar start}{cmd:,} {it:real matrix dfdxl}{cmd:, }{it:real matrix dfdxu}{cmd:, }{it:real matrix dfdv}{cmd:)} {p 4 4 2} where {it:P}, if it is declared, should be declared {cmd:transmorphic} {it:P} {pstd} and where {it:dfdx}, {it:dfdxl}, {it:dfdxu}, and {it:dfdv} are outputs {it:real matrix dfdx} {it:real matrix dfdxl} {it:real matrix dfdxu} {it:real matrix dfdv} {title:Input parameters} {pin}{it:n}{bind: }Number of observations for which to prepare draws{p_end} {pin}{it:m}{bind: }Draws per observation and simulated dimension{p_end} {pin}{it:d}{bind: }Dimension of cumulative integrals for which to be prepared to simulate{p_end} {pin}{it:type}{bind: }Sequence type{p_end} {pin}{it:pi}{bind: }Starting index of prime bases for Halton sequences (1->2, 2->3, 3->5, 4->7...) (default=1){p_end} {pin}{it:P}{bind: }Draws prepared by {cmd:ghk2setup()}{p_end} {pin}{it:X}{bind: }Upper bounds of integration{p_end} {pin}{it:Xl, Xu}{bind: }Lower and upper bounds of integration{p_end} {pin}{it:V}{bind: }Covariance matrix{p_end} {pin}{it:anti}{bind: }Optional dummy for inclusion of antithetics (default=0){p_end} {pin}{it:start}{bind: }Starting point to use in block of draws prepared by {cmd:ghk2setup()}{p_end} {title:Output parameters} {pin}{it:dfdx}{bind: }Scores with respect to {it:X}{p_end} {pin}{it:dfdxl, dfdxu}{bind: }Scores with respect to {it:Xl, Xu}{p_end} {pin}{it:dfdv}{bind: }Scores with respect to {it:V}, stored as vectorized lower-triangular matrices{p_end} {title:Description} {pstd} {cmd:ghk2()} implements the Geweke-Hajivassiliou-Keane algorithm for simulating the cumulative multivariate normal distribution, optionally computing scores, and optionally accepting lower as well as upper bounds. (See Cappellari and Jenkins 2003, 2005; Gates 2006.) It is modeled on {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfast()}, which is included in Stata 10, and which see for more explanation. Like {cmd:ghkfast()}, its first argument is a pre-generated set of points on the unit interval, in this case produced by {cmd:ghk2setup()}, which has the same syntax and semantics as {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfastsetup()}. The two commands' point sets are not interchangeable. {cmd:ghk2()} differs from {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfast()} in the following ways: {p 6 8 2} * It accepts lower as well as upper bounds for integration (second and fourth syntaxes above). This allows efficient estimation of probabilities over bounded rectilinear regions such as {(x1, x2) | l1<=x1<=u1, l2<=x2<=u2}. Without this feature, the routine would need to be called 2^d times, where d is the dimension of distribution. For example, the probability just mentioned would have to be computed as Phi(u1, u2) - Phi(l1, u2) - Phi(u1, l2) + Phi(l1, l2), where Phi is a bivariate cumulative normal distribution with some given covariance structure. Individual entries in the lower and upper bounds, {it:Xl} and {it:Xu}, may be missing ("."), and are interpreted as -infinity and +infinity, respectively. The fourth syntax differs from the second in requesting score matrices for upper and lower bounds, as well as for the covariance matrix {it:V}. {p 6 8 2} * {cmd:ghk2()} does not "pivot" the bounds of integration (in {it:X}, {it:Xl}, or {it:Xu}). On the recommendation of Genz (1992), {help mf_ghk:ghk()} and {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfast()} reorder each vector of bounds to put the larger entries toward the end, which reduces the variability of the simulated probability. However, pivoting has the disadvantage of creating discontinuities in results. Small changes in the bounds can produce relatively large ones in the results when they trigger reorderings of the pivoted vector. Especially when the number of draws is low, these discontinuities can stymie a search by {help ml:ml}. Thus {cmd:ghk2()} behaves very smoothly even at low draw counts, at the expense of more variability. (As of Stata 10.1, {help mf_ghk:ghk()} and {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfast()} also allow pivoting to be turned off.) {p 6 8 2} * {cmd:ghk2()} works in Stata 9. Stata 9 does ship with {help mf_ghk:ghk()}, but this does not use pre-generated points, and so is slower. {p 6 8 2} * {cmd:ghk2()} is generally faster than {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfast()}, at least in single-processor versions of Stata. It is optimized for contexts with a large number of observations relative to draws. In extreme cases, such as 10,000 observations and 10 draws, it can perform an order of magnitude faster. But at the opposite extreme, with, say, 100 observations and 1,000 draws, it can run half as fast.{p_end} {p 6 8 2} * The {it:start} argument allows the user to shift the starting observation within the pre-computed block of draws. E.g., if the pre-computed block of draws is for 200 observation rows, when calling {cmd:ghk2()} with a data matrix that has only 100 rows the {it:start} argument would allow rows 101-200 of the draws to be used rather than the usual 1-100. {p 6 8 2} * The {it:anti} argument, specifying whether to include antithetical draws, is required. Any non-zero value is interpreted as requesting them. {p 6 8 2} * It does not take a {it:rank} argument. ({help mf_ghk:ghk()} and {browse "http://www.stata.com/help.cgi?mf_ghkfast":ghkfast()} lost it in Stata 10.1 as well.) {title:Remarks} {p 4 4 2} The {it:type} argument may be {cmd:"halton"}, {cmd:"hammersley"}, {cmd:"random"}, or {cmd:"ghalton"}. "Random" and generalized Halton sequences are influenced by the state of the random number generator just before {cmd:ghk2setup()} is called. See {help mf_uniform:[M-5] uniform()}. {p 4 4 2} The {cmd:ghk2()} routine performs error checking and then calls one of four additional routines, whose syntaxes correspond to the four listed above: {cmd:_ghk2()}, {cmd:_ghk2_2()}, {cmd:_ghk2_d}, and {cmd:_ghk2_2d}. You can call these routines directly for a slight speed gain. {title:Examples (colored text is clickable)} {phang}{cmd:* ghk() and ghkfast() syntax changed in Stata 10.1, but these examples are not updated yet.}{p_end} {phang}. {stata version 9.0}{p_end} {phang}{cmd:* Exact matches, using Halton sequence}{p_end} {phang}{matacmd p = ghkfastsetup(10000, 5, 3, "halton")}{p_end} {phang}{matacmd p2 = ghk2setup(10000, 5, 3, "halton")}{p_end} {phang}{matacmd V = 1, .5, .4 \ .5, 1, .3 \ .4, .3, 1}{p_end} {phang}{matacmd rank = dfdx = dfdv = .}{p_end} {phang}{matacmd anti = 0}{p_end} {phang}{matacmd start = .}{p_end} {phang}{cmd:* Exact matches, using Halton sequence}{p_end} {phang}{matacmd ghk((1,2,3), V, (1,5), rank)}{p_end} {phang}{matacmd ghkfast(p, (1,2,3), V, rank)}{p_end} {phang}{matacmd ghk2(p2, (1,2,3), V, anti, start)}{p_end} {phang}{cmd:* Inexact matches because ghk() and ghkfast() pivot the data vector, ordering from low to high}{p_end} {phang}{matacmd ghk((3,2,1), V, (1,5), rank)}{p_end} {phang}{matacmd ghkfast(p, (3,2,1), V, rank)}{p_end} {phang}{matacmd ghk2(p2, (3,2,1), V, anti, start)}{p_end} {phang}{cmd:* Timing comparisons for many observations, few draws, with and without score computation}{p_end} {phang}{matacmd X = J(10000,3,1)}{p_end} {phang}{matacmd timer_clear()}{p_end} {phang}{matacmd timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1)}{p_end} {phang}{matacmd timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2)}{p_end} {phang}{matacmd timer()}{p_end} {phang}{matacmd timer_clear()}{p_end} {phang}{matacmd timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv)); timer_off(1)}{p_end} {phang}{matacmd timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2)}{p_end} {phang}{matacmd timer()}{p_end} {phang}{cmd:* Timing comparisons for fewer observations, many draws, including antithetical draws}{p_end} {phang}{matacmd anti = 1}{p_end} {phang}{matacmd p = ghkfastsetup(1000, 250, 3, "halton")}{p_end} {phang}{matacmd p2 = ghk2setup(1000, 250, 3, "halton")}{p_end} {phang}{matacmd X = J(1000,3,1)}{p_end} {phang}{matacmd timer_clear()}{p_end} {phang}{matacmd timer_on(1); mean(ghkfast(p, X, V, rank, ., anti)); timer_off(1)}{p_end} {phang}{matacmd timer_on(2); mean(ghk2(p2, X, V, anti, start)); timer_off(2)}{p_end} {phang}{matacmd timer()}{p_end} {phang}{matacmd timer_clear()}{p_end} {phang}{matacmd timer_on(1); mean(ghkfast(p, X, V, rank, ., anti, dfdx, dfdv)); timer_off(1)}{p_end} {phang}{matacmd timer_on(2); mean(ghk2(p2, X, V, anti, start, dfdx, dfdv)); timer_off(2)}{p_end} {phang}{matacmd timer()}{p_end} {phang}{cmd:* Demonstration of using lower and upper bounds. The two versions agree asymptotically in the number of draws.}{p_end} {phang}{cmd:* The first is 8 times faster than the last.}{p_end} {phang}{matacmd l1=l2=l3=0; u1=1; u2=2; u3=3}{p_end} {phang}{matacmd ghk2(p2, (l1,l2,l3), (u1,u2,u3), V, anti, start)}{p_end} {phang}{matacmd ghk2(p2,(u1,u2,u3),V,1,.)-ghk2(p2,(l1,u2,u3),V,1,.)-ghk2(p2,(u1,l2,u3),V,1,.)-ghk2(p2,(u1,u2,l3),V,1,.)+ghk2(p2,(l1,l2,u3),V,1,.)+ghk2(p2,(u1,l2,l3),V,1,.)+ghk2(p2,(l1,u2,l3),V,1,.)-ghk2(p2,(l1,l2,l3),V,1,.)}{p_end} {title:Conformability} {cmd:ghk2setup(}{it:n}{cmd:, }{it:m}{cmd:, }{it:d}{cmd:,} {it:type}{cmd:, | }{it:pi}{cmd:)}: {it:n}: 1 {it:x} 1 {it:m}: 1 {it:x} 1 {it:d}: 1 {it:x} 1 {it:type}: 1 {it:x} 1 {it:pi}: 1 {it:x} 1 {it:result}: {it:transmorphic} {cmd:ghk2(}{it:P}{cmd:,} {it:X}{cmd:, }{it:V}{cmd:, }{it:anti}{cmd:, }{it:start}{cmd:)}: {it:input:} {it:P}: {it:transmorphic} {it:X}: {it:n x d} {it:V}: {it:d x d} (symmetric, positive definite) {it:anti}: 1 {it:x} 1 {it:start}: 1 {it:x} 1 {it:output:} {it:result}: n {it:x} 1 {cmd:ghk2(}{it:P}{cmd:,} {it:Xl}{cmd:, }{it:Xu}{cmd:, }{it:V}{cmd:, }{it:anti}{cmd:, }{it:start}{cmd:)}: {it:input:} {it:P}: {it:transmorphic} {it:Xl}: {it:n x d} {it:Xu}: {it:n x d} {it:V}: {it:d x d} (symmetric, positive definite) {it:anti}: 1 {it:x} 1 {it:start}: 1 {it:x} 1 {it:output:} {it:result}: n {it:x} 1 {cmd:ghk2(}{it:P}{cmd:,} {it:X}{cmd:, }{it:V}{cmd:, }{it:anti}{cmd:, }{it:start}{cmd:, }{it:dfdx}{cmd:, }{it:dfdv}{cmd:)}: {it:input:} {it:P}: {it:transmorphic} {it:X}: {it:n x d} {it:V}: {it:d x d} (symmetric, positive definite) {it:anti}: 1 {it:x} 1 {it:start}: 1 {it:x} 1 {it:output:} {it:result}: {it:n x} 1 {it:dfdx}: {it:n x d} {it:dfdv}: {it:n x d}({it:d}+1)/2 {cmd:ghk2(}{it:P}{cmd:,} {it:Xl}{cmd:, }{it:Xu}{cmd:, }{it:V}{cmd:, }{it:anti}{cmd:, }{it:start}{cmd:, }{it:dfdxl}{cmd:, }{it:dfdxu}{cmd:, }{it:dfdv}{cmd:)}: {it:input:} {it:P}: {it:transmorphic} {it:Xl}: {it:n x d} {it:Xu}: {it:n x d} {it:V}: {it:d x d} (symmetric, positive definite) {it:anti}: 1 {it:x} 1 {it:start}: 1 {it:x} 1 {it:output:} {it:result}: {it:n x} 1 {it:dfdxl}: {it:n x d} {it:dfdxu}: {it:n x d} {it:dfdv}: {it:n x d}({it:d}+1)/2 {title:Source code} {p 4 4 2} {view ghk2.mata, adopath asis:ghk2.mata} {title:References} {p 4 8 2}Cappellari, L., and S. Jenkins. 2003. Multivariate probit regression using simulated maximum likelihood. {it:Stata Journal} 3(3): 278-94.{p_end} {p 4 8 2}Cappellari, L., and S. Jenkins. 2006. Calculation of multivariate normal probabilities by simulation, with applications to maximum simulated likelihood estimation. {it:Stata Journal} 6(2): 156-89.{p_end} {p 4 8 2}Gates, R. 2006. A Mata Geweke-Hajivassiliou-Keane multivariate normal simulator. {it:Stata Journal} 6(2): 190-213.{p_end} {p 4 8 2}Genz, A. 1992. Numerical computation of multivariate normal probabilities. {it:Journal of Computational and Graphical Statistics} 1: 141–149.{p_end} {title:Author} {p 4}David Roodman{p_end} {p 4}Senior Fellow{p_end} {p 4}Center for Global Development{p_end} {p 4}Washington, DC{p_end} {p 4}droodman@cgdev.org{p_end} {title:Also see} {p 4 13 2} Online: {bf:{help mf_ghk:[M-5] ghk()}}, {bf:{help mf_ghkfast:[M-5] ghkfast()}}, {bf:{help mf_halton:[M-5] halton()}} {p_end}