190
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Bayesian semi-parametric estimation of compound inhomogeneous Poisson processes for ultra-high frequency financial transaction data

&
Received 30 Jun 2023, Accepted 24 Feb 2024, Published online: 12 Mar 2024

Abstract

Marked point processes provide a flexible framework for studying ultra-high frequency financial data that records the time and price for each transaction. This paper estimates compound, inhomogeneous Poisson processes (CIPP) where trades arrive according to an inhomogeneous Poisson process, and returns are drawn from a distribution after arrival. The intensity functions have Gaussian process priors to model time-of-day (diurnal) effects, which have bursts of activity at the open and close and a midday lull, and lagged returns from the previous trade, which are idiosyncratic and depend on the stock. The nonparametric density estimator for returns imposes a shape constraint to obtain a unimodal distribution that is asymmetric and leptokurtic. A mixture model separates trades into time sensitive and time insensitive trades with different intensity functions. The latter group tends to have larger orders and higher returns. The proposed model is compared to B-splines and geometric Brownian motion.

1. Introduction

Diffusion processes for financial assets have had an enormous impact on financial research and practice since Bachelier (Citation1900) and Black and Scholes (Citation1973). The price series to estimate these models are usually sampled at regular intervals, such as the daily closing prices. Empirically, the movement in the sampled data often exceeds what would be expected from pure diffusion processes, which motivated numerous modifications and extensions (Durbin and Koopman Citation2000; Barndorff-Nielsen and Shephard Citation2001, Citation2006; Andersen et al. Citation2002; Andersen et al. Citation2003; Griffin and Steel Citation2006). Jump diffusion processes, which superimposes a jump process on the diffusion process (Press Citation1967, Citation1968; Merton Citation1976), model the outliers in the price series from the diffusion processes.

Despite the success of diffusion processes and their extensions, Engle (Citation2000), Engle and Russell (Citation1998), Russell and Engle (Citation2010), and Fodra and Pham (Citation2013), among others, maintain that ultra-high frequency data that records each transaction instead of sampling at fixed time intervals allows opportunities to discover stylised facts about the price process. Cont (Citation2001), Kiem and Madhavan (Citation1996), and Wood et al. (Citation1985) provide empirical generalisations based on ultra-high frequency data such as the nature of the return distribution, auto-correlation, and stochastic volatility (non-constant variance) and the impact of time sampling on these statistics. Easley and O'Hara (Citation1992) and Easley et al. (Citation1997) test micro-economic models about the price discovery processes, to single out only a few representative instances in a rich and diverse literature (Campbell et al. Citation1997; Hautsch Citation2012). Additionally, a relatively new development in finance is high frequency trading (HFT), where firms with powerful computer systems, often located adjacent to clearinghouses’ servers, transact a large number of orders in fractions of seconds. Clark (Citation2010) summarises risks posed by HFT. O’Hara (Citation2015) summarises strategies for HFT, such as leveraging speed of trading, exploiting deterministic patterns from simple algorithms of other traders, and manipulating prices by triggering bidding wars. The study of HFT requires ultra-high frequency data because daily aggregates will mask the activity at the micro second level.

Marked point processes (Engle and Russell Citation1998; Engle Citation2000) provide a flexible and general framework for ultra-high frequency data and assume that the duration times between trades are stochastic and returns at trades are drawn from an appropriate model. Marked point processes provide a high resolution picture of the price discovery process that is not readily available from diffusion processes. Theoretically, as the sampling intervals for diffusion processes decrease, the continuity of the sample paths should persist. Empirically, realised prices are constant between transactions and can only change at transactions, if then. One bridge from continuous diffusion processes to ultra-high frequency data is to assume that there exists a latent price process that evolves between trades (Elerian et al. Citation2001) and that the empirical price process censors the latent one.

A special class of marked point process are compound, inhomogeneous Poisson process (CIPP) where transactions arrive according to an inhomogeneous Poisson process, and returns at arrivals are a random sample from a common distribution. Poisson processes are reasonable models for transactions of liquid or highly traded assets: they assume a large population of investors and a small probability that any particular investor makes a trade during a small time interval. CIPP can be consistent with the efficient market hypothesis, which implies that all information is encapsulated in stock prices (Fama Citation1970; Firth Citation1975; Basu Citation1977; Roll Citation1984). The efficient market hypothesis predicts that prices change in response to new, unexpected information that alters investors’ evaluation of the firm’s economic value. Though it may not hold in detail for all situations, it seems to be broadly true (Ying et al. Citation2019). A consequence of efficient markets is that past returns are not predictive of future returns, and CIPP assumes that returns are a random sample from a common distribution. The efficient market hypothesis is less proscriptive about the timing of trades. In a perfect world, economically rational investors would transact trades the moment they receive new information that changes their valuation of a firm’s value. In reality, information is noisy, various frictions delay updating and trading, and behavioural biases (Hirshleifer Citation2015) derail the best intentions to be ‘rational’.

We propose semi-parametric Bayes models for the intensity function and return distribution for CIPP. The intensity of the Poisson process has flexible functions of the time-of-day and lagged return from the last trade. These functions have Gaussian process priors. The parametric model for the intensity function includes day-of-week effects and the effect of the lagged duration between the last, two observed trades, which is similar to autoregressive conditional duration (ACD) model (Engle and Russell Citation1998; Ghysels et al. Citation2004; Russell and Engle Citation2006).

The model uses the spectral representation or Fourier series for Gaussian processes for the prior distribution of the return’s density (Lenk Citation1993, Citation2003) and arrival rates (Lenk Citation1999). The spectral representation projects the Gaussian process onto orthonormal basis functions to obtain a linear model for the Gaussian process. Smoothing priors (Whittaker Citation1923; Wahba Citation1978) for the spectral coefficients attenuate the effects of noisy, high-frequency basis functions by shrinking them towards zero. The smoothing prior determines both the smoothness of the functions and the tradeoff between the prior specification and the data. Our smoothing prior has a hierarchical Bayes specification, which allows the estimation of the smoothing parameter from the data. The nonparametric density estimator imposes a unimodal constraint by assuming that the density has an inverted U-shape (Lenk and Choi Citation2017), which is obtained by constraints on the derivatives of the nonparametric function (Ramsay Citation1998).

The Poisson intensity function has a bathtub shape for arrival rates during the trading day with a flurry of activity at the market open and close and a lull during mid-day. The relationship with the return from the previous trade seems idiosyncratic and depends on the security: overall, large price movements tend to trigger faster trading. Additionally, there are day-of-week effects and effects for the duration between the previous two trades. If the time between the last two trades is long, then the time until the next trade tends to be long too. The return distribution is slightly asymmetric and leptokurtic. We also compare the estimated model to geometric Brownian motion. Over longer periods of several days, the two are similar; however, they do deviate within a trading day.

We also propose a mixture model to illustrate segmenting the data into two components based on different functional forms of the Poisson intensity function. We call the two groups ‘time sensitive trades’ (TST) and ‘time insensitive trades’ (TIT) because the intensity function for TST includes diurnal effects and excludes information about the most recent trade, while TIT responds to the previous trade and excludes diurnal effects. Conceptually, TIT uses software and computer systems to trade without direct human intervention; it monitors the flow of trades and responds to it without regard to the time of day. Using their latency advantages, TIT are more responsive to the last trades than TST, which may be initiated by human decision making. Since TST is sensitive to diurnal factors and less responsive to the previous trade, which may have occurred in less than one second, we conceptually think of TST as being executed by humans instead of a trading algorithm. A logistic regression model for TIT indicates that they tend to have larger order sizes and returns than TST trades.

The paper has the following structure. Section 2 reviews compound, inhomogeneous Poisson processes for ultra-high frequency trading data. It also presents a simulation that highlights how time sampling a CIPP can lead to erroneous conclusions about the data-generating process. In particular, the time-sampled CIPP exhibits stochastic volatility (Engle Citation1982; Bollerslev Citation1986, Citation1987; Bollerslev et al. Citation1992) and leptokurtic returns, even though the CIPP simulation samples independent returns from a normal distribution with a constant variance. Section 3 develops the Bayesian nonparametric model for the Poisson intensity function and the return distribution. Section 4 presents the mixture model. Section 5 details the MCMC algorithm. Section 6 presents a simple model for off-hour trading. Off-hour trading could be ignored because it accounts for a very small proportion of all trades; however, it may be incumbent on researchers to include such details when investigating micro-structure with high-frequency data. Section 7 estimates the model with Trade and Quote data (TAQ) from Wharton Research Data Services and illustrates simulating from the model for option pricing. Section 8 concludes the paper.

2. Compound inhomogeneous Poisson process

The evolution of the stock price Yt at time t can be written as a marked process that depends on the arrival of transactions and the rate of return at each transaction: (1) Yt=Ysi=1N(s,t)(1+ri)fort>s(1) where Ys is the stock price at time s; N(s,t) is the number of transactions in the time interval s to t, and ri is the return for the ith transaction in s to t. Yt is constant between trades and only jumps at trades. Equation Equation(1) is a well-known identity from financial accounting and not a modelling assumption. Different assumptions about the number of transactions and return process result in different models.

We assume that the number of trades N(s,t) between s and t, is an inhomogeneous Poisson process (IPP) with intensity function λ(t) at time t. The Poisson assumption is appropriate for liquid markets with a large number of investors who act independently and have a small probability of making a transaction at any time conditional on the intensity function. Over time, the probability of investing changes with new information or investor whimsy, so unconditionally it can appear that trades are correlated because trades cluster. The distribution and expected value of N(s,t) are (Çinlar Citation1975): (2) P[N(s,t)=k]=exp[Λ(s,t)]Λ(s,t)kk!fork=0,1,E[N(s,t)]=Λ(s,t)=stλ(x)dxfors<t(2) Additionally, the number of trades over disjoint time intervals are mutually independent. The distribution of the time between trades or the duration time can be derived from Equation Equation(2). If a trade occurs at time s, then the time of the next trade is T. If T is larger than t, then there has not been any arrivals between s and t, and N(s,t) is zero. The probability that the time of the next trade exceeds t given that the last trade occured at s is: P(T>t|Last trade ats)=P[N(s,t)=0]=exp[Λ(s,t)]fort>sThus, T has a generalised exponential distribution with cumulative distribution FT and density fT: (3) FT(t|s)=1exp[Λ(s,t)]fT(t|s)=λ(t)exp[Λ(s,t)]fort>s(3) The times between trades are mutually independent, generalised exponential distributions. The duration between trades is T-s. The log-likelihood function for the instantaneous rate function for n trades at time t0 < t1 < … < tn where t0 is the origin is: (4) L(λ)={i=1nlog[λ(ti)]}Λ(t0,tn)(4)

With our data the time stamp for trades are recorded in seconds, and we assume that they are rounded down.Footnote1 The likelihood in Equation Equation(4) is easily modified for discrete time recording because numerical integration is used to compute Λ. The discrete version of Equation Equation(3) is: (5) P(Ts+1|Last trade at s)=1exp[Λ(s,s+1)]P(s+k<Ts+k+1|Last trade at s)=exp[Λ(s,s+k)]exp[Λ(s,s+k+1)]fork=1,2,(5) CIPP requires a distribution for the returns. The efficient market hypothesis implies that the distribution of stock returns does not depend on past information, so we assume that the returns ri for each transaction are random draws from the density fR.

Time sampling from the CIPP can lead to return processes that have non-constant variance or stochastic volatility even though returns for each trade are independent. The total return from time s to t for s < t from Equation Equation(1) is: R(s,t)=YtYsYs=i=1N(s,t)(1+ri)1If the returns ri are mutually independent with mean µ and standard deviation σ, then the mean and variance of the total returns are: (6) E[R(s,t)]=exp[Λ(s,t)μ]1V[R(s,t)]=exp[Λ(s,t)(σ2+μ2+2μ)]exp[2Λ(s,t)μ](6)

When time sampling prices in equal length intervals, the variance of the sampled process in Equation Equation(6) is not constant if the integral of the intensity function over each time period Λ(s,t) is not constant.

To illustrate non-constant variance for the time sampled process, we simulate data from a CIPP where the returns are independent random draws from a normal distribution with mean 0 and standard deviation 0.04, and the intensity function that has a 20 d period.Footnote2 Sampling the price process at the daily close results in leptokurtic returns and stochastic volatility even though returns are normally distributed and have a constant variance. Figure graphs some of the main features of the CIPP model. Panel A graphs a realisation of the CIPP for 100,000 simulated trades over 500 trading days. Panel B graphs the return process from sampling the daily close and exhibits stochastic volatility. A GARCH(0,3) model provides a good fit and has highly significant coefficients. Panel C is the histogram of daily returns with kurtosis of 2.78 even though the true kurtosis is zero. The autocorrelation function for the squared daily returns, which is a measure of daily volatility, in Panel D provides further evidence for stochastic volatility. The simulation illustrates that time sampling from a CIPP introduces statistical features that are not present in the actual data-generating process. The key reason is that the number daily trades are random, and the Poisson rate parameter is not constant from day to day.

Figure 1. Simulated CIPP and daily close. (A) Simulated price process; (B) Returns based on daily close; (C) Histogram of returns from daily close; (D) ACF of squared returns from daily close.

Figure 1. Simulated CIPP and daily close. (A) Simulated price process; (B) Returns based on daily close; (C) Histogram of returns from daily close; (D) ACF of squared returns from daily close.

3. Bayesian semiparametric model for CIPP

We use the following conventions that t is the consecutive number of seconds since the beginning of the observational period and that td is the time of day since midnight of the current day in seconds. For the first day of trading, t and td are equal. If t is in trading day n, then t = nk + td where k is 86,400 s per day. Our model considers three time periods: pre-open (4.00am to 9.30am), regular trading (9.30am to 4.00pm), and post-close (4.00pm to 8.00pm). The next section presents the model for pre-open and post-close trading. The processes restart at the opening of the three periods.

During regular trading hours, the CIPP’s intensity function λ is a function of the time of day td: (7) λ(td|x,r)=exp[xβ+A(r)+B(td)](7) where x is a vector of covariates with regression coefficients β; A(r) is a flexible function of the lagged return from the previous transaction; and B(td) is a flexible function of the time of day td. In the application, the covariates are indicator variables for the day of week and the lagged duration between the last two transactions. The prior distribution for β is a normal distribution with mean mβ0 and variance Vβ02: β ∼ N(mβ0, Vβ02).

Since A and B are unknown functions, we assume that they have Gaussian process priors. We use a Fourier series representation or spectral analysis (Lenk Citation1999) to linearise the Gaussian processes. If Z is a second-order Gaussian process on the set ℬ, the mean process is E[Z(s)] = µ(s), and covariance function is Cov[Z(s),Z(t)] = ν(s,t). Without loss of generality, we set the mean process to zero since the parametric model could include the mean. In our case, ℬ is either regular trading hours from 9.30am to 4.00pm for the NYSE or an interval for the returns. The Karhunen-Loève representation projects Z into a linear space: Z(s)=k=1θkφk(s)where φk are orthonormal basis functions. The spectral coefficients θk are defined as: θk=BZ(s)φ¯k(s)dswhere φ¯k is the complex conjugate of φk. The distribution of the spectral coefficients is inherited from the prior distribution of Z. Because the basis functions are orthonormal and Z is a Gaussian process, the spectral coefficients are mutually independent and have normal distributions. Their means and variances are given by: E(θk)=Bμ(s)φ¯k(s)dsμkVar(θk)=BBυ(s,t)φ¯k(s)φk(t)dsdtνk2Conversely, given the means and variances of the spectral coefficients, the mean and covariance of the Gaussian process is: μ(s)=k=1μkφk(s)υ(s,t)=k=1νk2φk(s)φ¯k(t)provided that the variances νk2 have a finite sum.

In the empirical section, the basis is the cosine basis: φk(s)=[2g0(s)]12cos[G0(s)]forsBandk>1where g0 and G0 are the density and cumulative distribution functions for a base measure on ℬ. The cosine functions are an orthonormal basis on ℬ and are complete for piecewise continuous functions (Kreider et al. Citation1966). The cumulative distribution transforms G0 are the Cauchy distribution for A(r) and the uniform distribution for B(td). The Fourier series do not include sine functions because the functions would be periodic on ℬ, and they do not include the constant function because it is part of the parametric model. Other choices of basis can be used.

An important issue with nonparametric estimation is overfitting the data, which leads to functions with spurious oscillations. The spectral representation provides a natural method to specify smoothness because the high frequency spectral coefficients will be small for smooth functions. If the spectral coefficients decay at rate o(km) for some m > 1, then the function is m times differentiable almost everywhere (Katznelson Citation2004). An exponential hierarchical Bayes (HB) smoothing prior shrinks the spectral coefficients to 0, and the amount of shrinkage increases with frequency: (8) θkN[0,τ2exp()](8) The variance parameter τ controls the tradeoff between the data and the smoothing prior. If τ is small (large), then the model puts more weight on the prior (data), and the function will be smoother (rougher). The variance parameter γ allows the amount of shrinkage towards zero to depend on frequency k: high-frequency spectral coefficients have more shrinkage to zero than low-frequency coefficients, thus dampening the high-frequency oscillations in the function that are due to noise and not signal.

Silverman (Citation1985) observed that most nonparametric methods can be explicitly or implicitly viewed as kernel regression method. The covariance function for the Gaussian process priors plays the role of the kernel. Panel A of Figure plots the spectral covariance function when τ = 2 and γ = 1 on the unit square for the exponential HB smoother. The covariance has a saddle shape and is maximum at the end points, which encodes the information that estimates of the regression functions at the endpoints tend to be more uncertain than those in the middle. The prior covariance is negative at the corners (0, 1) and (1, 0). One issue with flexible function estimation is end effects. This covariance attenuates end effects by increasing the prior variance at the end points of ℬ, which gives the observations near the end points more weight when estimating the function at the endpoints. The prior variance function is smallest at the middle, so observations at the middle have less weight in estimation. For comparison, Panel B of Figure plots the implied covariance functionFootnote3 for B-splines (Eilers and Marx Citation1996) where the prior standard deviation τ of the coefficients are 2, and there are equally spaced internal knots from 0 to 1 for a total of 13 knots from the function bs() of the R package splines. The covariance of the B-spline is concentrated along a diagonal band and is zero otherwise; only observations that are close to the abscissa are used in estimating the ordinate.

Figure 2. Covariance functions. (A) Spectral covariance function τ = 2 and γ = 1; (B) B-spline covariance function τ = 2 and 13 knots.

Figure 2. Covariance functions. (A) Spectral covariance function τ = 2 and γ = 1; (B) B-spline covariance function τ = 2 and 13 knots.

Instead of assuming fixed values for the smoothing parameters τ and γ or using cross-validation, we take a fully Bayesian approach with a hierarchical Bayes prior: (9) τ2IG(r02,s02),theinverseGammadistributionγG(1,wγ,0),theGammadistribution.(9) In Equation Equation(9), the inverse Gamma distribution has shape parameter r0/2, scale parameter s0/2, and mean s0/(r0–2); and the Gamma distribution has shape parameter 1 (Exponential distribution), scale parameter w0, and mean 1/w0. The infinite series has to be truncated to implement it. The modelling strategy is to use a large number of cosine functions and let the smoothing prior shrink the insignificant ones towards zero instead of trying to pick the optimal number of cosine terms. Since smooth functions do not have high frequency oscillations, the estimates are insensitive to the number of cosines as long as the cutoff is sufficiently large.

(Lenk Citation1993, Citation2003) proposed a logistic-Gaussian prior for the nonparametric estimation of the density fR: fR(r)=exp[Z(r)]abexp[Z(u)]du for arbwhere Z is a Gaussian process. However, this model is too flexible and will result in multiple modes due to the lumpiness in returns of ultra-high frequency data: empirically returns cluster because prices are discrete to the nearest $0.01. Unimodal shape constraints are imposed by assuming that the derivative of fR before zero is positive and the derivative after zero is negative by introducing an auxiliary function h whose derivative changes signs at zero (Lenk and Choi Citation2017): (10) fR(r)=g0(r)exp{arh(s)[Z(s)]2ds}abg0(w)exp{awh(s)[Z(s)]2ds}dwZ(s)=k=0K2cos[Go(s)]θf,kforas,rbh(s)=1exp(100s)1+exp(100s)(10) where a and b are sufficiently large to include the domain of observed returns, and the base measure G0 is a Cauchy distribution that has a mode at zero. The auxiliary logistic function h is between 1 and −1 and changes signs at 0. Its ‘slope’ is fixed to 100 so that it is essentially 1 before −0.1 and −1 after 0.1. The Gaussian process Z has a finite spectral representation that includes the constant function; however, the sign of the constant term θf,0 is not identified. We restrict it to be positive by assuming that it has a truncated normal distribution on the positive numbers. If the spectral coefficients θf,k are 0, then the density is the base density g0. Z is squared in the exponent to make it a positive function. The first derivative of fR: fR(r)={h(r)[Z(r)]2+g0(r)g0(r)}fR(r)is positive before zero and negative after zero because h and the derivative of g0 are positive before 0 and negative after zero, and the other quantities are positive. Thus, the return density fR is a unimodal with mode at zero. The spectral coefficients θf,k for k > 0 have the exponential HB smoothing priors given by Equations Equation(8) and Equation(9).

4. Mixture model

This section proposes a mixture model for the intensity functions where trades can be classified into two or more groups. These groups can be meaningful for economic theory and the simplest dichotomy is ‘rational’ versus ‘noise’ trades (Ramiah et al. Citation2015). Stocks are a contingent claim on future earnings, so the economic value of a stock is the expected present value of free cash flow. Rational trades occur when the market price, which is determined by supply and demand conditions, deviates from their economic value. Rational traders may disagree about economic value because they can have different opinions of future cash flow or use different discount rates, thus leading to different valuations; however, in the long run market prices should adhere to economic value for liquid assets in efficient markets with rational investors, and trading data reveals the price discovery process. In contrast, noise trades are triggered by various processes, including behavioural, psychological, and situational factors. These trades introduce noise into the price discovery process. Market anomalies, i.e. material deviations between economic value and stock price, may occur if enough investors are subject to the same biases, resulting in potential market failures.

The distinctions between rational and noise trades are easier to make in theory than empirically. The ideal situation would be an algorithm that classifies trades as rational or noise in real time. An economic approach would require a valuation model to determine if a trade was consistent with economic theory; however, the algorithm’s valuation may be one of many possible realisations of the future economy. At the other extreme, a purely data-driven classification model would attempt to find clusters in the data. When applied to trading times, it tends to find outliers from the inhomogeneous Poisson process model.

This section proposes a mixture model between the two extremes of using an economic evaluation model and cluster analysis by including different variables in the Poisson intensity function for the different segments of trades. We do not claim to solve the rational versus noise question and refrain from using those labels. The motivation for the model is that economic valuation models typically do not include diurnal factors. The first group, called ‘time insensitive trades’ or TIT, do not systematically respond to diurnal factors. We vaguely conceptualise this group as trades that are executed by algorithms, which have the ability to respond to recent trades and do not become sleepy after lunch. The second group, called ‘time sensitive trades’ or TST, are subject to diurnal effects but lack the speed to respond to the most recent trades. We conceptualise TST as being executed by human traders. The intensity function for TST includes time of day and day of week and excludes lagged returns and lagged durations.

The mixture model modifies the intensity function in Equation Equation(7) for the time sensitive trades (TST) and time insensitive trades (TIT). The Poisson intensity functions for TST (Z = 0) and TIT (Z = 1) are: (11) λ0(td)=exp[β1Mo+β2Tu+β3Wd+β4Th+β5Fr+B(td)]for TSTλ1(td)=exp[β0+β6D+A(r)] for TIT(11) where Mo, Tu, Wd, Th, and Fr are day-of-week indicator variables; D is the duration between the previous two trades before time t; A(r) is the flexible function of the previous return, and B(td) is the flexible function for time of day. Additionally, we use a logistic regression model for time insensitive trades: (12) logit(P[TIT])=α0+α1Return+α2log(Order Size)(12) where Return is the return between the current trade and the next trade, and Order Size is the number of shares being transacted. Equation Equation(12) is used for classification purposes and not intended as a data generating mechanism since the return for the current trade is not known at the time of transaction.

5. MCMC algorithm

The MCMC algorithm uses random walk, Metropolis Hastings (MH) where the candidate values are drawn from normal distributions that are centred at the current draws of the MCMC algorithm. The variances of the generating distributions are adaptively determined from the past draws of the MCMC algorithm (Haario et al. Citation2001; Roberts and Rosenthal Citation2009). For a generic parameter δ of dimension d, the adaptive algorithm generates a candidate draw from a normal distribution at iteration t + 1: (13) δt+1c=δt+cV~t1/2zt(13) where δt is the current draw from iteration t, V~t is an estimate of the variance of δ, c is a scale factor, and zt are independent draws from a standard normal distribution. The suggested scale factor c is 2.38/d1/2. The candidate draw is either accepted according to the standard MH probability or the current value δt is retained at iteration t + 1.

One method to estimate the variance is with exponentially weighted moving averages (EWMA): (14) δ~t=wtδt+(1wt)δ~t1V~t=wt[(δtδ~t)(δtδ~t)]+(1wt)V~t1(14) where wt is a positive constant less than 1. The EWMA parameter wt can depend on the iteration: for instance, wt = w when t ≤ t0; wt = w/(t–t0) when t > t0; w = 0.2; and t0 is an initial burn-in period. This specification reduces the adaptation after the chain reaches its stationarity distribution. The EWMAs in Equation Equation(14) are updated when a candidate draw is accepted; otherwise, a long series of rejecting the candidates can result in variance estimates of zero. Roberts and Rosenthal (Citation2009) propose a mixture distribution where 95% of the mass is the generating distribution in Equation Equation(13) and 5% of the mass substitutes the identity matrix for the estimated variance to keep the estimated variances from becoming zero. We take a different approach where we replace eigenvalues of the estimated variance that are below a small, positive number with that small positive number, and only generate the candidate from Equation Equation(13).

We make several modifications to the standard algorithm. First, we use the current values of the HB standard deviations τexp(-/2) when generating the spectral coefficients θk for the Gaussian process priors. Second, we found that the suggest scale factor in Equation Equation(13) did not work with our data and model. Instead, we generate a scale parameter ξt from an inverse Gamma distribution where the shape parameter is fixed to a small integer (α = 3), and the scale parameter is estimated from an EWMA of the past ξ: (15) ξ~t=wt(ξt+ξmin)+(1wt)ξ~t1ξt+1IG(α,(α1)ξ~t)(15) where ξmin is a small non-negative constant which keeps the EWMA positive, and the weight wt is positive and less than one. The mean of the inverse Gamma distribution is ξ~t, and the standard deviation is ξ~t/√(α−2). The inverse Gamma distribution with a small shape parameter is right skew, so the step size will occasionally be relatively large, which can help with mixing. The EWMA ξ~t is only updated when the candidate draw is accepted. We also monitor the acceptance rates with an EWMA where the datum is one if the candidate draw is accepted and zero if the current draw retained on each MCMC iteration. If the acceptance rates are too small (< 0.1), the step size of the generating distribution is too large, and ξ~t in Equation Equation(15) is reduced by a multiplicative factor that is randomly drawn between 0.1 and 0.5. If the acceptance rates to too big (> 0.5), the step size is too small, and ξ~t is increased by a multiplicative factor that is randomly generated between 2 and 10.

The duration model has three sets of parameters: the regression parameters β, the spectral coefficients θA,k for A(r), and the spectral coefficients θB,k for B(td). Their generating distributions at iteration t + 1 are: (16) βt+1c=βt+ξβ,t1/2V~t,β1/2zt,βθA,k,t+1c=θA,k,t+ξA,t1/2τA,texp(kγA,t2)zt,A,k fork=1,,KAθB,k,t+1c=θB,k,t+ξB,t1/2τB,texp(kγB,t2)zt,B,k fork=1,,KB.(16) There are three separate scale parameters ξt for the three blocks of parameters in Equation Equation(16). Each scale factor follows the updating and generating distributions in Equation Equation(15). Another modification sets the variance of β to a diagonal matrix with a separate ξ for each parameter. We tried both methods, and believe the latter is more robust than the former. When the chain is near the posterior mode, the estimated variance of β is accurate, and the MH sampler works at least as well as the diagonal variance. However, when the chain is in a low probability region, the estimated variance can cause the chain to become stuck.

The log-likelihood uses the discretised form of the nonhomogeneous distribution in Equation Equation(5) since the trades are rounded down to the nearest second. The integral of the rate parameters for regular trading hours are computed with numerical methods where B(td) is computed on a grid with gaps of one second. We experimented with MH by generating β, θA,k, and θB,k simultaneously and in three blocks. The advantage of three blocks is that the scale parameters ξ can be turned separately. For a given number of iterations, the single block method is faster since the log-likelihood is computed once per iteration, while it is computed three times per iteration for the blocked MH. However, we did not consistently find that one method dominates the other.

The nonparametric model for the return distribution in Equation Equation(10) also has a Gaussian process with spectral coefficients θR,k. The candidate distribution for MCMC iteration t + 1 is similar to those in Equation Equation(16) except the Gaussian process includes the constant (k = 0), which is assumed to be positive to identify the model: θR,0,t+1c=θR,0,t+ξR,0,t1/2zt,R,0whereθR,0,t+1c>0θR,k,t+1c=θR,k,t+ξR,t1/2τR,texp(kγR,t2)zt,R,kfork=1,,KRwhile ξR,0,t and ξR,t are generated as in Equation Equation(15). The constant is drawn from a truncated normal distribution by using the inverse cdf method. The Gaussian process and density function are computed on a fine grid that extend slightly beyond the smallest and largest observed returns.

The HB smoothing parameters (τ2,γ) for each Gaussian process are updated from their full conditional distributions given the spectral coefficients θk. In the following, we drop the subscripts for the different Gaussian processes. Their prior distributions are in Equation Equation(9). The full conditional distribution of τ2 is conjugate: τ2IG(rK2,sK2)rK=r0+KsK=s0+k=1Kθk2exp()The full conditional distribution of γ is the product of K normal distributions and an Exponential distribution: (17) f(γ)exp[(K(K+1)2wγ,0)γ12τ2k=1Kθk2exp()].(17) We use slice sampling (Damien et al. Citation1999), which decomposes complex distributions with auxiliary, uniformly distributed random variables. The full conditional distribution in Equation Equation(17) with the auxiliary random variables can be written as f(γ,V1,,Vk)k=1KI[0Vkexp{akexp()}]exp(wγ,Kγ)ak=θk22τ2wγ,K=K(K+1)2wγ,0At iteration t + 1, the full conditional of distributions of Vk are independent, uniform distributions and are generate as: vk,t + 1= ukexp[-akexp(t)] where uk ∼ U(0,1). Given vk,t + 1, the full conditional density of γ is proportional to exp(wKγ) given the bounds: ukexp[akexp(kγt)] exp[akexp()]for k=1,,K

Inverting the inequalities as a function of γ gives: (18) γbk=γt+1k[log{aklog(uk)exp(kγt)}log(ak)]fork=1,,K.(18) Combining the bounds restricts γ to be less than b = min(bk). The full conditional distribution of γ is proportional to exp(wk γ) given γ ≤ b. The draw γt + 1 is generated with the inverse cdf method for bounded variables: (19) γt+1=b+wγ,K1log[u+(1u)exp(bwγ,K)](19) The forms of Equations Equation(18) and Equation(19) avoid overflow because the index k and wγ,K can be large positive numbers.

The MCMC algorithm for the mixture model that classifies trades into time sensitive trades (TST) and time insensitive trades (TIT) adapts standard MCMC procedures with the modification that the two groups have structurally different distribution functions in Equation Equation(11) and the ‘prior’ classification probabilities depend on covariates. The MCMC algorithm introduces indicator variables zi which are one if trade i is classified as TIT and 0 if trade i is classified as TST. The ‘prior’ probability that trade i belongs to TIT (zi = 1) is: P(zi=1|ui)=exp(uiα)1+exp(uiα)where ui is a vector of covariates for trade i. Given the parameter draws at time t, the MCMC algorithm generates draws at time t + 1 according to the following steps.

  1. Given the group indicators zi of the trades, generate β0 (parameters for days of week) and θB,k (spectral coefficients for the Gaussian process for time of day) using the observations in the TST group, and generate β1 (parameters for constant and lagged durations) and θA,k (spectral coefficients for the Gaussian process for lagged returns) using the observations in the TIT group. The algorithm modifies the generation distribution in Equation Equation(16) for the homogenous model.

  2. Given the group indictors, generate α by using standard, MH procedures for logistic regression.

  3. Reclassify the trades according to the posterior probabilities of group memberships. The duration between trade i-1 at time ti-1 and trade i at time ti is yi = titi–1. The posterior classification probability for trade i is: Pt+1(zi=1|yi,ui)=P(zi=1|ui)f1(yi|ti1)P(zi=0|ui)f0(yi|ti1)+P(zi=1|ui)f1(yi|ti1) where f0 and f1 are the probabilities (Equation Equation5) for the durations under the TIT and TST duration models.

Because the models for TIT and TST differ in their specifications, the MCMC does not have label switching (Titterington et al. Citation1987).

In the empirical section, the MCMC had a total of 100,000 iterations and the last 10,000 iterations were used for estimation. Execution times averaged from two to three minutes per 10,000 iteration per 10,000 trades for 30 cosine functions using R with a Xeon E5-1660 CPU running at 3.20 GHz with 64 GB of RAM. Convergence diagnostics (Geweke and Heidel) from the CODA package in R indicate that the MCMC converged.Footnote4

6. Off-hours trading

For the NYSE pre-open hours are 4am to 9.30am, and post-close hours are 4pm to 8pm. Off-hour trading incurs additional transaction costs, which limits trading. There may be societal factors that limit off-hour trading: professional traders who are located in the United States are not particularly keen to extend their work-day from 4am to 8pm. Off-hours trading tends to be too sporadic to use nonparametric methods for inhomogeneous Poisson processes. If trading were 24 h a day, then it would be possible to extend B to the entire day by including both sine and cosine terms since B would be a periodic function. In our empirical example, only 97 of 117,276 trades over 80 days were off-hours for UGI, and only 0.15% of PG trades where off-hours. Additionally, most of the off-hour trades are clustered around the open or close, so the data are very thin at other times. Off-hour trades are not central to this paper’s focus, but we include models for pre-open and post-close trading for completeness. We assume that the distribution of returns for off-hour trades is the same as the distribution during regular hours. Because off-hour trading is light, the Poisson process intensity function for regular hours does not apply to off-hours.

The models for off-hour trading has multiple stages. Not all days have pre-open trading. A given day has pre-open trading with probability p, and days are mutually independent. If a day has pre-open trading, there usually is a lag between the 4am start of pre-open trading and the first trade, which usually occurs between 7am and 9am. This lag has a normal distribution.Footnote5 After the first trade, subsequent trades are from a homogeneous Poisson process with rate parameter λ1 until the market opens. Post-close trading tends to cluster around the market close at 4.30pm with very few, if any, trades after 6pm, even though post-close trading does not end until 8pm. The number of trades in post-close trading has a Poisson distribution with parameter or expected number λ2a. Given the total number of trades, we assume that the trades arrive according to a homogeneous Poisson process with rate parameter λ2b. We use conjugate priors: p has a Beta distribution, the rate Poisson parameters have Gamma distributions, the mean of the lag parameter given the variance has a normal distribution, and the variance has an inverse Gamma distribution.

7. Data and results

TAQ (Trade and Quote) data from Wharton Research Data Service for four stocks: Citigroup (CITI), Procter and Gamble (PG), UGI Utilities (UGI), and Walmart (WMT) are used to estimate the models. The price process was observed from 12.00am on 3 January 2011 to 8.00pm on 23 April 2011. These four stocks were selected for different industries and different sizes. The first 60 trading days are used for estimation. The top of Table gives total numbers of trades in the 60 days, the numbers for regular, pre-open, and post-close trading for the four stocks. Table also presents the estimated posterior means for the parametric model for the intensity function for the four securities. The top third of the table is for the homogeneous model, while the bottom two-thirds of the table are for the mixture model. There are significant differences among the daily effects, and the coefficients for lagged durations are negative. If the duration of the last trade was long, then the current duration will also tend to be long. In the mixture model, time sensitive trades (TST) only have the day-of-week indicators, while time insensitive trades (TIT) have a constant and lagged duration. The probability of TIT increases with contemporaneous return and order size, except the coefficients of order size for WMT and return for UGI are negative. TIT or algorithmic-generated trades tend to have larger order sizes and reap larger returns to the detriment of TST or human-generated trades. The last row of Table 1 reports the proportion of TIT trades. CITI has the highest proportion of TIT, and WMT has the highest proportion of TST.

Table 1. Estimated parametric components for duration models. All of the coefficients are significant at 0.05 according to the Bayesian p-valueTable Footnotea except Tuesday and Wednesday for CITI in the mixture model. TST are time sensitive trades. TIT are time insensitive trades.

Figure illustrates some of the major features of ultra-high frequency data for PG. Panel A plots the stock price process for 13,854 trades on 1 April 2011. The stock price can only move at trades, and it does not change on 85.5% of the trades. There are 13,829 trades during regular trading hours and 25 off-hours trades. The average duration between trades during regular hours is 1.69 s; the standard deviation is 5.11 s, and the maximum duration is 104 s. Seventy per cent of the durations are less than one second. Panel B graphs the average number of arrivals during 15 min trading intervals during regular trading hours from 3 January to 31 March. Trading is more intense at the opening and closing bell than the midday lull from noon to 2pm. Panel C plots the histogram of returns. 85% of the trades did not have a price change, which results in the dominate spike at zero in the histogram in Panel C. The much smaller side lobes are an artefact of price being measured to the nearest $0.01. Returns occur between the lobes and are not visible due to the dominate bar at 0%. Engle (Citation1982) noted that sampling from a marked process can result in inducing features that are not apparent in the non-sampled data. Panel D shows the histogram from sampling at 30 min intervals, and looks more like a t-distribution than Panel C. Panels E and F plot the autocorrelation function of the squared returns, which is a measure of stochastic volatility. Because there are one million trades, there is statistical significant autocorrelations for the raw price process in Panel E, but they are small. The sampled process illustrates more autocorrelation, in Panel F, due to the diurnal trading. The lag 12 for the sampled process in Panel F is 6 h, while lag 12 for the original process in Panel E is less than one minute.

Figure 3. Price process for PG. (A) Prices for 2011-04-01; (B) Average number of arrivals per 15 min; (C) Return histogram all trades; (D) Return histogram 30 min sampling; (E) ACF of return squared all trades; (F) ACF of returned squared 30 min sampling.

Figure 3. Price process for PG. (A) Prices for 2011-04-01; (B) Average number of arrivals per 15 min; (C) Return histogram all trades; (D) Return histogram 30 min sampling; (E) ACF of return squared all trades; (F) ACF of returned squared 30 min sampling.

Figure compares the exponential HB smoother that attenuates high-frequency spectral coefficients with two other models: (1) an exchangeable prior with equal prior variances for the spectral coefficients (γ = 0) and (2) cubic B-splines (De Boor Citation1978; Eubank Citation1988; Eilers and Marx Citation1996) with an exchangeable prior on the coefficients in place of the cosine series. The three models are estimated to 91,122 trades of UGI over 60 trading days. Panels A and C plot A(r) for the lagged percent returns, and Panels B and D plot B(td) for the time of day. Panels A and B plot the cosine series models, and Panels C and D plot the B-splines with two choices of knots. The spectral expansions use 30 cosine functions and the splines models use (1) knots at 1 percentiles for both functions and (2) customised knots for A(r) and knots at 12 min intervals for B(td). The B-splines were computed from the bs() function in the R package splines.

Figure 4. Cosine series and B-spline for duration model of UGI. (A) 30 term cosine series of lagged percent return. (B) 30 term cosine series of time of day. (C) B-splines with knots at 1 percentile points (35 internal knots) and customised (12 internal knots) for lagged per cent return. (D) B-splines with knots at 1% points (99 internal knots) and 12 min intervals (32 internal knots) for time of day.

Figure 4. Cosine series and B-spline for duration model of UGI. (A) 30 term cosine series of lagged percent return. (B) 30 term cosine series of time of day. (C) B-splines with knots at 1 percentile points (35 internal knots) and customised (12 internal knots) for lagged per cent return. (D) B-splines with knots at 1% points (99 internal knots) and 12 min intervals (32 internal knots) for time of day.

Panels A and C of Figure show that the (log) arrival rate is minimum when the previous return was zero, and the arrival rates increase for larger absolute returns, but this effect attenuates beyond ±1%. The exponential HB smoothing prior (solid blue line) is smoother than the exchangeable prior (dashed red line) in Panel A. Estimating A(r) with B-splines is challenging because the return data are long-tailed and clumpy: 34% of the returns are within ±0.01% and only seven trades had absolute returns larger than 1%, while the minimum return was −3.8% and the maximum return was 4.5%. The 5 percentiles from 5% to 95% have seven unique values: −0.0318, −0.0312, −0.0305, 0, 0.0305, 0.0312, .0318, and the 1 percentiles from 1% to 99% result in 35 unique values due to the clusters. The unique 1% percentiles are the internal knots for the solid blue line in Panel C, while the red dashed line used 12 customised internal knots at ±0.01, ± 0.03, ± 0.04, ± 0.06, ± 0.07, ± 1, which were chosen by trial and error to better match Panel A. The cubic splines are sensitive to the number and placement of knots, especially outside of ±1%, which highlights a qualitative difference in the cosine basis and B-splines. B-splines are a local regression method where only observations that are close to the abscissa are used to estimate the ordinate because the basis functions are zero outside of subintervals. In contrast, spectral analysis is a global regression method: the cosine functions are non-zero except for integer multiples of π. The coefficients for the B-splines that are outside ±1% in Panel B are effectively estimated with very few observation, which makes the B-spline sensitive to the knots. In contrast, all of the data are used to estimate the spectral coefficients in Panel A. Because the observed durations after the seven trades outside of ±1% are zero, the plateaus in Panel A better represent the data than the cubic spline in Panel C. An alternative approach is to delete these outliers and only estimate the function between ±1%. Even though large price movements occur very infrequently relative to the number of trades (7/91,122), they are not rare relative to the number of trading days (7/60). We decided to keep the full data to model these ‘black swan’ events (Taleb Citation2010) and to exploit fully the information in high frequency trading.

Panels B and D of Figure show the bath-tub shape for inter-day arrivals B(td) where there is a lull in trading around noon and a surge later in the afternoon. The two priors for the cosine series result in similar shapes with the exchangeable prior having more high-frequency oscillations than the exponential HB smoother. The B-splines are more robust to the choice of knots in Panel D than Panel C because the arrival data are dense during regular trading hours and do not have clusters. Unsurprisingly, the B-spline is smoother with fewer knots (dashed red line) than with more knots (solid blue line). B-splines are sensitive to the choice and locations of knots, and cross-validation methods can provide guidance on their choice. The exponential HB smoothing prior automatically determines the smoothness in spectral analysis.

Figure displays the estimated functions A(r) for lagged return in the trading intensity of Equations Equation(7) and Equation(11). The panels are for the four stocks, and each panel plots the functions in the homogeneous and mixture model: only the intensity function for TIT includes A(r). The centre lines are the posterior mean and the bounds, which are very narrow, are 95% highest posterior distribution (HPD) intervals. The plots indicate that the functions vary across assets. Further studies are needed to determine the extent that these differences are systematic or random. For instance, they may depend on the industry and the stage of the investment/economic cycle. The estimated functions for the homogeneous and mixture models are similar, with the mixture model tending to have taller peaks and lower valleys – except the valley for WMT.

Figure 5. Nonparametric estimates of lagged return in duration intensity function. Posterior means and 2.5 and 97.5 posterior percentiles.

Figure 5. Nonparametric estimates of lagged return in duration intensity function. Posterior means and 2.5 and 97.5 posterior percentiles.

Figure plots the functions B(td) for time of day in the intensity function for arrivals in Equations Equation(7) and Equation(11) during trading hours. In the mixture model, only TST includes B(td). These are more consistent with bursts of activity after the opening bell an before closing with a minimum around the lunch hour.

Figure 6. Nonparametric estimates of time of day in duration intensity function. Posterior means and 2.5 and 97.5 posterior percentiles.

Figure 6. Nonparametric estimates of time of day in duration intensity function. Posterior means and 2.5 and 97.5 posterior percentiles.

Figure plots the densities for the return distributions. The ‘prior’ density g0 in Equation Equation(10) is Cauchy with mode zero, so the tails for the estimated return densities are similar to those of the Cauchy density. The x-axes are truncated at ±0.15% in the plots. The densities are nearly symmetric and leptokurtic.

Figure 7. Return distributions. Posterior means and 2.5% and 97.5% percentiles.

Figure 7. Return distributions. Posterior means and 2.5% and 97.5% percentiles.

Table displays the estimated model for off-hours trading. UGI has the least amount of off-hour trading, and CITI has the most. The probability that a day has pre-open trading ranges from 0.05 for UGI to 0.99 for CITI. The lag between the 4am open and the first trade ranges from 3 to 5 h. In post-close trading, the expected number of trades (Poisson parameter) ranges from 0.2 for UGI to 369 for CITI. The Poisson rate of trading ranges from 0.0002 for UGI to 0.1035 for CITI in pre-open trading and 0.0011 for WMT and 0.0257 for CITI in post-close trading.

Table 2. Estimated off-hour trading models.

Next, we illustrate simulating the price process by using the first 60 days of data for UGI to estimate the model and the next 20 days as a hold-out sample. The simulations use a different draw of the parameters from the posterior distribution as generated from the MCMC algorithm for each simulated price process. We simulated 5000 CIPP over 20 days after the training period. Figure summarises the simulation. Panel A plots the distribution of the simulated price history at the daily close for 20 days and the observed series, and Panel B summarises the price distribution in 15 min increments for the first day in the holdout sample or day 61. The coloured bands are percentiles in 5% increments while the minimum and maximum areas represent 95% prediction intervals. The long dashed line is the mean for geometric Brownian motion, and the dotted lines are the 95% prediction intervals for geometric Brownian motion. The simulated price process from CIPP and Brownian motion are nearly the same over 20 days in Panel A, but they diverge in Panel B because CIPP better reflects the micro-structure of high-frequency data. For UGI it appears that daily sampling of CIPP at the daily close can be closely approximated by geometric Brownian motion, while sampling 15 min time intervals shows greater deviation.

Figure 8. Simulated CIPP in validation sample and option prices for UGI. The colour bands are intervals that contain 5% of the simulated values from 5% to 95%. The outmost limits are 2.5% and 97.5%. (A) Simulated price process in validation sample; (B) Simulated price process in day 61; (C) Option prices for simulated CCIP; (D) Ratio of CCIP to black-sholes option prices.

Figure 8. Simulated CIPP in validation sample and option prices for UGI. The colour bands are intervals that contain 5% of the simulated values from 5% to 95%. The outmost limits are 2.5% and 97.5%. (A) Simulated price process in validation sample; (B) Simulated price process in day 61; (C) Option prices for simulated CCIP; (D) Ratio of CCIP to black-sholes option prices.

Panels C and D in Figure use the simulated CIPP to price call options and compare them to Black–Scholes for a European call option (Hull Citation2009). We assume that the discount rate is 5% and the strike price KD with duration of D days is: KD=S60,Cexp(μ+σ2D2+0.1σD)where µ and σ are estimated from the returns in 15 min gaps, and S60,C is the closing price in the last day of the calibration period: strike prices increase with durations. Panel C gives the distribution of option prices based on the simulated CIPP (solid line) along with the Black-Sholes pricing (dashed line). The mean option price from CIPP is slightly less than Black-Sholes. Panel D plots the ratio of the CIPP option distribution to the Blacks-Sholes price, where the mean of the ratio is the black line. Overall, option pricing using the CIPP and geometric Brownian motion are similar when durations are 1 to 20 days when computed at the daily close, and they have greater differences within a day.

8. Conclusion

We propose semi-parametric Bayesian analysis for compound, inhomogeneous Poisson processes and apply it to ultra-high frequency trading data. Trades arrive according to an inhomogeneous Poisson process. The intensity function is a flexible function of time of day during regular trading hours and the lagged return from the previous trade, which are modelled with Gaussian process priors. Time-of-day effects have a bathtub shape with frequent arrivals at the open and close of regular trading and a lull around midday. Arrival rates tend to increase after large changes in prices and tend to decrease after small changes; however, the behaviour depends on the stock and a more detailed study is required. The parametric model for the intensity function includes day-of-week effects and the lagged duration times, which have carryover effects: long durations tend to follow long durations, and short durations follow short ones. We use a logistic-Gaussian process that is unimodal for the return distribution. The estimated density is asymmetric and leptokurtic with a mode at 0.

Spectral representations of the Gaussian processes linearise the analysis, simplify the computations, and facilitate shape constraints for the return density. The spectral coefficients have exponential HB smoothing priors with hyper-parameters that control the tradeoff between the prior means and the data and that control the smoothness of the functions by attenuating high-frequency oscillations that are due to noise. These smoothing parameters are estimated in a fully Bayesian analysis that avoids cross-validation. We use a cosine basis, but researchers could also use other basis functions such as B-splines or Hermite polynomials. We also propose a mixture model where the specification of the intensity function for the two components helps to classify observations into time sensitive trades and time insensitive trades. We conceptualise the first group as trades that are generated by human investors (time sensitive trades), and the second group as trades that are software generated (time insensitive trades). The probability of time insensitive trades is a logistic function of the contemporaneous return and order size of the trade. In the empirical example, time insensitive trades, tend to have larger order sizes and returns than time sensitive trades, though further research is required to verify the result. We also compare simulations of the estimated process to geometric Brownian motion. Over long time period (20 days), the two are very close, while they deviate over shorter periods within the day. The model could be extended to have different diurnal functions for the days of the week or a hierarchical Bayes model for separate functions for each day.

The semi-parametric CIPP methodology can be used for other research purposes. For example, the intensity function could include indicator variables for event analysis. For instance, what is the impact on the arrival of trades before and after unexpected earnings announcements, changes in interest rates, or major news events? Increased activity before the event may indicate insider trading. Trades from high-frequency trading firms could be flagged, and their impact on the price discovery process ascertained. The proposed mixture model could be modified to include more components or to investigate other structural classification issues. The purpose of the paper was to develop the methodology for semi-parametric, CIPP. A more complete empirical investigation of the price discovery processes would require a larger number of stocks and time periods.

Disclosure statement

No potential conflict of interest was reported by the authors.

Notes

1 The rounding assumption does not seem to make material difference on the results.

2 λ(t) = exp(–6 + 2cos[2πt/s]) where t is cumulative number of seconds and s = 20*10*3600 for 10 h in regular trading per day, and 3600 s in an hour.

3 We assume that the prior distributions for the coefficients of the B-spline are independent with common standard deviation τ. Technically, B-splines are not an orthonormal basis, so the Karhunen-Loève representation theorem does not hold; however, the resulting B-spline function has the covariance kernel in Panel B of Figure .

4 Some of the spectral coefficients did not pass CODA’s diagnostics, which may be due to applying single parameter tests simultaneously to a large number of parameters. Spectral coefficients near zero were flagged more often; however, they had little impact on the estimated functions.

5 Theoretically, the normal distribution is truncated between the open of pre-open trading and the open of regular trading. In practice the first trades tend to cluster around 8 am, which is sufficiently far from the endpoints that the truncation is not an issue. If it is an issue, then one could use a truncated normal distribution.

References

  • Andersen, T.G., Benzoni, L., and Lund, J. (2002), ‘An Empirical Investigation of Continuous-Time Equity Return Models’, The Journal of Finance, 57, 3, 1239–1284.
  • Andersen, T.G., Bollerslev, T., Diebold, F.X., and Labys, P. (2003), ‘Modeling and Forecasting Realized Volatility’, Econometrica, 71, 2, 579–625.
  • Bachelier, L. (1900), ‘Theory of Speculation’, Annales Scientifiques de L’ É. N. S, 17, 21–86.
  • Barndorff-Nielsen, O.E., and Shephard, N. (2001), ‘Non-Gaussian Ornstein-Uhlenbeck-Based Models and Some of Their Uses in Financial Economics’, Journal of the Royal Statistical Society, Series B, 62, 2, 167–241.
  • Barndorff-Nielsen, O.E., and Shephard, N. (2006), ‘Impact of Jumps on Returns and Realized Variances: Econometric Analysis of Time-Deformed Lévy Processes’, Journal of Econometrics, 131, 217–252.
  • Basu, S. (1977), ‘Investment Performance of Common-Stocks in Relation to Their Price-Earning Ratios – Test of Efficient Market Hypothesis’, Journal of Finance, 32, 3, 664–692.
  • Black, F., and Scholes, M. (1973), ‘The Pricing of Options and Corporate Liabilities’, Journal of Political Economy, 81, 3, 637–654.
  • Bollerslev, T. (1986), ‘Generalized Autoregressive Conditional Heteroskedasticity’, Journal of Econometrics 31, 307–327.
  • Bollerslev, T. (1987), ‘A Conditional Heteroskedastic Time Series Model for Speculative Prices and Rates of Returns’, The Review of Economics and Statistics, 69, 3, 543–547.
  • Bollerslev, T., Chou, R.Y., and Kroner, K.F. (1992), ‘ARCH Modeling in Finance: A Review of the Theory and Empirical Evidence’, Journal of Econometrics, 52, 1–2, 5–59.
  • Campbell, J.Y., Lo, A., and Mackinlay, A.C. (1997), The Econometrics of Financial Markets, Princeton, NJ: Princeton University Press.
  • Çinlar, E. (1975), Introduction to Stochastic Processes, Upper Saddle River, NJ: Prentice Hall.
  • Clark, C.L. (2010), ‘Controlling Risk in a Lightning-Speed Trading Environment’, Chicago Fed Letter, The Federal Reserve Bank of Chicago, March, 272, www.chicagofed.org.
  • Cont, R. (2001), ‘Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues’, Quantitative Finance, 1, 2, 223–236.
  • Damien, P., Wakefield, J., and Walker, S. (1999), ‘Gibbs Sampling for Bayesian Non-Conjugate and Hierarchical Models by Using Auxiliary Variables’, Journal of the Royal Statistical Society, Series B, 61, 2, 331–344.
  • De Boor, C. (1978), A Practical Guide to Splines, Berlin: Springer.
  • Durbin, J., and Koopman, S.J. (2000), ‘Time Series Analysis of Non-Gaussian Observations Based on State Space Models from Both Classical and Bayesian Perspectives’, Journal of the Royal Statistical Society, Series B, 62, 1, 3–56.
  • Easley, D., Kiefer, N.M., and O’Hara, M. (1997), ‘One Day in the Life of a Very Common Stock’, The Review of Financial Studies, 10, 3, 805–835.
  • Easley, D., and O'Hara, M. (1992), ‘Time and the Process of Security Price Adjustment’, Journal of Finance, 47, 2, 577–604.
  • Eilers, P.H.C., and Marx, B.D. (1996), ‘Flexible Smoothing with B-Splines and Penalties’ with Discussion, Statistical Science, 11, 2, 89–121.
  • Elerian, O., Chib, S., and Shephard, N. (2001), ‘Likelihood Inference for Discretely Observed Nonlinear Diffusions’, Econometrica, 69, 4, 959–993.
  • Engle, F.R. (1982), ‘Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of UK Inflation’, Econometrica, 50, 4, 987–1008.
  • Engle, R.F. (2000), ‘The Econometrics of Ultra-High-Frequency Data’, Econometrica, 68, 1, 1–22.
  • Engle, F.R., and Russell, J.R. (1998), ‘Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data’, Econometrica, 66, 4, 1127–1162.
  • Eubank, R.L. (1988), Spline Smoothing and Nonparametric Regression, New York, NY: Marcel Dekker.
  • Fama, E.F. (1970), ‘Efficient Capital Markets: A Review of Theory and Empirical Work’, Journal of Finance, 25, 2, 383–417.
  • Firth, M. (1975), ‘Efficient Market Theory of Share Price Behavior’, Managerial Finance, 1, 3, 184–188.
  • Fodra, P., and Pham, H. (2013), ‘High Frequency Trading in a Markov Renewal Model’, hal-00867113, www.hal.science.
  • Ghysels, E., Gouriéroux, C., and Jasiak, J. (2004), ‘Stochastic Volatility Duration Models’, Journal of Econometrics, 119, 2, 412–433.
  • Griffin, J.E., and Steel, M.F.J. (2006), ‘Inference with Non-Gaussian Ornstein–Uhlenbeck Processes for Stochastic Volatility’, Journal of Econometrics, 134, 2, 605–644.
  • Haario, H., Saksman, E. and Tamminen, J. (2001), ‘An Adaptive Metropolis Algorithm’, Bernoulli, 7, 2, 223–242.
  • Hautsch, N. (2012), Econometrics of Financial High-Frequency Data, Berlin: Springer.
  • Hirshleifer, D. (2015), ‘Behavioral Finance’, Annual Review of Financial Economics, 7, 1, 133–159.
  • Hull, J.C. (2009), Options, Futures, & Other Derivatives (4th ed.), Upper Saddle River, NJ: Prentice Hall.
  • Katznelson, Y. (2004), An Introduction to Harmonic Analysis (3rd ed.), Cambridge, UK: Cambridge University Press.
  • Kiem, D., and Madhavan, A. (1996), ‘The Upstairs Market for Large-Block Transactions: Analysis and Measurement Off Price Effects’, Review of Financial Studies, 9, 1, 1–36.
  • Kreider, D.L., Kuller, R.G., Ostberg, D.R., and Perkins, F.W. (1966), An Introduction to Linear Analysis, Boston, MA: Addison-Wesley.
  • Lenk, P.J. (1993), ‘Bayesian Nonparametric Density Estimation’, Journal of Nonparametric Statistics, 3, 1, 53–69.
  • Lenk, P.J. (1999), ‘Bayesian Inference for Semi-Parametric Regression Using a Fourier Representation’, Journal of the Royal Statistical Society, Series B, 61, 4, 863–879.
  • Lenk, P.J. (2003), ‘Bayesian Semi-Parametric Density Estimation and Model Verification Using Logistic-Gaussian Processes’, Journal of Computational and Graphical Statistics, 12, 3, 548–565.
  • Lenk, P.J., and Choi, T. (2017), ‘Bayesian Analysis of Shape-Restricted Functions Using Gaussian Process Priors’, Statistica Sinica, 27, 1, 43–69.
  • Merton, R.C. (1976), ‘Option Pricing When Underlying Stock Returns are Discontinuous’, Journal of Financial Economics, 3, 1–2, 125–114.
  • O’Hara, S. (2015), ‘High Frequency Marketing Microstructure’, Journal of Financial Economics, 116, 2, 257–270.
  • Press, S.J. (1967), ‘A Compound Events Model for Security Prices’, Journal of Business, 40, 3, 317–335.
  • Press, S.J. (1968), ‘A Modified Compound Poisson Process with Normal Compounding’, Journal of the American Statistical Association, 63, 322, 607–613.
  • Ramiah, V., Xu, X., and Moosa, I.A. (2015), ‘Neoclassical Finance, Behavioral Finance and Noise Traders: A Review and Assessment of the Literature’, International Review of Financial Analysis, 41, 89–100.
  • Ramsay, J.O. (1998), ‘Estimating Smooth Monotone Functions’, Journal of the Royal Statistical Society, Series B, 60, 2, 365–375.
  • Roberts, G.O., and Rosenthal, J.S. (2009), ‘Examples of Adaptive MCMC’, Journal of Computational and Graphics Statistics, 18, 2, 349–367.
  • Roll, R. (1984), ‘A Simple Implicit Measure of the Effective Bid-Ask Spread in an Efficient Market, Journal of Finance, 39, 4, 1127–1139.
  • Russell, J.R., and Engle, R.F. (2006), ‘A Discrete-State Continuous-Time Model of Financial Transaction Prices and Times’, Journal of Business and Economic Statistics, 23, 2, 166–180.
  • Russell, J.R., and Engle, R.F. (2010), ‘Analysis of High-Frequency Data’, in Handbook of Financial Econometrics: Tools and Techniques, eds. Y. Ait-Sahalia and L.P. Hansen, Amsterdam: North-Holland, pp. 383–426.
  • Silverman, B.W. (1985), ‘Some Aspects of the Spline Smoothing Approach to Nonparametric Regression Curve Fitting’ with Discussion, Journal of the Royal Statistical Society, Series B, 47, 1, 1–21.
  • Taleb, N.N. (2010), The Black Swan: The Impact of the Highly Improbable, New York, NY: Random House.
  • Titterington, D.M., Smith, A.F.M., and Makov, U.E. (1987), Statistical Analysis of Finite Mixture Distributions, Hoboken, NJ: John Wiley & Sons.
  • Wahba, G. (1978), ‘Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression’, Journal of the Royal Statistical Society, Series B, 40, 3, 364–372.
  • Whittaker, E.T. (1923), ‘On a New Method of Graduation’, Proceedings of the Edinburgh Mathematical Society, 41, 63–75.
  • Wood, R.A., McInish, T.H., and Ord, J.K. (1985), ‘An Investigation of Transactions Data for NYSE Stocks’, The Journal of Finance, 40, 3, 723–739.
  • Ying, Q.W., Yousaf, T., ul Ain, Q., Akhtar, Y., and Rasheed, M.S. (2019), ‘Stock Investment and Excess Returns: A Critical Review in the Light of the Efficient Market Hypothesis’, Journal of Risk and Financial Management, 12, 2, 97, www.mdpi.com.