Stochastic Volatility with Correlated Jumps (SVCJ) Model
Model Formulation
Overview: The SVCJ model extends the standard stochastic volatility framework (e.g. Heston-type models) by incorporating jumps in both the asset return and its volatility, with those jumps occurring simultaneously (co-jumps) and potentially correlated in size. This model was first formalized in the affine jump-diffusion framework by Duffie, Pan, and Singleton (2000) and later applied/estimated in a Bayesian context by Eraker, Johannes, and Polson (2003). The inclusion of correlated jumps allows the volatility to spike upward during large negative returns (a stylized fact in crises) and improves fit to heavy-tailed return distributions. Below we detail the SVCJ dynamics, components, and assumptions.
Stochastic Differential Equations (SDEs)
In continuous time, let denote the logarithmic price (so is the asset’s log-return) and let denote the instantaneous variance (stochastic volatility). The SVCJ model is defined by the following system of SDEs with jumps:
Return (Log-Price) Process:
where is the drift of the log price, is a Brownian increment driving continuous returns volatility, and is a Poisson jump process. The term represents jumps in returns: when a jump occurs (), the log-price instantly moves by an amount . If no jump occurs (), this term is zero.
Variance (Volatility) Process:
where is the mean-reversion rate of volatility toward long-run level , is the volatility-of-volatility (scale of random fluctuations in ), and is another Brownian motion. The term represents jumps in volatility: at the same jump times as in the return process, the variance is instantly adjusted by .
These SDEs imply that between jumps, follows a diffusion with stochastic variance , and itself follows a mean-reverting square-root process (similar to the Cox-Ingersoll-Ross/Heston model). When a jump occurs, both and experience an instantaneous jump. In effect, jumps arrive according to a single Poisson process with intensity (so ), and each jump event has a paired impact on returns and volatility.
Correlation Structure: The Brownian motions driving the continuous parts of returns and volatility are correlated: for a constant . This captures the well-known leverage effect (typically for equities, so that when returns go down, $dW^Y$ is negative, volatility $V_t$ tends to go up as $dW^V$ is negative if $\rho$ is negative). In addition, the jumps occur simultaneously (perfect correlation in jump timing), and the sizes of return and volatility jumps may be statistically correlated as described next.
Model Components and Parameters
Each term in the SVCJ SDEs corresponds to a specific component of the model:
Drift () – The drift $\mu$ is the average rate of increase of the log-price (i.e. average return). In practice $\mu$ is usually small for high-frequency data (often close to 0 for returns at hourly or daily frequency after de-meaning).
Diffusive Volatility () – The process $V_t$ governs the continuous sample path variability of returns. It mean-reverts to level $\theta$ at speed $\kappa$. A large $\kappa$ means $V_t$ quickly moves back to $\theta$, while a small $\kappa$ means volatility is highly persistent. Parameter $\theta$ is the long-run (or average) variance level. The term $\sigma\sqrt{V_t}$ is the diffusion coefficient for volatility, so $\sigma$ controls the magnitude of random fluctuations in $V_t$ (often called the volatility-of-volatility). This $dV_t$ equation (without jumps) is essentially an Ornstein–Uhlenbeck process in $V_t$ (or Cox–Ingersoll–Ross if we ensure $V_t\ge0$) which is a common stochastic volatility specification.
Jump Intensity () – This governs how frequently jumps occur. $\lambda$ is the Poisson intensity (expected number of jumps per unit time). For example, $\lambda = 1$ (per year) would mean on average one jump per year. For hourly data, $\lambda$ would be much smaller if jumps are rare. Each jump event is instantaneous but causes discrete shifts in $Y_t$ and $V_t$. The probability of a jump in a small interval $\Delta t$ is $\lambda,\Delta t$ (assuming at most one jump per short interval).
Jump Sizes () – These random variables characterize the magnitude of jumps. Importantly, the jump in returns and the jump in volatility are linked. A common assumption (following Eraker et al. 2003) is that jump sizes are drawn from a joint distribution. For instance, one convenient specification is:
, an exponential distribution with mean $\mu_V>0$ for volatility jumps (ensuring volatility jumps are positive increments).
, a normal distribution for the return jump, whose mean depends linearly on $\xi^V$ via a jump correlation parameter $\rho_J$. Here $\mu_Y$ can be negative (so that on average jumps cause a drop in returns), and $\rho_J$ captures the co-movement between jump sizes. For example, a negative $\rho_J$ would mean that if $\xi^V$ is larger (a big upward volatility jump), the conditional mean of $\xi^Y$ is more negative – i.e. a larger downward move in returns. This aligns with the intuition that market crashes (large negative $\xi^Y$) coincide with volatility spikes (large positive $\xi^V$).
Alternate specifications: In some formulations, both jump sizes are taken to be jointly normally distributed or jointly double-exponential, etc. The key is that there is some correlation in jump size (often implemented as above) and typically $\mathbb{E}[\xi^Y] < 0$ (downward price jumps on average) while $\mathbb{E}[\xi^V] > 0$ (upward volatility jumps) for equity markets. All jump events are i.i.d. draws from this joint distribution.
Correlation Parameters: (continuous shocks correlation) and (jump size correlation) are distinct. $\rho$ captures the instantaneous correlation between the Brownian motions (usually $\rho<0$ for leverage effect). $\rho_J$ captures the correlation between jump magnitudes; typically one expects $\rho_J < 0$ for stock markets (so that a positive vol jump accompanies a negative return jump). If $\rho_J=0$, jumps in vol and returns are still simultaneous in time but their sizes are independent. If $\rho_J \neq 0$, the jump sizes are statistically linked (e.g. a larger vol jump tends to co-occur with a more extreme return jump).
Joint Distributional Assumptions
The SVCJ model makes several important assumptions about the joint distribution of the processes:
Independence of Driving Processes: The Brownian motions $(W^Y, W^V)$ and the Poisson process $N_t$ are assumed independent of each other (except for the correlations specified, i.e. $W^Y$ and $W^V$ have correlation $\rho$, and $N_t$ is independent of the Brownian motions). This means jumps occur randomly in time, unrelated to the immediate Brownian shocks (though the size of a jump in returns vs volatility is correlated via $\rho_J$ as above).
Simultaneous Jumps: Jumps in $Y_t$ and $V_t$ arrive at the same time because they are driven by the single common Poisson process $N_t$. This is the defining feature of the “correlated jumps” model – every jump affects both return and volatility. (In contrast, an alternative model could have separate Poisson processes for returns and volatility jumps, but SVCJ ties them together.) Thus, $N_t$ counts joint jumps.
Jump Size Distribution: At each jump time $\tau$ (when $dN_\tau=1$), a pair $(\xi^Y,;\xi^V)$ is drawn from a fixed bivariate distribution (such as the conditional normal-exponential setup described). Across different jump events, these pairs are i.i.d. Realizations of jump sizes are independent of past history given the model parameters. Often, the distribution is chosen to be affine-friendly (e.g. normal or exponential) so that the overall model remains an affine jump-diffusion, which facilitates analytical tractability under transforms.
Markov and Affine Structure: $(Y_t, V_t)$ as a whole is a Markov process. Given the current state, the conditional distribution of future increments depends only on the current $(Y_t, V_t)$. The SVCJ model falls into the class of affine jump-diffusions, meaning the characteristic functions and moment generating functions have exponential-affine forms. This property, while technical, is useful for deriving closed-form solutions for option prices and for certain filtering tasks. (Duffie, Pan, Singleton (2000) first highlighted this class, which includes SVCJ.)
Initial Distribution: One typically specifies $V_0$ (initial volatility) either as a given value or draws it from the stationary distribution of the volatility process (which for a CIR process is Gamma-distributed). In a Bayesian context, one might even treat $V_0$ as an unknown to estimate (with its own prior). The initial log-price $Y_0$ can be set from the first observed price.
In summary, the joint dynamics imply that conditional on no jump in an interval, returns are approximately normally distributed with variance proportional to the current volatility, while conditional on a jump, the return has an additional jump component and volatility shifts upward. Unconditionally, each period’s return is a mixture of a diffusion and a jump scenario. The presence of volatility jumps means volatility can change discontinuously, not just gradually. Eraker et al. (2003) found strong evidence that including volatility jumps significantly improves model fit, even after accounting for price jumps. This supports the idea that in market turmoil, both returns and vol move abruptly together.
Bayesian Estimation via MCMC
Estimating SVCJ model parameters from data is challenging because the likelihood is complex (the model has latent volatility and rare jumps that are not directly observed). A Bayesian approach with Markov Chain Monte Carlo (MCMC) sampling is a powerful methodology to infer the posterior distribution of the parameters and the latent state (volatility path and jumps) given observed price data. Eraker, Johannes, and Polson (2003) pioneered this approach for SVCJ, developing a likelihood-based MCMC estimation strategy and providing estimates of not only the parameters but also the latent volatility, jump times, and jump sizes from actual return data. We outline the key aspects of this estimation method:
Likelihood Construction for SVCJ
The likelihood is based on the joint density of the observed returns given the parameters. However, because volatility $V_t$ and jump indicators are latent, it is easier to express the complete-data likelihood (including latent variables) and then integrate out the unobservables. Formally, if $\Theta$ denotes the set of model parameters ($\mu, \kappa, \theta, \sigma, \rho, \lambda,$ jump size distribution params), and we have observations of prices (or log-returns) at discrete times $t=1,2,\dots,T$ (e.g. hourly), the likelihood can be written as:
Direct evaluation of this high-dimensional integral is infeasible. Instead, one exploits the Markov structure and calculates the likelihood incrementally. Over a single small interval $\Delta t$, the conditional distribution of the next observation $(Y_{t+\Delta}, V_{t+\Delta})$ given the current state can be described as a mixture of two cases:
No Jump in $(t, t+\Delta]$: Occurs with probability $e^{-\lambda \Delta} \approx 1 - \lambda \Delta$. In this case, the transitions are purely diffusive:
$Y_{t+\Delta}-Y_t \mid (V_t=v, \text{no jump}) \sim \mathcal{N}(\mu,\Delta,;v,\Delta)$ approximately (for small $\Delta$, treating $V_t$ as roughly constant over the interval). More precisely, under the SDE, $Y$ given $V$ has a conditional Gaussian increment with variance equal to the integral of $V_s$ over the interval. If $\Delta$ is small or $V$ is slowly moving, $v\Delta$ is a good approximation.
$V_{t+\Delta}\mid(V_t=v,\text{no jump})$ is approximately Gaussian: $v + \kappa(\theta-v)\Delta + \sigma\sqrt{v\Delta},\varepsilon^v$ (with $\varepsilon^v\sim N(0,1)$), again by an Euler–Maruyama discretization. (In fact, the exact transition of the CIR process is noncentral chi-square, but MCMC often uses a simpler approximation or a hierarchical approach to handle this.)
Moreover, $(\varepsilon^y, \varepsilon^v)$ for the return and vol increments are correlated with correlation $\rho$. So in the no-jump scenario, $(\Delta Y, \Delta V)$ is bivariate normal given $V_t$.
Jump Occurs in $(t, t+\Delta]$: Occurs with probability $\lambda \Delta$ (to first order). If a jump happens, then:
$Y_{t+\Delta}-Y_t = \mu,\Delta + \sqrt{v,\Delta},\varepsilon^y + \xi^Y$ (combining the diffusive part plus jump $\xi^Y$).
$V_{t+\Delta} = v + \kappa(\theta-v)\Delta + \sigma\sqrt{v\Delta},\varepsilon^v + \xi^V$.
Here $(\varepsilon^y,\varepsilon^v)$ are again standard normals with correlation $\rho$, and $(\xi^Y,\xi^V)$ is a draw from the jump size distribution. In practice, for likelihood purposes, if a jump occurs we know a priori that there is an extra term $\xi^Y$ in the return – which makes the return distribution highly non-Gaussian (it’s a convolution of Gaussian and jump distribution). The joint density for this case is the product of the Gaussian density of the diffusive part times the density of the jump sizes, multiplied by $\lambda \Delta$.
Thus, each interval’s contribution to the likelihood is a mixture of a no-jump density and a jump density. The exact likelihood for a sequence of observations can be written by summing/integrating over all possible jump configurations (which is combinatorially large). Instead of attempting to sum over $2^T$ jump/no-jump possibilities, the Bayesian MCMC approach introduces the jump indicators as latent binary variables in the inference. This way, we avoid explicitly marginalizing out jumps and volatility; we sample them.
In a Bayesian setup, we treat the latent volatility path ${V_t}_{t=0}^T$ and the collection of jump indicators ${N_t}$ (or equivalently an indicator $I_t$ whether a jump occurred at time $t$) and jump sizes ${\xi^Y_t,\xi^V_t}$ as augmented data. The complete-data likelihood factors nicely as the product over $t$ of conditional densities (since given the latent variables, each observation’s density is either a Gaussian or Gaussian+jump). This makes it straightforward to draw samples from the conditional posterior of one part given the others.
Note: The model’s posterior is highly multi-dimensional: it involves continuous latent variables for volatility at each time and discrete latent jump variables. MCMC is well-suited to sample from such a posterior, where classical maximum likelihood would be difficult. Johannes and Polson (2003) (the discussants of Eraker et al.) note that carefully designed MCMC algorithms are required to deal with the latent jump process and volatility simultaneously.
Prior Specification
In Bayesian estimation we place prior distributions on all unknown parameters (and sometimes on latent states like initial $V_0$). The choice of priors can be informative or vague, depending on what is known. Typical priors in continuous-time finance models might be:
For positive parameters like $\kappa, \theta, \sigma, \lambda, \mu_V$ (vol jump mean), a common choice is Gamma or log-normal priors (to constrain them to $\mathbb{R}^+$).
For parameters that can be real like $\mu$ (drift) or $\mu_Y$ (mean jump size in returns), one might use a normal prior.
For correlation parameters $\rho$ and $\rho_J$ (bounded in [-1,1]), one might use a Beta distribution transformed to [-1,1] (e.g. an arcsine prior), or simply a uniform prior on [-1,1].
For $\sigma_J$ (volatility of jump size distribution) or other standard deviations, often half-normal or inverse-gamma priors are used.
If one has previous empirical evidence or theoretical reasons, priors can be set to reflect that (for example, a prior that $\rho$ is strongly negative, or $\mu_Y$ is negative). In Eraker et al. (2003), priors were chosen moderately informative to aid convergence. It’s important to ensure priors are not overly restrictive, especially if a rolling window approach will reuse them (see below).
In a rolling estimation scheme, the use of priors becomes dynamic: after analyzing one window of data, we obtain a posterior for $\Theta$ which can serve as the prior for the next window. This is a form of Bayesian updating. Essentially, the knowledge gained from previous data is carried forward. We discuss this in detail under Rolling Estimation, but note here that the prior at each stage can be set equal to the previous posterior (or a slight variance-adjusted version of it). This makes the estimation informative: the algorithm doesn’t “start from scratch” for each window, but rather builds on past results.
Posterior Sampling via MCMC
Given the priors and the complete-data likelihood, we form the posterior . Up to proportionality:
MCMC algorithms draw samples from this posterior by iteratively sampling from conditional distributions. A common approach is a Gibbs sampler (with some Metropolis-Hastings steps) cycling through blocks such as: (1) latent jumps, (2) latent volatilities, (3) parameters. One possible MCMC scheme is:
Sample Jump Indicators and Sizes: For each time point $t=1,\dots,T$, one can sample whether a jump occurred ($I_t = 1$ or $0$) and the jump size $\xi^Y_t, \xi^V_t$ if $I_t=1$. This is done conditional on the current guess of the volatility $V_t$ and parameters. Intuitively, if an observed return $y_{t+1}-y_t$ is very large in magnitude compared to what the diffusive volatility would allow, the posterior probability of a jump at $t+1$ will be high. Formally, one can compute the posterior odds of a jump vs no-jump at time $t$ by comparing the likelihood of the return under each scenario times the prior odds ($\lambda \Delta$ vs $1-\lambda\Delta$). This yields $P(I_t=1 \mid \text{rest})$ in closed-form (in a mixture of Normals context, it’s analogous to the filtering step in mixture models). Given $I_t=1$, the conditional posterior for $\xi^Y_t$ and $\xi^V_t$ comes from their prior (the model’s assumed jump size distribution) updated by the fact that the observed return and volatility change must equal the realized jump plus diffusion. Often, one simply draws $\xi^Y_t$ and $\xi^V_t$ from their conditional distribution given that a jump happened and given the observed $(\Delta Y,\Delta V)$ deviation from expected diffusive move. In practice, since $\xi^Y_t$ and $\xi^V_t$ have a known joint prior (e.g. Normal conditional on the other), one can do a Gibbs step: first sample $\xi^V_t$ from its (exponential) prior truncated/weighted by how likely the observed $\Delta V$ is, then sample $\xi^Y_t$ from $N(\mu_Y+\rho_J \xi^V_t, \sigma_J^2)$ as per model updated by the return observation. Alternatively, some implementations marginalize out $\xi$ and just decide jump yes/no, then later sample $\xi$ given jump; details can vary. The end result of this step is a sampled set of jump times and sizes consistent with the data and current parameters.
Sample Latent Volatility Path: Next, treat the sequence $V_1,\dots,V_T$ as unknowns to sample (sometimes also including $V_0$). Given the observed data and the newly sampled jumps, the return equation no longer has outliers – the jump part of each return has been accounted for, so the remaining part of return is (approximately) $N(\mu\Delta, V_t \Delta)$. Meanwhile, the volatility process itself satisfies a discretized state equation with possibly a jump increment at the time if $I_t=1$. The conditional posterior $p(V_{0:T}\mid \text{rest})$ is multivariate because adjacent $V_t$’s are coupled by the Markov process dynamics. This is usually the hardest step computationally, since $V_t$ are continuous and high-dimensional. Various strategies exist:
One can sample each $V_t$ one-by-one from its conditional density given $V_{t-1}$, $V_{t+1}$ and other parameters. Because the diffusion has Markov structure, $V_t$ given $V_{t-1}, V_{t+1}, \Theta$ (and any jump at $t$) has a tractable density (often log-normal or something if using an Eul