Category: Statistics (Page 2 of 2)

Validating Trading Backtests with Surrogate Time-Series

Back-testing trading strategies is a dangerous business because there is a high risk you will keep tweaking your trading strategy model to make the back-test results better. When you do so, you’ll find out that after tweaking you have actually worsened the ‘live’ performance later on. The reason is that you’ve been overfitting your trading model to your back-test data through selection bias.

In this post we will use two techniques that help quantify and monitor the statistical significance of backtesting and tweaking:

  1. First, we analyze the performance of backtest results by comparing them against random trading strategies that similar trading characteristics (time period, number of trades, long/short ratio). This quantifies specifically how “special” the timing of the trading strategy is while keeping all other things equal (like the trends, volatility, return distribution, and patterns in the traded asset).
  2. Second, we analyse the impact and cost of tweaking strategies by comparing it against doing the same thing with random strategies. This allows us to see if improvements are significant, or simply what one would expect when picking the best strategy from a set of multiple variants.
Continue reading

Gaussian Mixture Approximation for the Laplace Distribution

The Laplacian distribution is an interesting alternative building-block compared to the Gaussian distribution because it has much fatter tails. A drawback might be that some nice analytical properties that Gaussian distribution gives you don’t easily translate to Laplacian distributions. In those cases, it can be handy to approximate the Laplacian distribution with a mixture of Gaussians. The following approximation can then be uses

    \[L(x) = \frac{1}{2}e^{-|x|} \approx \frac{1}{n} \sum_{i=1}^n N\left(x | \mu=0, \sigma^2=-2\ln \frac{1+2i}{2n}\right)\]

def laplacian_gmm(n=4):
    # all components have the same weight
    weights = np.repeat(1.0/n, n)
    
    # centers of the n bins in the interval [0,1]
    uniform = np.arange(0.5/n, 1.0, 1.0/n)
    
    # Uniform- to Exponential-distribution transform
    sigmas = np.array(-2*np.log(uniform))**.5
    return weights, sigmas

def laplacian_gmm_pdf(x, n=4):
    weights, sigmas = laplacian_gmm(n)
    p = np.zeros_like(x)
    for i in range(n):
        p += weights[i] * norm(loc=0, scale=sigmas[i]).pdf(x)
    return p
Newer posts »
SITMO Machine Learning | Quantitative Finance