您的位置:  首页 » 量化交易与机器学习 » backtrader » 在回测性能和出核内存Execution- backtrader中文的教程

回测性能和出核内存Execution

There都经过最近两次https://redit.com/r/algotrading线程这是灵感这个article.

  • A线程有虚假声称backtrader不能1.6M蜡烛应对:reddit的/ R / algotrading – 一个高性能的回溯测试系统?
  • 而另一个要求的东西,可回测8000宇宙股票:reddit的/ R / algotrading – 支持1000+股票回溯测试库?

    随着笔者询问了一个框架,可以回溯测试“”外的核心/显存“*”,因为显然它不能所有的数据加载到记忆“

我们”会是当然的解决这些概念与backtrader

The 2M Candles

In为​​了做到这一点,第一件事就是产生的量蜡烛。鉴于海报谈到77股票和1.6M的蜡烛,这将达到每只股票20779支蜡烛,所以我们“会做以下有不错numbers

  • 产生100支stocks
  • Generate 20,000 stock

Ie蜡烛蜡烛:100个文件总计2M candles.

The script

import numpy as np
import pandas as pd

COLUMNS = ["open", "high", "low", "close", "volume", "openinterest"]
CANDLES = 20000
STOCKS

dateindex = pd.date_range(start="2010-01-01", periods=CANDLES, freq="15min")

for i in range(STOCKS):

    data = np.random.randint(10, 20, size=(CANDLES, len(COLUMNS)))
    df = pd.DataFrame(data * 1.01, dateindex, columns=COLUMNS)
    df = df.rename_axis("datetime")
    df.to_csv("candles{:02d}.csv".format(i))

This产生100个文件名为candles00.csv…高达candles99.csv。该实际值并不重要。具有通常的datetime, OHLCV组件(和OpenInterest)被什么matters.

The测试system

  • Hardware / OS:一个Windows 1015.6″ 膝上型计算机与英特尔I7和存储器32个千兆字节将是used.
  • Python:3.6.1 64 bitspypy3 6.0.0
  • 其他:应用程序持续运行,并采取20%左右的中央处理器。例如Chrome浏览器(102对),边缘,Word,PowerPoint和Excel的running

Default配置Execution

Our测试脚本的东西(见下方为完整的源代码),将打开这些100个文件和处理它们使用默认的backtraderconfiguration.

$ ./two-million-candles.py
Cerebro Start Time:          2019-10-25 13:59:53.662508
Strat Init Time:             2019-10-25 14:01:32.288510
Time Loading Data Feeds:     98.63
Number of data feeds:        100
Strat Start Time:            2019-10-25 14:01:32.320509
Pre-Next Start Time:         2019-10-25 14:01:33.299509
Time Calculating Indicators: 0.98
Next Start Time:             2019-10-25 14:01:33.299509
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    99.64
End Time:                    2019-10-25 14:02:46.926509
Time in Strategy Next Logic: 73.63
Total Time in Strategy:      73.63
Total Time:                  173.26
Length of data feeds:        20000

Memory Usage:348 MB的峰时observed

Most实际上是花了预加载数据(98.63秒),花其余的在战略,其中包括通过经纪人在去每次迭代(73.63秒)。总时间是173.26seconds.

根据您想如何计算它的表现是:

  • 11,543蜡烛/秒考虑到整个运行time
  • 27.162蜡烛/秒只考虑在strategy

Bottomline花费的时间:在两个书签交易螺纹上面的1st的权利要求backtrader不能处理1.6M蜡烛是FALSE.

用做它pypy

因为线程声称,使用pypy没“T帮助,让”看看会发生什么使用it.

$ ./two-million-candles.py
Cerebro Start Time:          2019-10-25 14:14:20.167032
Strat Init Time:             2019-10-25 14:15:18.768034
Time Loading Data Feeds:     58.60
Number of data feeds:        100
Strat Start Time:            2019-10-25 14:15:18.893032
Pre-Next Start Time:         2019-10-25 14:15:19.218031
Time Calculating Indicators: 0.32
Next Start Time:             2019-10-25 14:15:19.219032
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    59.05
End Time:                    2019-10-25 14:15:29.306033
Time in Strategy Next Logic: 10.09
Total Time in Strategy:      10.09
Total Time:                  69.14
Length of data feeds:        20000

Holy牛时!总时间已经下降到69.14秒,从总173.26秒。性能比doubled.

内存Usage:269个Mbytes.

This也比标准的CPython interpreter.

Handling 2M的蜡烛出核心memory

所有这一切都可以如果认为backtrader有几个用于回溯测试会话的执行,包括配置选项优化缓冲区,只与最低工作需要设置数据(最好只缓存大小的1,它只会在理想的发生场景)

中使用的选择将是exactbars=True。从文档exactbars(这是在给定为Cerebro的参数`任实例或调用的时候run

  `True` or `1`: all “lines” objects reduce memory usage to the
  automatically calculated minimum period.

  If a Simple Moving Average has a period of 30, the underlying data
  will have always a running buffer of 30 bars to allow the
  calculation of the Simple Moving Average

  * This setting will deactivate `preload` and `runonce`

  * Using this setting also deactivates **plotting**

为了最大限度地优化着想,因为密谋将被禁用时,以下也将被使用:stdstats=False,其禁用标准Observers现金,价值和交易(密谋有用,而不再在范围)

$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False
Cerebro Start Time:          2019-10-25 14:03:51.268508
Strat Init Time:             2019-10-25 14:03:51.284508
Time Loading Data Feeds:     0.02
Number of data feeds:        100
Strat Start Time:            2019-10-25 14:03:51.285507
Pre-Next Start Time:         2019-10-25 14:03:51.300508
Time Calculating Indicators: 0.02
Next Start Time:             2019-10-25 14:03:51.301507
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    0.03
End Time:                    2019-10-25 14:06:22.988508
Time in Strategy Next Logic: 151.69
Total Time in Strategy:      151.69
Total Time:                  151.72
Length of data feeds:        20000

内存Usage:75兆字节(从开始到稳定的端部回溯测试会话)

让“S比较开支的以前未优化run

  • Instead了90秒预加载数据,回溯测试开始immediately
  • The总时间为151.72秒VS173.26。的12.4%.
  • 在内存使用情况的改善68.5%.

Note

We可能实际上已经抛出100M蜡烛脚本和量消耗的内存将仍然固定在75 Mbytes

再次pypy

现在我们知道如何优化,让做它“做吧的pypyway.

$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False
Cerebro Start Time:          2019-10-25 14:10:11.715509
Strat Init Time:             2019-10-25 14:10:11.836510
Time Loading Data Feeds:     0.12
Number of data feeds:        100
Strat Start Time:            2019-10-25 14:10:11.841509
Pre-Next Start Time:         2019-10-25 14:10:11.912513
Time Calculating Indicators: 0.07
Next Start Time:             2019-10-25 14:10:11.913509
Strat warm-up period Time:   0.00
Time to Strat Next Logic:    0.20
End Time:                    2019-10-25 14:11:31.827509
Time in Strategy Next Logic: 79.91
Total Time in Strategy:      79.91
Total Time:                  80.11
Length of data feeds:        20000

存储器Usage:在恒定49 Mbytes

它相较于先前的等效运行:

  • 80.11秒VS151.7247.2%的改善运行time
  • 49 Mbytes VS75 Mbytes34.6%improvement.

A与trading

The脚本可以创建指标完成后运行(移动平均),并执行一个short/long在100中的数据的策略使用运动的交叉馈送平均值。让“s的pypy和optimizations.

$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False --strat indicators=True,trade=True
Cerebro Start Time:          2019-10-25 23:03:25.391254
Strat Init Time:             2019-10-25 23:03:25.498254
Time Loading Data Feeds:     0.11
Number of data feeds:        100
Total indicators:            300
Moving Average to be used:   SMA
Indicators period 1:         10
Indicators period 2:         50
Strat Start Time:            2019-10-25 23:03:26.327256
Pre-Next Start Time:         2019-10-25 23:03:26.463253
Time Calculating Indicators: 0.14
Next Start Time:             2019-10-25 23:03:27.850254
Strat warm-up period Time:   1.39
Time to Strat Next Logic:    2.46
End Time:                    2019-10-25 23:07:44.387255
Time in Strategy Next Logic: 256.54
Total Time in Strategy:      257.92
Total Time:                  259.00
Length of data feeds:        20000

Memory Usage做到这一点:`的峰值1050 Mbytes是observed.

The执行时间已经明显增加(指标+交易),但为什么内存使用量增加?

达成任何结论之前,让“S运行它创建的指标,但没有trading

$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False --strat indicators=True
Cerebro Start Time:          2019-10-25 23:09:14.387256
Strat Init Time:             2019-10-25 23:09:14.487255
Time Loading Data Feeds:     0.10
Number of data feeds:        100
Total indicators:            300
Moving Average to be used:   SMA
Indicators period 1:         10
Indicators period 2:         50
Strat Start Time:            2019-10-25 23:09:15.184255
Pre-Next Start Time:         2019-10-25 23:09:15.367257
Time Calculating Indicators: 0.18
Next Start Time:             2019-10-25 23:09:16.719257
Strat warm-up period Time:   1.35
Time to Strat Next Logic:    2.33
End Time:                    2019-10-25 23:11:40.304254
Time in Strategy Next Logic: 143.58
Total Time in Strategy:      144.94
Total Time:                  145.92
Length of data feeds:        20000

Memory Usage58 Mbytes

随着在手:内存使用量增加真当trading。该原因是OrderTrade对象被创建,传递周围,​​并保持由broker.

Note

Take到该数据集包含随机值帐户,其产生数量庞大交叉的,因此一个enourmous大量的订单和交易。上述不应预期有规律的数据set.

Conclusions

The假claim

Already类似的行为被证明为bogus.

General

  1. backtrader可以轻松处理2M使用默认配置蜡烛(与存储器内数据预加载)
  2. backtrader可在非预加载优化模式下操作减少缓冲器到最小为外的芯 – 存储器backtesting
  3. Whenbacktesting在优化的非预加载模式,增加的存储器消费来自于管理费用,其经纪人generates.

Using Python和/或backtrader这些cases

Withpypy,交易启用,并且随机数据集(比平时高交易),整个2M条在一个总的处理:

  • 259.00秒,即:4 minutes and 19 seconds

考虑到这是一台笔记本电脑完成运行多个其他的事情同时,可以得出的结论是2M条可done.

What有关8000股票的情况?

执行时间必须要缩放80,因此:

  • 20,800 seconds(或几乎347 minutes5 hours and 47 minutes)将需要运行scenario.

Even该随机组假定标准的数据集,其将产生少得多的操作,一个仍然在被谈论返回检验hours3 or 4

这样,与工作流只backtrader为研究和回溯测试工具似乎远远fetched.

Using替代的工作流程是,恕我直言,可能的:

  • 研究与pandasML
  • 回测最有前途的思路与方法backtrader(可能减少数据研究阶段后置)

测试Script

这里的源code

#!/usr/bin/env python
# -*- coding: utf-8; py-indent-offset:4 -*-
###############################################################################
import argparse
import datetime

import backtrader as bt


class St(bt.Strategy):
    params = dict(
        indicators=False,
        indperiod1=10,
        indperiod2=50,
        indicator=bt.ind.SMA,
        trade=False,
    )

    def __init__(self):
        self.dtinit = datetime.datetime.now()
        print("Strat Init Time:             {}".format(self.dtinit))
        loaddata = (self.dtinit - self.env.dtcerebro).total_seconds()
        print("Time Loading Data Feeds:     {:.2f}".format(loaddata))

        print("Number of data feeds:        {}".format(len(self.datas)))
        if self.p.indicators:
            total_ind = self.p.indicators * 3 * len(self.datas)
            print("Total indicators:            {}".format(total_ind))
            indname = self.p.indicator.__name__
            print("Moving Average to be used:   {}".format(indname))
            print("Indicators period 1:         {}".format(self.p.indperiod1))
            print("Indicators period 2:         {}".format(self.p.indperiod2))

            self.macross = {}
            for d in self.datas:
                ma1 = self.p.indicator(d, period=self.p.indperiod1)
                ma2 = self.p.indicator(d, period=self.p.indperiod2)
                self.macross[d] = bt.ind.CrossOver(ma1, ma2)

    def start(self):
        self.dtstart = datetime.datetime.now()
        print("Strat Start Time:            {}".format(self.dtstart))

    def prenext(self):
        if len(self.data0) == 1:  # only 1st time
            self.dtprenext = datetime.datetime.now()
            print("Pre-Next Start Time:         {}".format(self.dtprenext))
            indcalc = (self.dtprenext - self.dtstart).total_seconds()
            print("Time Calculating Indicators: {:.2f}".format(indcalc))

    def nextstart(self):
        if len(self.data0) == 1:  # there was no prenext
            self.dtprenext = datetime.datetime.now()
            print("Pre-Next Start Time:         {}".format(self.dtprenext))
            indcalc = (self.dtprenext - self.dtstart).total_seconds()
            print("Time Calculating Indicators: {:.2f}".format(indcalc))

        self.dtnextstart = datetime.datetime.now()
        print("Next Start Time:             {}".format(self.dtnextstart))
        warmup = (self.dtnextstart - self.dtprenext).total_seconds()
        print("Strat warm-up period Time:   {:.2f}".format(warmup))
        nextstart = (self.dtnextstart - self.env.dtcerebro).total_seconds()
        print("Time to Strat Next Logic:    {:.2f}".format(nextstart))
        self.next()

    def next(self):
        if not self.p.trade:
            return

        for d, macross in self.macross.items():
            if macross > 0:
                self.order_target_size(data=d, target=1)
            elif macross < 0:
                self.order_target_size(data=d, target=-1)

    def stop(self):
        dtstop = datetime.datetime.now()
        print("End Time:                    {}".format(dtstop))
        nexttime = (dtstop - self.dtnextstart).total_seconds()
        print("Time in Strategy Next Logic: {:.2f}".format(nexttime))
        strattime = (dtstop - self.dtprenext).total_seconds()
        print("Total Time in Strategy:      {:.2f}".format(strattime))
        totaltime = (dtstop - self.env.dtcerebro).total_seconds()
        print("Total Time:                  {:.2f}".format(totaltime))
        print("Length of data feeds:        {}".format(len(self.data)))


def run(args=None):
    args = parse_args(args)

    cerebro = bt.Cerebro()

    datakwargs = dict(timeframe=bt.TimeFrame.Minutes, compression=15)
    for i in range(args.numfiles):
        dataname = "candles{:02d}.csv".format(i)
        data = bt.feeds.GenericCSVData(dataname=dataname, **datakwargs)
        cerebro.adddata(data)

    cerebro.addstrategy(St, **eval("dict(" + args.strat + ")"))
    cerebro.dtcerebro = dt0 = datetime.datetime.now()
    print("Cerebro Start Time:          {}".format(dt0))
    cerebro.run(**eval("dict(" + args.cerebro + ")"))


def parse_args(pargs=None):
    parser = argparse.ArgumentParser(
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
        description=(
            "Backtrader Basic Script"
        )
    )

    parser.add_argument("--numfiles", required=False, default=100, type=int,
                        help="Number of files to rea")

    parser.add_argument("--cerebro", required=False, default="",
                        metavar="kwargs", help="kwargs in key=value format")

    parser.add_argument("--strat", "--strategy", required=False, default="",
                        metavar="kwargs", help="kwargs in key=value format")


    return parser.parse_args(pargs)


if __name__ == "__main__":
    run()

评论被关闭。