在回测性能和出核内存Execution- backtrader中文的教程
在回测性能和出核内存Execution
There都经过最近两次https://redit.com/r/algotrading线程这是灵感这个article.
- A线程有虚假声称backtrader不能1.6M蜡烛应对:reddit的/ R / algotrading – 一个高性能的回溯测试系统?
- 而另一个要求的东西,可回测8000宇宙股票:reddit的/ R / algotrading – 支持1000+股票回溯测试库?
随着笔者询问了一个框架,可以回溯测试“”外的核心/显存“*”,因为显然它不能所有的数据加载到记忆“
我们”会是当然的解决这些概念与backtrader
The 2M Candles
In为了做到这一点,第一件事就是产生的量蜡烛。鉴于海报谈到77股票和1.6M的蜡烛,这将达到每只股票20779支蜡烛,所以我们“会做以下有不错numbers
- 产生100支stocks
- Generate 20,000 stock
Ie蜡烛蜡烛:100个文件总计2M candles.
The script
import numpy as np import pandas as pd COLUMNS = ["open", "high", "low", "close", "volume", "openinterest"] CANDLES = 20000 STOCKS dateindex = pd.date_range(start="2010-01-01", periods=CANDLES, freq="15min") for i in range(STOCKS): data = np.random.randint(10, 20, size=(CANDLES, len(COLUMNS))) df = pd.DataFrame(data * 1.01, dateindex, columns=COLUMNS) df = df.rename_axis("datetime") df.to_csv("candles{:02d}.csv".format(i))
This产生100个文件名为candles00.csv
…高达candles99.csv
。该实际值并不重要。具有通常的datetime
, OHLCV
组件(和OpenInterest
)被什么matters.
The测试system
- Hardware / OS:一个Windows 1015.6″ 膝上型计算机与英特尔I7和存储器32个千兆字节将是used.
- Python:
3.6.1 64 bits
和pypy3 6.0.0
- 其他:应用程序持续运行,并采取20%左右的中央处理器。例如Chrome浏览器(102对),边缘,Word,PowerPoint和Excel的running
Default配置Execution
Our测试脚本的东西(见下方为完整的源代码),将打开这些100个文件和处理它们使用默认的backtraderconfiguration.
$ ./two-million-candles.py Cerebro Start Time: 2019-10-25 13:59:53.662508 Strat Init Time: 2019-10-25 14:01:32.288510 Time Loading Data Feeds: 98.63 Number of data feeds: 100 Strat Start Time: 2019-10-25 14:01:32.320509 Pre-Next Start Time: 2019-10-25 14:01:33.299509 Time Calculating Indicators: 0.98 Next Start Time: 2019-10-25 14:01:33.299509 Strat warm-up period Time: 0.00 Time to Strat Next Logic: 99.64 End Time: 2019-10-25 14:02:46.926509 Time in Strategy Next Logic: 73.63 Total Time in Strategy: 73.63 Total Time: 173.26 Length of data feeds: 20000
Memory Usage:348 MB的峰时observed
Most实际上是花了预加载数据(98.63
秒),花其余的在战略,其中包括通过经纪人在去每次迭代(73.63
秒)。总时间是173.26
seconds.
根据您想如何计算它的表现是:
11,543
蜡烛/秒考虑到整个运行time27.162
蜡烛/秒只考虑在strategy
Bottomline花费的时间:在两个书签交易螺纹上面的1st的权利要求backtrader不能处理1.6M蜡烛是FALSE.
用做它pypy
因为线程声称,使用pypy
没“T帮助,让”看看会发生什么使用it.
$ ./two-million-candles.py Cerebro Start Time: 2019-10-25 14:14:20.167032 Strat Init Time: 2019-10-25 14:15:18.768034 Time Loading Data Feeds: 58.60 Number of data feeds: 100 Strat Start Time: 2019-10-25 14:15:18.893032 Pre-Next Start Time: 2019-10-25 14:15:19.218031 Time Calculating Indicators: 0.32 Next Start Time: 2019-10-25 14:15:19.219032 Strat warm-up period Time: 0.00 Time to Strat Next Logic: 59.05 End Time: 2019-10-25 14:15:29.306033 Time in Strategy Next Logic: 10.09 Total Time in Strategy: 10.09 Total Time: 69.14 Length of data feeds: 20000
Holy牛时!总时间已经下降到69.14
秒,从总173.26
秒。性能比doubled.
内存Usage:269个Mbytes.
This也比标准的CPython interpreter.
Handling 2M的蜡烛出核心memory
所有这一切都可以如果认为backtrader有几个用于回溯测试会话的执行,包括配置选项优化缓冲区,只与最低工作需要设置数据(最好只缓存大小的1
,它只会在理想的发生场景)
中使用的选择将是exactbars=True
。从文档exactbars
(这是在给定为Cerebro
的参数`任实例或调用的时候run
)
`True` or `1`: all “lines” objects reduce memory usage to the automatically calculated minimum period. If a Simple Moving Average has a period of 30, the underlying data will have always a running buffer of 30 bars to allow the calculation of the Simple Moving Average * This setting will deactivate `preload` and `runonce` * Using this setting also deactivates **plotting**
为了最大限度地优化着想,因为密谋将被禁用时,以下也将被使用:stdstats=False
,其禁用标准Observers现金,价值和交易(密谋有用,而不再在范围)
$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False Cerebro Start Time: 2019-10-25 14:03:51.268508 Strat Init Time: 2019-10-25 14:03:51.284508 Time Loading Data Feeds: 0.02 Number of data feeds: 100 Strat Start Time: 2019-10-25 14:03:51.285507 Pre-Next Start Time: 2019-10-25 14:03:51.300508 Time Calculating Indicators: 0.02 Next Start Time: 2019-10-25 14:03:51.301507 Strat warm-up period Time: 0.00 Time to Strat Next Logic: 0.03 End Time: 2019-10-25 14:06:22.988508 Time in Strategy Next Logic: 151.69 Total Time in Strategy: 151.69 Total Time: 151.72 Length of data feeds: 20000
内存Usage:75兆字节(从开始到稳定的端部回溯测试会话)
让“S比较开支的以前未优化run
- Instead了
90
秒预加载数据,回溯测试开始immediately - The总时间为
151.72
秒VS173.26
。的12.4%
. - 在内存使用情况的改善
68.5%
.
Note
We可能实际上已经抛出100M蜡烛脚本和量消耗的内存将仍然固定在75 Mbytes
再次pypy
现在我们知道如何优化,让做它“做吧的pypy
way.
$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False Cerebro Start Time: 2019-10-25 14:10:11.715509 Strat Init Time: 2019-10-25 14:10:11.836510 Time Loading Data Feeds: 0.12 Number of data feeds: 100 Strat Start Time: 2019-10-25 14:10:11.841509 Pre-Next Start Time: 2019-10-25 14:10:11.912513 Time Calculating Indicators: 0.07 Next Start Time: 2019-10-25 14:10:11.913509 Strat warm-up period Time: 0.00 Time to Strat Next Logic: 0.20 End Time: 2019-10-25 14:11:31.827509 Time in Strategy Next Logic: 79.91 Total Time in Strategy: 79.91 Total Time: 80.11 Length of data feeds: 20000
存储器Usage:在恒定49 Mbytes
它相较于先前的等效运行:
80.11
秒VS151.72
或47.2%
的改善运行time49 Mbytes
VS75 Mbytes
或34.6%
improvement.
A与trading
The脚本可以创建指标完成后运行(移动平均),并执行一个short/long在100中的数据的策略使用运动的交叉馈送平均值。让“s的pypy
和optimizations.
$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False --strat indicators=True,trade=True Cerebro Start Time: 2019-10-25 23:03:25.391254 Strat Init Time: 2019-10-25 23:03:25.498254 Time Loading Data Feeds: 0.11 Number of data feeds: 100 Total indicators: 300 Moving Average to be used: SMA Indicators period 1: 10 Indicators period 2: 50 Strat Start Time: 2019-10-25 23:03:26.327256 Pre-Next Start Time: 2019-10-25 23:03:26.463253 Time Calculating Indicators: 0.14 Next Start Time: 2019-10-25 23:03:27.850254 Strat warm-up period Time: 1.39 Time to Strat Next Logic: 2.46 End Time: 2019-10-25 23:07:44.387255 Time in Strategy Next Logic: 256.54 Total Time in Strategy: 257.92 Total Time: 259.00 Length of data feeds: 20000
Memory Usage做到这一点:`的峰值1050 Mbytes
是observed.
The执行时间已经明显增加(指标+交易),但为什么内存使用量增加?
达成任何结论之前,让“S运行它创建的指标,但没有trading
$ ./two-million-candles.py --cerebro exactbars=True,stdstats=False --strat indicators=True Cerebro Start Time: 2019-10-25 23:09:14.387256 Strat Init Time: 2019-10-25 23:09:14.487255 Time Loading Data Feeds: 0.10 Number of data feeds: 100 Total indicators: 300 Moving Average to be used: SMA Indicators period 1: 10 Indicators period 2: 50 Strat Start Time: 2019-10-25 23:09:15.184255 Pre-Next Start Time: 2019-10-25 23:09:15.367257 Time Calculating Indicators: 0.18 Next Start Time: 2019-10-25 23:09:16.719257 Strat warm-up period Time: 1.35 Time to Strat Next Logic: 2.33 End Time: 2019-10-25 23:11:40.304254 Time in Strategy Next Logic: 143.58 Total Time in Strategy: 144.94 Total Time: 145.92 Length of data feeds: 20000
Memory Usage:58 Mbytes
随着在手:内存使用量增加真当trading。该原因是Order
和Trade
对象被创建,传递周围,并保持由broker.
Note
Take到该数据集包含随机值帐户,其产生数量庞大交叉的,因此一个enourmous大量的订单和交易。上述不应预期有规律的数据set.
Conclusions
The假claim
Already类似的行为被证明为bogus.
General
- backtrader可以轻松处理
2M
使用默认配置蜡烛(与存储器内数据预加载) - backtrader可在非预加载优化模式下操作减少缓冲器到最小为外的芯 – 存储器backtesting
- Whenbacktesting在优化的非预加载模式,增加的存储器消费来自于管理费用,其经纪人generates.
Using Python和/或backtrader这些cases
Withpypy
,交易启用,并且随机数据集(比平时高交易),整个2M条在一个总的处理:
259.00
秒,即:4 minutes and 19 seconds
考虑到这是一台笔记本电脑完成运行多个其他的事情同时,可以得出的结论是2M
条可done.
What有关8000
股票的情况?
执行时间必须要缩放80,因此:
20,800 seconds
(或几乎347 minutes
或5 hours and 47 minutes
)将需要运行scenario.
Even该随机组假定标准的数据集,其将产生少得多的操作,一个仍然在被谈论返回检验hours(3 or 4
)
这样,与工作流只backtrader为研究和回溯测试工具似乎远远fetched.
Using替代的工作流程是,恕我直言,可能的:
- 研究与
pandas
或ML
- 回测最有前途的思路与方法backtrader(可能减少数据研究阶段后置)
测试Script
这里的源code
#!/usr/bin/env python # -*- coding: utf-8; py-indent-offset:4 -*- ############################################################################### import argparse import datetime import backtrader as bt class St(bt.Strategy): params = dict( indicators=False, indperiod1=10, indperiod2=50, indicator=bt.ind.SMA, trade=False, ) def __init__(self): self.dtinit = datetime.datetime.now() print("Strat Init Time: {}".format(self.dtinit)) loaddata = (self.dtinit - self.env.dtcerebro).total_seconds() print("Time Loading Data Feeds: {:.2f}".format(loaddata)) print("Number of data feeds: {}".format(len(self.datas))) if self.p.indicators: total_ind = self.p.indicators * 3 * len(self.datas) print("Total indicators: {}".format(total_ind)) indname = self.p.indicator.__name__ print("Moving Average to be used: {}".format(indname)) print("Indicators period 1: {}".format(self.p.indperiod1)) print("Indicators period 2: {}".format(self.p.indperiod2)) self.macross = {} for d in self.datas: ma1 = self.p.indicator(d, period=self.p.indperiod1) ma2 = self.p.indicator(d, period=self.p.indperiod2) self.macross[d] = bt.ind.CrossOver(ma1, ma2) def start(self): self.dtstart = datetime.datetime.now() print("Strat Start Time: {}".format(self.dtstart)) def prenext(self): if len(self.data0) == 1: # only 1st time self.dtprenext = datetime.datetime.now() print("Pre-Next Start Time: {}".format(self.dtprenext)) indcalc = (self.dtprenext - self.dtstart).total_seconds() print("Time Calculating Indicators: {:.2f}".format(indcalc)) def nextstart(self): if len(self.data0) == 1: # there was no prenext self.dtprenext = datetime.datetime.now() print("Pre-Next Start Time: {}".format(self.dtprenext)) indcalc = (self.dtprenext - self.dtstart).total_seconds() print("Time Calculating Indicators: {:.2f}".format(indcalc)) self.dtnextstart = datetime.datetime.now() print("Next Start Time: {}".format(self.dtnextstart)) warmup = (self.dtnextstart - self.dtprenext).total_seconds() print("Strat warm-up period Time: {:.2f}".format(warmup)) nextstart = (self.dtnextstart - self.env.dtcerebro).total_seconds() print("Time to Strat Next Logic: {:.2f}".format(nextstart)) self.next() def next(self): if not self.p.trade: return for d, macross in self.macross.items(): if macross > 0: self.order_target_size(data=d, target=1) elif macross < 0: self.order_target_size(data=d, target=-1) def stop(self): dtstop = datetime.datetime.now() print("End Time: {}".format(dtstop)) nexttime = (dtstop - self.dtnextstart).total_seconds() print("Time in Strategy Next Logic: {:.2f}".format(nexttime)) strattime = (dtstop - self.dtprenext).total_seconds() print("Total Time in Strategy: {:.2f}".format(strattime)) totaltime = (dtstop - self.env.dtcerebro).total_seconds() print("Total Time: {:.2f}".format(totaltime)) print("Length of data feeds: {}".format(len(self.data))) def run(args=None): args = parse_args(args) cerebro = bt.Cerebro() datakwargs = dict(timeframe=bt.TimeFrame.Minutes, compression=15) for i in range(args.numfiles): dataname = "candles{:02d}.csv".format(i) data = bt.feeds.GenericCSVData(dataname=dataname, **datakwargs) cerebro.adddata(data) cerebro.addstrategy(St, **eval("dict(" + args.strat + ")")) cerebro.dtcerebro = dt0 = datetime.datetime.now() print("Cerebro Start Time: {}".format(dt0)) cerebro.run(**eval("dict(" + args.cerebro + ")")) def parse_args(pargs=None): parser = argparse.ArgumentParser( formatter_class=argparse.ArgumentDefaultsHelpFormatter, description=( "Backtrader Basic Script" ) ) parser.add_argument("--numfiles", required=False, default=100, type=int, help="Number of files to rea") parser.add_argument("--cerebro", required=False, default="", metavar="kwargs", help="kwargs in key=value format") parser.add_argument("--strat", "--strategy", required=False, default="", metavar="kwargs", help="kwargs in key=value format") return parser.parse_args(pargs) if __name__ == "__main__": run()
评论被关闭。