You are here:  Home » 量化交易与机器学习 » backtrader » 数据过滤器 – backtrader中文教程

前段时间,Ticket #23 让我想到了在该票的上下文中进行的讨论的潜在改进。

在票中我添加了一个DataFilter类,但这太复杂了。实际上让人想起内置的复杂性 DataResamplerDataReplayer,用于实现同名功能的类。

因此,由于有几个版本,backtrader支持向 数据馈送添加(如果您愿意,可以filter调用它)。processor使用该功能在内部重新实现了重采样和重放,并且一切似乎都不那么复杂(尽管仍然如此)

工作中的过滤器

给定现有的数据馈送/源,您可以使用addfilter数据馈送的方法:

data = MyDataFeed(name=myname)

data.addfilter(filter, *args, **kwargs)

显然,filter必须符合给定的接口,即:

  • 接受此签名的可调用对象:
callable(data, *args, **kwargs)

或者

  • 可以实例化和调用的类
    • 在实例化过程中,init方法必须支持签名:
def __init__(self, data, *args, **kwargs)
    • call和 last 方法这个:
def __call__(self, data)

def last(self, data)

将为数据源生成的每个数据可调用的调用/实例。

Ticket #23 的更好解决方案

哪张票想要:

  • 日内的相对量指标
  • 盘中数据可能缺失
  • 非交易时间的数据

实施几个过滤器可以缓解回测环境的情况。

过滤掉前/后市场数据

下面的过滤器(已经在 中可用backtrader)来拯救:

class SessionFilter(with_metaclass(metabase.MetaParams, object)):
    '''
    This class can be applied to a data source as a filter and will filter out
    intraday bars which fall outside of the regular session times (ie: pre/post
    market data)

    This is a "non-simple" filter and must manage the stack of the data (passed
    during init and __call__)

    It needs no "last" method because it has nothing to deliver
    '''
    def __init__(self, data):
        pass

    def __call__(self, data):
        '''
        Return Values:

          - False: data stream was not touched
          - True: data stream was manipulated (bar outside of session times and
          - removed)
        '''
        if data.sessionstart <= data.datetime.tm(0) <= data.sessionend:
            # Both ends of the comparison are in the session
            return False  # say the stream is untouched

        # bar outside of the regular session times
        data.backwards()  # remove bar from data stack
        return True  # signal the data was manipulated

过滤器使用数据中嵌入的会话开始/结束时间来过滤条形图

  • 如果新数据的日期时间在会话时间内,False则返回表示数据未触及
  • 如果数据时间超出范围,则发送数据源会 backwards有效地擦除最后生成的数据。并True返回以指示数据流已被操纵。

Tips:调用data.backwards()可能/可能是低级别的,过滤器应该有一个处理数据流内部的 API

脚本末尾的示例代码可以在有或没有过滤器的情况下运行。第一次运行是 100% 未过滤且未指定会话时间:

$ ./data-filler.py --writer --wrcsv

查看第一天的开始和结束:

===============================================================================
Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len
1,2006-01-02-volume-min-001,1,2006-01-02 09:01:00,3602.0,3603.0,3597.0,3599.0,5699.0,0.0,Strategy,1
2,2006-01-02-volume-min-001,2,2006-01-02 09:02:00,3600.0,3601.0,3598.0,3599.0,894.0,0.0,Strategy,2
...
...
581,2006-01-02-volume-min-001,581,2006-01-02 19:59:00,3619.0,3619.0,3619.0,3619.0,1.0,0.0,Strategy,581
582,2006-01-02-volume-min-001,582,2006-01-02 20:00:00,3618.0,3618.0,3617.0,3618.0,242.0,0.0,Strategy,582
583,2006-01-02-volume-min-001,583,2006-01-02 20:01:00,3618.0,3618.0,3617.0,3617.0,15.0,0.0,Strategy,583
584,2006-01-02-volume-min-001,584,2006-01-02 20:04:00,3617.0,3617.0,3617.0,3617.0,107.0,0.0,Strategy,584
585,2006-01-02-volume-min-001,585,2006-01-03 09:01:00,3623.0,3625.0,3622.0,3624.0,4026.0,0.0,Strategy,585
...

会议时间为 2006 年 1 月 2 日09:01:00至 20:04:00 。

现在运行 aSessionFilter并告诉脚本使用 09:30 和 17:30 作为会话的开始/结束时间:

$ ./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter

===============================================================================
Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len
1,2006-01-02-volume-min-001,1,2006-01-02 09:30:00,3604.0,3605.0,3603.0,3604.0,546.0,0.0,Strategy,1
2,2006-01-02-volume-min-001,2,2006-01-02 09:31:00,3604.0,3606.0,3604.0,3606.0,438.0,0.0,Strategy,2
...
...
445,2006-01-02-volume-min-001,445,2006-01-02 17:29:00,3621.0,3621.0,3620.0,3620.0,866.0,0.0,Strategy,445
446,2006-01-02-volume-min-001,446,2006-01-02 17:30:00,3620.0,3621.0,3619.0,3621.0,1670.0,0.0,Strategy,446
447,2006-01-02-volume-min-001,447,2006-01-03 09:30:00,3637.0,3638.0,3635.0,3636.0,1458.0,0.0,Strategy,447
...

数据输出现在从 09:30 开始,到 17:30 结束。上市前/上市后数据已被过滤掉。

填写缺失数据

对输出的更深入检查显示以下内容:

...
61,2006-01-02-volume-min-001,61,2006-01-02 10:30:00,3613.0,3614.0,3613.0,3614.0,112.0,0.0,Strategy,61
62,2006-01-02-volume-min-001,62,2006-01-02 10:31:00,3614.0,3614.0,3614.0,3614.0,183.0,0.0,Strategy,62
63,2006-01-02-volume-min-001,63,2006-01-02 10:34:00,3614.0,3614.0,3614.0,3614.0,841.0,0.0,Strategy,63
64,2006-01-02-volume-min-001,64,2006-01-02 10:35:00,3614.0,3614.0,3614.0,3614.0,17.0,0.0,Strategy,64
...

缺少 10:32 和 10:33 分钟的数据。作为今年的第一个交易日,可能根本没有谈判。或者数据馈送可能未能捕获该数据。

出于 Ticket #23 的目的,并能够将给定分钟的交易量与前一天的同一分钟进行比较,我们将填写缺失的数据。

已经backtrader有一个SessionFiller正如预期的那样填充缺失的数据。代码很长并且比过滤器更复杂(完整实现见最后),但让我们看看类/参数定义:

class SessionFiller(with_metaclass(metabase.MetaParams, object)):
    '''
    Bar Filler for a Data Source inside the declared session start/end times.

    The fill bars are constructed using the declared Data Source ``timeframe``
    and ``compression`` (used to calculate the intervening missing times)

    Params:

      - fill_price (def: None):

        If None is passed, the closing price of the previous bar will be
        used. To end up with a bar which for example takes time but it is not
        displayed in a plot ... use float('Nan')

      - fill_vol (def: float('NaN')):

        Value to use to fill the missing volume

      - fill_oi (def: float('NaN')):

        Value to use to fill the missing Open Interest

      - skip_first_fill (def: True):

        Upon seeing the 1st valid bar do not fill from the sessionstart up to
        that bar
    '''
    params = (('fill_price', None),
              ('fill_vol', float('NaN')),
              ('fill_oi', float('NaN')),
              ('skip_first_fill', True))

示例脚本现在可以过滤和填充数据:

./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter --filler

...
62,2006-01-02-volume-min-001,62,2006-01-02 10:31:00,3614.0,3614.0,3614.0,3614.0,183.0,0.0,Strategy,62
63,2006-01-02-volume-min-001,63,2006-01-02 10:32:00,3614.0,3614.0,3614.0,3614.0,0.0,,Strategy,63
64,2006-01-02-volume-min-001,64,2006-01-02 10:33:00,3614.0,3614.0,3614.0,3614.0,0.0,,Strategy,64
65,2006-01-02-volume-min-001,65,2006-01-02 10:34:00,3614.0,3614.0,3614.0,3614.0,841.0,0.0,Strategy,65
...

分钟 10:32 和 10:33 在那里。该脚本使用最后一个已知的“收盘价”来填充价格值并将交易量和未平仓量字段设置为 0。该脚本接受一个--fvol参数以将交易量设置为任何值(包括“NaN”)

完成票 #23

有了SessionFilterSessionFiller以下已经完成:

  • 未交付前/后市场数据
  • 没有数据(在给定的时间范围内)丢失

RelativeVolume现在不再需要在 Ticket 23 中讨论的“同步”来实现 指标,因为所有日子都有完全相同的柱数(在示例中,从 09:30 到 17:30 的所有分钟都包括在内)

请记住,默认设置是将缺失的交易量设置为0一个简单的 RelativeVolume指标,可以开发:

class RelativeVolume(bt.Indicator):
    csv = True  # show up in csv output (default for indicators is False)

    lines = ('relvol',)
    params = (
        ('period', 20),
        ('volisnan', True),
    )

    def __init__(self):
        if self.p.volisnan:
            # if missing volume will be NaN, do a simple division
            # the end result for missing volumes will also be NaN
            relvol = self.data.volume(-self.p.period) / self.data.volume
        else:
            # Else do a controlled Div with a built-in function
            relvol = bt.DivByZero(
                self.data.volume(-self.p.period),
                self.data.volume,
                zero=0.0)

        self.lines.relvol = relvol

这很聪明,可以通过使用内置辅助来避免被零除 backtrader

在脚本的下一次调用中将所有部分放在一起:

./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter --filler --relvol

===============================================================================
Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len,RelativeVolume,len,relvol
1,2006-01-02-volume-min-001,1,2006-01-02 09:30:00,3604.0,3605.0,3603.0,3604.0,546.0,0.0,Strategy,1,RelativeVolume,1,
2,2006-01-02-volume-min-001,2,2006-01-02 09:31:00,3604.0,3606.0,3604.0,3606.0,438.0,0.0,Strategy,2,RelativeVolume,2,
...

RelativeVolume正如预期的那样,该指标在第 1根柱线期间没有产生任何输出 。周期在脚本中计算为:(17:30 – 09:30 * 60) + 1。让我们直接看一下第二天 10:32 和 10:33 的相对交易量如何,假设 1 st天,体积值被填充0

...
543,2006-01-02-volume-min-001,543,2006-01-03 10:31:00,3648.0,3648.0,3647.0,3648.0,56.0,0.0,Strategy,543,RelativeVolume,543,3.26785714286
544,2006-01-02-volume-min-001,544,2006-01-03 10:32:00,3647.0,3648.0,3647.0,3647.0,313.0,0.0,Strategy,544,RelativeVolume,544,0.0
545,2006-01-02-volume-min-001,545,2006-01-03 10:33:00,3647.0,3647.0,3647.0,3647.0,135.0,0.0,Strategy,545,RelativeVolume,545,0.0
546,2006-01-02-volume-min-001,546,2006-01-03 10:34:00,3648.0,3648.0,3647.0,3648.0,171.0,0.0,Strategy,546,RelativeVolume,546,4.91812865497
...

两者都按预期设置0

结论

数据源中的filter机制开启了完全操纵数据流的可能性。谨慎使用。

脚本代码和用法

可作为以下来源的样本backtrader

usage: data-filler.py [-h] [--data DATA] [--filter] [--filler] [--fvol FVOL]
                      [--tstart TSTART] [--tend TEND] [--relvol]
                      [--fromdate FROMDATE] [--todate TODATE] [--writer]
                      [--wrcsv] [--plot] [--numfigs NUMFIGS]

DataFilter/DataFiller Sample

optional arguments:
  -h, --help            show this help message and exit
  --data DATA, -d DATA  data to add to the system
  --filter, -ft         Filter using session start/end times
  --filler, -fl         Fill missing bars inside start/end times
  --fvol FVOL           Use as fill volume for missing bar (def: 0.0)
  --tstart TSTART, -ts TSTART
                        Start time for the Session Filter (HH:MM)
  --tend TEND, -te TEND
                        End time for the Session Filter (HH:MM)
  --relvol, -rv         Add relative volume indicator
  --fromdate FROMDATE, -f FROMDATE
                        Starting date in YYYY-MM-DD format
  --todate TODATE, -t TODATE
                        Starting date in YYYY-MM-DD format
  --writer, -w          Add a writer to cerebro
  --wrcsv, -wc          Enable CSV Output in the writer
  --plot, -p            Plot the read data
  --numfigs NUMFIGS, -n NUMFIGS
                        Plot using numfigs figures

源码:

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import argparse
import datetime
import math

# The above could be sent to an independent module
import backtrader as bt
import backtrader.feeds as btfeeds
import backtrader.utils.flushfile
import backtrader.filters as btfilters

from relativevolume import RelativeVolume


def runstrategy():
    args = parse_args()

    # Create a cerebro
    cerebro = bt.Cerebro()

    # Get the dates from the args
    fromdate = datetime.datetime.strptime(args.fromdate, '%Y-%m-%d')
    todate = datetime.datetime.strptime(args.todate, '%Y-%m-%d')

    # Get the session times to pass them to the indicator
    # datetime.time has no strptime ...
    dtstart = datetime.datetime.strptime(args.tstart, '%H:%M')
    dtend = datetime.datetime.strptime(args.tend, '%H:%M')

    # Create the 1st data
    data = btfeeds.BacktraderCSVData(
        dataname=args.data,
        fromdate=fromdate,
        todate=todate,
        timeframe=bt.TimeFrame.Minutes,
        compression=1,
        sessionstart=dtstart,  # internally just the "time" part will be used
        sessionend=dtend,  # internally just the "time" part will be used
    )

    if args.filter:
        data.addfilter(btfilters.SessionFilter)

    if args.filler:
        data.addfilter(btfilters.SessionFiller, fill_vol=args.fvol)

    # Add the data to cerebro
    cerebro.adddata(data)

    if args.relvol:
        # Calculate backward period - tend tstart are in same day
        # + 1 to include last moment of the interval dstart <-> dtend
        td = ((dtend - dtstart).seconds // 60) + 1
        cerebro.addindicator(RelativeVolume,
                             period=td,
                             volisnan=math.isnan(args.fvol))

    # Add an empty strategy
    cerebro.addstrategy(bt.Strategy)

    # Add a writer with CSV
    if args.writer:
        cerebro.addwriter(bt.WriterFile, csv=args.wrcsv)

    # And run it - no trading - disable stdstats
    cerebro.run(stdstats=False)

    # Plot if requested
    if args.plot:
        cerebro.plot(numfigs=args.numfigs, volume=True)


def parse_args():
    parser = argparse.ArgumentParser(
        description='DataFilter/DataFiller Sample')

    parser.add_argument('--data', '-d',
                        default='../../datas/2006-01-02-volume-min-001.txt',
                        help='data to add to the system')

    parser.add_argument('--filter', '-ft', action='store_true',
                        help='Filter using session start/end times')

    parser.add_argument('--filler', '-fl', action='store_true',
                        help='Fill missing bars inside start/end times')

    parser.add_argument('--fvol', required=False, default=0.0,
                        type=float,
                        help='Use as fill volume for missing bar (def: 0.0)')

    parser.add_argument('--tstart', '-ts',
                        # default='09:14:59',
                        # help='Start time for the Session Filter (%H:%M:%S)')
                        default='09:15',
                        help='Start time for the Session Filter (HH:MM)')

    parser.add_argument('--tend', '-te',
                        # default='17:15:59',
                        # help='End time for the Session Filter (%H:%M:%S)')
                        default='17:15',
                        help='End time for the Session Filter (HH:MM)')

    parser.add_argument('--relvol', '-rv', action='store_true',
                        help='Add relative volume indicator')

    parser.add_argument('--fromdate', '-f',
                        default='2006-01-01',
                        help='Starting date in YYYY-MM-DD format')

    parser.add_argument('--todate', '-t',
                        default='2006-12-31',
                        help='Starting date in YYYY-MM-DD format')

    parser.add_argument('--writer', '-w', action='store_true',
                        help='Add a writer to cerebro')

    parser.add_argument('--wrcsv', '-wc', action='store_true',
                        help='Enable CSV Output in the writer')

    parser.add_argument('--plot', '-p', action='store_true',
                        help='Plot the read data')

    parser.add_argument('--numfigs', '-n', default=1,
                        help='Plot using numfigs figures')

    return parser.parse_args()


if __name__ == '__main__':
    runstrategy()

 

评论被关闭。