You are here:  Home » 量化交易与机器学习 » backtrader » 数据过滤器 – backtrader中文教程

数据Filters

Some前段时间票务#23让我想起了一个潜在的改善为其在ticket.

  • To票#23的背景下举行的讨论:https://github.com/mementum/backtrader/issues/23

Within我加票一个DataFilter类,但这种过于复杂。其实让人联想到这是建于复杂的DataResamplerDataReplayer,所使用的类来实现的同样names.

As的功能,因为几个版本的这种和backtrader支持加入filter(称之为processor如果你想)到数据源。重采样使用的功能,一切都重播内部重新实现似乎不太复杂(尽管它仍是)

在work

Given过滤器使用一个现有的数据进料/源的addfilter的数据的`方法饲料:

data = MyDataFeed(name=myname)

data.addfilter(filter, *args, **kwargs)

显然filter必须符合给定的接口,是这样的:

  • 可调用它接受这个签名:
    callable(data, *args, **kwargs)
    

or

  • 可实例化并called
    • 在实例化过程中init方法必须支持签名:
    def __init__(self, data, *args, **kwargs)
    
    • `的call和最后一个方法,这一个:
    def __call__(self, data)
    
    def last(self, data)
    

可调用/实例将要求每个数据的数据源是producing.

A门票#23

That票更好的解决办法想:

  • 一个RelativeVolumeIndicator期指盘中basis
  • 日内数据可能missing
  • Pre /后会话数据可能arrive

Implementing一对过滤器缓解情况的回溯测试environment.

筛选出前/后市场Data

The以下过滤器(已经可以在backtrader)就派上用场了:

class SessionFilter(with_metaclass(metabase.MetaParams, object)):
    """
    This class can be applied to a data source as a filter and will filter out
    intraday bars which fall outside of the regular session times (ie: pre/post
    market data)

    This is a "non-simple" filter and must manage the stack of the data (passed
    during init and __call__)

    It needs no "last" method because it has nothing to deliver
    """
    def __init__(self, data):
        pass

    def __call__(self, data):
        """
        Return Values:

          - False: data stream was not touched
          - True: data stream was manipulated (bar outside of session times and
          - removed)
        """
        if data.sessionstart <= data.datetime.tm(0) <= data.sessionend:
            # Both ends of the comparison are in the session
            return False  # say the stream is untouched

        # bar outside of the regular session times
        data.backwards()  # remove bar from data stack
        return True  # signal the data was manipulated

过滤器使用中最嵌入的数据会话启动/结束时间来过滤bars

  • 如果新数据的日期时间是会话的时间内False是返回指示数据是untouched
  • 如果日期时间在上述范围之外,则数据源被发送backwards有效清除最后产生的数据。和True是返回指示数据流已manipulated.

Note

Callingdata.backwards()可能是/可能低的水平和过滤器应具有与数据的内部打交道的API在脚本的末尾stream

The示例代码可使用和不使用过滤器来运行。该第一次运行100%未经过滤的和不指定会议时间:

$ ./data-filler.py --writer --wrcsv

看着开始和1st一天结束:

===============================================================================
Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len
1,2006-01-02-volume-min-001,1,2006-01-02 09:01:00,3602.0,3603.0,3597.0,3599.0,5699.0,0.0,Strategy,1
2,2006-01-02-volume-min-001,2,2006-01-02 09:02:00,3600.0,3601.0,3598.0,3599.0,894.0,0.0,Strategy,2
...
...
581,2006-01-02-volume-min-001,581,2006-01-02 19:59:00,3619.0,3619.0,3619.0,3619.0,1.0,0.0,Strategy,581
582,2006-01-02-volume-min-001,582,2006-01-02 20:00:00,3618.0,3618.0,3617.0,3618.0,242.0,0.0,Strategy,582
583,2006-01-02-volume-min-001,583,2006-01-02 20:01:00,3618.0,3618.0,3617.0,3617.0,15.0,0.0,Strategy,583
584,2006-01-02-volume-min-001,584,2006-01-02 20:04:00,3617.0,3617.0,3617.0,3617.0,107.0,0.0,Strategy,584
585,2006-01-02-volume-min-001,585,2006-01-03 09:01:00,3623.0,3625.0,3622.0,3624.0,4026.0,0.0,Strategy,585
...

会议从9点01分00秒上2006.Now运行一月的2nd

运行二十点04分00秒以SessionFilter并告诉脚本中使用09:30和17:30作为会话的开始/结束时间:

$ ./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter

===============================================================================
Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len
1,2006-01-02-volume-min-001,1,2006-01-02 09:30:00,3604.0,3605.0,3603.0,3604.0,546.0,0.0,Strategy,1
2,2006-01-02-volume-min-001,2,2006-01-02 09:31:00,3604.0,3606.0,3604.0,3606.0,438.0,0.0,Strategy,2
...
...
445,2006-01-02-volume-min-001,445,2006-01-02 17:29:00,3621.0,3621.0,3620.0,3620.0,866.0,0.0,Strategy,445
446,2006-01-02-volume-min-001,446,2006-01-02 17:30:00,3620.0,3621.0,3619.0,3621.0,1670.0,0.0,Strategy,446
447,2006-01-02-volume-min-001,447,2006-01-03 09:30:00,3637.0,3638.0,3635.0,3636.0,1458.0,0.0,Strategy,447
...

的数据输出现在开始于09:30和17:30结束。前/后市场数据有在缺少输出的Data

A更深层次的检验被过滤out.

Filling显示以下内容:

...
61,2006-01-02-volume-min-001,61,2006-01-02 10:30:00,3613.0,3614.0,3613.0,3614.0,112.0,0.0,Strategy,61
62,2006-01-02-volume-min-001,62,2006-01-02 10:31:00,3614.0,3614.0,3614.0,3614.0,183.0,0.0,Strategy,62
63,2006-01-02-volume-min-001,63,2006-01-02 10:34:00,3614.0,3614.0,3614.0,3614.0,841.0,0.0,Strategy,63
64,2006-01-02-volume-min-001,64,2006-01-02 10:35:00,3614.0,3614.0,3614.0,3614.0,17.0,0.0,Strategy,64
...

数据分钟10:32和10:33缺失。身为1st的交易日一年里可能已经没有谈判可言。或者该数据源可能已失败以捕获data.

For门票#23的目的,并能够比较给定的量与前一天相同的一分一秒,我们将在缺少尽显data.

Already在backtrader还有SessionFiller这是预期罢了在丢失的数据。该代码是长并承担比更复杂过滤器(参见年底全面实施),但让我们来看看类/ PARAMS定义:

class SessionFiller(with_metaclass(metabase.MetaParams, object)):
    """
    Bar Filler for a Data Source inside the declared session start/end times.

    The fill bars are constructed using the declared Data Source ``timeframe``
    and ``compression`` (used to calculate the intervening missing times)

    Params:

      - fill_price (def: None):

        If None is passed, the closing price of the previous bar will be
        used. To end up with a bar which for example takes time but it is not
        displayed in a plot ... use float("Nan")

      - fill_vol (def: float("NaN")):

        Value to use to fill the missing volume

      - fill_oi (def: float("NaN")):

        Value to use to fill the missing Open Interest

      - skip_first_fill (def: True):

        Upon seeing the 1st valid bar do not fill from the sessionstart up to
        that bar
    """
    params = (("fill_price", None),
              ("fill_vol", float("NaN")),
              ("fill_oi", float("NaN")),
              ("skip_first_fill", True))

示例脚本现在可以过滤并填写数据:

./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter --filler

...
62,2006-01-02-volume-min-001,62,2006-01-02 10:31:00,3614.0,3614.0,3614.0,3614.0,183.0,0.0,Strategy,62
63,2006-01-02-volume-min-001,63,2006-01-02 10:32:00,3614.0,3614.0,3614.0,3614.0,0.0,,Strategy,63
64,2006-01-02-volume-min-001,64,2006-01-02 10:33:00,3614.0,3614.0,3614.0,3614.0,0.0,,Strategy,64
65,2006-01-02-volume-min-001,65,2006-01-02 10:34:00,3614.0,3614.0,3614.0,3614.0,841.0,0.0,Strategy,65
...

分钟10:32和10:33在那里。该脚本使用最后一个已知的“关闭”价格以填充价格值并设定体积并openinterest字段为0。脚本接受了--fvol参数将音量设置为任何事情(包括“楠)

完成票务#23

With的SessionFilterSessionFiller以下已完成:

  • 前/后市场数据不delivered
  • 无数据(对于给定的时间范围内)是missing

Now的“同步”,在票23讨论的实施RelativeVolume指标不再需要,因为所有的日子恰好有相同数量的杆件(在本例中所有分钟从09:30到17:30既包括)

记住,默认是缺少的音量设置为0的容易RelativeVolume指标可开发:

class RelativeVolume(bt.Indicator):
    csv = True  # show up in csv output (default for indicators is False)

    lines = ("relvol",)
    params = (
        ("period", 20),
        ("volisnan", True),
    )

    def __init__(self):
        if self.p.volisnan:
            # if missing volume will be NaN, do a simple division
            # the end result for missing volumes will also be NaN
            relvol = self.data.volume(-self.p.period) / self.data.volume
        else:
            # Else do a controlled Div with a built-in function
            relvol = bt.DivByZero(
                self.data.volume(-self.p.period),
                self.data.volume,
                zero=0.0)

        self.lines.relvol = relvol

它足够聪明,通过使用以避免被零除内置援助backtrader.

将所有片一起在脚本的下一次调用:

./data-filler.py --writer --wrcsv --tstart 09:30 --tend 17:30 --filter --filler --relvol

===============================================================================
Id,2006-01-02-volume-min-001,len,datetime,open,high,low,close,volume,openinterest,Strategy,len,RelativeVolume,len,relvol
1,2006-01-02-volume-min-001,1,2006-01-02 09:30:00,3604.0,3605.0,3603.0,3604.0,546.0,0.0,Strategy,1,RelativeVolume,1,
2,2006-01-02-volume-min-001,2,2006-01-02 09:31:00,3604.0,3606.0,3604.0,3606.0,438.0,0.0,Strategy,2,RelativeVolume,2,
...

RelativeVolume指示剂产生的1st`期间没有输出,正如所料,酒吧。周期是在脚本中计算公式为:(17:30 – 09:30 * 60)+ 1,让我们直视相对量第二的外观为10:32和10:33一天,鉴于1st一天,音量值充满了0

...
543,2006-01-02-volume-min-001,543,2006-01-03 10:31:00,3648.0,3648.0,3647.0,3648.0,56.0,0.0,Strategy,543,RelativeVolume,543,3.26785714286
544,2006-01-02-volume-min-001,544,2006-01-03 10:32:00,3647.0,3648.0,3647.0,3647.0,313.0,0.0,Strategy,544,RelativeVolume,544,0.0
545,2006-01-02-volume-min-001,545,2006-01-03 10:33:00,3647.0,3647.0,3647.0,3647.0,135.0,0.0,Strategy,545,RelativeVolume,545,0.0
546,2006-01-02-volume-min-001,546,2006-01-03 10:34:00,3648.0,3648.0,3647.0,3648.0,171.0,0.0,Strategy,546,RelativeVolume,546,4.91812865497
...

这是设置为0如预期both.

Conclusion

Thefilter在数据源机构打开的可能性完全操作数据流。与caution.

Script代码和Usage

Available使用如样品中的backtrader的来源:

usage: data-filler.py [-h] [--data DATA] [--filter] [--filler] [--fvol FVOL]
                      [--tstart TSTART] [--tend TEND] [--relvol]
                      [--fromdate FROMDATE] [--todate TODATE] [--writer]
                      [--wrcsv] [--plot] [--numfigs NUMFIGS]

DataFilter/DataFiller Sample

optional arguments:
  -h, --help            show this help message and exit
  --data DATA, -d DATA  data to add to the system
  --filter, -ft         Filter using session start/end times
  --filler, -fl         Fill missing bars inside start/end times
  --fvol FVOL           Use as fill volume for missing bar (def: 0.0)
  --tstart TSTART, -ts TSTART
                        Start time for the Session Filter (HH:MM)
  --tend TEND, -te TEND
                        End time for the Session Filter (HH:MM)
  --relvol, -rv         Add relative volume indicator
  --fromdate FROMDATE, -f FROMDATE
                        Starting date in YYYY-MM-DD format
  --todate TODATE, -t TODATE
                        Starting date in YYYY-MM-DD format
  --writer, -w          Add a writer to cerebro
  --wrcsv, -wc          Enable CSV Output in the writer
  --plot, -p            Plot the read data
  --numfigs NUMFIGS, -n NUMFIGS
                        Plot using numfigs figures

的代码:

from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import argparse
import datetime
import math

# The above could be sent to an independent module
import backtrader as bt
import backtrader.feeds as btfeeds
import backtrader.utils.flushfile
import backtrader.filters as btfilters

from relativevolume import RelativeVolume


def runstrategy():
    args = parse_args()

    # Create a cerebro
    cerebro = bt.Cerebro()

    # Get the dates from the args
    fromdate = datetime.datetime.strptime(args.fromdate, "%Y-%m-%d")
    todate = datetime.datetime.strptime(args.todate, "%Y-%m-%d")

    # Get the session times to pass them to the indicator
    # datetime.time has no strptime ...
    dtstart = datetime.datetime.strptime(args.tstart, "%H:%M")
    dtend = datetime.datetime.strptime(args.tend, "%H:%M")

    # Create the 1st data
    data = btfeeds.BacktraderCSVData(
        dataname=args.data,
        fromdate=fromdate,
        todate=todate,
        timeframe=bt.TimeFrame.Minutes,
        compression=1,
        sessionstart=dtstart,  # internally just the "time" part will be used
        sessionend=dtend,  # internally just the "time" part will be used
    )

    if args.filter:
        data.addfilter(btfilters.SessionFilter)

    if args.filler:
        data.addfilter(btfilters.SessionFiller, fill_vol=args.fvol)

    # Add the data to cerebro
    cerebro.adddata(data)

    if args.relvol:
        # Calculate backward period - tend tstart are in same day
        # + 1 to include last moment of the interval dstart <-> dtend
        td = ((dtend - dtstart).seconds // 60) + 1
        cerebro.addindicator(RelativeVolume,
                             period=td,
                             volisnan=math.isnan(args.fvol))

    # Add an empty strategy
    cerebro.addstrategy(bt.Strategy)

    # Add a writer with CSV
    if args.writer:
        cerebro.addwriter(bt.WriterFile, csv=args.wrcsv)

    # And run it - no trading - disable stdstats
    cerebro.run(stdstats=False)

    # Plot if requested
    if args.plot:
        cerebro.plot(numfigs=args.numfigs, volume=True)


def parse_args():
    parser = argparse.ArgumentParser(
        description="DataFilter/DataFiller Sample")

    parser.add_argument("--data", "-d",
                        default="../../datas/2006-01-02-volume-min-001.txt",
                        help="data to add to the system")

    parser.add_argument("--filter", "-ft", action="store_true",
                        help="Filter using session start/end times")

    parser.add_argument("--filler", "-fl", action="store_true",
                        help="Fill missing bars inside start/end times")

    parser.add_argument("--fvol", required=False, default=0.0,
                        type=float,
                        help="Use as fill volume for missing bar (def: 0.0)")

    parser.add_argument("--tstart", "-ts",
                        # default="09:14:59",
                        # help="Start time for the Session Filter (%H:%M:%S)")
                        default="09:15",
                        help="Start time for the Session Filter (HH:MM)")

    parser.add_argument("--tend", "-te",
                        # default="17:15:59",
                        # help="End time for the Session Filter (%H:%M:%S)")
                        default="17:15",
                        help="End time for the Session Filter (HH:MM)")

    parser.add_argument("--relvol", "-rv", action="store_true",
                        help="Add relative volume indicator")

    parser.add_argument("--fromdate", "-f",
                        default="2006-01-01",
                        help="Starting date in YYYY-MM-DD format")

    parser.add_argument("--todate", "-t",
                        default="2006-12-31",
                        help="Starting date in YYYY-MM-DD format")

    parser.add_argument("--writer", "-w", action="store_true",
                        help="Add a writer to cerebro")

    parser.add_argument("--wrcsv", "-wc", action="store_true",
                        help="Enable CSV Output in the writer")

    parser.add_argument("--plot", "-p", action="store_true",
                        help="Plot the read data")

    parser.add_argument("--numfigs", "-n", default=1,
                        help="Plot using numfigs figures")

    return parser.parse_args()


if __name__ == "__main__":
    runstrategy()

SessionFiller

backtrader来源:

class SessionFiller(with_metaclass(metabase.MetaParams, object)):
    """
    Bar Filler for a Data Source inside the declared session start/end times.

    The fill bars are constructed using the declared Data Source ``timeframe``
    and ``compression`` (used to calculate the intervening missing times)

    Params:

      - fill_price (def: None):

        If None is passed, the closing price of the previous bar will be
        used. To end up with a bar which for example takes time but it is not
        displayed in a plot ... use float("Nan")

      - fill_vol (def: float("NaN")):

        Value to use to fill the missing volume

      - fill_oi (def: float("NaN")):

        Value to use to fill the missing Open Interest

      - skip_first_fill (def: True):

        Upon seeing the 1st valid bar do not fill from the sessionstart up to
        that bar
    """
    params = (("fill_price", None),
              ("fill_vol", float("NaN")),
              ("fill_oi", float("NaN")),
              ("skip_first_fill", True))

    # Minimum delta unit in between bars
    _tdeltas = {
        TimeFrame.Minutes: datetime.timedelta(seconds=60),
        TimeFrame.Seconds: datetime.timedelta(seconds=1),
        TimeFrame.MicroSeconds: datetime.timedelta(microseconds=1),
    }

    def __init__(self, data):
        # Calculate and save timedelta for timeframe
        self._tdunit = self._tdeltas[data._timeframe] * data._compression

        self.seenbar = False  # control if at least one bar has been seen
        self.sessend = MAXDATE  # maxdate is the control for bar in session

    def __call__(self, data):
        """
        Params:
          - data: the data source to filter/process

        Returns:
          - False (always) because this filter does not remove bars from the
        stream

        The logic (starting with a session end control flag of MAXDATE)

          - If new bar is over session end (never true for 1st bar)

            Fill up to session end. Reset sessionend to MAXDATE & fall through

          - If session end is flagged as MAXDATE

            Recalculate session limits and check whether the bar is within them

            if so, fill up and record the last seen tim

          - Else ... the incoming bar is in the session, fill up to it
        """
        # Get time of current (from data source) bar
        dtime_cur = data.datetime.datetime()

        if dtime_cur > self.sessend:
            # bar over session end - fill up and invalidate
            self._fillbars(data, self.dtime_prev, self.sessend + self._tdunit)
            self.sessend = MAXDATE

        # Fall through from previous check ... the bar which is over the
        # session could already be in a new session and within the limits
        if self.sessend == MAXDATE:
            # No bar seen yet or one went over previous session limit
            sessstart = data.datetime.tm2datetime(data.sessionstart)
            self.sessend = sessend = data.datetime.tm2datetime(data.sessionend)

            if sessstart <= dtime_cur <= sessend:
                # 1st bar from session in the session - fill from session start
                if self.seenbar or not self.p.skip_first_fill:
                    self._fillbars(data, sessstart - self._tdunit, dtime_cur)

            self.seenbar = True
            self.dtime_prev = dtime_cur

        else:
            # Seen a previous bar and this is in the session - fill up to it
            self._fillbars(data, self.dtime_prev, dtime_cur)
            self.dtime_prev = dtime_cur

        return False

    def _fillbars(self, data, time_start, time_end, forcedirty=False):
        """
        Fills one by one bars as needed from time_start to time_end

        Invalidates the control dtime_prev if requested
        """
        # Control flag - bars added to the stack
        dirty = False

        time_start += self._tdunit
        while time_start < time_end:
            dirty = self._fillbar(data, time_start)
            time_start += self._tdunit

        if dirty or forcedirty:
            data._save2stack(erase=True)

    def _fillbar(self, data, dtime):
        # Prepare an array of the needed size
        bar = [float("Nan")] * data.size()

        # Fill datetime
        bar[data.DateTime] = date2num(dtime)

        # Fill the prices
        price = self.p.fill_price or data.close[-1]
        for pricetype in [data.Open, data.High, data.Low, data.Close]:
            bar[pricetype] = price

        # Fill volume and open interest
        bar[data.Volume] = self.p.fill_vol
        bar[data.OpenInterest] = self.p.fill_oi

        # Fill extra lines the data feed may have defined beyond DateTime
        for i in range(data.DateTime + 1, data.size()):
            bar[i] = data.lines[i][0]

        # Add tot he stack of bars to save
        data._add2stack(bar)

        return True

评论被关闭。