Sometimes we want to ignore certain assets when computing pipeline expresssions. There are two common cases where ignoring assets is useful:
Factor
computing the coefficients of a regression (RollingLinearRegressionOfReturns).Factor
method top
to compute the top 200 assets by earnings yield, ignoring assets that don't meet some liquidity constraint.To support these two use-cases, all Factors
and many Factor
methods can accept a mask argument, which must be a Filter
indicating which assets to consider when computing.
from zipline.pipeline import Pipeline, EquityPricing
from zipline.research import run_pipeline
from zipline.pipeline.factors import SimpleMovingAverage, AverageDollarVolume
Let's say we want our pipeline to output securities with a high or low percent difference but we also only want to consider securities with a dollar volume above \$10,000,000. To do this, let's rearrange our make_pipeline
function so that we first create the high_dollar_volume
filter. We can then use this filter as a mask
for moving average factors by passing high_dollar_volume
as the mask
argument to SimpleMovingAverage
.
# Dollar volume factor
dollar_volume = AverageDollarVolume(window_length=30)
# High dollar volume filter
high_dollar_volume = (dollar_volume > 10000000)
# Average close price factors
mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10, mask=high_dollar_volume)
mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30, mask=high_dollar_volume)
# Relative difference factor
percent_difference = (mean_close_10 - mean_close_30) / mean_close_30
Applying the mask to SimpleMovingAverage
restricts the average close price factors to a computation over the ~2000 securities passing the high_dollar_volume
filter, as opposed to ~8000 without a mask. When we combine mean_close_10
and mean_close_30
to form percent_difference
, the computation is performed on the same ~2000 securities.
Masks can be also be applied to methods that return filters like top
, bottom
, and percentile_between
.
Masks are most useful when we want to apply a filter in the earlier steps of a combined computation. For example, suppose we want to get the 50 securities with the highest open price that are also in the top 10% of dollar volume. Suppose that we then want the 90th-100th percentile of these securities by close price. We can do this with the following:
# Dollar volume factor
dollar_volume = AverageDollarVolume(window_length=30)
# High dollar volume filter
high_dollar_volume = dollar_volume.percentile_between(90,100)
# Top open price filter (high dollar volume securities)
top_open_price = EquityPricing.open.latest.top(50, mask=high_dollar_volume)
# Top percentile close price filter (high dollar volume, top 50 open price)
high_close_price = EquityPricing.close.latest.percentile_between(90, 100, mask=top_open_price)
Let's put this into make_pipeline
and output an empty pipeline screened with our high_close_price
filter.
def make_pipeline():
# Dollar volume factor
dollar_volume = AverageDollarVolume(window_length=30)
# High dollar volume filter
high_dollar_volume = dollar_volume.percentile_between(90,100)
# Top open securities filter (high dollar volume securities)
top_open_price = EquityPricing.open.latest.top(50, mask=high_dollar_volume)
# Top percentile close price filter (high dollar volume, top 50 open price)
high_close_price = EquityPricing.close.latest.percentile_between(90, 100, mask=top_open_price)
return Pipeline(
screen=high_close_price
)
Running this pipeline outputs 5 securities on Jan 5th, 2010.
result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')
print(f'Number of securities that passed the filter: {len(result)}')
result
Number of securities that passed the filter: 5
date | asset |
---|---|
2010-01-05 | Equity(FIBBG000QXWHD1 [BIDU]) |
Equity(FIBBG000DWCFL4 [BRK.A]) | |
Equity(FIBBG000DWG505 [BRK.B]) | |
Equity(FIBBG000BHLYP4 [CME]) | |
Equity(FIBBG009S39JX6 [GOOGL]) |
Note that applying masks in layers as we did above can be thought of as an "asset funnel".
Next Lesson: Classifiers