A Filter is a function from an asset and a moment in time to a boolean:
F(asset, timestamp) -> boolean
In Pipeline, Filters are used for narrowing down the set of securities included in a computation or in the final output of a pipeline. There are two common ways to create a Filter
: comparison operators and Factor
/Classifier
methods.
from zipline.pipeline import Pipeline, EquityPricing
from zipline.research import run_pipeline
from zipline.pipeline.factors import SimpleMovingAverage
Comparison operators on Factors
and Classifiers
produce Filters. Since we haven't looked at Classifiers
yet, let's stick to examples using Factors
. The following example produces a filter that returns True
whenever the latest close price is above $20.
last_close_price = EquityPricing.close.latest
close_price_filter = last_close_price > 20
And this example produces a filter that returns True whenever the 10-day mean is below the 30-day mean.
mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)
mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)
mean_crossover_filter = mean_close_10 < mean_close_30
Remember, each security will get its own True
or False
value each day.
Various methods of the Factor
and Classifier
classes return Filters
. Again, since we haven't yet looked at Classifiers
, let's stick to Factor
methods for now (we'll look at Classifier
methods later). The Factor.top(n)
method produces a Filter
that returns True
for the top n
securities of a given Factor
. The following example produces a filter that returns True
for exactly 200 securities every day, indicating that those securities were in the top 200 by last close price across all known securities.
last_close_price = EquityPricing.close.latest
top_close_price_filter = last_close_price.top(200)
For a full list of Factor
methods that return Filters
, see the Factor API Reference.
For a full list of Classifier
methods that return Filters
, see the Classifier API Reference.
As a starting example, let's create a filter that returns True
if a security's 30-day average dollar volume is above $10,000,000. To do this, we'll first need to create an AverageDollarVolume
factor to compute the 30-day average dollar volume. Let's include the built-in AverageDollarVolume
factor in our imports:
from zipline.pipeline.factors import AverageDollarVolume
And then, let's instantiate our average dollar volume factor.
dollar_volume = AverageDollarVolume(window_length=30)
By default, AverageDollarVolume
uses EquityPricing.close
and EquityPricing.volume
as its inputs
, so we don't specify them.
Now that we have a dollar volume factor, we can create a filter with a boolean expression. The following line creates a filter returning True
for securities with a dollar_volume
greater than 10,000,000:
high_dollar_volume = (dollar_volume > 10000000)
To see what this filter looks like, let's can add it as a column to the pipeline we defined in the previous lesson.
def make_pipeline():
mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)
mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)
percent_difference = (mean_close_10 - mean_close_30) / mean_close_30
dollar_volume = AverageDollarVolume(window_length=30)
high_dollar_volume = (dollar_volume > 10000000)
return Pipeline(
columns={
'percent_difference': percent_difference,
'high_dollar_volume': high_dollar_volume
}
)
If we make and run our pipeline, we now have a column high_dollar_volume
with a boolean value corresponding to the result of the expression for each security.
result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')
result
percent_difference | high_dollar_volume | ||
---|---|---|---|
date | asset | ||
2010-01-05 | Equity(FIBBG000C2V3D6 [A]) | 0.021425 | True |
Equity(QI000000004076 [AABA]) | 0.050484 | True | |
Equity(FIBBG000BZWHH8 [AACC]) | 0.059385 | False | |
Equity(FIBBG000V2S3P6 [AACG]) | -0.079614 | False | |
Equity(FIBBG000M7KQ09 [AAI]) | 0.068811 | True | |
... | ... | ... | |
Equity(FIBBG011MC2100 [AATC]) | -0.047524 | False | |
Equity(FIBBG000GDBDH4 [BDG]) | NaN | False | |
Equity(FIBBG000008NR0 [ISM]) | NaN | False | |
Equity(FIBBG000GZ24W8 [PEM]) | NaN | False | |
Equity(FIBBG000BB5S87 [HCH]) | 0.045581 | False |
7841 rows × 2 columns
By default, a pipeline produces computed values each day for every asset in the data bundle. Very often however, we only care about a subset of securities that meet specific criteria (for example, we might only care about securities that have enough daily trading volume to fill our orders quickly). We can tell our Pipeline to ignore securities for which a filter produces False
by passing that filter to our Pipeline via the screen
keyword.
To screen our pipeline output for securities with a 30-day average dollar volume greater than $10,000,000, we can simply pass our high_dollar_volume
filter as the screen
argument. This is what our make_pipeline
function now looks like:
def make_pipeline():
mean_close_10 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=10)
mean_close_30 = SimpleMovingAverage(inputs=EquityPricing.close, window_length=30)
percent_difference = (mean_close_10 - mean_close_30) / mean_close_30
dollar_volume = AverageDollarVolume(window_length=30)
high_dollar_volume = dollar_volume > 10000000
return Pipeline(
columns={
'percent_difference': percent_difference
},
screen=high_dollar_volume
)
When we run this, the pipeline output only includes securities that pass the high_dollar_volume
filter on a given day. For example, running this pipeline on Jan 5th, 2010 results in an output for ~1,600 securities
result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')
print(f'Number of securities that passed the filter: {len(result)}')
result
Number of securities that passed the filter: 1619
percent_difference | ||
---|---|---|
date | asset | |
2010-01-05 | Equity(FIBBG000C2V3D6 [A]) | 0.021425 |
Equity(QI000000004076 [AABA]) | 0.050484 | |
Equity(FIBBG000M7KQ09 [AAI]) | 0.068811 | |
Equity(QI000000053169 [AAN]) | 0.045988 | |
Equity(FIBBG000F7RCJ1 [AAP]) | 0.015388 | |
... | ... | |
Equity(FIBBG000BX6PW7 [YELL]) | -0.094294 | |
Equity(FIBBG000RF0Z26 [YGE]) | 0.056671 | |
Equity(FIBBG000BH3GZ2 [YUM]) | 0.003000 | |
Equity(FIBBG000BKPL53 [ZBH]) | 0.010965 | |
Equity(FIBBG000BX9WL1 [ZION]) | -0.011646 |
1619 rows × 1 columns
The ~
operator is used to invert a filter, swapping all True
values with Falses
and vice-versa. For example, we can write the following to filter for low dollar volume securities:
low_dollar_volume = ~high_dollar_volume
This will return True
for all securities with an average dollar volume below or equal to $10,000,000 over the last 30 days.
Next Lesson: Combining Filters