A classifier is a function from an asset and a moment in time to a categorical output such as a string
or integer
label:
F(asset, timestamp) -> category
An example of a classifier producing a string output is the exchange of a security. To create this classifier, we'll have to import master.SecuritiesMaster.Exchange
and use the latest
attribute to instantiate our classifier:
from zipline.pipeline import Pipeline, master
from zipline.research import run_pipeline
from zipline.pipeline.factors import AverageDollarVolume
# Since the underlying data of master.SecuritiesMaster.Exchange
# is of type string, .latest returns a Classifier
exchange = master.SecuritiesMaster.Exchange.latest
Previously, we saw that the latest
attribute produced an instance of a Factor
. In this case, since the underlying data is of type string
, latest
produces a Classifier
.
Similarly, a computation producing the sector of a security is a Classifier
. To get the sector, we can again use the SecuritiesMaster
dataset.
sector = master.SecuritiesMaster.usstock_Sector.latest
Classifiers can also be used to produce filters with methods like isnull
, eq
, and startswith
. The full list of Classifier
methods producing Filters
can be found in the API Reference.
As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the eq
method of our exchange
classifier.
nyse_filter = exchange.eq('XNYS')
This filter will return True
for securities having 'XNYS'
as their Exchange
.
Classifiers can also be produced from various Factor
methods. The most general of these is the quantiles
method which accepts a bin count as an argument. The quantiles
method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a Classifier
with these labels. NaN
s are labeled with -1. Aliases are available for quartiles (quantiles(4)
), quintiles (quantiles(5)
), and deciles (quantiles(10)
). As an example, this is what a filter for the top decile of a factor might look like:
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))
Let's put each of our classifiers into a pipeline and run it to see what they look like.
def make_pipeline():
exchange = master.SecuritiesMaster.Exchange.latest
nyse_filter = exchange.eq('XNYS')
sector = master.SecuritiesMaster.usstock_Sector.latest
dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))
return Pipeline(
columns={
'exchange': exchange,
'sector': sector,
'dollar_volume_decile': dollar_volume_decile
},
screen=(nyse_filter & top_decile)
)
result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')
print(f'Number of securities that passed the filter: {len(result)}')
result.head(5)
Number of securities that passed the filter: 471
exchange | sector | dollar_volume_decile | ||
---|---|---|---|---|
date | asset | |||
2010-01-05 | Equity(FIBBG000C2V3D6 [A]) | XNYS | Technology | 9 |
Equity(FIBBG000F7RCJ1 [AAP]) | XNYS | Consumer Discretionary | 9 | |
Equity(FIBBG000MDCQC2 [COR]) | XNYS | Health Care | 9 | |
Equity(FIBBG000B9ZXB4 [ABT]) | XNYS | Health Care | 9 | |
Equity(QI000000052857 [ABV]) | XNYS | Consumer Staples | 9 |
Classifiers are also useful for describing grouping keys for complex transformations on Factor outputs. Grouping operations such as demean are outside the scope of this tutorial.
Next Lesson: Datasets