Classifiers¶

A classifier is a function from an asset and a moment in time to a categorical output such as a string or integer label:

F(asset, timestamp) -> category

An example of a classifier producing a string output is the exchange of a security. To create this classifier, we'll have to import master.SecuritiesMaster.Exchange and use the latest attribute to instantiate our classifier:

In [1]:

from zipline.pipeline import Pipeline, master
from zipline.research import run_pipeline
from zipline.pipeline.factors import AverageDollarVolume

# Since the underlying data of master.SecuritiesMaster.Exchange
# is of type string, .latest returns a Classifier
exchange = master.SecuritiesMaster.Exchange.latest

Previously, we saw that the latest attribute produced an instance of a Factor. In this case, since the underlying data is of type string, latest produces a Classifier.

Similarly, a computation producing the sector of a security is a Classifier. To get the sector, we can again use the SecuritiesMaster dataset.

In [2]:

sector = master.SecuritiesMaster.usstock_Sector.latest

Building Filters from Classifiers¶

Classifiers can also be used to produce filters with methods like isnull, eq, and startswith. The full list of Classifier methods producing Filters can be found in the API Reference.

As an example, if we wanted a filter to select for securities trading on the New York Stock Exchange, we can use the eq method of our exchange classifier.

In [3]:

nyse_filter = exchange.eq('XNYS')

This filter will return True for securities having 'XNYS' as their Exchange.

Quantiles¶

Classifiers can also be produced from various Factor methods. The most general of these is the quantiles method which accepts a bin count as an argument. The quantiles method assigns a label from 0 to (bins - 1) to every non-NaN data point in the factor output and returns a Classifier with these labels. NaNs are labeled with -1. Aliases are available for quartiles (quantiles(4)), quintiles (quantiles(5)), and deciles (quantiles(10)). As an example, this is what a filter for the top decile of a factor might look like:

In [4]:

dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
top_decile = (dollar_volume_decile.eq(9))

Let's put each of our classifiers into a pipeline and run it to see what they look like.

In [5]:

def make_pipeline():
    exchange = master.SecuritiesMaster.Exchange.latest
    nyse_filter = exchange.eq('XNYS')

    sector = master.SecuritiesMaster.usstock_Sector.latest

    dollar_volume_decile = AverageDollarVolume(window_length=10).deciles()
    top_decile = (dollar_volume_decile.eq(9))

    return Pipeline(
        columns={
            'exchange': exchange,
            'sector': sector,
            'dollar_volume_decile': dollar_volume_decile
        },
        screen=(nyse_filter & top_decile)
    )

In [6]:

result = run_pipeline(make_pipeline(), start_date='2010-01-05', end_date='2010-01-05')
print(f'Number of securities that passed the filter: {len(result)}')
result.head(5)

Number of securities that passed the filter: 471

Out[6]:

		exchange	sector	dollar_volume_decile
date	asset
2010-01-05	Equity(FIBBG000C2V3D6 [A])	XNYS	Technology	9
	Equity(FIBBG000F7RCJ1 [AAP])	XNYS	Consumer Discretionary	9
	Equity(FIBBG000MDCQC2 [COR])	XNYS	Health Care	9
	Equity(FIBBG000B9ZXB4 [ABT])	XNYS	Health Care	9
	Equity(QI000000052857 [ABV])	XNYS	Consumer Staples	9