Many trading algorithms have the following structure:
There are several technical challenges with doing this robustly. These include:
Pipeline exists to solve these challenges by providing a uniform API for expressing computations on a diverse collection of datasets.
An ideal algorithm design workflow involves a research phase and an implementation phase. In the research phase, we can interact with data or quickly iterate on different ideas in a notebook. Algorithms are then implemented in Zipline where they can be backtested.
One feature of the Pipeline API is that constructing a pipeline is identical in a research notebook and in a Zipline algorithm. The only difference between using pipeline in the two environments is how it is run. This makes it easy to iterate on a pipeline design in research and then move it with a simple copy paste to a Zipline algorithm.
There are three types of computations that can be expressed in a pipeline: factors, filters, and classifiers. Abstractly, factors, filters, and classifiers all represent functions that produce a value from an asset and a moment in time. Factors, filters, and classifiers are distinguished by the types of values they produce.
A factor is a function from an asset and a moment in time to a numerical value. A simple example of a factor is the most recent price of a security. Given a security and a specific point in time, the most recent price is a number. Another example is the 10-day average trading volume of a security. Factors are most commonly used to assign values to securities which can then be used in a number of ways. A factor can be used in each of the following procedures:
A filter is a function from an asset and a moment in time to a boolean. An example of a filter is a function indicating whether a security's price is below \$10. Given a security and a point in time, this evaluates to either True or False. Filters are most commonly used for describing sets of assets to include or exclude for some particular purpose.
A classifier is a function from an asset and a moment in time to a categorical output. More specifically, a classifier produces a string or an int that doesn't represent a numerical value (e.g. an integer label such as a sector code). Classifiers are most commonly used for grouping assets for complex transformations on Factor outputs. An example of a classifier is the exchange on which an asset is currently being traded.
Pipeline computations can be performed using a variety of data such as pricing (OHLC) and volume data, fundamental data, and securities master data. We will explore these datasets in later lessons.
A typical pipeline usually involves multiple computations and datasets. In this tutorial, we will build up to a pipeline that selects liquid securities with large changes between their 10-day and 30-day average prices.
Next Lesson: Creating a Pipeline