Overview
Causal's architecture makes in uniquely useful for building real time machine learning models. Using Causal:
- Removes a great deal of the schlep and tedium involved in gathering training data from your product
- Keeps that training data up to date as your product changes
- Enables you to deploy and run experiments on machine learning models without deploying new code
Data collection and maintenance
You begin by defining the shape of the data you need for a given product feature. Using Causal's feature definition, you specify each product feature's:
- Inputs: the data that will be fed to the model
- Outputs: the predictions that a model has made, and
- Events: the downstream payoff as a result of that model.
This data is recorded and assembled into training data. One row for every impression of your product feature, with everything you need to train a model. Inputs become the independent variables of your model, and outputs and events are used to create the dependent variables and (optionally) weights.
This data is stored in your feature tables in your own data warehouse. As you change the data model for your product feature, Causal updates your data pipelines and schemas automatically— no need to write or maintain complicated, fragile ETLs.
Model deployment and experimentation
Causal dramatically simplifies the process of deploying models. The feature definitions above also serve as the contract between the presentation and decision-making layers of your product. In addition to recording your inputs, Causal makes them all available to your models at runtime. That data can be fed to a model hosting platform or REST API. Causal then passes back the model results as the outputs the product feature should use. New models can be rolled out and A/B tested through the Causal tools UI, allowing data science teams to iterate quickly without needing additional work from engineering or devops.
The next sections will discuss some design decisions we made to support machine learning in Causal, and how you can use Causal to deploy models inside your features.
Throughout this section, when we use the word "feature" we mean it as "a user-facing piece of your product or application" not as "a feature of a machine learning model."