Inputs
Each time a feature is shown to a user, Causal records whatever inputs you've specified as necessary in that feature's definition. This is data that is available before a feature is rendered, like the user's IP address or browsing history.
This context allows you to serve up different decisions in different situations.1 The contextual data is all available before a model is run and is fed to your model during training.
For example, if you are building a cross sell feature, you can record the current product page the user is looking at, the last few items a user has looked at, and the user's demographics. If you are building a search page algorithm, you can record the query string, relevance scores, and the user's past purchases.
This is typically a lot more data than you'd record with other systems, but that is okay. Causal was designed to scale up to collect huge amounts of data efficiently. In Causal, it is possible to have features with all the contextual information a data scientist needs to train a model.
Every time a feature is displayed, its inputs are recorded in the impression table and sent to the your model at runtime. Since the training data and the runtime values are created using the same process, you can always be sure that your training and runtime data will match.
Contextual Feature Store
Causal is a contextual feature store. We guarantee that training and runtime data line up and give you the same time shifting guarantees as more heavy weight feature stores. But, we do it by recording the data rather than by introducing another data handling system into your environment.
For example, say you have a Redis database that you are using to store data about your customers. In order to use data in a new model, you'd simply use a plugin to call Redis and get the data. The data still lives inside Redis and there is no disruption to your current application. Causal just records the retrieved data for training and serves it up at runtime.
Other feature stores require you to move the data into their store, which requires you to either maintain two systems or move your current applications into their store. This is exceptionally problematic when you can't move some of your feature calculations, as with the Elastic Learn to Rank plugin.
- For machine learning and other reasons, like selecting which audience to display a feature to.↩