Impressions, Memoization, and Caching
An impression is when your feature is displayed to a user.
A typical web app can render the same feature many times. In React, this is particularly common, as the React component tree can be rendered numerous times for every state change. Each render should not be considered an impression and it is important to de-dupe them accordingly.
In Causal, each impression has an impression id associated with it. Causal considers requests for impressions that share the same id, feature, and arguments to be the same impression. You can pass in explicit impression ids to Causal in
useImpression. Doing so tells Causal what impressions should be considered distinct from each other.
There are several perfectly reasonable strategies to de-dupe renders into impressions:
- Session level. Treat all requests for a feature (with the same arguments) within a user's session as the same impression.
- Web request level. Treat all requests for a feature (with the same arguments) for a web request as the same impression.
- Component level. Treat all requests for feature (with the same arguments) during a components lifecycle (mount/unmount) as the same impression.
- De-dupe at a fixed interval. Consider all requests for a feature (with the same arguments) within a fixed timespan to be the same impression.
By default Causal TypeScript client uses the third strategy, it de-dupes at the component level. The other clients do not de-dupe
We recommend explicitly passing in your own impression ids. You know more about your application and impression lifecycle than Causal does. It also has some futher advantages detailed below.
Impression IDs and the data warehouse
In addition to controlling impression lifespans, impression ids pass through to the data warehouse. If you pass in explicit impression ids to the Causal APIs, then your ids will end up in the your data warehouse (as opposed to ids that Causal generated internally).
This has a couple of interesting benefits when you already have a notion of impression identifiers in your application.
First, you do not need to do any special mapping logic in order to tie a Causal impression table to other data in your data warehouse. You can simply join between the Causal impression table and other tables in your warehouse.
Second, it makes it simple to log feature data in distributed systems. Let's say for example, that you calculate the content in a microservice. That content goes through a hydration stage and is then sent out to your front end. You can call Causal inside your microservice in order to use feature outputs to generate your content.
However, events that happen on that feature occur in the browser. Using your own impression IDs, you can register the event in a totally different Causal API in order to tie the event to the impression. Because Causal event IDs are the same as yours, your front end will already have the correct ID in hand.
Causal uses memoization to de-duplicate feature responses. That is, impressions that have the same argument values are guaranteed to get the same output values from Causal for that user. That's why a Causal impression table has a vector of impression IDs instead of a single one. It may represent several different impressions of the feature. If you want to alter this behavior, we recommend adding an explicit argument to bust the cache.
This has 3 important benefits:
- It compresses the impression table so like impressions don't take up more space on disk.
- It makes client side caching very efficient, so many requests that would otherwise have to wait for a server round trip, return directly from local memory in the browser.
- It groups like impressions together which make putting together learn-to-rank training sets very easy.1
To enhance performance, Causal caches feature values within the web browser's local storage. If a request for a feature matches a feature in the cache, the cached value will be immediately used without the need for a network roundtrip. The impression server is very low latency, but no latency is even better.
Cache values can optionally be purged if features are updated in the web UI. This includes changing feature outputs, altering A/B tests, or updating feature flags.
- Learning to rank training requires a set of items that you've shown, and all the clicks on said set. Since the impressions that serve the same content are memoized and appear in the same data warehouse row, so do all the events. Data scientists can use a Causal impression table to make a learn-to-rank training directly from the table without any extra ETL.↩