Causal's Feature Data Model
Causal collects data about and optimizes your product's customer facing features. A feature is anything on your front end that you'd like to improve using data driven techniques. For example:
- A call to action that you'd like to improve through an A/B test
- A set of personalized crosssells that you'd like to improve with a new recommender algorithm
- A signup funnel that you'd like to improve conversion on by using advanced analytics
- A search result page that you'd like to make more relevant using a learn-to-rank algorithm
Developers define features in our Feature Development Language (FDL), a powerful modelling language that is based on GraphQL. FDL defines the contract between the presentation layer and data layer. For each feature, developers use the FDL to define:
- Context: which data to collect for each impression of the feature from the front-end (e.g. user ID, IP address, time of day, user inputs, etc.) and back-end (e.g. browsing history, search history, etc.)
- Attributes: the names and types of data the front-end expects to receive from Causal (e.g. a string, an integer, an enumeration, a particular machine learning model, etc.)
- Events: which user actions to record that occur after the feature is shown (e.g. button clicks, adding products to cart, saving items, etc.)
FDL's data model is very different from other data logging systems. After you write your FDL file, Causal gives you many benefits right out of the box.
Efficient, Accessible Data
FDL's data model is much richer than a simple event store. Events in other systems are simple chunks of data that happen at a certain instant in time. Causal's features, on the other hand, collect all the information needed to reason about a particular feature on your website, whenever they occur in time.
Typically, you'd have to use data engineers in order to write an ETL that would put your data warehouse into this format before data scientists and analysts could use your event data. Since FDL lets you describe how you want the data to look up front, we sidestep that whole process. You get ready to use tables without any extra work on your part.
This data is stored in ORC format, and directly accessible from most data warehouses. Causal generates the DDL to create these tables in your warehouse, so they are accessible just like native tables.
Clean, Accurate Data
Causal prevents missing data, misspelled fields, and incorrect values—and, of course, the need to write ugly ETls to work around these mistakes later. Since Causal generates strongly typed APIs from FDL, these errors are caught at compile time, before a front end engineer runs their code the first time.
Causal guarantees that there will never be mismatches between the front-end data collection and the data warehouse because the front-end API and back-end data warehouse are automatically generated from the same FDL definitions.
As features are added, deprecated, and removed, Causal automatically updates the data warehouse tables based on the new front end tracking. When you add tracking data, it automatically shows up in your warehouse table without any extra data engineering effort. When tracking data is deprecated and removed, views into that data are automatically updated to hide the obsolete columns.1 When you try to make a breaking change to your data, the FDL compiler will warn you and prevent you from making serious mistakes.
Deeply Configurable Features
Every feature attribute can be configured without writing or deploying new code, meaning that product managers and data scientists can tweak, test, and iterate on every Causal feature quickly and independently.
- Once an attribute has been defined and integrated into the product, any member of the team can update it. Causal's data model is flexible enough to cover use cases as simple as copy and as complex as search algorithms. For example, you could enable your team to roll out new algorithms, switch the order of onboarding flows, update copy and imagery, and test new layouts.
- You also have the option of running A/B tests by splitting traffic across multiple variants of your feature attributes. Causal handles the traffic assignment, metric collection, and stats calculation so you know which variant performs best. Once you find a winner, rolling out it out takes a single click.
- Data scientists can use Causal as a machine learning feature store. The context becomes the independent variables, the attributes become the dependent variable, and the events are the payoff. Causal gives your data scientists the same time shifting guarantees as other more cumbersome feature stores. Once these models are trained, a data scientist can deploy the model to the front end without any extra work from front end engineers.
Support for the Whole Lifecycle
Your data is going to change, and FDL can help you manage that change. FDL has built in support for helping you manage your data warehouse over time. You can read about it in Feature Lifecycle.
- Previous views are still available if needed.↩