Skip to main content

Sessions

Causal groups visitor activity into sessions. A session begins when a user first arrives to your application and ends after some period of inactivity.1

You can add custom data to your application's sessions using the same syntax as adding a new feature. For example:

session {
args {
userId: String
}
plugin java "ComputeCustomerData" {
homeZip: String
membershipLevel: String
}
event Add_to_cart {
item: ID!
}
}

This FDL says that you can pass in a userId if the user is logged in (it is nullable). It will also reach out to the customer data platform and retrieve the home zip code and the current membership level.

Session arguments and outputs are calculated when the visitor first arrives on your site, and cached for the entirety of the session. All these values are made available through the generated APIs and in the data warehouse.

Session events can be signalled at any time, and are not tied to any particular impression, just the session as a whole.

Session Data

Causal automatically creates a session table to store your session data. Here is the table generated by the above FDL:

create table session(
device_id string,
session_id string,
start_time timestamp,
last_modified_time timestamp,
ip_address string,
user_agent string,
client_type string,
entry_url string,
variants array < string >,
user_id string,
home_zip string,
membership_level string,
add_to_cart array<struct<event_time:timestamp,impression_id:string,item:string>>
) comment 'table generated to represent Causal session'
PARTITIONED BY (ds string, hh string)
LOCATION 's3://xxx/tables/session/';

There are some system provided values, followed by the session arguments, outputs, and events.

All data for a visitor session is guaranteed to be written in the same partition in your data warehouse. The partition corresponds to the session end time. If you wanted to join between the session row and a feature table row, you'd be able to add the partition values to the join key. This'll drastically improve query performance, and makes sure that you don't lose data due to a user session spanning partition boundaries. Other append only data stores require you include multiple partitions in your session queries. Even then, you may lose data over time boundaries.2

Mutable Data

Feature impressions and events occur at a specific point in time. Sessions however, occur over a time interval, and the session data may change over that interval.

In order to model such changes, Causal provides a @mutable directive. You use this directive to mark session data that may change. For example, what do you do when a user logs in to your application during a session? You'd add a @mutable directive:

session {
args {
userId: String @mutable
}
plugin java "ComputeCustomerData" {
homeZip: String
membershipLevel: String
}
event Add_to_cart {
item: ID!
}
}

Causal will add a mutator method to the generated API so that clients can register a change. The java API, for example, will add a SessionRequest.setUserId( String x ) to update the value.

Mutation in the Warehouse

In order to represent the mutations over time, the type of the user_id column in the data warehouse must change from string to array<struct<start_time:timestamp,end_time:timestamp,value:string>>. There is guaranteed to be no clock skew between these timestamps and any other timestamp in the session.

Adding mutability is a breaking change

If you add a @mutable directive to a field that already has partition data with the non-mutable values, that will be flagged by the compiler as a breaking change. If you are making a value mutable and already have immutable data in your warehouse that you'd like to save, you should rename the field.

Persistent Keys

Causal identifies a user of your application using a persistent key. We use this value to:

  • Make sure a visitor is selected into the same experiment variants on repeat visits
  • Make sure that we disrupt as few visitors as possible as we are rolling out a feature change
  • Identify your development organization's machines for the QA and debugging tools

If you already have way to identify a site visitor, you may declare it in your session definition. Let's say that our application has a persistent first party cookie that you use to identify the user called vistorId and you'd like to use that as your persistent key. Add the following to the session definition:

session {
args {
visitorId: String! @persistent_key
userId: String @mutable
}
...
}

Causal will now use the visitorId argument to look up sessions on the server and store the value into the data warehouse in the vistor_id column.

If you don't declare a session argument to be a persistent key, we will generate one for you automatically and call it deviceId. If you already have a session argument called deviceId that is not marked as persistent_key, the compiler will flag an error and ask you to mark one of your session fields as the persistent key.

Session Keys

Causal has it's own notion of a session in order to group and coordinate data on disk. However, you may have your own internal session that you'd like represent in Causal's tables. You can do that by adding your session identifier into the session arguments, and marking it with the @session_key directive.

For example, let's say your internal systems use a value called arrivalId in order to represent a visitors extended interactions with your website. We can add it to Causal as follows:

session {
args {
visitorId: String! @persistent_key
arrivalId: String! @session_key
userId: String @mutable
}
...
}

Causal will line up it's internal sessions with your organization's concept of an arrival:

  • If you call the Causal API with a new arrivalId for the given visitorId, Causal will create a new session
  • Causal will write the arrivalId value to the session table so that you can use aggregation queries over all Causal sessions that have the same arrivalId.

Once your session key is associated with the persistent id, you may interact with the Causal API using either identifier. So if you are already using your session key to coordinate activity between a set of microservices, there's no need to pipe the persistent key through in order to interact with Causal's API. In the Java API, we can always get a handle onto the visitor's session using the arrivalId using the SessionRequest.fromArrivalId( String x ) method.

Per Metrics

Causal's metric system is used to calculate the performance of features, ML models, and experiments over time. You can use the Tools UI to define metrics on the impression level (ie clicks per impression) and the session level (add to carts per session).

Causal also allows you to define metrics over any grouping you'd like, but you must declare the session fields that you'd like the metric system to group by.

You can do this using the @per directive:

session {
args {
visitorId: String! @persistent_key @per
arrivalId: String! @session_key @per
userId: String @mutable
}
...
}

The above code says that you'd like to be able to define a metric on a per visitor or on a per arrival basis. For example, "visits per visitor" or "pageviews per arrival."


  1. The default is 30 minutes.
  2. The dreaded "midnight" problem. You break up your data into large chunks (typically daily) and group by session to calculate session statistics. However, because you are aggregating over a day, sessions spanning midnight are incomplete.