Skip to main content

Elastic Search Integration

Causal can support Elastic Search out of the box as a way to fill in output values.

info

This feature is in beta. Please email support@causallabs.io if you'd like to use it.

Advantages to using the Causal plugin to access your Elastic Search cluster:

  1. You can change search queries using the tools API, so you do not need to make any code changes to run an experiment or roll out a new algorithm.
  2. You get a type safe API customized to the documents you store in your cluster, so programming errors are reduced.
  3. You get automatic logging (including learning to rank data) to make it easier for data scientists to improve your search.
  4. Causal's compiler makes sure that changes to your elastic queries will still be compatible with your data warehouse, preventing breaking changes to your ETLs.

However, the plugin does not support all Elastic functionality. We concentrate on the document retrival and relevance use case, and focus less on analytics use cases. If the elastic plugin does not support your use case, you can always use external outputs or a plugin to support Elastic's full functionality along with Causal's data logging.

If you find the plugin cannot support your use case, please email support@causallabs.io to tell us what we are missing.

Elastic Search Templates

We leverage Elastic Search's search templates in order to allow you to run arbitrary elastic queries inside your features.

First, define a template for all the queries you'd like to use inside your features. We will use the following template from the elastic documentation in this example:

PUT _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": {
"query": {
"match": {
"message": "{{query_string}}"
}
},
"from": "{{from}}",
"size": "{{size}}"
},
"params": {
"query_string": "My query string"
}
}
}

Here is a typical response from that same example:

{
"took": 36,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5753642,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"message": "hello world"
}
}
]
}
}

Using the Plugin

The following example shows how to connect that template to a feature:

type Document {
_id : String!
message : String! @no_log
_score : Float! @hidden
_ltrlog : Json @hidden
}

feature SearchPage {
args {
queryString : String! @test("test query")
from : Int! @test(0)
size : Int! @test(10)
}
outputs {
documents : [Document!]!
}
event Click {
_id : ID!
bounceTime : Int @elapsed( path: ["SearchPage", "SearchPage.Click"] )
}
}

You can hook up an elastic search query to the front end using the external link editor. Use the editor to feature flag in a new ES query, or run experiments between two different queries. You can specify the endpoint for your elastic search cluster and the search template you'd like to use. The external link dialog will also let you test your feature to make sure it is compatible with the stored query template.

Causal will copy document field names from the elastic result (first using "fields", then "_source") into the list of documents specified in your feature. If, for some reason, the query gives an error, the client will get default values. This can also include feature values for your machine learning model if you use the learning-to-rank logging extension.

There may be some data that you do not want to store in your feature impression tables. For example, the message field may contain a lot of data that is needed by the front end, but not needed during analytics. Use the @no_log directive to filter these out. The data will be sent to the front end, but dropped from the data warehouse.

There may be some sensitive data that you may need for training or your internal systems, but do not want exposed to the front end. For example, score may be important for analytics. The LTR feature values will be invaluable to data scientists training new learning to rank models. However, you don't really want that showing up on your front end. That may be potentially sensitive information that you do not want your competitors to have. In these cases use the @hidden directive. They will be logged to the data warehouse for analysis, but never leave your data center to the front end application.

Recording Results

Implicit feedback, in the form of use clicks on your search page, can be used to compare different search configurations through experimentation. It is also one of the most valuable forms of feedback that data scientists use to train new models to improve your search.

In addition to click rate and other simple search metrics, Causal supports order based statistics like NDCG. You can use this to compare experiment variants based on where the clicks appear in the list (as opposed to just if), and verify that your ranking models are performing as expected when run in the real world. In order to do so, Causal needs to know which document was clicked on. So when you specify your click event, make sure to have a reference back to the document, like _id above.

As shown in this article from Microsoft Research, it's often useful to filter out bounces from your search metric when calculating your statistics. For example, if your user clicks on a document, glances at it, and then winds up doing another search withing a few seconds, that is probably not a successful outcome. The @elapsed directive is ideal for figuring this out. If you specify such an elapsed time in your click event, you can use it in your search metrics and machine learning training without any extra development.