Skip to main content

Elastic Search Integration

Causal supports Elastic Search out of the box as a way to fill in output values.

There are a couple of advantages to using the plugin to access your Elastic Search cluster:

  1. You can change search queries using the tools API, so you do not need to make any code changes to run an experiment or roll out a new algorithm.
  2. You get a type safe API customized to the documents you store in your cluster, so programming errors are reduced.
  3. You get automatic logging (including learning to rank data) to make it easier for data scientists to improve your search.
  4. Causal's compiler makes sure that changes to your elastic queries will still be compatible with your data warehouse, preventing breaking changes to your ETLs.

However, the plugin does not support all Elastic functionality. We concentrate on the document retrival and relevance use case, and focus less on analytics use cases. If the elastic plugin does not support your use case, you can always use external outputs or a plugin to support Elastic's full functionality along with Causal's data logging.

If you find the plugin cannot support your use case, please email support@causallabs.io to tell us what we are missing.

Elastic Search Templates

We leverage Elastic Search's search templates in order to allow you to run arbitrary elastic queries inside your features.

First, define a template for all the queries you'd like to use inside your features. We will use the following template from the elastic documentation in this example:

PUT _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": {
"query": {
"match": {
"message": "{{query_string}}"
}
},
"from": "{{from}}",
"size": "{{size}}"
},
"params": {
"query_string": "My query string"
}
}
}

Here is a typical response from that same example:

{
"took": 36,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.5753642,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"message": "hello world"
}
}
]
}
}

Using the Plugin

The following example shows how to connect that template to a feature:

feature search_page {
args {
query_string : String! @test("test query")
from: Int! @test(0)
size: Int! @test(10)
}
outputs {
documents : [{
_id : String!
message : String! @no_log
_score : Float! @hidden
}]! @elastic(cluster: "$ELASTIC_ENDPOINT",
templates: ["my-search-template"])
}
}

The @elastic directive above is optional. When present, Causal's compiler will grab all the named templates from the given cluster when compiling the FDL file and insure that the results are compatible with the output value type. In this case, the "my-search-template" template must have "message" field in the returned documents. If not, you'll get a compile time error and know that you must change your feature definition or ES query.

Whether or not you use the @elastic directive, you can hook up an elastic search query to the front end using the external link editor. You can use the editor to feature flag in a new ES query, or run experiments between two different queries.

Causal will copy document field names from the elastic result (first using "fields", then "_source") into the list of documents specified in your feature. If, for some reason, the query gives an error, the client will get default values.

There may be some data that you do not want to store in your feature impression tables. For example, the message field may contain a lot of data that is needed by the front end, but not needed during analytics. Use the @no_log directive. The data will be sent to the front end, but dropped from the data warehouse.

There may be some sensitive data that you may need for training or your internal systems, but do not want exposed to the front end. For example, score may be important for analytics, and therefore you'd like it in your data warehouse. However, you don't really want that showing up on your front end. That may be potentially sensitive information that you do not want your competitors to have. In these cases use the @hidden directive.

Integration With Learning To Rank

TBD