Executive Summary

TerraAlto setup automated AWS ElasticSearch cluster index maintenance using AWS Lambda and Python. Royal FrieslandCampina (RFC) is one of the world’s largest dairy companies.

About the Customer

Royal FrieslandCampina (RFC) is one of the world’s largest dairy companies with approximately $12.8 billion in annual revenue. The Dutch company produces and sells dairy-based beverages, cheeses, desserts, and infant and sports nutrition products across Europe, Asia, Africa and the Americas.

Customer Challenge

While upgrading an AWS Elasticsearch cluster to version 7 there was an issue with the shard number limit having been reached and thus it was no longer possible to create indexes. As temporary solution the number of shards was increased and a manual cleanup was performed. RFC wished to implement automated maintenance of indexes and regular deletion of indices, so there would be future issue with shard limits.

Why AWS

RFC as a company began their cloud journey on AWS in 2015 and it is has become their primary platform.

Why RFC Chose TerraAlto

RFC have been engaged with TerraAlto as their primary AWS partner since 2015 and work closely with on many AWS related projects included data platform initiatives.

Partner (TerraAlto) Solution

Curator is a solution provided by Elastic (the company behind Elasticsearch) that can be used for Elasticsearch index maintenance. It’s purpose is to help you curate or manage indices. They provide a python API for this. https://curator.readthedocs.io/en/latest/ . We needed to filter indexes based on an index pattern and apply a retention period in days, deleting indices older than that number of days.

To do this in python we needed to import the ‘elasticsearch’ and ‘curator’ modules. To work with Curator we then create an ElasticSearch client using a signed request. We then pass this client to Curator to get a list of indices [ilo = curator.IndexList(es)]. This will return an curator.indexlist.IndexList class and within this is contained the list of indices. This returns all indices by default but there are a number of built in filter methods in the class that can be used. https://curator.readthedocs.io/en/latest/objectclasses.html#curator.indexlist.IndexList

We needed to apply to different filters based on different retention periods for different indices. So we filtered on index name and then age. We used regex check for the index name to be sure that we were only deleting indices that match that particular pattern and applied the date filter on those objects.

Example:
#indices found, filter the list
ilo.filter_by_regex(kind=’regex’, value=regex_pattern)
ilo.filter_by_age(source=’name’, direction=’older’, timestring=date_format, unit=’days’, unit_count=retention_days)
indices.append({“index_name”: “redshift”, “regex_pattern”: “^redshift-\d{4}-\d{2}-\d{2}$”, “retention_days”: 365, “date_format”: “%Y-%m-%d”})
indices.append({“index_name”: “powercenter”, “regex_pattern”: “^powercenter-\d{4}-\d{2}-\d{2}$”, “retention_days”: 101, “date_format”: “%Y-%m-%d”})
indices.append({“index_name”: “metricbeat”, “regex_pattern”: “^metricbeat-\d{4}.\d{2}.\d{2}$”, “retention_days”: 30, “date_format”: “%Y.%m.%d”})


Once we identified the indices to delete we needed to instantiate the DeleteIndices class [delete_indices = curator.DeleteIndices(ilo)]. You can do a a dry run to see what would be deleted [do_dry_run()] or proceed to delete [do_action()].

One thing that we ran into and is not well documented was around the ability to reuse the index list object on iterations for the different index types. We tried to deep copy the index list object, but this caused a Python recursion error. However as an alternative the object has a working_list() method that allows you to take a copy of the index list within the object.

All the above is incorporated into an AWS Lambda function running on a schedule.

Results and Benefits

TerraAlto has been able to provide:

  • Automation of essential maintenance around the usage of AWS ElasticSearch, thus removing a problem and preventing it from reoccurring.
  • Ongoing reduction of hands on activity to operate a data platform on AWS.