Elasticsearch – Modules

Elasticsearch - Modules

Elasticsearch Modules is composed of a number of modules, which are responsible for its functionality. These modules have two types of settings as follows −

  • Static Settings − These settings need to be configured in config (elasticsearch.yml) file before starting Elasticsearch. You need to update all the concern nodes in the cluster to reflect the changes by these settings.
  • Dynamic Settings − These settings can be set on live Elasticsearch.

We will discuss the different modules of Elasticsearch in the following sections of this chapter.

Cluster-Level Routing and Shard Allocation Elasticsearch Modules

Cluster level settings decide the allocation of shards to different nodes and the reallocation of shards to rebalance clusters. These are the following settings to control shard allocation.

Cluster-Level Shard Allocation

SettingPossible valueDescription
cluster.routing.allocation.enable
allThis default value allows shard allocation for all kinds of shards.
primariesThis allows shard allocation only for primary shards.
new_primariesThis allows shard allocation only for primary shards for new indices.
noneThis does not allow any shard allocations.
cluster. routing.allocation .node_concurrent_recoveriesNumeric value (by default 2)This restricts the number of concurrent shard recoveries.
cluster. routing.allocation .node_initial_primaries_recoveriesNumeric value (by default 4)This restricts the number of parallel initial primary recoveries.
cluster. routing.allocation .same_shard.hostBoolean value (by default false)This restricts the allocation of more than one replica of the same shard in the same physical node.
indices.recovery.concurrent _streamsNumeric value (by default 3)This controls the number of open network streams per node at the time of shard recovery from peer shards.
indices.recovery.concurrent _small_file_streamsNumeric value (by default 2)This controls the number of open streams per node for small files having a size of less than 5Mb at the time of shard recovery.
cluster.routing.rebalance.enable
allThis default value allows balancing for all kinds of shards.
primariesThis allows shard balancing only for primary shards.
replicasThis allows shard balancing only for replica shards.
noneThis does not allow any kind of shard balancing.
cluster. routing.allocation .allow_rebalance
alwaysThis default value always allows rebalancing.
indices_primaries _activeThis allows rebalancing when all primary shards in the cluster are allocated.
Indices_all_activeThis allows rebalancing when all the primary and replica shards are allocated.
cluster.routing.allocation.cluster _concurrent_rebalanceNumeric value (by default 2)This restricts the number of concurrent shard balancing in cluster.
cluster.routing.allocation .balance.shardFloat value (by default 0.45f)This defines the weight factor for shards allocated on every node.
cluster.routing.allocation .balance.indexFloat value (by default 0.55f)This defines the ratio of the number of shards per index allocated on a specific node.
cluster.routing.allocation .balance.thresholdNon negative float value (by default 1.0f)This is the minimum optimization value of operations that should be performed.

Disk-based Shard Allocation

SettingPossible valueDescription
cluster.routing.allocation.disk.threshold_enabledBoolean value (by default true)This enables and disables disk allocation decider.
cluster.routing.allocation.disk.watermark.lowString value(by default 85%)This denotes maximum usage of disk; after this point, no other shard can be allocated to that disk.
cluster.routing.allocation.disk.watermark.highString value (by default 90%)This denotes the maximum usage at the time of allocation; if this point is reached at the time of allocation, then Elasticsearch will allocate that shard to another disk.
cluster.info.update.intervalString value (by default 30s)This is the interval between disk usages checkups.
cluster.routing.allocation.disk.include_relocationsBoolean value (by default true)This decides whether to consider the shards currently being allocated while calculating disk usage.

Discovery

This module helps a cluster to discover and maintain the state of all the nodes in it. The state of the cluster changes when a node is added or deleted from it. The cluster name set is used to create a logical difference between different clusters. There are some modules that help you to use the APIs provided by cloud vendors and those are as given below −

  • Azure discovery
  • EC2 discovery
  • Google compute engine discovery
  • Zen discovery

Gateway

This module maintains the cluster state and the shard data across full cluster restarts. The following are the static settings of this module −

SettingPossible valueDescription
gateway.expected_nodesnumeric value (by default 0)The number of nodes that are expected to be in the cluster for the recovery of local shards.
gateway.expected_master_nodesnumeric value (by default 0)The number of master nodes that are expected to be in the cluster before starting recovery.
gateway.expected_data_nodesnumeric value (by default 0)The number of data nodes expected in the cluster before start recovery.
gateway.recover_after_timeString value (by default 5m)This is the interval between disk usages checkups.
cluster.routing.allocation. disk.include_relocationsBoolean value (by default true)This specifies the time for which the recovery process will wait to start regardless of the number of nodes joined in the cluster. gateway.recover_ after_nodes
gateway.recover_after_master_nodes
gateway.recover_after_data_nodes

HTTP

This module manages the communication between an HTTP client and Elasticsearch APIs. This module can be disabled by changing the value of HTTP. enabled to false.

The following are the settings (configured in elasticsearch.yml) to control this module −

S.NoSetting & Description
1HTTP. port This is a port to access Elasticsearch and it ranges from 9200-9300.
2HTTP.publish_portThis port is for HTTP clients and is also useful in the case of the firewall.
3HTTP.bind_hostThis is a host address for HTTP service.
4HTTP.publish_hostThis is a host address for HTTP client.
5HTTP.max_content_lengthThis is the maximum size of content in an HTTP request. Its default value is 100Mb.
6HTTP.max_initial_line_lengthThis is the maximum size of URL and its default value is 4kb.
7HTTP.max_header_sizeThis is the maximum HTTP header size and its default value is 8kb.
8HTTP.compression enables or disables support for compression and its default value is false.
9HTTP.pipelinigThis enables or disables HTTP pipelining.
10HTTP. pipelining.max_eventsThis restricts the number of events to be queued before closing an HTTP request.

Indices

This module maintains the settings, which are set globally for every index. The following settings are mainly related to memory usage −

Circuit Breaker

This is used for preventing operations from causing an OutOfMemroyError. The setting mainly restricts the JVM heap size. For example, indices.breaker.total.limit setting, which defaults to 70% of the JVM heap.

Fielddata Cache

This is used mainly when aggregating on a field. It is recommended to have enough memory to allocate it. The amount of memory used for the field data cache can be controlled using indices.field data.cache.size setting.

Node Query Cache

This memory is used for caching the query results. This cache uses the Least Recently Used (LRU) eviction policy. Indices.queries.cache.size setting controls the memory size of this cache.

Indexing Buffer

This buffer stores the newly created documents in the index and flushes them when the buffer is full. Setting like indices. memory.index_buffer_size controls the amount of heap-allocated for this buffer.

Shard Request Cache

This cache is used to store the local search data for every shard. The cache can be enabled during the creation of the index or can be disabled by sending URL parameters.

Disable cache - ?request_cache = true
Enable cache "index.requests.cache.enable": true

Indices Recovery

It controls the resources during recovery process. The following are the settings −

SettingDefault value
indices.recovery.concurrent_streams3
indices.recovery.concurrent_small_file_streams2
indices.recovery.file_chunk_size512kb
indices.recovery.translog_ops1000
indices.recovery.translog_size512kb
indices.recovery.compresstrue
indices.recovery.max_bytes_per_sec40mb

TTL Interval

Time to Live (TTL) interval defines the time of a document, after which the document gets deleted. The following are the dynamic settings for controlling this process −

SettingDefault value
indices.TTL.intervalThe 60s
indices.TTL.bulk_size1000

Node

Each node has an option to be a data node or not. You can change this property by changing the node. data setting. Setting the value as false defines that the node is not a data node.

Next Topic – Click Here

This Post Has 2 Comments

  1. amb superslot

    wow, awesome post.Really thank you! Awesome.

Leave a Reply