Apache Flume – NetCat Source

This chapter takes an example to explain how you can generate events and subsequently log them into the console. For this, we are using the NetCat source and the logger sink.

Prerequisites

To run the example provided in this chapter, you need to install Flume.

Configuring Flume

We have to configure the source, the channel, and the sink using the configuration file in the conf folder. The example given in this chapter uses a NetCat Source, Memory channel, and a logger sink.

NetCat Source

While configuring the NetCat source, we have to specify a port while configuring the source. Now the source (NetCat source) listens to the given port and receives each line we entered in that port as an individual event and transfers it to the sink through the specified channel.

While configuring this source, you have to provide values to the following properties −

  • channels
  • Source type − netcat
  • bind − Host name or IP address to bind.
  • port − Port number to which we want the source to listen.

Channel

We are using the memory channel. To configure the memory channel, you must provide a value to the type of the channel. Given below are the list of properties that you need to supply while configuring the memory channel −

  • type − It holds the type of the channel. In our example, the type is MemChannel.
  • Capacity − It is the maximum number of events stored in the channel. Its default value is 100. (optional)
  • TransactionCapacity − It is the maximum number of events the channel accepts or sends. Its default value is 100. (optional).

Logger Sink

This sink logs all the events passed to it. Generally, it is used for testing or debugging purpose. To configure this sink, you must provide the following details.

  • Channel
  • type − logger

Example Configuration File

Given below is an example of the configuration file. Copy this content and save as netcat.conf in the conf folder of Flume.

# Naming the components on the current agent
NetcatAgent.sources = Netcat   
NetcatAgent.channels = MemChannel 
NetcatAgent.sinks = LoggerSink  

# Describing/Configuring the source 
NetcatAgent.sources.Netcat.type = netcat 
NetcatAgent.sources.Netcat.bind = localhost
NetcatAgent.sources.Netcat.port = 56565  

# Describing/Configuring the sink 
NetcatAgent.sinks.LoggerSink.type = logger  

# Describing/Configuring the channel 
NetcatAgent.channels.MemChannel.type = memory 
NetcatAgent.channels.MemChannel.capacity = 1000 
NetcatAgent.channels.MemChannel.transactionCapacity = 100 
 
# Bind the source and sink to the channel 
NetcatAgent.sources.Netcat.channels = MemChannel
NetcatAgent.sinks.LoggerSink.channel = MemChannel

Execution

Browse through the Flume home directory and execute the application as shown below.

$ cd $FLUME_HOME
$ ./bin/flume-ng agent --conf $FLUME_CONF --conf-file $FLUME_CONF/netcat.conf 
   --name NetcatAgent -Dflume.root.logger=INFO,console

If everything goes fine, the source starts listening to the given port. In this case, it is 56565. Given below is the snapshot of the command prompt window of a NetCat source which has started and listening to the port 56565.

Passing Data to the Source

To pass data to NetCat source, you have to open the port given in the configuration file. Open a separate terminal and connect to the source (56565) using the curl command. When the connection is successful, you will get a message “connected” as shown below.

$ curl telnet://localhost:56565 
connected 

Now you can enter your data line by line (after each line, you have to press Enter). The NetCat source receives each line as an individual event and you will get a received message “OK”.

Whenever you are done with passing data, you can exit the console by pressing (Ctrl+C). Given below is the snapshot of the console where we have connected to the source using the curl command.

Each line that is entered in the above console will be received as an individual event by the source. Since we have used the Logger sink, these events will be logged on to the console (source console) through the specified channel (memory channel in this case).

The following snapshot shows the NetCat console where the events are logged.

Leave a Reply