Splunk Troubleshooting – Forwarder Welcome to the first installment in our new Splunk troubleshooting series. As Splunk professionals we know that there are issues that are not covered in the Splunk Tutorials and guides. Splunk Answers can be a valuable resource at times but still it can suck up some precious time trying to find the guidance you need. So to save you time and provide you with the knowledge you need to tackle some of the most common Splunk issues we have created this blog series for you, the Splunk professional. We are Splunk experts and Splunk professional services partners with the experience and knowledge to assist with your Splunk deployment in any environment. Splunk: Troubleshooting Forwarder Communications (for the purposes of this article we will be working with *nix based nodes) While Splunk can be a very powerful tool for harnessing the true power of your log data, sometimes setting up communication between your forwarders and indexers can prove challenging. Below we explore some of the more common errors you may encounter and how to bring these issues to resolution. Timeouts in Cooked Connections within splunkd.log Splunk Error Code: (WARN TcpOutputProc – Cooked connection to ip= 255.255.255.255 timed out) This is the most common error encountered when troubleshooting Splunk forwarder communications and can be one of the most frustrating. To determine if this is the error causing your issues, you can simply take a look at the most recent events within the splunkd.log file. tail -100f splunkd.log | grep TcpOutputProc
If you are experiencing the issue, you will see warning messages within the log similar to the following: WARN TcpOutputProc – Cooked connection to ip= 255.255.255.255 timed out Cooked connection denotes communications between two Splunk nodes as opposed to Raw connections which refer to non-Splunk nodes passing their data to Splunk. First we need to ensure that the forwarder is listening on the correct port. To determine this we simply issue a netstat: netstat –an | grep 9998 If the forwarder is listening properly you should see a result similar to: Tcp 0 0 0.0.0.0:9998 0.0.0.0:* LIST This tells us that the forwarder is in listen mode for port 9998. Now that we’ve determined we’re listening on the correct port we need to test the communications path between the forwarder and indexer. To do this we attempt to open a telnet session to the indexer from the forwarder: telnet indexername.domainname.com 9998
If the port is available the connection should be successful almost immediately. Should the connection fail we’ll want to try another port to determine if we have a port availability issue or something more. So next we’ll attempt to telnet to port 8089 (Splunk management port which should always be open on an indexer): telnet indexername.domainname.com 8089 If the connection is successful you should see something close to the following: Connected to indexername.domainname.com Once we’ve made the successful connection, we now know that there is an open port issue with port 9998 to the indexer. Now the question remains, is this an issue at the firewall layer or within a local firewall such as iptables on the indexer itself. To determine this, we SSH into the indexer and attempt to telnet back onto ourselves via port 9998. telnet localhost 9998 If the problem lies with a local firewall such as iptables, you will receive an error similar to the following: telnet: connect to address ::1: Connection refused So what does this tell us? The refusal locally tells us that port 9998 has not been opened via the local firewall and this is the source of our issue. To resolve we simply open port 9998 via iptables (or your local firewall) to resolve.
iptables –D INPUT –p tcp –dport 9998 –j DROP service iptables save If the telnet connection to localhost connects successfully, you have determined that the communications issue is tied to the fact that port 9998 needs to be opened at the network firewall layer to allow communications with the forwarder. To Know more about Splunk Visit Mindmajix