Troubleshooting network issues

Troubleshooting network connectivity issues like connection refused or connection is hanging due to port issues.

Step 1.

Verify ping time from both ends(source and destination)

ping 10.10.10.11

Ping time of 100 ms and below are average for most broadband connections. So there will not be any lag.

While a ping of 150 ms or more may not be helpful and there will be a lag.

Step 2.

Verify whether the port is listening or not, using netstat or nc.

netstat -ntl|grep 7199

tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN

Or can verified as below

nc -lk 7199

nc: Address already in use

When the port is open and when none of the processes is listening then the above command(nc -lk) will start listening on a specific port.

Step 3.

Check if the port is open or not for the remote server. Login to source server and try to connect remote server using any one of the following methods(telnet, nc, and nmap).

telnet 10.10.10.12 7199

Trying 10.10.10.12…

Connected to 10.10.10.12.

Escape character is ‘^]’.

^] — ctrl+ ]

telnet> quit

nc -zvw3 10.10.10.12 7199

Connection to 10.10.10.12 7199 port [tcp/*] succeeded!

nmap -sT 10.10.10.12 -p 7199 -Pn

Output should show as open, should not show as filtered or closed .

Step 4.

Acknowledgment between the server’s ports can be verified using ngrep or tcpdump.

For example

sudo tcpdump tcp port 7199

Or

tcp port 7199 -w trace7199.pcap –to save in file

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

21:07:35.356796 IP ip-10-10-10-11.srv101.dsinternal.org.47608 > ip-10.10.10.12.srv101.dsinternal.org.7199: Flags [S], seq 1054145878, win 29200, options [mss 1460,sackOK,TS val 989707819 ecr 0,nop,wscale 9], length 0

21:07:35.356836 IP ip-10.10.10.12.srv101.dsinternal.org.7199 > ip-10.10.10.11.srv101.dsinternal.org.47608: Flags [S.], seq 3203231434, ack 1054145879, win 28960, options [mss 1460,sackOK,TS val 989707126 ecr 989707819,nop,wscale 9], length 0

Note: Check the length, if the length is 0, then nothing is getting pushed.

Or

sudo ngrep port 7199

T 10.10.10.12:46996 -> 10.10.10.11:7199

T 10.10.10.11:7199 -> 10.10.10.12:46996

Step 5.

Get confirmation from the network team whether the outbound or short time living port ranges are open or not, for example on Linux flavor we can verify range using the below command.

cat /proc/sys/net/ipv4/ip_local_port_range

32768 60999

Step 6.

Verify the outbound port’s connectivity by passing text between servers as below.

Server 1

Connect to local outbound port

cassandra@ip-10-10-10-11:~$ sudo nc -l 32768

Server 2

Connect to server 1 outbound port.

cassandra@10-10-10-12:~$ nc 10.10.10.11 32768

Now type the text on any server that should be viewed on another server terminal.

Datastax outbound port verification

Note:Please test vice-versa as well to verify the outbound port’s remote connectivity.

Step 7.

Check the firewall rules using iptables.

10.10.10.11:~$ sudo iptables -L

Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Step 8.

Finally, confirm with the networking team that they need to check inter-subnet firewall or firewall appliance policies that regulate traffic going to different subnets.

Specific to JMX Port 7199 or if nodetool is not working across the servers, then verify the following steps

Step 1.

Make sure whether the following parameters are in the cassandra-env.sh or in jvm.options file

-Dcassandra.jmx.remote.port=7199

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.ssl=false

-Djava.rmi.server.hostname=10.10.10.11

Step 2.

Download the jmxterm application from any website or from (https://docs.cyclopsgroup.org/jmxterm) then test port 7199 as below

10-10-10-11:~$ java -jar jmxterm-1.0.2-uber.jar

$>open localhost:7199

#Connection to localhost:7199 is opened

$>exit

#bye

10-10-10-11:~$ java -jar jmxterm-1.0.2-uber.jar

$>open 10.10.10.12:7199

#Connection to 10.10.10.12:7199 is opened

Note:If the connection is hanging or the connection is refused then there is a chance for the above things, one is either 7199 or outbound ports are not open or might be due to the above parameters in step1.