Troubleshooting network connectivity issues like connection refused or connection is hanging due to port issues.
Step 1.
Verify ping time from both ends(source and destination)
ping 10.10.10.11
Ping time of 100 ms and below are average for most broadband connections. So there will not be any lag.
While a ping of 150 ms or more may not be helpful and there will be a lag.
Step 2.
Verify whether the port is listening or not, using netstat or nc.
netstat -ntl|grep 7199
tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN
Or can verified as below
nc -lk 7199
nc: Address already in use
When the port is open and when none of the processes is listening then the above command(nc -lk) will start listening on a specific port.
Step 3.
Check if the port is open or not for the remote server. Login to source server and try to connect remote server using any one of the following methods(telnet, nc, and nmap).
telnet 10.10.10.12 7199
Trying 10.10.10.12…
Connected to 10.10.10.12.
Escape character is ‘^]’.
^] — ctrl+ ]
telnet> quit
nc -zvw3 10.10.10.12 7199
Connection to 10.10.10.12 7199 port [tcp/*] succeeded!
nmap -sT 10.10.10.12 -p 7199 -Pn
Output should show as open, should not show as filtered or closed .
Step 4.
Acknowledgment between the server’s ports can be verified using ngrep or tcpdump.
For example
sudo tcpdump tcp port 7199
Or
tcp port 7199 -w trace7199.pcap –to save in file
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
21:07:35.356796 IP ip-10-10-10-11.srv101.dsinternal.org.47608 > ip-10.10.10.12.srv101.dsinternal.org.7199: Flags [S], seq 1054145878, win 29200, options [mss 1460,sackOK,TS val 989707819 ecr 0,nop,wscale 9], length 0
21:07:35.356836 IP ip-10.10.10.12.srv101.dsinternal.org.7199 > ip-10.10.10.11.srv101.dsinternal.org.47608: Flags [S.], seq 3203231434, ack 1054145879, win 28960, options [mss 1460,sackOK,TS val 989707126 ecr 989707819,nop,wscale 9], length 0
Note: Check the length, if the length is 0, then nothing is getting pushed.
Or
sudo ngrep port 7199
T 10.10.10.12:46996 -> 10.10.10.11:7199
T 10.10.10.11:7199 -> 10.10.10.12:46996
Step 5.
Get confirmation from the network team whether the outbound or short time living port ranges are open or not, for example on Linux flavor we can verify range using the below command.
cat /proc/sys/net/ipv4/ip_local_port_range
32768 60999
Step 6.
Verify the outbound port’s connectivity by passing text between servers as below.
Server 1
Connect to local outbound port
cassandra@ip-10-10-10-11:~$ sudo nc -l 32768
Server 2
Connect to server 1 outbound port.
cassandra@10-10-10-12:~$ nc 10.10.10.11 32768
Now type the text on any server that should be viewed on another server terminal.
Datastax outbound port verification
Note:Please test vice-versa as well to verify the outbound port’s remote connectivity.
Step 7.
Check the firewall rules using iptables.
10.10.10.11:~$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination
Step 8.
Finally, confirm with the networking team that they need to check inter-subnet firewall or firewall appliance policies that regulate traffic going to different subnets.
Specific to JMX Port 7199 or if nodetool is not working across the servers, then verify the following steps
Step 1.
Make sure whether the following parameters are in the cassandra-env.sh or in jvm.options file
-Dcassandra.jmx.remote.port=7199
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=10.10.10.11
Step 2.
Download the jmxterm application from any website or from (https://docs.cyclopsgroup.org/jmxterm) then test port 7199 as below
10-10-10-11:~$ java -jar jmxterm-1.0.2-uber.jar
$>open localhost:7199
#Connection to localhost:7199 is opened
$>exit
#bye
10-10-10-11:~$ java -jar jmxterm-1.0.2-uber.jar
$>open 10.10.10.12:7199
#Connection to 10.10.10.12:7199 is opened
Note:If the connection is hanging or the connection is refused then there is a chance for the above things, one is either 7199 or outbound ports are not open or might be due to the above parameters in step1.