g***@charter.net
2021-04-21 16:35:04 UTC
Freebsd-net,
We are running FreeBSD 10.4 with multipath routing enabled (RADIX_MPATH)
and are using just a single static route (10.18.91.0/255.255.255.0
10.17.118.3)
when we infrequently run into the problem described below.
The system is running fine with off-net clients (10.18.91.0/255.255.255.0)
accessible.
Then at some point we can no longer reach the off-net clients with ping and
SSH failing.
Interestingly, the off-net clients can successfully ping and SSH into our
failing node.
When the problem occurs we've determined our failing node is sending
0.0.0.0 as it's source IP address, which is why the outgoing pings and SSH
fail.
We have also found that if we remove the single static route and add it
back,
the problem is corrected.
Is this a known issue that's been fixed in subsequent releases?
I've been looking in the function ip_output() and see where it calls
rtalloc_mpath_fib() to lookup the route to the destination (e.g.,
10.18.91.10).
and then later fills in the source IP "if available". There's a comment
stating
"/* Interface may have no addresses. */" and the code doesn't try to fill in
the
source IP and continues on without error, which goes along with what we've
observed in the failure case.
Thus, our problem seems to be in the actual routing code/structures, which
I'm digging deeper into every day.
Do you have any tips or specific areas of the routing code I should be
looking into ?
Thanks
Greg
We are running FreeBSD 10.4 with multipath routing enabled (RADIX_MPATH)
and are using just a single static route (10.18.91.0/255.255.255.0
10.17.118.3)
when we infrequently run into the problem described below.
The system is running fine with off-net clients (10.18.91.0/255.255.255.0)
accessible.
Then at some point we can no longer reach the off-net clients with ping and
SSH failing.
Interestingly, the off-net clients can successfully ping and SSH into our
failing node.
When the problem occurs we've determined our failing node is sending
0.0.0.0 as it's source IP address, which is why the outgoing pings and SSH
fail.
We have also found that if we remove the single static route and add it
back,
the problem is corrected.
Is this a known issue that's been fixed in subsequent releases?
I've been looking in the function ip_output() and see where it calls
rtalloc_mpath_fib() to lookup the route to the destination (e.g.,
10.18.91.10).
and then later fills in the source IP "if available". There's a comment
stating
"/* Interface may have no addresses. */" and the code doesn't try to fill in
the
source IP and continues on without error, which goes along with what we've
observed in the failure case.
Thus, our problem seems to be in the actual routing code/structures, which
I'm digging deeper into every day.
Do you have any tips or specific areas of the routing code I should be
looking into ?
Thanks
Greg