Discussion:
[Bug 166724] [re] if_re watchdog timeout
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Rodney W. Grimes <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org,
| |***@FreeBSD.org

--- Comment #21 from Rodney W. Grimes <***@FreeBSD.org> ---
Put this back on a visible bug list
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Dirk Meyer <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #22 from Dirk Meyer <***@FreeBSD.org> ---

As stated in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208205

The generic driver fails under load.

Replacing the card with another Realtec card did not help.

Replacing the Realtec card with an Intel card did solve all problems.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Mateusz Piotrowski <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Keywords| |needs-patch
CC| |***@FreeBSD.org
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Arto Pekkanen <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #23 from Arto Pekkanen <***@gmail.com> ---
I managed to solve this issue by disabling MSI and MSI-X. Put the following
lines into /boot/loader.conf

hw.re.msi_disable="1"
hw.re.msix_disable="1"

You see, the MSI/MSI-X interrupt processing supposedly eliminates the need to
perform an extra read from device register after receiving an interrupt which
tells that a DMA write is finished. However, there is some kind of problem
either in the driver or the chip itself in the way it handles these interrupts.

By disabling MSI and MSI-X, the driver switches to using the older interrupt
filter handler, and thus probably performs and extra read from some device
register to wait for the DMA transfer to memory to be ready (according to
wikipedia, when using legacy interrupts this is the only way to ensure the DMA
transfer wasn't buffered by the chipset etc).

So, I would suggest everybody watching this thread to try if disabling MSI and
MSI-X on their system helps. Might not apply to all Realtek NICs, but on my
machine this workaround is valid.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #24 from Alex Dupre <***@FreeBSD.org> ---
Disabling MSI/MSI-X was proposed as solution in the past. I've just tried again
to be sure, it helps, but the issue doesn't disappear completely. With it I can
successfully run the google (m-lab) speed test, but I still get a watchdog
timeout and network reset as soon as I start the Ookla speed test. Fully
reproducible.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #25 from zjk <***@wp.pl> ---
hw.re.msi_disable hw.re.msix_disable
I tested this solution for a few days (it already exists somewhere on the
internet).
There is no visible effect (on my computers) - network is closing very quickly.
But - maybe it depends on the network card chipset?

However, I highly recommend the analysis:
https://forums.freebsd.org/threads/10-2-release-re0-watchdog-timeout.55306/#post-337045
There are some extremely important remarks.
One important tip - this may be the result of overloading the processor. In
general - a problem for low-performance processors. Or vice versa: for the
"computationally demanding" chipset of the network card, and finally the
"programmatically extended" driver.

Probably because the version of "built-in" driver for FreeBSD is so much
"slimmed", in relation to the full version from Realtek (from the Realtek
website). It may be intended to run on less-efficient processors.

But I can not fully appreciate everything from this analysis. "Watchdog
timeout" messages - also occur after stopping the transmission. Processor load
drops to several percent, but watchdog timeout messages still appear every few
seconds.

In general - a reset is needed to restore the normal operation of the
interface.

As a solution, you can use "patch" - instead of, for example, limit the
connection speed to 100 Mb, you can use, for example, dummynet for flow / band
management.

It is still not a solution to the problem of the driver itself.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #26 from Alex Dupre <***@FreeBSD.org> ---
After upgrading to 11.2-RELEASE the problem seems disappeared on my machine.

Looking at dmesg the only difference is the missing of the following line at
boot:

re0: turning off MSI enable bit.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #27 from zjk <***@wp.pl> ---
After upgrading several machines to 11.2 and all-night tests: nothing better,
still a watchdog fault.
zjk
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #28 from Alex Dupre <***@FreeBSD.org> ---
I still see a few watchdog errors in the logs, but I'm unable to trigger them
voluntarily, even with very high traffic. While before it was enough to run a
single speed test to drop the connection, now I can saturate the link without a
watchdog timeout. The connection is quite stable now. The issue is likely not
solved, but it's much harder to be triggered in my scenario.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #29 from zjk <***@wp.pl> ---
The following configuration is very promising:
- kernel 11.2-RELEASE recompiled together,
- re driver v. 1.93 (from realtek site).

Effect:
- NO (absolutely none) watchdog timeout,
- FULL speed in both directions (I will still test different situations),
- works well with lagg(!).

Now I compile realtek version 1.94 with 11.2-RELEASE - I will let you know what
are the effects.

zjk
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #30 from Alex Dupre <***@FreeBSD.org> ---
Surely you won't get the watchdog timeout error with the driver taken from the
realtek website, it's been commented out from the source code, so it's not a
real clue.

Said so, with 11.0 and 11.1 I've always used the 1.93 version without issues.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Palle Girgensohn <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Blocks| |227979


Referenced Bugs:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227979
[Bug 227979] re0: watchdog timeout, perpetual
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
7 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Igor Zabelin <***@yandex.ru> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@yandex.ru

--- Comment #31 from Igor Zabelin <***@yandex.ru> ---
I see problems with the 1.94 or 1.95 realtek driver and 11.2-RELEASE.
Data transfer stops without messages after about a week of load.
With 11.1 there is no problem.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
6 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

Mark Johnston <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #32 from Mark Johnston <***@FreeBSD.org> ---
I hit this a couple of times on a NFS server running 12.0-ALPHA3 while running
highly parallel buildworlds with an NFS-mounted obj dir.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
6 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #33 from zjk <***@wp.pl> ---
Created attachment 196815
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=196815&action=edit
System load average and usage - monitorix
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
6 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #34 from zjk <***@wp.pl> ---
A. After longer tests - I must cancel the previous optimistic news. We are
talking about the 11.2-RELEASE + 1.93-realtek driver:

1. Suspensions, computer stops - still occur. They are only shorter - though
still cumbersome.

See attachment above.

Generally at the beginning the interface works quickly, after some time it
slows down and shows signs of loss.

2. There are still messages about the interface suspension. Because I use lagg
it looks like this:
+ [20445] re1: Interface stopped DISTRIBUTING, possible flapping
+ [48114] re0: Interface stopped DISTRIBUTING, possible flapping

B. Regarding Alex's statements. This is a real problem.
Of course, the "watchdog timeout" message itself is not harmful.
The important thing is that the message in the function follows the reset and
re-initialisation of the interface - this unfortunately results in the loss or
partial destruction of transmitted files / frames (which unfortunately I have
experienced many times).

The application of version 1.93-1.94: is therefore of such a improvement that
not only does the message disappear (commented out from function - as Alex
correctly writes), but the files are not damaged during the transmission (yet
to be checked!).

Version 11.2-RELEASE - for me it certainly generates hundreds of messages
"watchdog timeout" - but today I do not know if it prevents damage or loss of
transmitted data (to be checked).
I see:
/* Cancel pending I/O and free all RX/TX buffers. */
re_stop(sc);
/* Put controller into known state. */
re_reset(sc);
It means: drop, loss transmitted information.

C. However, I will not agree with Alex that it is good. Perhaps it is good for
a laptop, too little for the server. It is still terrible.

D. Test 11.2 + 1.94 - I have not started yet.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
6 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #35 from Alex Dupre <***@FreeBSD.org> ---
I don't think I've ever said this issue is good :-)

What I said is that in my environment when I switched to 11.2-RELEASE it was
happening less frequently. With the FreeBSD driver is easy to detect it,
because it prints the timeout message and resets the interface after 5 ticks,
effectively interrupting any connections for a few seconds. The Realtek driver
doesn't reset the interface and doesn't print the message, so a short timeout
might go unnoticed.

To add new info to the thread, recently I've tried to increase the watchdog
timeout of the FreeBSD driver, changing it from 5 to 50 ticks. Well, the result
was that the connection interruption lasted longer, so the interface seems
really stuck and the reset the only solution.

In the last months I've also tried Realtek drivers 1.94.01 and 1.95 (the one
I'm currently running) and I'm not seeing differences from the 1.93, in my
scenario it seems to work good enough (== I'm not able to detect any connection
drop during normal usage, that doesn't mean they are not happening at all).
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
6 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #35 from Alex Dupre <***@FreeBSD.org> ---
I don't think I've ever said this issue is good :-)

What I said is that in my environment when I switched to 11.2-RELEASE it was
happening less frequently. With the FreeBSD driver is easy to detect it,
because it prints the timeout message and resets the interface after 5 ticks,
effectively interrupting any connections for a few seconds. The Realtek driver
doesn't reset the interface and doesn't print the message, so a short timeout
might go unnoticed.

To add new info to the thread, recently I've tried to increase the watchdog
timeout of the FreeBSD driver, changing it from 5 to 50 ticks. Well, the result
was that the connection interruption lasted longer, so the interface seems
really stuck and the reset the only solution.

In the last months I've also tried Realtek drivers 1.94.01 and 1.95 (the one
I'm currently running) and I'm not seeing differences from the 1.93, in my
scenario it seems to work good enough (== I'm not able to detect any connection
drop during normal usage, that doesn't mean they are not happening at all).

--- Comment #36 from zjk <***@wp.pl> ---
Ok, ok Alex - I understand.

Therefore, for doubters - I added 2 posts earlier chart from monitorix.
For a 24/7 server - you can see how the link hangs (and this happens on a
server that has not too heavy load...), only the reset restores a longer good
response.

For problem solvers - I must add: on most computers I use lagg. Evidently this
"overlay" on the driver increases the frequency of hanging (compared to
computers with re without lagg). But this is a separate problem for a separate
thread.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
4 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #93 from László Károlyi <***@karolyi.hu> ---
(In reply to Bob Smith from comment #92)
With FreeBSD 13.0 out, and it having changed its src and ports repository to
git (https://docs.freebsd.org/en/books/handbook/mirrors/#git), the process
changes to:

1. install git
2. git clone -o freebsd -b releng/$(uname -r | cut -d'-' -f1,1)
https://git.FreeBSD.org/src.git /usr/src
3. git clone -o freebsd https://git.freebsd.org/ports.git /usr/ports
4. cd /usr/ports/net/realtek-re-kmod/
5. make install
6. echo 'if_re_load="YES"' >> /boot/loader.conf
7. echo 'if_re_name="/boot/modules/if_re.ko"' >> /boot/loader.conf

Just thought I'd update this since SVN is no longer the default way to check
out sources and ports.
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
b***@freebsd.org
4 years ago
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=166724

--- Comment #94 from Chris Hutchinson <***@bsdforge.com> ---
I'm just going to throw this out there for a couple of
reasons...
1) several people indicated the vendors driver solved
it for them
2) I just bought a realtek card capable of 9k jumbo
frames. But the re(4) kernel module built into the
kernel wouldn't do 9k jumbo frames.
3) This will work even if you already have the re(4)
module built in, or from /boot/kernel/

Please try the /usr/ports/net/realtek-re-kmod/
After you've either built and installed it, or
pkg(8) installed it. Add the following to loader.conf(5)

if_re_load="YES"
if_re_name="/boot/modules/if_re.ko"

I have zero trouble using this driver, and am also
able to use the 9k jumbo frames this card is capable
of managing.

HTH

--Chris
--
You are receiving this mail because:
You are on the CC list for the bug.
You are the assignee for the bug.
Continue reading on narkive:
Loading...