Discussion:
[Bug 221317] Netmap issue after ixgbe driver update in r320897
(too old to reply)
b***@freebsd.org
2018-04-12 18:56:41 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Stephen Hurd <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Assignee|***@freebsd.org |***@FreeBSD.org
CC| |***@FreeBSD.org
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-12 19:06:54 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #19 from commit-***@freebsd.org ---
A commit references this bug:

Author: shurd
Date: Thu Apr 12 19:06:15 UTC 2018
New revision: 332447
URL: https://svnweb.freebsd.org/changeset/base/332447

Log:
Work around netmap issue with ixgbe

After multiple start/stop of netmap, ixgbe will get into a bad state
requiring a reboot to recover. Adding a delay before stopping the interface
appears to work around the issue.

The -CURRENT driver has diverged too far from -STABLE for an MFC.

PR: 221317
Submitted by: Sylvain Galliano <***@efficientip.com>
Reported by: Cassiano Peixoto <***@gmail.com>
Sponsored by: Limelight Networks

Changes:
stable/11/sys/dev/ixgbe/if_ix.c
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-12 19:10:01 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #20 from Stephen Hurd <***@FreeBSD.org> ---
I've committed your work-around just in case nobody has time to investigate
this before 11.2. Thanks for sticking with this.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 17:46:56 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #21 from commit-***@freebsd.org ---
A commit references this bug:

Author: shurd
Date: Fri Apr 13 17:45:54 UTC 2018
New revision: 332481
URL: https://svnweb.freebsd.org/changeset/base/332481

Log:
Move 1-second spin into ixgbe_netmap_reg()

This should still work around the netmap issue, but should not impact other
calls to ixgbe_stop().

PR: 221317
Sponsored by: Limelight Networks

Changes:
stable/11/sys/dev/ixgbe/if_ix.c
stable/11/sys/dev/ixgbe/ixgbe_netmap.c
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 17:50:12 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #22 from Stephen Hurd <***@FreeBSD.org> ---
Can you test with r332481 and ensure it still works around the issue?
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 18:16:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Stephen Hurd <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #191979|0 |1
is obsolete| |

--- Comment #23 from Stephen Hurd <***@FreeBSD.org> ---
Created attachment 192502
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=192502&action=edit
Attempt to remove 1-second spin

Assuming the previous commit still works around the issue, please try the
attached patch.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 18:37:36 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #24 from Sylvain Galliano <***@efficientip.com> ---
(In reply to Stephen Hurd from comment #22)

Hello Stephen,

Your patch is working when using netmap, but issue with ifconfig down/up in
loop is back (see little script in comment #14)
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 18:49:01 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #25 from Stephen Hurd <***@FreeBSD.org> ---
(In reply to Sylvain Galliano from comment #24)

Hrm, could you try putting an ixgbe_qflush(ipf) in ixgbe_stop() before the
interrupt is disabled? My current theory is that the TX queue is being left in
a bad state (which is why the delay helps).

I don't current have an 11-STABLE system with an ixgbe in it to test on.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 19:17:46 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #26 from Sylvain Galliano <***@efficientip.com> ---
(In reply to Stephen Hurd from comment #25)

Unfortunately it's not working.

Here is the patch I applied:

--- sys/dev/ixgbe/if_ix.c (revision 332482)
+++ sys/dev/ixgbe/if_ix.c (working copy)
@@ -3568,6 +3568,7 @@
mtx_assert(&adapter->core_mtx, MA_OWNED);

INIT_DEBUGOUT("ixgbe_stop: begin\n");
+ ixgbe_qflush(ifp);
ixgbe_disable_intr(adapter);
callout_stop(&adapter->timer);
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 20:00:10 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #27 from Sylvain Galliano <***@efficientip.com> ---
(In reply to Stephen Hurd from comment #25)

In my first test, I used commit r332481 (with msec_delay moved in netmap code)
-> worked with netmap only (not for ifconfig down/up)

I've just tested your attached patch (ixgbe_qflush(ifp) in ixgbe_netmap.c and I
reproduce issue after several netmap start/stop
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-13 20:55:26 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Stephen Hurd <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #192502|0 |1
is obsolete| |

--- Comment #28 from Stephen Hurd <***@FreeBSD.org> ---
Created attachment 192505
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=192505&action=edit
Additional debugging in ixgbe_stop()

This patch won't solve the problem, but it will log errors encountered in
ixgbe_stop() if any.

If there are no errors logged in dmesg, I'm curious if that delay needs to be
at the beginning of the call to stop, or if it can be moved to just before the
init_locked() call.

If there's an error, possibly just retrying after a short delay will help, but
if not, I'll see if I can get an 11-STABLE system up and running this weekend.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-04-16 12:06:35 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #29 from Sylvain Galliano <***@efficientip.com> ---
(In reply to Stephen Hurd from comment #28)

Patch with error logs applied:
I do not have any error log before issue to appear.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-05-24 17:29:26 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Stephen Hurd <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Assignee|***@FreeBSD.org |***@freebsd.org
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-05-24 17:33:43 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Stephen Hurd <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|Netmap issue after ixgbe |ifconfig down/up issue
|driver update in r320897 |after ixgbe driver update
| |in r320897
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-11-18 21:24:15 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Lev A. Serebryakov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@FreeBSD.org

--- Comment #30 from Lev A. Serebryakov <***@FreeBSD.org> ---
I have same problem with CURRENT r340586.

Script which calls ifconfig down / ifconfig up in the loop renders NIC unusable
("media: No carrier").

Also, driver complains about unsupported SFP+ type before failure.

Reboot helps.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-11-28 14:38:22 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #31 from Lev A. Serebryakov <***@FreeBSD.org> ---
Any news on this? I have exactly the same problem on 12 and CURRENT, with new
iflib-based driver too.

It is very annoying, as I can not run long benchmarks in automatic mode, I need
to monitor, do I have NICs hanged up.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-11-28 14:52:37 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Charles Goncalves <***@halfling.com.br> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@halfling.com.br

--- Comment #32 from Charles Goncalves <***@halfling.com.br> ---
(In reply to Lev A. Serebryakov from comment #31)
I applied Sylvain's patch with change to 100ms and works fine for production
use while I am waiting to someone fix this.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-11-28 15:05:54 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #33 from Lev A. Serebryakov <***@FreeBSD.org> ---
(In reply to Charles Goncalves from comment #32)
It is not clear where should I apply patch on 12/13, as driver is very
different. Put it into iflib for ALL adapters?
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-11-28 15:29:41 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #34 from Lev A. Serebryakov <***@FreeBSD.org> ---
(In reply to Charles Goncalves from comment #32)
Nope, adding delay to common iflib_netmap_register code doesn't help, but this
code is somewhat different from 11 driver's one.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-12-07 13:12:26 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Piotr Pietruszewski <***@intel.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@intel.c
| |om

--- Comment #35 from Piotr Pietruszewski <***@intel.com> ---
(In reply to Lev A. Serebryakov from comment #34)
(In reply to Charles Goncalves from comment #32)
(In reply to Sylvain Galliano from comment #29)
(In reply to Cassiano Peixoto from comment #18)

The bug seems to be fixed by applying patch D18468 which is currently under
review ( https://reviews.freebsd.org/D18468 ). Please let me know if the patch
solves your problem.
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-12-07 16:25:56 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #36 from Sylvain Galliano <***@efficientip.com> ---
(In reply to Piotr Pietruszewski from comment #35)
Patch looks good, I've stressed NIC during one hour without any issue.
NIC status always stay 'active' after last 'ifconfig up'
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-12-07 18:35:22 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

--- Comment #37 from Charles Goncalves <***@halfling.com.br> ---
(In reply to Piotr Pietruszewski from comment #35)
can't apply this patch on 11.2-STABLE
--
You are receiving this mail because:
You are on the CC list for the bug.
b***@freebsd.org
2018-12-09 19:49:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317

Peter Vanek <***@efficientip.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@efficientip.com

--- Comment #38 from Peter Vanek <***@efficientip.com> ---
(In reply to Charles Goncalves from comment #37)

Hi Charles,

My colleague Sylvain did patch merge against Freebsd-current;
He had same too many conflicts against stable version.

Peter
--
You are receiving this mail because:
You are on the CC list for the bug.
Loading...