Discussion:
[Bug 256375] iflib/if_em: unplugging network cable causes huge KTorrent slowdown
b***@freebsd.org
2021-06-02 11:57:27 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

Bug ID: 256375
Summary: iflib/if_em: unplugging network cable causes huge
KTorrent slowdown
Product: Base System
Version: CURRENT
Hardware: Any
OS: Any
Status: New
Keywords: iflib, performance
Severity: Affects Only Me
Priority: ---
Component: kern
Assignee: ***@FreeBSD.org
Reporter: ***@FreeBSD.org

Created attachment 225494
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225494&action=edit
procstat -kk output

I'm experiencing strange and very prominent performance loss with KTorrent when
unplugging Ethernet cable from em0, that is, switch to WiFi: it now takes
several tens seconds for it to update the window contents, react to mouse
clicks, and transfer speeds drop down to single bytes per second. At this
time, the ktorrent process, as displayed by top(1), is in "iflib" state which
occasionally briefly changes to "e1000_".

But once I plug the network cable back, everything goes back to normal, and the
process returns to "select" state like it should. This happens on Lenovo L470
laptop with the following NIC:

***@pci0:0:31:6: class=0x020000 rev=0x21 hdr=0x00 vendor=0x8086
device=0x15d8 subvendor=0x17aa subdevice=0x505f
vendor = 'Intel Corporation'
device = 'Ethernet Connection (4) I219-V'
class = network
subclass = ethernet

Both today's kernel and my normal Jan/Feb'ish kernel behave identically. The
driver from `net/intel-em-kmod' port does not exhibit this problem. I'm
attaching procstat -kk $ktorrent_pid output, in case it could be useful. This
line looks suspicious as if something is holding the lock:

mi_switch+0xc1 _sx_xlock_hard+0x3d1 iflib_media_status+0xf2 ifmedia_ioctl+0x16a
ifhwioctl+0x2bd ifioctl+0x48d kern_ioctl+0x23b sys_ioctl+0xf6
amd64_syscall+0x100 fast_syscall_common+0xf8

Please advise on any other debug information I could provide which may help to
track the bug down and fix it.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-02 15:11:19 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

Kevin Bowling <***@freebsd.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Keywords| |IntelNetworking
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-02 20:15:52 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #1 from Eric Joyner <***@freebsd.org> ---
It would be nice to figure out what exactly that "e1000_" refers to. Without
looking, I have a hunch that it refers to "e1000_delay".

Is ktorrent trying to query the link status of connections? I don't know what
syscall it's making that would lead to iflib_media_status() being called.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 04:10:50 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #2 from Alexey Dokuchaev <***@FreeBSD.org> ---
Created attachment 225515
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225515&action=edit
dtrace output, cable plugged
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 04:25:37 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #3 from Alexey Dokuchaev <***@FreeBSD.org> ---
Created attachment 225516
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225516&action=edit
dtrace output, cable unplugged

Alexander suggested I run the following DTrace script, with and without the
cable, to see if the problem has something to do with the lock or rather
em_if_update_admin_status():

dtrace -n 'fbt::iflib_media_status:entry { self->ts = timestamp; }
fbt::iflib_media_status:return /self->ts/ {@[execname, "ns"] =
quantize(timestamp - self->ts); self->ts = 0; }'

So, two things: 1) KTorrent is not required to reproduce the bug (good),
ifconfig(8) is also slowed down, these is actually a visible delay of its
console output when I run without the cable; and 2) apparently
em_if_update_admin_status() is bogusly and repeatedly checking hw link status
despite, as Alexander had mentioned, that there actually is a link status field
which is being checked in other functions.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 05:27:49 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #4 from Alexey Dokuchaev <***@FreeBSD.org> ---
(In reply to Eric Joyner from comment #1)
Post by b***@freebsd.org
It would be nice to figure out what exactly that "e1000_" refers to.
I've patched those strings to make them pinpoint exact source code location:

"e1000_" refers to sys/dev/e1000/e1000_osdep.h:97:pause("e1000_delay",
ms_scale(x));

"iflib" refers to sys/net/iflib.c:531:#define CTX_LOCK_INIT(_sc)
sx_init(&(_sc)->ifc_ctx_sx, "iflib ctx lock")
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 06:47:23 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #5 from Alexey Dokuchaev <***@FreeBSD.org> ---
(In reply to Alexey Dokuchaev from comment #4)
Sorry, I replied too soon. Both safe_pause_ms() and safe_pause_us() are
called.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 11:56:29 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #6 from Alexey Dokuchaev <***@FreeBSD.org> ---
Created attachment 225522
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225522&action=edit
drace without iflib_admin_intr_deferred() call

Alexander suggested commenting out iflib_admin_intr_deferred() call in
@@ -1564,7 +1564,7 @@ em_if_media_status(if_ctx_t ctx, struct ifmediareq *ifmr)
INIT_DEBUGOUT("em_if_media_status: begin");
- iflib_admin_intr_deferred(ctx);
+ //iflib_admin_intr_deferred(ctx);
ifmr->ifm_status = IFM_AVALID;
ifmr->ifm_active = IFM_ETHER;
This had caused minor, yet noticeable improvement in responsiveness: KTorrent
now stays mostly in "select" state with occasional "e1000_" and rare "iflib"
during which it's still stuck for several (vs several tens before) seconds, but
can nonetheless perform transfers again at reasonable speeds. However, it's
still not as good as once I "kldunload if_em".
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 15:14:03 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #7 from Eric Joyner <***@freebsd.org> ---
(In reply to Alexey Dokuchaev from comment #6)

That's something I would've suggested, too. The iflib_media_status() function
already calls the driver's admin status function before calling the driver's
media status function, so I don't think it's strictly necessary to call the
admin task again (and thereby call the driver's admin status function
more-or-less immediately afterwards!).

I need to actually check, but the admin status function is probably being
called every half second, and maybe something in there that checks link status
is taking too long, especially when there is no media?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-03 17:21:41 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

Alexey Dokuchaev <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #225522|drace without |dtrace without
description|iflib_admin_intr_deferred() |iflib_admin_intr_deferred()
|call |call
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-04 10:49:44 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

Aleksandr Fedorov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #225542|text/x-c++src |text/plain
mime type| |

--- Comment #8 from Aleksandr Fedorov <***@FreeBSD.org> ---
Created attachment 225542
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225542&action=edit
ifmedia status test program
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-04 12:01:20 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #9 from Aleksandr Fedorov <***@FreeBSD.org> ---
I wrote a quick and dirty program to test getting link status in a loop. See
attachments.

To start:
c++ ifmediatst.cpp -o ifmediatst
./ifmediatst em0

Also I found Notebook (DELL Latitude 5280) with I219-LM network card.

Some tests output. RELASE-13.0 with iflib based driver:
./ifmediatst em0
...

iteration: 16631 state: active, duration: 0.146374 ms
iteration: 16632 state: active, duration: 0.148981 ms
iteration: 16633 state: active, duration: 0.149550 ms
iteration: 16634 state: active, duration: 0.147517 ms
iteration: 16635 state: no carrier, duration: 111.931014 ms
iteration: 16636 state: no carrier, duration: 224.454816 ms
iteration: 16637 state: no carrier, duration: 111.525601 ms
iteration: 16638 state: no carrier, duration: 111.994844 ms
iteration: 16639 state: no carrier, duration: 111.995998 ms
iteration: 16640 state: no carrier, duration: 111.996868 ms
iteration: 16641 state: no carrier, duration: 111.992766 ms
iteration: 16642 state: no carrier, duration: 111.994449 ms
iteration: 16643 state: no carrier, duration: 111.996805 ms
iteration: 16644 state: no carrier, duration: 111.979432 ms
iteration: 16645 state: no carrier, duration: 111.992068 ms
...

RELEASE-13.0 with removed iflib_admin_intr_deferred(ctx) from
em_if_media_status():
./ifmediatst em0
...
iteration: 7412 state: active, duration: 0.048852 ms
iteration: 7413 state: active, duration: 0.047400 ms
iteration: 7414 state: active, duration: 0.046978 ms
iteration: 7415 state: active, duration: 0.048232 ms
iteration: 7416 state: active, duration: 0.047024 ms
iteration: 7417 state: active, duration: 0.046711 ms
iteration: 7418 state: active, duration: 0.048584 ms
iteration: 7419 state: active, duration: 0.048262 ms
iteration: 7420 state: no carrier, duration: 55.209966 ms
iteration: 7421 state: no carrier, duration: 55.980619 ms
iteration: 7422 state: no carrier, duration: 55.990572 ms
iteration: 7423 state: no carrier, duration: 55.990879 ms
iteration: 7424 state: no carrier, duration: 55.991856 ms
iteration: 7425 state: no carrier, duration: 55.995035 ms
iteration: 7426 state: no carrier, duration: 111.989782 ms
iteration: 7427 state: no carrier, duration: 55.992155 ms
iteration: 7428 state: no carrier, duration: 55.982886 ms
iteration: 7429 state: no carrier, duration: 55.988871 ms
iteration: 7430 state: no carrier, duration: 55.990932 ms
iteration: 7431 state: no carrier, duration: 55.992620 ms
iteration: 7432 state: no carrier, duration: 55.990883 ms
iteration: 7433 state: no carrier, duration: 55.992068 ms
iteration: 7434 state: no carrier, duration: 55.992049 ms
iteration: 7435 state: no carrier, duration: 55.994219 ms
iteration: 7436 state: no carrier, duration: 111.993054 ms
iteration: 7437 state: no carrier, duration: 55.988053 ms
iteration: 7438 state: no carrier, duration: 55.992884 ms
iteration: 7439 state: no carrier, duration: 55.990556 ms
iteration: 7440 state: no carrier, duration: 55.991261 ms
iteration: 7441 state: no carrier, duration: 55.993390 ms
iteration: 7442 state: no carrier, duration: 55.991350 ms
iteration: 7443 state: no carrier, duration: 55.990535 ms
iteration: 7444 state: no carrier, duration: 52.996427 ms
iteration: 7445 state: no carrier, duration: 55.971019 ms
iteration: 7446 state: no carrier, duration: 111.987514 ms
iteration: 7447 state: no carrier, duration: 55.990414 ms
...

As you can see, active state: 0.1 ms -> 0.04 ms, no carrier: 111 ms -> 55 ms.
Periodically, in the absence of a carrier, the value reaches 111 ms. I think
this is the influence of the timer that calls em_if_update_admin_status().

RELEASE-13.0 with driver from net/intel-em-kmod:
./ifmediatst em0
...
iteration: 276057 state: active, duration: 0.000437 ms
iteration: 276058 state: active, duration: 0.000438 ms
iteration: 276059 state: active, duration: 0.000440 ms
iteration: 276060 state: active, duration: 0.000435 ms
iteration: 276061 state: no carrier, duration: 50.117849 ms
iteration: 276062 state: no carrier, duration: 50.113035 ms
iteration: 276063 state: no carrier, duration: 50.109604 ms
iteration: 276064 state: no carrier, duration: 50.110543 ms
iteration: 276065 state: no carrier, duration: 50.109672 ms
iteration: 276066 state: no carrier, duration: 50.109586 ms
iteration: 276067 state: no carrier, duration: 50.109930 ms
iteration: 276068 state: no carrier, duration: 50.113659 ms
iteration: 276069 state: no carrier, duration: 100.288162 ms
iteration: 276070 state: no carrier, duration: 50.109680 ms
iteration: 276071 state: no carrier, duration: 50.110047 ms
iteration: 276072 state: no carrier, duration: 50.110437 ms
...
iteration: 276122 state: no carrier, duration: 50.111246 ms
iteration: 276123 state: no carrier, duration: 50.110519 ms
iteration: 276124 state: no carrier, duration: 50.110134 ms
iteration: 276125 state: no carrier, duration: 100.281411 ms
iteration: 276126 state: no carrier, duration: 50.111783 ms
iteration: 276127 state: no carrier, duration: 50.110226 ms
iteration: 276128 state: no carrier, duration: 50.112179 ms
...
iteration: 276153 state: no carrier, duration: 50.112264 ms
iteration: 276154 state: no carrier, duration: 50.111985 ms
iteration: 276155 state: no carrier, duration: 50.113115 ms
iteration: 276156 state: no carrier, duration: 50.112568 ms
iteration: 276157 state: no carrier, duration: 100.300982 ms
iteration: 276158 state: no carrier, duration: 50.109478 ms
iteration: 276159 state: no carrier, duration: 50.109434 ms
iteration: 276160 state: no carrier, duration: 50.111644 ms
iteration: 276161 state: no carrier, duration: 50.109683 ms
...

A port tree driver is significantly faster in active state. Without a carrier,
it's about the same: 55 vs 50 ms.
Also, the port tree driver has a longer timer period.

Alexey, may you perform similar tests with ifmediatst utility?
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-04 12:02:34 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

Aleksandr Fedorov <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #225543|text/x-log |text/plain
mime type| |

--- Comment #10 from Aleksandr Fedorov <***@FreeBSD.org> ---
Created attachment 225543
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=225543&action=edit
RELEASE-13 stock iflib driver log
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-04 14:22:50 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #11 from Alexey Dokuchaev <***@FreeBSD.org> ---
(In reply to Aleksandr Fedorov from comment #9)
Post by b***@freebsd.org
Alexey, may you perform similar tests with ifmediatst utility?
My results mostly agree with yours, however, they are slightly higher and there
are more outliers:

Active, ports driver
15688292 samples, 0.000440~52.404189, mean 0.000695, stdev 0.035022

Unplugged, ports driver
699 samples, 50.136990~351.355143, mean 67.735449, stdev 39.253407

Active, removed iflib_admin_intr_deferred()
117707 samples, 0.046730~160.303206, mean 0.098550, stdev 1.004555

Unplugged, removed iflib_admin_intr_deferred()
169 samples, 52.865603~497.009039, mean 75.350125, stdev 52.074932

Active, unpatched
40666 samples, 0.046684~185.143686, mean 0.170622, stdev 1.997476

Unplugged, unpatched
61 samples, 54.975691~439.951595, mean 135.355143, stdev 66.179309
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-04 15:01:07 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #12 from Aleksandr Fedorov <***@FreeBSD.org> ---
Yes, the result is similar.

May you run the following DTrace script with plugged and unplugged cable?

dtrace -qn 'fbt::iflib_media_status:entry /execname == "ktorrent"/ {@a=count()}
tick-1s {printa(@a); trunc(@a,0);}'

I'm wondering how many calls a ktorrent makes per second.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2021-06-04 19:53:36 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256375

--- Comment #13 from Alexey Dokuchaev <***@FreeBSD.org> ---
(In reply to Aleksandr Fedorov from comment #12)
Post by b***@freebsd.org
I'm wondering how many calls a ktorrent makes per second.
Unplugged: 661 samples, 1 ~ 11, mean 7.562784, stdev 2.144989

Plugged: 263 samples, 1 ~ 116, mean 11.958175, stdev 28.016662

These are with stock (unpatched) base driver.
--
You are receiving this mail because:
You are the assignee for the bug.
Loading...