[Darshan-users] ioctl(LL_IOC_LOV_GETSTRIPE) crashes

Snyder, Shane ssnyder at mcs.anl.gov
Tue Oct 27 13:04:26 CDT 2020


Replying here for you and other interested Darshan folks, but Lustre has already fixed this issue in recent releases. See this ticket for more details:

https://jira.whamcloud.com/browse/LU-12580

Looks like versions 2.14.0 and 2.12.5 both have applied this bug fix.

Our plan is still to drop the usage of ioctls in our Lustre module to avoid any chance of this issue going forward. We hope to have this done for our upcoming release, but if that doesn't work out, we will make sure to disable build of the Lustre module by default until we can get it re-implemented safely.

--Shane
[LU-12580] usercopy exposure attempt detected in LL_IOC_LOV_GETSTRIPE ioctl - Whamcloud Community JIRA<https://jira.whamcloud.com/browse/LU-12580>
Any `lmm_stripe_count` greater than the actual file's stripe count will trigger the bug. Kernel side the issue appears to be in `lov_getstripe`: with a positive `lum_size`(line 409), `lmm_size` is set as `lum_size`(line 442) even if `lmm_magic != LOV_MAGIC_COMP_V1`(line 414), while instead the structure is just as big as `lmmk_size`:
jira.whamcloud.com

________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Snyder, Shane <ssnyder at mcs.anl.gov>
Sent: Tuesday, October 27, 2020 9:36 AM
To: Ed Karrels <edk at illinois.edu>; darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: Re: [Darshan-users] ioctl(LL_IOC_LOV_GETSTRIPE) crashes

Hi Ed,

Thanks for letting us know you're hitting this problem, too. We've had trouble reproducing this problem in the past as we haven't really had access to systems that users reported seeing the problems on, but I happen to have a Frontera account so I could look more closely there.

Our plan is still to re-implement the Lustre module using newer Lustre API calls (rather than these ioctls which are giving us problems on some system) and to confirm overheads are low enough before making the change. I'll have a look and see if it's something we might be able to include in our next release, which we are planning soon. I'll be sure to keep the list posted on our progress.

Thanks!
--Shane
________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Ed Karrels <edk at illinois.edu>
Sent: Monday, October 26, 2020 10:07 PM
To: darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] ioctl(LL_IOC_LOV_GETSTRIPE) crashes

Hello Darshan team,
I just wanted to add a data point to the ioctl(LL_IOC_LOV_GETSTRIPE) issue (which I see was discussed on this mailing list last October ("Help needed, Darshan keeps crashing in Lustre filesystem"), and in this issue: https://xgitlab.cels.anl.gov/darshan/darshan/-/issues/270.

I was trying to use Darshan 3.2.1 on Frontera at TACC, and my test program kept locking up. After a few failures the system admin let me know my failures were crashing each node I tested on.

This is one of error messages they found:
[187544.836144] Lustre: test_darshan_lo: using old ioctl(LL_IOC_LOV_GETSTRIPE) on [0x200022388:0x1776:0x0], use llapi_layout_get_by_path()[187544.848624] usercopy: kernel memory exposure attempt detected from ffff8d46b11910c0 (kmalloc-64) (48032 bytes)

I'll disable the Lustre module for now and keep an eye out for new releases.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20201027/b60ec2f6/attachment-0001.html>


More information about the Darshan-users mailing list