[Swift-devel] Re: [Fwd: TeraGrid News: UC/ANL GPFS and PVFS file systems unavailable]
Ti Leggett
leggett at ci.uchicago.edu
Mon Sep 10 11:10:29 CDT 2007
I don't think this user was related to the swift project.
On Sep 10, 2007, at 10:53 AM, Ioan Raicu wrote:
> Hi,
> Can you provide us more details in terms of which user (at least
> offline) has been causing this? Also, can you give us a sample job
> description that was causing the problems? For example the command/
> arguments or offending process. Maybe its something we can fix, if
> its a runaway app.
> Thanks,
> Ioan
>
> Ti Leggett wrote:
>> For last few weeks a particular user has been running nodes out of
>> memory, causing the kernel to start randomly killing processes.
>> This leaves the nodes in quite weird states, but in many cases the
>> resource manager still thinks they're ok to schedule jobs on. The
>> best thing to do when you find this or any problem on the TG is to
>> file a ticket with help at teragrid.org giving at the least your
>> userid, the jobid, and the nodes you were running on and, if you
>> can determine, the particularly problematic node.
>>
>> On Sep 10, 2007, at 9:58 AM, Michael Wilde wrote:
>>
>>> Ti, just an fyi:
>>>
>>> as you work on this, I'd like to mention that Ioan, Nika and I
>>> have been plagued for quite a while now (4 weeks or more I think
>>> for Ioan and Nika) by an *occasional* node that seems to return
>>> "Stale NFS file handle" for access for scratchgpfs1 files.
>>>
>>> We've been working around this, but I wonder if something that
>>> occasionally knocks out a few nodes' access to gpfs has now
>>> happened en-masse?
>>>
>>> All: Moving forward, everyone who encounters this (or any other
>>> system problems) should file a trouble ticket right away so the
>>> bad nodes can be fixed.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>>
>>> -------- Original Message --------
>>> Subject: TeraGrid News: UC/ANL GPFS and PVFS file systems
>>> unavailable
>>> Date: Mon, 10 Sep 2007 06:48:21 -0700 (PDT)
>>> From: news at teragrid.org
>>>
>>>
>>> UC/ANL GPFS and PVFS file systems unavailable
>>>
>>> Systems: UC/ANL
>>> Posted on Sep 10 2007, 13:46:33 (GMT/UTC) by Ti Leggett
>>>
>>> The GPFS (local and WAN) and PVFS scrach file systems are
>>> currently unavailable. We are working to re-establish them.
>>>
>>> _______________________________________________________________
>>> This message can also be found at http://news.teragrid.org/
>>> announcements/20070910_01.php. To unsubscribe or change the
>>> categories to which you are subscribed, go to http://
>>> news.teragrid.org/user.php#manage.
>>>
>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
> --
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web: http://www.cs.uchicago.edu/~iraicu
> http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>
More information about the Swift-devel
mailing list