[Swift-devel] Re: [Fwd: TeraGrid News: UC/ANL GPFS and PVFS file systems unavailable]

Ioan Raicu iraicu at cs.uchicago.edu
Mon Sep 10 10:53:40 CDT 2007


Hi,
Can you provide us more details in terms of which user (at least 
offline) has been causing this?  Also, can you give us a sample job 
description that was causing the problems?  For example the 
command/arguments or offending process.  Maybe its something we can fix, 
if its a runaway app.
Thanks,
Ioan

Ti Leggett wrote:
> For last few weeks a particular user has been running nodes out of 
> memory, causing the kernel to start randomly killing processes. This 
> leaves the nodes in quite weird states, but in many cases the resource 
> manager still thinks they're ok to schedule jobs on. The best thing to 
> do when you find this or any problem on the TG is to file a ticket 
> with help at teragrid.org giving at the least your userid, the jobid, and 
> the nodes you were running on and, if you can determine, the 
> particularly problematic node.
>
> On Sep 10, 2007, at 9:58 AM, Michael Wilde wrote:
>
>> Ti, just an fyi:
>>
>> as you work on this, I'd like to mention that Ioan, Nika and I have 
>> been plagued for quite a while now (4 weeks or more I think for Ioan 
>> and Nika) by an *occasional* node that seems to return "Stale NFS 
>> file handle" for access for scratchgpfs1 files.
>>
>> We've been working around this, but I wonder if something that 
>> occasionally knocks out a few nodes' access to gpfs has now happened 
>> en-masse?
>>
>> All: Moving forward, everyone who encounters this (or any other 
>> system problems) should file a trouble ticket right away so the bad 
>> nodes can be fixed.
>>
>> Thanks,
>>
>> Mike
>>
>>
>> -------- Original Message --------
>> Subject: TeraGrid News: UC/ANL GPFS and PVFS file systems unavailable
>> Date: Mon, 10 Sep 2007 06:48:21 -0700 (PDT)
>> From: news at teragrid.org
>>
>>
>> UC/ANL GPFS and PVFS file systems unavailable
>>
>> Systems: UC/ANL
>> Posted on Sep 10 2007, 13:46:33 (GMT/UTC) by Ti Leggett
>>
>> The GPFS (local and WAN) and PVFS scrach file systems are currently 
>> unavailable. We are working to re-establish them.
>>
>> _______________________________________________________________
>> This message can also be found at 
>> http://news.teragrid.org/announcements/20070910_01.php.  To 
>> unsubscribe or change the categories to which you are subscribed, go 
>> to http://news.teragrid.org/user.php#manage.
>>
>>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================




More information about the Swift-devel mailing list