[Swift-devel] Re: [Fwd: TeraGrid News: UC/ANL GPFS and PVFS file systems unavailable]

Ti Leggett leggett at ci.uchicago.edu
Mon Sep 10 11:10:29 CDT 2007


I don't think this user was related to the swift project.

On Sep 10, 2007, at 10:53 AM, Ioan Raicu wrote:

> Hi,
> Can you provide us more details in terms of which user (at least  
> offline) has been causing this?  Also, can you give us a sample job  
> description that was causing the problems?  For example the command/ 
> arguments or offending process.  Maybe its something we can fix, if  
> its a runaway app.
> Thanks,
> Ioan
>
> Ti Leggett wrote:
>> For last few weeks a particular user has been running nodes out of  
>> memory, causing the kernel to start randomly killing processes.  
>> This leaves the nodes in quite weird states, but in many cases the  
>> resource manager still thinks they're ok to schedule jobs on. The  
>> best thing to do when you find this or any problem on the TG is to  
>> file a ticket with help at teragrid.org giving at the least your  
>> userid, the jobid, and the nodes you were running on and, if you  
>> can determine, the particularly problematic node.
>>
>> On Sep 10, 2007, at 9:58 AM, Michael Wilde wrote:
>>
>>> Ti, just an fyi:
>>>
>>> as you work on this, I'd like to mention that Ioan, Nika and I  
>>> have been plagued for quite a while now (4 weeks or more I think  
>>> for Ioan and Nika) by an *occasional* node that seems to return  
>>> "Stale NFS file handle" for access for scratchgpfs1 files.
>>>
>>> We've been working around this, but I wonder if something that  
>>> occasionally knocks out a few nodes' access to gpfs has now  
>>> happened en-masse?
>>>
>>> All: Moving forward, everyone who encounters this (or any other  
>>> system problems) should file a trouble ticket right away so the  
>>> bad nodes can be fixed.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>>
>>> -------- Original Message --------
>>> Subject: TeraGrid News: UC/ANL GPFS and PVFS file systems  
>>> unavailable
>>> Date: Mon, 10 Sep 2007 06:48:21 -0700 (PDT)
>>> From: news at teragrid.org
>>>
>>>
>>> UC/ANL GPFS and PVFS file systems unavailable
>>>
>>> Systems: UC/ANL
>>> Posted on Sep 10 2007, 13:46:33 (GMT/UTC) by Ti Leggett
>>>
>>> The GPFS (local and WAN) and PVFS scrach file systems are  
>>> currently unavailable. We are working to re-establish them.
>>>
>>> _______________________________________________________________
>>> This message can also be found at http://news.teragrid.org/ 
>>> announcements/20070910_01.php.  To unsubscribe or change the  
>>> categories to which you are subscribed, go to http:// 
>>> news.teragrid.org/user.php#manage.
>>>
>>>
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>
> -- 
> ============================================
> Ioan Raicu
> Ph.D. Student
> ============================================
> Distributed Systems Laboratory
> Computer Science Department
> University of Chicago
> 1100 E. 58th Street, Ryerson Hall
> Chicago, IL 60637
> ============================================
> Email: iraicu at cs.uchicago.edu
> Web:   http://www.cs.uchicago.edu/~iraicu
>       http://dsl.cs.uchicago.edu/
> ============================================
> ============================================
>




More information about the Swift-devel mailing list