[Swift-user] Swift is stuck with 5K jobs

Andriy Fedorov fedorov at bwh.harvard.edu
Mon Mar 14 13:30:47 CDT 2011


Michael,

This is a very good observation.

The problem is one has to know approximately how long the total run of
the swift script will take, which includes the time to wait in the
queue for the computing resources. I do not know how such estimations
can be reliably obtained.

IMHO, submission from the head node is ok, since it occupies only one
CPU. However, I believe processes that are running on the head node
for more than 30 minutes are terminated automatically, so submission
from the head node may not work for all cases.

Any other ideas?

--
Andriy Fedorov, Ph.D.

Research Fellow
Brigham and Women's Hospital
Harvard Medical School
75 Francis Street
Boston, MA 02115 USA
fedorov at bwh.harvard.edu
(617) 525-6258 (office)



On Mon, Mar 14, 2011 at 13:45, Michael Wilde <wilde at mcs.anl.gov> wrote:
> Andriy, All,
>
> On systems like TeraGrid hosts where the login hosts are frequently heavily loaded, we should verify that you can obtain a single interactive compute node via qsub -I on which to run the swift command (ideally under screen to make re-attachment easy) and that from there Swift can run jobs using the Coaster-over-PBS provider configuration.
>
> I suspect (and hope) that any cluster node on say abe, queenbee, and ranger can also run qsub and qstat.  We should test and document that, but in the meantime, Andriy, can you try that approach? I *think* that it should be identical to running from a login host.
>
> What I want to avoid is causing too heavy a load on any login host and in the process getting Swift banned or having it associated with causing system problems.
>
> Thanks and regards,
>
> - Mike
>
>
> ----- Original Message -----
>> On Mon, 2011-03-14 at 11:06 -0400, Andriy Fedorov wrote:
>> > Am I hitting some limit? Is 5K jobs too much?
>>
>> Shouldn't be, but if you have the coaster service running in local
>> mode,
>> that might do the trick.
>>
>> >
>> > How do I terminate swift now not to waste cycles of the head node?
>>
>> kill -9 <pidOfJavaProcess>
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>



More information about the Swift-user mailing list