[Swift-devel] bug 53
Ian Foster
itf at mcs.anl.gov
Tue Sep 18 13:40:29 CDT 2007
We should never be losing time because of lack of suitable hardware.
Ian
Sent via BlackBerry from T-Mobile
-----Original Message-----
From: Ioan Raicu <iraicu at cs.uchicago.edu>
Date: Tue, 18 Sep 2007 13:27:34
To:itf at mcs.anl.gov
Cc:Mike Wilde <wilde at mcs.anl.gov>, swift-devel at ci.uchicago.edu
Subject: Re: [Swift-devel] bug 53
IMO, the biggest hurdle for large workflows will be memory (I recommend 2GB+), and if the jobs are short that end up pushing hundreds of jobs/sec for prolonged periods of time to Falkon, having multiple processors might also be important.
Ioan
Ian Foster wrote: It seems ridiculous to me that we are still using a student-supported machine to run major applications. Surely we should have one highly capable, well-maintained machine for this? And this shouldn't be a "suggestion" but a clear policy. Sent via BlackBerry from T-Mobile -----Original Message----- From: Ioan Raicu <iraicu at cs.uchicago.edu> <mailto:iraicu at cs.uchicago.edu> Date: Tue, 18 Sep 2007 13:13:08 To:Michael Wilde <wilde at mcs.anl.gov> <mailto:wilde at mcs.anl.gov> Cc:swift-devel at ci.uchicago.edu <mailto:Cc:swift-devel at ci.uchicago.edu> Subject: Re: [Swift-devel] bug 53 Michael Wilde wrote: Its not clear when this happened, as Nika and Ioan's workflow submission from viper has afaik been mostly through Falkon for quite a while now. Nika, perhaps you can shift back to trying the two Falkon approaches (with higher prio on testing Ioan's retry code) in the meantime. Ioan, is CI Support / Ti supporting viper, or are you the "sysadmin" Ben is referring to? Yes, I am viper's support. viper is my department office machine. Ive also suggested in the past that we focus on using evitable and terminable (and swift03/04) as our main submit hosts, primarily for support and coordination reasons. Is this a good time to try the GRAM/non-Falkon workfow there? Sure, but watch out for the large MolDyn runs as 1GB or less of memory is not enough for 244 mol runs. Ioan - Mike Ben Clifford wrote: sounds like viper had firewall configuration changed recently. viper sysadmin needs to help debug basic job submission with simple globus tools before that machine is worth using again. On Tue, 18 Sep 2007, Michael Wilde wrote: does the cog equivalent of globus_tcp_source_range also need to be set? is that only for gridftp, or gram as well? or could this be a gridftp hang? - mike Ben Clifford wrote: can you submit a job using globus-job-run? On Tue, 18 Sep 2007, Veronika Nefedova wrote: I set tcp.port.range in swift properties but even a simple helloworld workflow hangs (the submit host doesn't receive the notification from the compute host that the job has finished). tcp.port.range=50000,60000 Not sure what else has changed on viper? It used to be a very good submit host, I never had any problems with it ); Nika On Sep 18, 2007, at 9:13 AM, Mihael Hategan wrote: Should pick that one. If not ~/.globus/cog.properties -> tcp.port.range=begin,end On Tue, 2007-09-18 at 07:42 +0000, Ben Clifford wrote: Not sure if cog picks up the GLOBUS_whatever environment variables. Mihael presumably knows. On Mon, 17 Sep 2007, Ioan Raicu wrote: There is a firewall on viper. Ports 50000 - 60000 are open for TCP. You might want to set the TCP_PORT_RANGE (I am not sure this is the exact environment variable, but something like that) to be between 50K and 60K ports to ensure that GT4 uses one of these open ports. Ioan Veronika Nefedova wrote: The same. You can check the job's status in its log on viper in ~nefedova/alamines/MolDyn-244-loops-20070917-1356-h95gxij8.log. The job is still runnning (i.e. hanging) with the same symptom as before: the first jobs is done and then nothing else gets submitted (the submit host doesn't receive any notification that the job has finished). NIka On Sep 17, 2007, at 9:51 AM, Mihael Hategan wrote: On Mon, 2007-09-17 at 09:41 -0500, Veronika Nefedova wrote: I did 'svn up' in cog directory and then did 'ant dist' in the same directory. 'ant dist' should be done in the swift directory. My 'svn info' gives me r1740. On Sep 17, 2007, at 8:55 AM, Mihael Hategan wrote: Did you update cog? On Mon, 2007-09-17 at 08:38 -0500, Veronika Nefedova wrote: No, I've tried with r1740, it still hanged (timed out). the log is on viper:/home/nefedova/alamines/MolDyn-244- loops-20070914-1834-pvhyji75.log NIka On Sep 15, 2007, at 10:59 AM, Mihael Hategan wrote: On Sat, 2007-09-15 at 09:06 +0000, Ben Clifford wrote: On Fri, 14 Sep 2007, Mihael Hategan wrote: On Thu, 2007-09-13 at 16:41 -0500, Mihael Hategan wrote: Ok, so there's something in. That something was throttling a bit too much (not just jobs, but all tasks on that site). I need to take a second look at it. Is that fixed by cog r1740? It looks like that commit is intended to. It's an attempt to fix it, but it needs to be confirmed by Nika. _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel <http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel> _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel <http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel> _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel <http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel> _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel <http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel> _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel <http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel> _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu <mailto:Swift-devel at ci.uchicago.edu> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel <http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel>
-- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu <mailto:iraicu at cs.uchicago.edu> Web: http://www.cs.uchicago.edu/~iraicu <http://www.cs.uchicago.edu/~iraicu> http://dsl.cs.uchicago.edu/ <http://dsl.cs.uchicago.edu/> ============================================ ============================================
More information about the Swift-devel
mailing list