From aespinosa at cs.uchicago.edu Tue Nov 3 16:56:26 2009
From: aespinosa at cs.uchicago.edu (Allan Espinosa)
Date: Tue, 3 Nov 2009 16:56:26 -0600
Subject: [Swift-user] cobalt can't find wrapperlogs
Message-ID: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
Hi,
I'm using a cobalt-only sites.xml to launch MPI jobs from the
BlueGene. But when i inspected the workdir, no job directories were
created
swift session:
Swift svn swift-r3186 cog-r2577
RunID: run0
Progress:
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
1Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Failed to transfer wrapper log from mpitest-run0/info/p on INTREPID
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Failed to transfer wrapper log from mpitest-run0/info/r on INTREPID
Progress: Submitted:1
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Failed to transfer wrapper log from mpitest-run0/info/t on INTREPID
Execution failed:
Exception in hello:
Arguments: []
Host: INTREPID
Directory: mpitest-run0/jobs/t/hello-tlf9cyij
stderr.txt:
stdout.txt:
----
Caused by:
No status file was found. Check the shared filesystem on INTREPID
listing of workdir:
intrepid-fs0/users/espinosa/scratch/mpi_runs/mpitest-run0> find .
.
./shared
./shared/_swiftwrap
./shared/_swiftseq
./kickstart
./status
./info
./200173.cobaltlog
./200174.cobaltlog
./200175.cobaltlog
./200176.cobaltlog
./200177.cobaltlog
sites.xml:
64
HTCScienceApps
20
vn
prod-devel
/intrepid-fs0/users/espinosa/scratch/mpi_runs
Where do you think these jobdirs were created? I have also attached
the swift log in this email.
-Allan
--
Allan M. Espinosa
PhD student, Computer Science
University of Chicago
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpitest-run0.log
Type: text/x-log
Size: 45705 bytes
Desc: not available
URL:
From hategan at mcs.anl.gov Tue Nov 3 16:58:18 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 03 Nov 2009 16:58:18 -0600
Subject: [Swift-user] cobalt can't find wrapperlogs
In-Reply-To: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
Message-ID: <1257289098.18763.0.camel@localhost>
Make sure you set GLOBUS_HOSTNAME to the IP of eth0 before running.
On Tue, 2009-11-03 at 16:56 -0600, Allan Espinosa wrote:
> Hi,
>
> I'm using a cobalt-only sites.xml to launch MPI jobs from the
> BlueGene. But when i inspected the workdir, no job directories were
> created
>
> swift session:
> Swift svn swift-r3186 cog-r2577
>
> RunID: run0
> Progress:
> Progress: Submitted:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> 1Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Checking status:1
> Failed to transfer wrapper log from mpitest-run0/info/p on INTREPID
> Progress: Submitted:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Checking status:1
> Failed to transfer wrapper log from mpitest-run0/info/r on INTREPID
> Progress: Submitted:1
> Progress: Submitted:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Active:1
> Progress: Checking status:1
> Failed to transfer wrapper log from mpitest-run0/info/t on INTREPID
> Execution failed:
> Exception in hello:
> Arguments: []
> Host: INTREPID
> Directory: mpitest-run0/jobs/t/hello-tlf9cyij
> stderr.txt:
>
> stdout.txt:
>
> ----
>
> Caused by:
> No status file was found. Check the shared filesystem on INTREPID
>
> listing of workdir:
> intrepid-fs0/users/espinosa/scratch/mpi_runs/mpitest-run0> find .
> .
> ./shared
> ./shared/_swiftwrap
> ./shared/_swiftseq
> ./kickstart
> ./status
> ./info
> ./200173.cobaltlog
> ./200174.cobaltlog
> ./200175.cobaltlog
> ./200176.cobaltlog
> ./200177.cobaltlog
>
>
> sites.xml:
>
>
>
>
> 64
> HTCScienceApps
> 20
> vn
> prod-devel
> /intrepid-fs0/users/espinosa/scratch/mpi_runs
>
>
>
>
> Where do you think these jobdirs were created? I have also attached
> the swift log in this email.
>
> -Allan
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
From aespinosa at cs.uchicago.edu Tue Nov 3 17:23:52 2009
From: aespinosa at cs.uchicago.edu (Allan Espinosa)
Date: Tue, 3 Nov 2009 17:23:52 -0600
Subject: [Swift-user] cobalt can't find wrapperlogs
In-Reply-To: <1257289098.18763.0.camel@localhost>
References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
<1257289098.18763.0.camel@localhost>
Message-ID: <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com>
I did.
$ ifconfig eth0; echo $GLOBUS_HOSTNAME; ./demompi.sh
eth0 Link encap:Ethernet HWaddr 00:14:5E:9C:0D:82
inet addr:172.17.5.144 Bcast:172.31.255.255 Mask:255.240.0.0
inet6 addr: fe80::214:5eff:fe9c:d82/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:88471902 errors:0 dropped:54 overruns:0 frame:222
TX packets:84299690 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:67184589504 (64072.2 Mb) TX bytes:69406208974 (66190.9 Mb)
Interrupt:33
172.17.5.144
Swift svn swift-r3186 cog-r2577
RunID: run0
Progress:
Progress: Stage in:1
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Active:1
Progress: Checking status:1
Failed to transfer wrapper log from mpitest-run0/info/a on INTREPID
Progress: Submitted:1
Progress: Active:1
Progress: Active:1
...
...
I also set it using the "env" namespace:
172.17.5.144
Yet it doesn't seem to be reflected in the cobalt logs:
workdir$grep GLOBUS *.cobaltlog
$
thanks,
-Allan
2009/11/3 Mihael Hategan :
> Make sure you set GLOBUS_HOSTNAME to the IP of eth0 before running.
>
> On Tue, 2009-11-03 at 16:56 -0600, Allan Espinosa wrote:
>> Hi,
>>
>> I'm using a cobalt-only sites.xml to launch MPI jobs from the
>> BlueGene. ?But when i inspected the workdir, no job directories were
>> created
>>
>> swift session:
>> Swift svn swift-r3186 cog-r2577
>>
>> RunID: run0
>> Progress:
>> Progress: ?Submitted:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> 1Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Checking status:1
>> Failed to transfer wrapper log from mpitest-run0/info/p on INTREPID
>> Progress: ?Submitted:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Checking status:1
>> Failed to transfer wrapper log from mpitest-run0/info/r on INTREPID
>> Progress: ?Submitted:1
>> Progress: ?Submitted:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Active:1
>> Progress: ?Checking status:1
>> Failed to transfer wrapper log from mpitest-run0/info/t on INTREPID
>> Execution failed:
>> ? ? ? ? Exception in hello:
>> Arguments: []
>> Host: INTREPID
>> Directory: mpitest-run0/jobs/t/hello-tlf9cyij
>> stderr.txt:
>>
>> stdout.txt:
>>
>> ----
>>
>> Caused by:
>> ? ? ? ? No status file was found. Check the shared filesystem on INTREPID
>>
>> listing of workdir:
>> intrepid-fs0/users/espinosa/scratch/mpi_runs/mpitest-run0> find .
>> .
>> ./shared
>> ./shared/_swiftwrap
>> ./shared/_swiftseq
>> ./kickstart
>> ./status
>> ./info
>> ./200173.cobaltlog
>> ./200174.cobaltlog
>> ./200175.cobaltlog
>> ./200176.cobaltlog
>> ./200177.cobaltlog
>>
>>
>> sites.xml:
>>
>>
>> ? ?
>> ? ?
>> ? ?64
>> ? ?HTCScienceApps
>> ? ?20
>> ? ?vn
>> ? ?prod-devel
>> ? ?/intrepid-fs0/users/espinosa/scratch/mpi_runs
>>
>>
>>
>>
>> Where do you think these jobdirs were created? ?I have also attached
>> the swift log in this email.
>>
>> -Allan
From hategan at mcs.anl.gov Tue Nov 3 17:31:29 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 03 Nov 2009 17:31:29 -0600
Subject: [Swift-user] cobalt can't find wrapperlogs
In-Reply-To: <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com>
References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
<1257289098.18763.0.camel@localhost>
<50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com>
Message-ID: <1257291089.19332.3.camel@localhost>
Oh, no coasters. I see.
I don't think the swift wrapper will work on CNK.
From aespinosa at cs.uchicago.edu Tue Nov 3 17:34:27 2009
From: aespinosa at cs.uchicago.edu (Allan Espinosa)
Date: Tue, 3 Nov 2009 17:34:27 -0600
Subject: [Swift-user] cobalt can't find wrapperlogs
In-Reply-To: <1257291089.19332.3.camel@localhost>
References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
<1257289098.18763.0.camel@localhost>
<50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com>
<1257291089.19332.3.camel@localhost>
Message-ID: <50b07b4b0911031534yd0e88f1o979577e1d15dbee1@mail.gmail.com>
ahh right.
the way of running mpi jobs on Intrepid is different from what's the
recommended procedure in the swift webpage. on the bluegene the
actual "mpirun ./a.out" is equivalent to "cqsub ./a.out" itself.
2009/11/3 Mihael Hategan :
> Oh, no coasters. I see.
>
> I don't think the swift wrapper will work on CNK.
>
>
>
>
--
Allan M. Espinosa
PhD student, Computer Science
University of Chicago
From hategan at mcs.anl.gov Tue Nov 3 17:41:49 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 03 Nov 2009 17:41:49 -0600
Subject: [Swift-user] cobalt can't find wrapperlogs
In-Reply-To: <50b07b4b0911031534yd0e88f1o979577e1d15dbee1@mail.gmail.com>
References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com>
<1257289098.18763.0.camel@localhost>
<50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com>
<1257291089.19332.3.camel@localhost>
<50b07b4b0911031534yd0e88f1o979577e1d15dbee1@mail.gmail.com>
Message-ID: <1257291709.19637.2.camel@localhost>
On Tue, 2009-11-03 at 17:34 -0600, Allan Espinosa wrote:
> ahh right.
>
> the way of running mpi jobs on Intrepid is different from what's the
> recommended procedure in the swift webpage. on the bluegene the
> actual "mpirun ./a.out" is equivalent to "cqsub ./a.out" itself.
I know.
But you can run cnk executables on zeptoos.
And I assume that since the swift wrapper is started by mpirun, the mpi
environment will be set-up, so it should apply ok to the executable that
the wrapper forks.
So I'd say just change to zeptoos and try again. I suspect it might
work.
>
> 2009/11/3 Mihael Hategan :
> > Oh, no coasters. I see.
> >
> > I don't think the swift wrapper will work on CNK.
> >
> >
> >
> >
>
>
>
From wilde at mcs.anl.gov Wed Nov 4 12:54:13 2009
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Wed, 04 Nov 2009 12:54:13 -0600
Subject: [Swift-user] How to use the swift command "text mode monitor" -tui ?
Message-ID: <4AF1CDD5.9070007@mcs.anl.gov>
How does one switch between the various tabs on the Swift command's
"-tui" text mode monitor?
Im on a Macbook, and tried number keys, alt-number, function keys, etc,
but nothing seems to be recognized.
The only keys it seems to respond to is hitting enter in response to
"OK" upon script completion, which causes swift and the tui to exit.
From hategan at mcs.anl.gov Wed Nov 4 13:02:15 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Wed, 04 Nov 2009 13:02:15 -0600
Subject: [Swift-user] How to use the swift command "text mode monitor"
-tui ?
In-Reply-To: <4AF1CDD5.9070007@mcs.anl.gov>
References: <4AF1CDD5.9070007@mcs.anl.gov>
Message-ID: <1257361335.6036.4.camel@localhost>
On Wed, 2009-11-04 at 12:54 -0600, Michael Wilde wrote:
> How does one switch between the various tabs on the Swift command's
> "-tui" text mode monitor?
>
> Im on a Macbook, and tried number keys, alt-number, function keys, etc,
> but nothing seems to be recognized.
The function keys should work on standard terminals (including the OS X
terminal).
There may be certain configurations for which things don't work (I heard
reports of GNU screen interfering with things on OS X).
So it would be helpful if you could mention exactly what kind of
configuration you're using.
From aespinosa at cs.uchicago.edu Wed Nov 4 20:12:54 2009
From: aespinosa at cs.uchicago.edu (Allan Espinosa)
Date: Wed, 4 Nov 2009 20:12:54 -0600
Subject: [Swift-user] hack to run mpi jobs on bluegene/p
Message-ID: <50b07b4b0911041812k768dde9and6b1508a149487f7@mail.gmail.com>
I made some hackish wrapper scripts to the app you want to run:
cat hello_wrapper.sh
#!/bin/bash
echo "hello world"
jobid=`qsub -t 20 -q prod-devel -n 64 --mode vn -o stdout.file \
/home/espinosa/experiments/mpitest/hello`
getstatus(){
qstat | grep $jobid | awk '{ print $5}'
}
echo $jobid
stat=`getstatus`
while [ $stat != "exiting" ]; do
stat=`getstatus`
sleep 1
done
sample workflow:
> cat mpitest.swift
type file;
app (file out) hello() {
hello;
}
file output<"stdout.file">;
output = hello();
obviously we can't do stderr=@filename(x) and stuff, but we can still
get progress and restartability features of the output files we are
expecting. we would also need separate commandline processing for
"wrapper" args to real program args.
enjoy! :)
-Allan
--
Allan M. Espinosa
PhD student, Computer Science
University of Chicago
From iraicu at cs.uchicago.edu Thu Nov 5 12:57:23 2009
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Thu, 05 Nov 2009 12:57:23 -0600
Subject: [Swift-user] hack to run mpi jobs on bluegene/p
In-Reply-To: <50b07b4b0911041812k768dde9and6b1508a149487f7@mail.gmail.com>
References: <50b07b4b0911041812k768dde9and6b1508a149487f7@mail.gmail.com>
Message-ID: <4AF32013.6010205@cs.uchicago.edu>
Hi Allan,
I don't know if I understand your statement correctly. Are you saying
that you got Swift to be able to run MPI jobs on the BG/P? If yes, with
what provider? Coaster? Falkon? I don't think it was Falkon, as Falkon
doesn't have support for allocating jobs with N-processors. Does Coaster
support this kind of allocation, of multiple processors at the same
time? If yes, I'd like to hear more about this, and how you got MPI to
run on the BG/P through Swift.
Thanks,
Ioan
Allan Espinosa wrote:
> I made some hackish wrapper scripts to the app you want to run:
>
> cat hello_wrapper.sh
> #!/bin/bash
> echo "hello world"
> jobid=`qsub -t 20 -q prod-devel -n 64 --mode vn -o stdout.file \
> /home/espinosa/experiments/mpitest/hello`
>
> getstatus(){
> qstat | grep $jobid | awk '{ print $5}'
> }
>
> echo $jobid
> stat=`getstatus`
> while [ $stat != "exiting" ]; do
> stat=`getstatus`
> sleep 1
> done
>
>
> sample workflow:
>
>> cat mpitest.swift
>>
> type file;
>
> app (file out) hello() {
> hello;
> }
>
> file output<"stdout.file">;
>
> output = hello();
>
>
> obviously we can't do stderr=@filename(x) and stuff, but we can still
> get progress and restartability features of the output files we are
> expecting. we would also need separate commandline processing for
> "wrapper" args to real program args.
>
> enjoy! :)
> -Allan
>
>
--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384
Evanston, IL 60208-3118
=================================================================
Cel: 1-847-722-0876
Tel: 1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web: http://www.eecs.northwestern.edu/~iraicu/
https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From fangfang at uchicago.edu Thu Nov 12 18:12:00 2009
From: fangfang at uchicago.edu (Fangfang Xia)
Date: Thu, 12 Nov 2009 16:12:00 -0800
Subject: [Swift-user] RAxML error msgs
Message-ID: <4bc7a37b0911121612s3a18502bv18bd3211f37b3a14@mail.gmail.com>
Hi,
I am trying to run RAxML with Swift on Surveyor and am getting the
following error mesages. Could you help me with this? Thanks.
directory:
~fangfang/work/jgi/phylo/test.raxml/
command line:
swift -tc.file tc.data -sites.file sites.xml raxmlex1.swift
Failed to transfer wrapper log from
raxmlex1-20091112-1726-ct8vv8tf/info/m on surveyor
Progress: Submitted:7
Failed to connect: Connection timed out at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
129.
Failed to connect: Connection timed out at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
129.
Failed to connect: Connection timed out at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
129.
Progress: Submitted:6 Active:1
Failed to transfer wrapper log from
raxmlex1-20091112-1726-ct8vv8tf/info/p on surveyor
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Progress: Submitted:7
Worker task failed: 1112-260539-000002Block task ended prematurely
Statement unlikely to be reached at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
592.
(Maybe you meant system() when you said exec()?)
Statement unlikely to be reached at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
592.
(Maybe you meant system() when you said exec()?)
Statement unlikely to be reached at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
592.
(Maybe you meant system() when you said exec()?)
Statement unlikely to be reached at
/home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
592.
...
From hategan at mcs.anl.gov Thu Nov 12 20:05:56 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 12 Nov 2009 20:05:56 -0600
Subject: [Swift-user] RAxML error msgs
In-Reply-To: <4bc7a37b0911121612s3a18502bv18bd3211f37b3a14@mail.gmail.com>
References: <4bc7a37b0911121612s3a18502bv18bd3211f37b3a14@mail.gmail.com>
Message-ID: <1258077956.29015.1.camel@localhost>
Make sure you set GLOBUS_HOSTNAME=172.17.3.16 before starting swift and
that you have true in sites.xml.
Mihael
On Thu, 2009-11-12 at 16:12 -0800, Fangfang Xia wrote:
> Hi,
>
> I am trying to run RAxML with Swift on Surveyor and am getting the
> following error mesages. Could you help me with this? Thanks.
>
> directory:
> ~fangfang/work/jgi/phylo/test.raxml/
>
> command line:
> swift -tc.file tc.data -sites.file sites.xml raxmlex1.swift
>
>
> Failed to transfer wrapper log from
> raxmlex1-20091112-1726-ct8vv8tf/info/m on surveyor
> Progress: Submitted:7
>
> Failed to connect: Connection timed out at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 129.
> Failed to connect: Connection timed out at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 129.
> Failed to connect: Connection timed out at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 129.
>
> Progress: Submitted:6 Active:1
> Failed to transfer wrapper log from
> raxmlex1-20091112-1726-ct8vv8tf/info/p on surveyor
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Progress: Submitted:7
> Worker task failed: 1112-260539-000002Block task ended prematurely
>
> Statement unlikely to be reached at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 592.
> (Maybe you meant system() when you said exec()?)
> Statement unlikely to be reached at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 592.
> (Maybe you meant system() when you said exec()?)
> Statement unlikely to be reached at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 592.
> (Maybe you meant system() when you said exec()?)
> Statement unlikely to be reached at
> /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line
> 592.
>
> ...
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
From yecartes at gmail.com Mon Nov 16 02:35:58 2009
From: yecartes at gmail.com (Allan Espinosa)
Date: Mon, 16 Nov 2009 02:35:58 -0600
Subject: [Swift-user] Re: Copy of parallel scripting paper
In-Reply-To: <1258360189.11500.4.camel@localhost>
References: <1258073372.2595.7.camel@localhost>
<50b07b4b0911121658j44867c43p131a7c330362099a@mail.gmail.com>
<1258074413.2595.14.camel@localhost>
<50b07b4b0911121708m481d9ccfx8030e01f4a52304f@mail.gmail.com>
<1258078206.2595.15.camel@localhost>
<50b07b4b0911121811n5ad685c5vb75be232c1231d0b@mail.gmail.com>
<1258343957.8340.4.camel@localhost>
<50b07b4b0911160014n6ba89dd1ia990925ff46fd1d8@mail.gmail.com>
<1258359764.11500.3.camel@localhost>
<1258360189.11500.4.camel@localhost>
Message-ID: <50b07b4b0911160035m46695fbbx3bfc451afb9d9c54@mail.gmail.com>
Hi william,
Yeah. typically i place a tc.data in my current directory and then
specify it with the "-tc.file ./tc.data" option.
I guess we can run on PBS Torque. Here's a sample sites.xml config:
/home/aespinosa
2.02
1.98
fast
01:00:00
My initialScore and jobThrottle score enables me to submit a maximum
of 200 jobs at a time.
Btw, I'm cc'ing you to the swift-user mailing list. Hope you enjoy
tinkering around Swift!
-Allan
2009/11/16 William Emmanuel S. Yu :
>
> I found the tc.data file. geez... its in a strange place! The etc folder
> in the distribution directory.
>
> On Mon, 2009-11-16 at 16:22 +0800, William Emmanuel Yu wrote:
>> On Mon, 2009-11-16 at 02:14 -0600, Allan Espinosa wrote:
>> > I guess that works. ?But if you have ?a scheduler, you should use it.
>> > we have support for pbs, cobalt and condor. ?any unsupported scheduler
>> > can be interfaced with Globus GRAM2 or GRAM4. ?From my perspective,
>> > the cluster sysads should make it easy for its users to submit jobs :)
>> >
>> Ok right now. That systems uses Torque and is really maintained at this
>> point. So it was a real challenge using it. But, I managed to write a
>> script with bash and mpirun that got the job done. However, I did
>> encounter a lot of dependency issues. Haha. As expected.
>>
>> Do you have a document on torque integration? At least a quick one.
>>
>> Btw, I am using the default swift binary on my laptop with no installed
>> scheduler. I ran into the following error:
>>
>> "Could not find any valid host for task "Task(type=UNKNOWN,
>> identity=urn:cog-1258359742934)" with constraints {tr=convert,
>> filenames=[Ljava.lang.String;@2de41d, trfqn=convert,
>> filecache=org.griphyn.vdl.karajan.lib.cache.CacheMapAdapter at db4bcf}"
>>
>> Which is strange as I am not using any scheduler.
>>
>> Thanks!
>>
>>
>>
>> > 2009/11/15 William Emmanuel Yu :
>> > >
>> > > After reading the article, I think I have a better appreciate of the
>> > > problems you are trying to solve. Let me try to read about SWIFT more.
>> > >
>> > > What is the easier way to run SWIFT on a small cluster (say 64 cores on
>> > > 16 nodes)? Can I just do an NFS and passwordless-SSH thing? I think I
>> > > will try to help a buddy of mine with his thesis on aquatic migrations
>> > > in UP as practice.
>> > >
>> > > Thanks.
>> > >
>> > > On Thu, 2009-11-12 at 20:11 -0600, Allan Espinosa wrote:
>> > >> you can try the swift tutorial designed for localhost:
>> > >>
>> > >> http://www.ci.uchicago.edu/swift/guides/tutorial.php
>> > >>
>> > >> 2009/11/12 William Emmanuel Yu :
>> > >> > let me review first if this is easier to teach ... but, if you have a
>> > >> > cookbook for a quick local install then that would also be cool.
>> > >> >
>> > >> > On Thu, 2009-11-12 at 19:08 -0600, Allan Espinosa wrote:
>> > >> >> its user and setup easy when you use a local scheduler (run from
>> > >> >> localhost). ?I can help you on the setup on local host or via ssh.
>> > >> >> Some of my colleagues have also tried using this over Amazon EC2.
>> > >> >>
>> > >> >> -Allan
>> > >> >>
>> > >> >> 2009/11/12 William Emmanuel Yu :
>> > >> >> >
>> > >> >> > interesting.. but this is setup heavy but user easy right?
>> > >> >> >
>> > >> >> > On Thu, 2009-11-12 at 18:58 -0600, Allan Espinosa wrote:
>> > >> >> >> hi william
>> > >> >> >>
>> > >> >> >> see attached file.
>> > >> >> >>
>> > >> >> >> We are using Swift (http://www.ci.uchicago.edu/swift). ?It has
>> > >> >> >> adapters to cluster schedulers like PBS, GRAM2, GRAM4, Cobalt and
>> > >> >> >> Condor. ?But you can use ssh and local (fork) for a start.
>> > >> >> >>
>> > >> >> >> We couple it to Falkon (http://dev.globus.org/wiki/Incubator/Falkon)
>> > >> >> >> for faster job throughput in short-time jobs and deployments on
>> > >> >> >> supercomputers.
>> > >> >> >>
>> > >> >> >> -Allan
>> > >> >> >>
>> > >> >> >> 2009/11/12 William Emmanuel Yu :
>> > >> >> >> >
>> > >> >> >> > Can I get a copy of this paper? What tools are you using now?
>> > >> >> >> >
>> > >> >> >> > --
>> > >> >> >> > ?-------------------------------------------------------
>> > >> >> >> > William Emmanuel S. Yu (???)
>> > >> >> >> > Department of Information Systems and Computer Science
>> > >> >> >> > Ateneo de Manila University
>> > >> >> >> > email ?: ?wyu at ateneo dot edu
>> > >> >> >> > blog ? : ?http://hip2b2.yutivo.org/
>> > >> >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> > >> >> >> > phone ?: ?+63(2)4266001 loc. 4186
>> > >> >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> > >> >> >> >
>> > >> >> >> > Confidentiality Issue: ?This message is intended only for the use of the
>> > >> >> >> > addressee and may contain information that is privileged and
>> > >> >> >> > confidential. If you are not the intended recipient, you are hereby
>> > >> >> >> > notified that any use or dissemination of this communication is strictly
>> > >> >> >> > prohibited. ?If you have received this communication in error, please
>> > >> >> >> > notify us immediately by reply and delete this message from your system.
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> > --
>> > >> >> > ?-------------------------------------------------------
>> > >> >> > William Emmanuel S. Yu (???)
>> > >> >> > Department of Information Systems and Computer Science
>> > >> >> > Ateneo de Manila University
>> > >> >> > email ?: ?wyu at ateneo dot edu
>> > >> >> > blog ? : ?http://hip2b2.yutivo.org/
>> > >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> > >> >> > phone ?: ?+63(2)4266001 loc. 4186
>> > >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> > >> >> >
>> > >> >> > Confidentiality Issue: ?This message is intended only for the use of the
>> > >> >> > addressee and may contain information that is privileged and
>> > >> >> > confidential. If you are not the intended recipient, you are hereby
>> > >> >> > notified that any use or dissemination of this communication is strictly
>> > >> >> > prohibited. ?If you have received this communication in error, please
>> > >> >> > notify us immediately by reply and delete this message from your system.
>> > >> >> >
>> > >> >> >
>> > >> >>
>> > >> >>
>> > >> >>
>> > >> > --
>> > >> > ?-------------------------------------------------------
>> > >> > William Emmanuel S. Yu (???)
>> > >> > Department of Information Systems and Computer Science
>> > >> > Ateneo de Manila University
>> > >> > email ?: ?wyu at ateneo dot edu
>> > >> > blog ? : ?http://hip2b2.yutivo.org/
>> > >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> > >> > phone ?: ?+63(2)4266001 loc. 4186
>> > >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> > >> >
>> > >> > Confidentiality Issue: ?This message is intended only for the use of the
>> > >> > addressee and may contain information that is privileged and
>> > >> > confidential. If you are not the intended recipient, you are hereby
>> > >> > notified that any use or dissemination of this communication is strictly
>> > >> > prohibited. ?If you have received this communication in error, please
>> > >> > notify us immediately by reply and delete this message from your system.
>> > >> >
>> > >> >
>> > >>
>> > >>
>> > >>
>> > > --
>> > > ?-------------------------------------------------------
>> > > William Emmanuel S. Yu (???)
>> > > Department of Information Systems and Computer Science
>> > > Ateneo de Manila University
>> > > email ?: ?wyu at ateneo dot edu
>> > > blog ? : ?http://hip2b2.yutivo.org/
>> > > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> > > phone ?: ?+63(2)4266001 loc. 4186
>> > > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> > >
>> > > Confidentiality Issue: ?This message is intended only for the use of the
>> > > addressee and may contain information that is privileged and
>> > > confidential. If you are not the intended recipient, you are hereby
>> > > notified that any use or dissemination of this communication is strictly
>> > > prohibited. ?If you have received this communication in error, please
>> > > notify us immediately by reply and delete this message from your system.
>> > >
>> > >
>> >
>> >
>> >
> --
> ?-------------------------------------------------------
> William Emmanuel S. Yu (???)
> Novare Technologies Inc.
> 6th Floor Peninsula Court Building,
> Makati Avenue corner Paseo de Roxas Avenue,
> Makati City, 1226 Philippines
> email ?: ?william dot yu at novare dot com dot hk
> web ? ?: ?www.novare.com.hk
>
> Confidentiality Issue: ?This message is intended only for the use of the
> addressee and may contain information that is privileged and
> confidential. If you are not the intended recipient, you are hereby
> notified that any use or dissemination of this communication is strictly
> prohibited. ?If you have received this communication in error, please
> notify us immediately by reply and delete this message from your system.
>
>
--
Allan M. Espinosa
PhD student, Computer Science
University of Chicago
From william.yu at novare.com.hk Mon Nov 16 02:48:44 2009
From: william.yu at novare.com.hk (William Emmanuel S. Yu)
Date: Mon, 16 Nov 2009 16:48:44 +0800
Subject: [Swift-user] Re: Copy of parallel scripting paper
In-Reply-To: <50b07b4b0911160035m46695fbbx3bfc451afb9d9c54@mail.gmail.com>
References: <1258073372.2595.7.camel@localhost>
<50b07b4b0911121658j44867c43p131a7c330362099a@mail.gmail.com>
<1258074413.2595.14.camel@localhost>
<50b07b4b0911121708m481d9ccfx8030e01f4a52304f@mail.gmail.com>
<1258078206.2595.15.camel@localhost>
<50b07b4b0911121811n5ad685c5vb75be232c1231d0b@mail.gmail.com>
<1258343957.8340.4.camel@localhost>
<50b07b4b0911160014n6ba89dd1ia990925ff46fd1d8@mail.gmail.com>
<1258359764.11500.3.camel@localhost>
<1258360189.11500.4.camel@localhost>
<50b07b4b0911160035m46695fbbx3bfc451afb9d9c54@mail.gmail.com>
Message-ID: <1258361324.11500.12.camel@localhost>
Hey. Thanks for the quick response. I do have to note that I haven't
gone through much of the documentation yet. But, these tidbits
definitely help.
Again thanks.
On Mon, 2009-11-16 at 02:35 -0600, Allan Espinosa wrote:
> Hi william,
>
> Yeah. typically i place a tc.data in my current directory and then
> specify it with the "-tc.file ./tc.data" option.
>
> I guess we can run on PBS Torque. Here's a sample sites.xml config:
>
>
>
>
>
> /home/aespinosa
>
> 2.02
> 1.98
>
> fast
> 01:00:00
>
>
>
> My initialScore and jobThrottle score enables me to submit a maximum
> of 200 jobs at a time.
>
>
> Btw, I'm cc'ing you to the swift-user mailing list. Hope you enjoy
> tinkering around Swift!
>
> -Allan
>
> 2009/11/16 William Emmanuel S. Yu :
> >
> > I found the tc.data file. geez... its in a strange place! The etc folder
> > in the distribution directory.
> >
> > On Mon, 2009-11-16 at 16:22 +0800, William Emmanuel Yu wrote:
> >> On Mon, 2009-11-16 at 02:14 -0600, Allan Espinosa wrote:
> >> > I guess that works. But if you have a scheduler, you should use it.
> >> > we have support for pbs, cobalt and condor. any unsupported scheduler
> >> > can be interfaced with Globus GRAM2 or GRAM4. From my perspective,
> >> > the cluster sysads should make it easy for its users to submit jobs :)
> >> >
> >> Ok right now. That systems uses Torque and is really maintained at this
> >> point. So it was a real challenge using it. But, I managed to write a
> >> script with bash and mpirun that got the job done. However, I did
> >> encounter a lot of dependency issues. Haha. As expected.
> >>
> >> Do you have a document on torque integration? At least a quick one.
> >>
> >> Btw, I am using the default swift binary on my laptop with no installed
> >> scheduler. I ran into the following error:
> >>
> >> "Could not find any valid host for task "Task(type=UNKNOWN,
> >> identity=urn:cog-1258359742934)" with constraints {tr=convert,
> >> filenames=[Ljava.lang.String;@2de41d, trfqn=convert,
> >> filecache=org.griphyn.vdl.karajan.lib.cache.CacheMapAdapter at db4bcf}"
> >>
> >> Which is strange as I am not using any scheduler.
> >>
> >> Thanks!
> >>
> >>
> >>
> >> > 2009/11/15 William Emmanuel Yu :
> >> > >
> >> > > After reading the article, I think I have a better appreciate of the
> >> > > problems you are trying to solve. Let me try to read about SWIFT more.
> >> > >
> >> > > What is the easier way to run SWIFT on a small cluster (say 64 cores on
> >> > > 16 nodes)? Can I just do an NFS and passwordless-SSH thing? I think I
> >> > > will try to help a buddy of mine with his thesis on aquatic migrations
> >> > > in UP as practice.
> >> > >
> >> > > Thanks.
> >> > >
> >> > > On Thu, 2009-11-12 at 20:11 -0600, Allan Espinosa wrote:
> >> > >> you can try the swift tutorial designed for localhost:
> >> > >>
> >> > >> http://www.ci.uchicago.edu/swift/guides/tutorial.php
> >> > >>
> >> > >> 2009/11/12 William Emmanuel Yu :
> >> > >> > let me review first if this is easier to teach ... but, if you have a
> >> > >> > cookbook for a quick local install then that would also be cool.
> >> > >> >
> >> > >> > On Thu, 2009-11-12 at 19:08 -0600, Allan Espinosa wrote:
> >> > >> >> its user and setup easy when you use a local scheduler (run from
> >> > >> >> localhost). I can help you on the setup on local host or via ssh.
> >> > >> >> Some of my colleagues have also tried using this over Amazon EC2.
> >> > >> >>
> >> > >> >> -Allan
> >> > >> >>
> >> > >> >> 2009/11/12 William Emmanuel Yu :
> >> > >> >> >
> >> > >> >> > interesting.. but this is setup heavy but user easy right?
> >> > >> >> >
> >> > >> >> > On Thu, 2009-11-12 at 18:58 -0600, Allan Espinosa wrote:
> >> > >> >> >> hi william
> >> > >> >> >>
> >> > >> >> >> see attached file.
> >> > >> >> >>
> >> > >> >> >> We are using Swift (http://www.ci.uchicago.edu/swift). It has
> >> > >> >> >> adapters to cluster schedulers like PBS, GRAM2, GRAM4, Cobalt and
> >> > >> >> >> Condor. But you can use ssh and local (fork) for a start.
> >> > >> >> >>
> >> > >> >> >> We couple it to Falkon (http://dev.globus.org/wiki/Incubator/Falkon)
> >> > >> >> >> for faster job throughput in short-time jobs and deployments on
> >> > >> >> >> supercomputers.
> >> > >> >> >>
> >> > >> >> >> -Allan
> >> > >> >> >>
> >> > >> >> >> 2009/11/12 William Emmanuel Yu :
> >> > >> >> >> >
> >> > >> >> >> > Can I get a copy of this paper? What tools are you using now?
> >> > >> >> >> >
> >> > >> >> >> > --
> >> > >> >> >> > ?-------------------------------------------------------
> >> > >> >> >> > William Emmanuel S. Yu (???)
> >> > >> >> >> > Department of Information Systems and Computer Science
> >> > >> >> >> > Ateneo de Manila University
> >> > >> >> >> > email : wyu at ateneo dot edu
> >> > >> >> >> > blog : http://hip2b2.yutivo.org/
> >> > >> >> >> > web : http://CNG.ateneo.edu/cng/wyu/
> >> > >> >> >> > phone : +63(2)4266001 loc. 4186
> >> > >> >> >> > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp
> >> > >> >> >> >
> >> > >> >> >> > Confidentiality Issue: This message is intended only for the use of the
> >> > >> >> >> > addressee and may contain information that is privileged and
> >> > >> >> >> > confidential. If you are not the intended recipient, you are hereby
> >> > >> >> >> > notified that any use or dissemination of this communication is strictly
> >> > >> >> >> > prohibited. If you have received this communication in error, please
> >> > >> >> >> > notify us immediately by reply and delete this message from your system.
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> >>
> >> > >> >> > --
> >> > >> >> > ?-------------------------------------------------------
> >> > >> >> > William Emmanuel S. Yu (???)
> >> > >> >> > Department of Information Systems and Computer Science
> >> > >> >> > Ateneo de Manila University
> >> > >> >> > email : wyu at ateneo dot edu
> >> > >> >> > blog : http://hip2b2.yutivo.org/
> >> > >> >> > web : http://CNG.ateneo.edu/cng/wyu/
> >> > >> >> > phone : +63(2)4266001 loc. 4186
> >> > >> >> > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp
> >> > >> >> >
> >> > >> >> > Confidentiality Issue: This message is intended only for the use of the
> >> > >> >> > addressee and may contain information that is privileged and
> >> > >> >> > confidential. If you are not the intended recipient, you are hereby
> >> > >> >> > notified that any use or dissemination of this communication is strictly
> >> > >> >> > prohibited. If you have received this communication in error, please
> >> > >> >> > notify us immediately by reply and delete this message from your system.
> >> > >> >> >
> >> > >> >> >
> >> > >> >>
> >> > >> >>
> >> > >> >>
> >> > >> > --
> >> > >> > ?-------------------------------------------------------
> >> > >> > William Emmanuel S. Yu (???)
> >> > >> > Department of Information Systems and Computer Science
> >> > >> > Ateneo de Manila University
> >> > >> > email : wyu at ateneo dot edu
> >> > >> > blog : http://hip2b2.yutivo.org/
> >> > >> > web : http://CNG.ateneo.edu/cng/wyu/
> >> > >> > phone : +63(2)4266001 loc. 4186
> >> > >> > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp
> >> > >> >
> >> > >> > Confidentiality Issue: This message is intended only for the use of the
> >> > >> > addressee and may contain information that is privileged and
> >> > >> > confidential. If you are not the intended recipient, you are hereby
> >> > >> > notified that any use or dissemination of this communication is strictly
> >> > >> > prohibited. If you have received this communication in error, please
> >> > >> > notify us immediately by reply and delete this message from your system.
> >> > >> >
> >> > >> >
> >> > >>
> >> > >>
> >> > >>
> >> > > --
> >> > > ?-------------------------------------------------------
> >> > > William Emmanuel S. Yu (???)
> >> > > Department of Information Systems and Computer Science
> >> > > Ateneo de Manila University
> >> > > email : wyu at ateneo dot edu
> >> > > blog : http://hip2b2.yutivo.org/
> >> > > web : http://CNG.ateneo.edu/cng/wyu/
> >> > > phone : +63(2)4266001 loc. 4186
> >> > > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp
> >> > >
> >> > > Confidentiality Issue: This message is intended only for the use of the
> >> > > addressee and may contain information that is privileged and
> >> > > confidential. If you are not the intended recipient, you are hereby
> >> > > notified that any use or dissemination of this communication is strictly
> >> > > prohibited. If you have received this communication in error, please
> >> > > notify us immediately by reply and delete this message from your system.
> >> > >
> >> > >
> >> >
> >> >
> >> >
> > --
> > ?-------------------------------------------------------
> > William Emmanuel S. Yu (???)
> > Novare Technologies Inc.
> > 6th Floor Peninsula Court Building,
> > Makati Avenue corner Paseo de Roxas Avenue,
> > Makati City, 1226 Philippines
> > email : william dot yu at novare dot com dot hk
> > web : www.novare.com.hk
> >
> > Confidentiality Issue: This message is intended only for the use of the
> > addressee and may contain information that is privileged and
> > confidential. If you are not the intended recipient, you are hereby
> > notified that any use or dissemination of this communication is strictly
> > prohibited. If you have received this communication in error, please
> > notify us immediately by reply and delete this message from your system.
> >
> >
>
>
>
--
?-------------------------------------------------------
William Emmanuel S. Yu (???)
Novare Technologies Inc.
6th Floor Peninsula Court Building,
Makati Avenue corner Paseo de Roxas Avenue,
Makati City, 1226 Philippines
email : william dot yu at novare dot com dot hk
web : www.novare.com.hk
Confidentiality Issue: This message is intended only for the use of the
addressee and may contain information that is privileged and
confidential. If you are not the intended recipient, you are hereby
notified that any use or dissemination of this communication is strictly
prohibited. If you have received this communication in error, please
notify us immediately by reply and delete this message from your system.
From yecartes at gmail.com Tue Nov 17 03:38:42 2009
From: yecartes at gmail.com (Allan Espinosa)
Date: Tue, 17 Nov 2009 03:38:42 -0600
Subject: [Swift-user] Re: Copy of parallel scripting paper
In-Reply-To: <1258440713.4998.7.camel@localhost>
References: <1258073372.2595.7.camel@localhost>
<50b07b4b0911121658j44867c43p131a7c330362099a@mail.gmail.com>
<1258074413.2595.14.camel@localhost>
<50b07b4b0911121708m481d9ccfx8030e01f4a52304f@mail.gmail.com>
<1258078206.2595.15.camel@localhost>
<50b07b4b0911121811n5ad685c5vb75be232c1231d0b@mail.gmail.com>
<1258343957.8340.4.camel@localhost>
<50b07b4b0911160014n6ba89dd1ia990925ff46fd1d8@mail.gmail.com>
<1258440713.4998.7.camel@localhost>
Message-ID: <50b07b4b0911170138k497cb092lc5daea3ef1fff896@mail.gmail.com>
the first line in the app() function should be installed in tc.data.
typically we deploy at systems level the programs we want. but this in
itself is an interesting approach in running stuff. Also if you are
load balancing the the application to difference architecture systems
on the second stage, the other site will not be utilized since it will
create a lot of failed jobs.
-Allan
2009/11/17 William Emmanuel Yu :
>
> Hey,
>
> Here is an intermediate swift question. How do I run a program that I
> just recently compiled. I would like to do something like what is
> intended by the program below. Of course, this program does not work
> because I can't run the generated exefile.
>
> --- start script ---
>
> type sourcefile;
> type exefile;
> type outputfile;
>
> (exefile e) compile(sourcefile s) {
> ?app {
> ? ?gcc "-o" @filename(e) @filename(s);
> ?}
> }
>
> (outputfile o) run(exefile e) {
> ?app {
> ? ?@filename(e) stdout=@filename(o)
> ?}
> }
>
> exefile efile <"hello.exe">;
> sourcefile sfile <"hello.c">;
> outputfile ofile <"hello.out">;
>
> efile = compile(sfile);
> ofile = run(efile);
>
> --- end script ---
>
> Of course, the main idea is that I won't be compiling and running one
> file per script but a full directly of C source files that I want to
> compile and run.
>
> Thanks!
>
> On Mon, 2009-11-16 at 02:14 -0600, Allan Espinosa wrote:
>> I guess that works. ?But if you have ?a scheduler, you should use it.
>> we have support for pbs, cobalt and condor. ?any unsupported scheduler
>> can be interfaced with Globus GRAM2 or GRAM4. ?From my perspective,
>> the cluster sysads should make it easy for its users to submit jobs :)
>>
>> 2009/11/15 William Emmanuel Yu :
>> >
>> > After reading the article, I think I have a better appreciate of the
>> > problems you are trying to solve. Let me try to read about SWIFT more.
>> >
>> > What is the easier way to run SWIFT on a small cluster (say 64 cores on
>> > 16 nodes)? Can I just do an NFS and passwordless-SSH thing? I think I
>> > will try to help a buddy of mine with his thesis on aquatic migrations
>> > in UP as practice.
>> >
>> > Thanks.
>> >
>> > On Thu, 2009-11-12 at 20:11 -0600, Allan Espinosa wrote:
>> >> you can try the swift tutorial designed for localhost:
>> >>
>> >> http://www.ci.uchicago.edu/swift/guides/tutorial.php
>> >>
>> >> 2009/11/12 William Emmanuel Yu :
>> >> > let me review first if this is easier to teach ... but, if you have a
>> >> > cookbook for a quick local install then that would also be cool.
>> >> >
>> >> > On Thu, 2009-11-12 at 19:08 -0600, Allan Espinosa wrote:
>> >> >> its user and setup easy when you use a local scheduler (run from
>> >> >> localhost). ?I can help you on the setup on local host or via ssh.
>> >> >> Some of my colleagues have also tried using this over Amazon EC2.
>> >> >>
>> >> >> -Allan
>> >> >>
>> >> >> 2009/11/12 William Emmanuel Yu :
>> >> >> >
>> >> >> > interesting.. but this is setup heavy but user easy right?
>> >> >> >
>> >> >> > On Thu, 2009-11-12 at 18:58 -0600, Allan Espinosa wrote:
>> >> >> >> hi william
>> >> >> >>
>> >> >> >> see attached file.
>> >> >> >>
>> >> >> >> We are using Swift (http://www.ci.uchicago.edu/swift). ?It has
>> >> >> >> adapters to cluster schedulers like PBS, GRAM2, GRAM4, Cobalt and
>> >> >> >> Condor. ?But you can use ssh and local (fork) for a start.
>> >> >> >>
>> >> >> >> We couple it to Falkon (http://dev.globus.org/wiki/Incubator/Falkon)
>> >> >> >> for faster job throughput in short-time jobs and deployments on
>> >> >> >> supercomputers.
>> >> >> >>
>> >> >> >> -Allan
>> >> >> >>
>> >> >> >> 2009/11/12 William Emmanuel Yu :
>> >> >> >> >
>> >> >> >> > Can I get a copy of this paper? What tools are you using now?
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > ?-------------------------------------------------------
>> >> >> >> > William Emmanuel S. Yu (???)
>> >> >> >> > Department of Information Systems and Computer Science
>> >> >> >> > Ateneo de Manila University
>> >> >> >> > email ?: ?wyu at ateneo dot edu
>> >> >> >> > blog ? : ?http://hip2b2.yutivo.org/
>> >> >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> >> >> >> > phone ?: ?+63(2)4266001 loc. 4186
>> >> >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> >> >> >> >
>> >> >> >> > Confidentiality Issue: ?This message is intended only for the use of the
>> >> >> >> > addressee and may contain information that is privileged and
>> >> >> >> > confidential. If you are not the intended recipient, you are hereby
>> >> >> >> > notified that any use or dissemination of this communication is strictly
>> >> >> >> > prohibited. ?If you have received this communication in error, please
>> >> >> >> > notify us immediately by reply and delete this message from your system.
>> >> >> >> >
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> > --
>> >> >> > ?-------------------------------------------------------
>> >> >> > William Emmanuel S. Yu (???)
>> >> >> > Department of Information Systems and Computer Science
>> >> >> > Ateneo de Manila University
>> >> >> > email ?: ?wyu at ateneo dot edu
>> >> >> > blog ? : ?http://hip2b2.yutivo.org/
>> >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> >> >> > phone ?: ?+63(2)4266001 loc. 4186
>> >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> >> >> >
>> >> >> > Confidentiality Issue: ?This message is intended only for the use of the
>> >> >> > addressee and may contain information that is privileged and
>> >> >> > confidential. If you are not the intended recipient, you are hereby
>> >> >> > notified that any use or dissemination of this communication is strictly
>> >> >> > prohibited. ?If you have received this communication in error, please
>> >> >> > notify us immediately by reply and delete this message from your system.
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> > --
>> >> > ?-------------------------------------------------------
>> >> > William Emmanuel S. Yu (???)
>> >> > Department of Information Systems and Computer Science
>> >> > Ateneo de Manila University
>> >> > email ?: ?wyu at ateneo dot edu
>> >> > blog ? : ?http://hip2b2.yutivo.org/
>> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> >> > phone ?: ?+63(2)4266001 loc. 4186
>> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> >> >
>> >> > Confidentiality Issue: ?This message is intended only for the use of the
>> >> > addressee and may contain information that is privileged and
>> >> > confidential. If you are not the intended recipient, you are hereby
>> >> > notified that any use or dissemination of this communication is strictly
>> >> > prohibited. ?If you have received this communication in error, please
>> >> > notify us immediately by reply and delete this message from your system.
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> > --
>> > ?-------------------------------------------------------
>> > William Emmanuel S. Yu (???)
>> > Department of Information Systems and Computer Science
>> > Ateneo de Manila University
>> > email ?: ?wyu at ateneo dot edu
>> > blog ? : ?http://hip2b2.yutivo.org/
>> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
>> > phone ?: ?+63(2)4266001 loc. 4186
>> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>> >
>> > Confidentiality Issue: ?This message is intended only for the use of the
>> > addressee and may contain information that is privileged and
>> > confidential. If you are not the intended recipient, you are hereby
>> > notified that any use or dissemination of this communication is strictly
>> > prohibited. ?If you have received this communication in error, please
>> > notify us immediately by reply and delete this message from your system.
>> >
>> >
>>
>>
>>
> --
> ?-------------------------------------------------------
> William Emmanuel S. Yu (???)
> Department of Information Systems and Computer Science
> Ateneo de Manila University
> email ?: ?wyu at ateneo dot edu
> blog ? : ?http://hip2b2.yutivo.org/
> web ? ?: ?http://CNG.ateneo.edu/cng/wyu/
> phone ?: ?+63(2)4266001 loc. 4186
> GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp
>
> Confidentiality Issue: ?This message is intended only for the use of the
> addressee and may contain information that is privileged and
> confidential. If you are not the intended recipient, you are hereby
> notified that any use or dissemination of this communication is strictly
> prohibited. ?If you have received this communication in error, please
> notify us immediately by reply and delete this message from your system.
>
>
--
Allan M. Espinosa
PhD student, Computer Science
University of Chicago
From wilde at mcs.anl.gov Tue Nov 17 10:20:37 2009
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Tue, 17 Nov 2009 08:20:37 -0800
Subject: [Swift-user] How to set GLOBUS_HOSTNAME for ALCF Viz systems?
Message-ID: <4B02CD55.8010607@mcs.anl.gov>
What interface should GLOBUS_HOSTNAME be set to, eg on Eureka?
eth4 10. net
eth5 140. net
myri0 172. net
Im guessing myri0 here, for the compute node to reach the login node???
- Mike
eth4 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A2
inet addr:10.40.9.151 Bcast:10.40.255.255 Mask:255.255.0.0
inet6 addr: fe80::230:48ff:fed1:ba2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:430006791 errors:0 dropped:105781973 overruns:0
frame:0
TX packets:1487287 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:28555408302 (27232.5 Mb) TX bytes:701260634 (668.7 Mb)
Memory:d9540000-d9560000
eth5 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A3
inet addr:140.221.82.124 Bcast:140.221.82.255
Mask:255.255.255.0
inet6 addr: fe80::230:48ff:fed1:ba3/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:81373909 errors:0 dropped:0 overruns:0 frame:0
TX packets:166797316 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6260202432 (5970.1 Mb) TX bytes:235189046440
(224293.7 Mb)
Memory:d9580000-d95a0000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:9421943 errors:0 dropped:0 overruns:0 frame:0
TX packets:9421943 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3160789166 (3014.3 Mb) TX bytes:3160789166 (3014.3 Mb)
myri0 Link encap:Ethernet HWaddr 00:60:DD:46:F9:02
inet addr:172.17.9.151 Bcast:172.31.255.255 Mask:255.240.0.0
inet6 addr: fe80::260:ddff:fe46:f902/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:252551665 errors:0 dropped:0 overruns:0 frame:0
TX packets:141201312 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:809041321883 (771561.9 Mb) TX bytes:556774673980
(530981.7 Mb)
Interrupt:210
From aespinosa at cs.uchicago.edu Tue Nov 17 10:40:27 2009
From: aespinosa at cs.uchicago.edu (Allan Espinosa)
Date: Tue, 17 Nov 2009 10:40:27 -0600
Subject: [Swift-user] How to set GLOBUS_HOSTNAME for ALCF Viz systems?
In-Reply-To: <4B02CD55.8010607@mcs.anl.gov>
References: <4B02CD55.8010607@mcs.anl.gov>
Message-ID: <50b07b4b0911170840v661b42a8n61be4bad287de474@mail.gmail.com>
I thought myri0 was the interface that connects to the DDN.
2009/11/17 Michael Wilde :
> What interface should GLOBUS_HOSTNAME be set to, eg on Eureka?
>
> eth4 10. net
> eth5 140. net
> myri0 172. net
>
> Im guessing myri0 here, for the compute node to reach the login node???
>
> - Mike
>
>
> eth4 ? ? ?Link encap:Ethernet ?HWaddr 00:30:48:D1:0B:A2
> ? ? ? ? ?inet addr:10.40.9.151 ?Bcast:10.40.255.255 ?Mask:255.255.0.0
> ? ? ? ? ?inet6 addr: fe80::230:48ff:fed1:ba2/64 Scope:Link
> ? ? ? ? ?UP BROADCAST RUNNING MULTICAST ?MTU:1500 ?Metric:1
> ? ? ? ? ?RX packets:430006791 errors:0 dropped:105781973 overruns:0 frame:0
> ? ? ? ? ?TX packets:1487287 errors:0 dropped:0 overruns:0 carrier:0
> ? ? ? ? ?collisions:0 txqueuelen:1000
> ? ? ? ? ?RX bytes:28555408302 (27232.5 Mb) ?TX bytes:701260634 (668.7 Mb)
> ? ? ? ? ?Memory:d9540000-d9560000
>
> eth5 ? ? ?Link encap:Ethernet ?HWaddr 00:30:48:D1:0B:A3
> ? ? ? ? ?inet addr:140.221.82.124 ?Bcast:140.221.82.255 Mask:255.255.255.0
> ? ? ? ? ?inet6 addr: fe80::230:48ff:fed1:ba3/64 Scope:Link
> ? ? ? ? ?UP BROADCAST RUNNING MULTICAST ?MTU:1500 ?Metric:1
> ? ? ? ? ?RX packets:81373909 errors:0 dropped:0 overruns:0 frame:0
> ? ? ? ? ?TX packets:166797316 errors:0 dropped:0 overruns:0 carrier:0
> ? ? ? ? ?collisions:0 txqueuelen:1000
> ? ? ? ? ?RX bytes:6260202432 (5970.1 Mb) ?TX bytes:235189046440 (224293.7
> Mb)
> ? ? ? ? ?Memory:d9580000-d95a0000
>
> lo ? ? ? ?Link encap:Local Loopback
> ? ? ? ? ?inet addr:127.0.0.1 ?Mask:255.0.0.0
> ? ? ? ? ?inet6 addr: ::1/128 Scope:Host
> ? ? ? ? ?UP LOOPBACK RUNNING ?MTU:16436 ?Metric:1
> ? ? ? ? ?RX packets:9421943 errors:0 dropped:0 overruns:0 frame:0
> ? ? ? ? ?TX packets:9421943 errors:0 dropped:0 overruns:0 carrier:0
> ? ? ? ? ?collisions:0 txqueuelen:0
> ? ? ? ? ?RX bytes:3160789166 (3014.3 Mb) ?TX bytes:3160789166 (3014.3 Mb)
>
> myri0 ? ? Link encap:Ethernet ?HWaddr 00:60:DD:46:F9:02
> ? ? ? ? ?inet addr:172.17.9.151 ?Bcast:172.31.255.255 ?Mask:255.240.0.0
> ? ? ? ? ?inet6 addr: fe80::260:ddff:fe46:f902/64 Scope:Link
> ? ? ? ? ?UP BROADCAST RUNNING MULTICAST ?MTU:9000 ?Metric:1
> ? ? ? ? ?RX packets:252551665 errors:0 dropped:0 overruns:0 frame:0
> ? ? ? ? ?TX packets:141201312 errors:0 dropped:0 overruns:0 carrier:0
> ? ? ? ? ?collisions:0 txqueuelen:1000
> ? ? ? ? ?RX bytes:809041321883 (771561.9 Mb) ?TX bytes:556774673980
> (530981.7 Mb)
> ? ? ? ? ?Interrupt:210
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
>
--
Allan M. Espinosa
PhD student, Computer Science
University of Chicago
From wilde at mcs.anl.gov Tue Nov 17 10:56:37 2009
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Tue, 17 Nov 2009 08:56:37 -0800
Subject: [Swift-user] How to set GLOBUS_HOSTNAME for ALCF Viz systems?
In-Reply-To: <50b07b4b0911170840v661b42a8n61be4bad287de474@mail.gmail.com>
References: <4B02CD55.8010607@mcs.anl.gov>
<50b07b4b0911170840v661b42a8n61be4bad287de474@mail.gmail.com>
Message-ID: <4B02D5C5.5010303@mcs.anl.gov>
Im unsure - it has the same IP net address as the suggested setting for
GLOBUS_HOSTNAME that Mihael gave for Surveyor.
On 11/17/09 8:40 AM, Allan Espinosa wrote:
> I thought myri0 was the interface that connects to the DDN.
>
> 2009/11/17 Michael Wilde :
>> What interface should GLOBUS_HOSTNAME be set to, eg on Eureka?
>>
>> eth4 10. net
>> eth5 140. net
>> myri0 172. net
>>
>> Im guessing myri0 here, for the compute node to reach the login node???
>>
>> - Mike
>>
>>
>> eth4 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A2
>> inet addr:10.40.9.151 Bcast:10.40.255.255 Mask:255.255.0.0
>> inet6 addr: fe80::230:48ff:fed1:ba2/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:430006791 errors:0 dropped:105781973 overruns:0 frame:0
>> TX packets:1487287 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:28555408302 (27232.5 Mb) TX bytes:701260634 (668.7 Mb)
>> Memory:d9540000-d9560000
>>
>> eth5 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A3
>> inet addr:140.221.82.124 Bcast:140.221.82.255 Mask:255.255.255.0
>> inet6 addr: fe80::230:48ff:fed1:ba3/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>> RX packets:81373909 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:166797316 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:6260202432 (5970.1 Mb) TX bytes:235189046440 (224293.7
>> Mb)
>> Memory:d9580000-d95a0000
>>
>> lo Link encap:Local Loopback
>> inet addr:127.0.0.1 Mask:255.0.0.0
>> inet6 addr: ::1/128 Scope:Host
>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>> RX packets:9421943 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:9421943 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:3160789166 (3014.3 Mb) TX bytes:3160789166 (3014.3 Mb)
>>
>> myri0 Link encap:Ethernet HWaddr 00:60:DD:46:F9:02
>> inet addr:172.17.9.151 Bcast:172.31.255.255 Mask:255.240.0.0
>> inet6 addr: fe80::260:ddff:fe46:f902/64 Scope:Link
>> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
>> RX packets:252551665 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:141201312 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:809041321883 (771561.9 Mb) TX bytes:556774673980
>> (530981.7 Mb)
>> Interrupt:210
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>>
>
>
>
From wilde at mcs.anl.gov Tue Nov 17 11:07:42 2009
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Tue, 17 Nov 2009 09:07:42 -0800
Subject: [Swift-user] coasters on eureka - block task ends prematurely
Message-ID: <4B02D85E.7000807@mcs.anl.gov>
Im getting the following on eureka for a 1-job cat sanity test of coasters:
eur$ swift -tc.file tc -sites.file sites.xml cats.swift
Swift svn swift-r3186 cog-r2577
RunID: 20091117-1031-71txxj43
Progress:
Worker task failed: 1117-311023-000000Block task ended prematurely
Progress: Active:1
Failed to transfer wrapper log from cats-20091117-1031-71txxj43/info/q
on coast
Progress: Submitted:1
Worker task failed: 1117-311023-000001Block task ended prematurely
Progress: Active:1
Failed to transfer wrapper log from cats-20091117-1031-71txxj43/info/s
on coast
Progress: Submitted:1
Worker task failed: 1117-311023-000002Block task ended prematurely
Progress: Active:1
Failed to transfer wrapper log from cats-20091117-1031-71txxj43/info/u
on coast
Execution failed:
Exception in cat:
Arguments: []
Host: coast
Directory: cats-20091117-1031-71txxj43/jobs/u/cat-u6s7zkjj
stderr.txt:
stdout.txt:
----
Caused by:
Task failed: 1117-311023-000002Block task ended prematurely
Cleaning up...
Shutting down service at https://10.40.9.151:58810
Got channel MetaChannel: 845296226 -> null
+ Done
eur$
--
tc is:
coast cat /bin/cat null null null
sites.xml is:
1
1
8
1
JGI-Pilot
zeptoos
1800
true
0.63
100000
/home/wilde/swiftwork
/scratch
--
Ive also tested with maxtime 3000 as in prior examples from Mihael.
Latest logs are on Eureka in:
eur$ pwd
/home/wilde/swift/lab
eur$ ls *log
23683.cobaltlog 23684.cobaltlog cats-20091117-1101-oapf33ye.0.rlog
cats-20091117-1101-oapf33ye.log swift.log
eur$
Moving logs to logs/ as I test further.
First sign of trouble (that I can see) in the log above (*ye.log) is:
2009-11-17 11:01:48,582-0600 INFO BlockQueueProcessor Plan time: 1
2009-11-17 11:01:50,785-0600 INFO BlockQueueProcessor Updated
allocsize: 8.66447649575794
2009-11-17 11:01:50,786-0600 INFO BlockQueueProcessor allocsize =
8.66447649575794, queuedsize = 1.0660596665516473, qsz = 1
2009-11-17 11:01:50,786-0600 INFO BlockQueueProcessor Plan time: 1
2009-11-17 11:01:51,940-0600 DEBUG TaskImpl Task(type=JOB_SUBMISSION,
identity=urn:cog-1258477276784) setting status to Completed
2009-11-17 11:01:51,941-0600 INFO Block Block task status changed:
Completed
2009-11-17 11:01:51,941-0600 WARN Block Worker task failed:
1117-011117-000000Block task ended prematurely
--
- Mike
From hategan at mcs.anl.gov Tue Nov 17 13:31:07 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Tue, 17 Nov 2009 13:31:07 -0600
Subject: [Swift-user] Re: How to set GLOBUS_HOSTNAME for ALCF Viz systems?
In-Reply-To: <4B02CD55.8010607@mcs.anl.gov>
References: <4B02CD55.8010607@mcs.anl.gov>
Message-ID: <1258486267.11276.1.camel@localhost>
On Tue, 2009-11-17 at 08:20 -0800, Michael Wilde wrote:
> What interface should GLOBUS_HOSTNAME be set to, eg on Eureka?
I don't know. Whatever interface the CNs can contact the LN on.
If there is any symmetry between eureka and intrepid, it would be
172.x.x.x.
From hategan at mcs.anl.gov Mon Nov 30 14:55:02 2009
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Mon, 30 Nov 2009 14:55:02 -0600
Subject: [Swift-user] code branch
Message-ID: <1259614502.26099.26.camel@localhost>
Hello,
I branched the cog and swift codes. This was done in order to meet both
the needs of users who use Swift on a regular basis as well as our needs
to commit "researchy" code that may not be as stable.
I added a note on the downloads page
(http://www.ci.uchicago.edu/swift/downloads/index.php) which contains
information on how to access the stable branch(es). Here's the short
version:
https://cogkit.svn.sourceforge.net/svnroot/cogkit/branches/4.1.7/src/cog
https://svn.ci.uchicago.edu/svn/vdl2/branches/1.0 swift
The development code continues to be available at the previous locations
in the repositories.
Mihael
From iraicu at cs.uchicago.edu Mon Nov 30 17:17:43 2009
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Mon, 30 Nov 2009 17:17:43 -0600
Subject: [Swift-user] CFP: IEEE Transactions on Parallel and Distributed
Systems, Special
Issue on Many-Task Computing on Grids and Supercomputers
Message-ID: <4B145297.6050400@cs.uchicago.edu>
Call for Papers
---------------------------------------------------------------------------------------
IEEE Transactions on Parallel and Distributed Systems
Special Issue on Many-Task Computing on Grids and Supercomputers
http://dsl.cs.uchicago.edu/TPDS_MTC/
=======================================================================================
The Special Issue on Many-Task Computing (MTC) will provide the scientific community a
dedicated forum, within the prestigious IEEE Transactions on Parallel and Distributed
Systems Journal, for presenting new research, development, and deployment efforts of
loosely coupled large scale applications on large scale clusters, Grids, Supercomputers,
and Cloud Computing infrastructure. MTC, the focus of the special issue, encompasses
loosely coupled applications, which are generally composed of many tasks (both
independent and dependent tasks) to achieve some larger application goal. This special
issue will cover challenges that can hamper efficiency and utilization in running
applications on large-scale systems, such as local resource manager scalability and
granularity, efficient utilization of the raw hardware, parallel file system contention
and scalability, data management, I/O management, reliability at scale, and application
scalability. We welcome paper submissions on all topics related to MTC on large scale
systems. For more information on this special issue, please see
http://dsl.cs.uchicago.edu/TPDS_MTC/.
Scope
---------------------------------------------------------------------------------------
This special issue will focus on the ability to manage and execute large scale
applications on today's largest clusters, Grids, and Supercomputers. Clusters with tens
of thousands of processor cores are readily available, Grids (i.e. TeraGrid) with a
dozen sites and 100K+ processors, and supercomputers with up to 200K processors (i.e.
IBM BlueGene/L and BlueGene/P, Cray XT5, Sun Constellation), are all now available to
the broader scientific community for open science research. Large clusters and
supercomputers have traditionally been high performance computing (HPC) systems, as
they are efficient at executing tightly coupled parallel jobs within a particular
machine with low-latency interconnects; the applications typically use message passing
interface (MPI) to achieve the needed inter-process communication. On the other hand,
Grids have been the preferred platform for more loosely coupled applications that tend
to be managed and executed through workflow systems, commonly known to fit in the
high-throughput computing (HTC) paradigm.
Many-task computing (MTC) aims to bridge the gap between two computing paradigms, HTC
and HPC. MTC is reminiscent to HTC, but it differs in the emphasis of using many
computing resources over short periods of time to accomplish many computational tasks
(i.e. including both dependent and independent tasks), where the primary metrics are
measured in seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations
(e.g. jobs) per month. MTC denotes high-performance computations comprising multiple
distinct activities, coupled via file system operations. Tasks may be small or large,
uniprocessor or multiprocessor, compute-intensive or data-intensive. The set of tasks
may be static or dynamic, homogeneous or heterogeneous, loosely coupled or tightly
coupled. The aggregate number of tasks, quantity of computing, and volumes of data may
be extremely large. MTC includes loosely coupled applications that are generally
communication-intensive but not naturally expressed using standard message passing
interface commonly found in HPC, drawing attention to the many computations that are
heterogeneous but not "happily" parallel.
There is more to HPC than tightly coupled MPI, and more to HTC than embarrassingly
parallel long running jobs. Like HPC applications, and science itself, applications
are becoming increasingly complex opening new doors for many opportunities to apply
HPC in new ways if we broaden our perspective. Some applications have just so many
simple tasks that managing them is hard. Applications that operate on or produce
large amounts of data need sophisticated data management in order to scale. There
exist applications that involve many tasks, each composed of tightly coupled MPI
tasks. Loosely coupled applications often have dependencies among tasks, and typically
use files for inter-process communication. Efficient support for these sorts of
applications on existing large scale systems will involve substantial technical
challenges and will have big impact on science.
Today's existing HPC systems are a viable platform to host MTC applications. However,
some challenges arise in large scale applications when run on large scale systems,
which can hamper the efficiency and utilization of these large scale systems. These
challenges vary from local resource manager scalability and granularity, efficient
utilization of the raw hardware, parallel file system contention and scalability, data
management, I/O management, reliability at scale, application scalability, and
understanding the limitations of the HPC systems in order to identify good candidate
MTC applications. Furthermore, the MTC paradigm can be naturally applied to the emerging
Cloud Computing paradigm due to its loosely coupled nature, which is being adopted by
industry as the next wave of technological advancement to reduce operational costs while
improving efficiencies in large scale infrastructures.
For an interesting discussion in a blog by Ian Foster on the difference between MTC and
HTC, please see his blog at http://ianfoster.typepad.com/blog/2008/07/many-tasks-comp.html.
The proposed editors also published several papers highly relevant to this special issue.
One paper is titled "Toward Loosely Coupled Programming on Petascale Systems", and was
published in IEEE/ACM Supercomputing 2008 (SC08) Conference; the second paper is titled
"Many-Task Computing for Grids and Supercomputers", which was published in the IEEE
Workshop on Many-Task Computing on Grids and Supercomputers 2008 (MTAGS08). To see last
year's workshop program agenda, and accepted papers and presentations, please see
http://dsl.cs.uchicago.edu/MTAGS08/. To see this year's workshop web site, see
http://dsl.cs.uchicago.edu/MTAGS09/.
Topics
---------------------------------------------------------------------------------------
Topics of interest include, but are not limited to:
* Compute Resource Management in large scale clusters, large Grids, Supercomputers,
or Cloud Computing infrastructure
o Scheduling
o Job execution frameworks
o Local resource manager extensions
o Performance evaluation of resource managers in use on large scale systems
o Challenges and opportunities in running many-task workloads on HPC systems
o Challenges and opportunities in running many-task workloads on Cloud
Computing infrastructure
* Data Management in large scale Grid and Supercomputer environments:
o Data-Aware Scheduling
o Parallel File System performance and scalability in large deployments
o Distributed file systems
o Data caching frameworks and techniques
* Large-Scale Workflow Systems
o Workflow system performance and scalability analysis
o Scalability of workflow systems
o Workflow infrastructure and e-Science middleware
o Programming Paradigms and Models
* Large-Scale Many-Task Applications
o Large-scale many-task applications
o Large-scale many-task data-intensive applications
o Large-scale high throughput computing (HTC) applications
o Quasi-supercomputing applications, deployments, and experiences
Paper Submission and Publication
---------------------------------------------------------------------------------------
Authors are invited to submit papers with unpublished, original work of not more than
14 pages of double column text using single spaced 9.5 point size on 8.5 x 11 inch
pages and 0.5 inch margins
(http://www2.computer.org/portal/c/document_library/get_file?uuid=02e1509b-5526-4658-afb2-fe8b35044552&groupId=525767).
Papers will be peer-reviewed, and accepted papers will be published in the IEEE digital
library. Submitted articles must not have been previously published or currently
submitted for journal publication elsewhere. As an author, you are responsible for
understanding and adhering to our submission guidelines. You can access them by clicking
on the following web link: http://www.computer.org/mc/tpds/author.htm. Please thoroughly
read these before submitting your manuscript.
Please submit your paper to Manuscript Central at http://cs-ieee.manuscriptcentral.com/.
Please feel free to contact the Peer Review Publications Coordinator, Annissia Bryant at
tpds at computer.org or the guest editors at foster at anl.gov, iraicu at cs.uchicago.edu, or
yozha at microsoft.com if you have any questions. For more information on this special issue,
please see http://dsl.cs.uchicago.edu/TPDS_MTC/.
Important Dates
---------------------------------------------------------------------------------------
* Abstract Due: December 14th, 2009
* Papers Due: December 21st, 2009
* First Round Decisions: February 22nd, 2010
* Major Revisions if needed: April 19th, 2010
* Second Round Decisions: May 24th, 2010
* Minor Revisions if needed: June 7th, 2010
* Final Decision: June 21st, 2010
* Publication Date: November, 2010
Guest Editors and Potential Reviewers
---------------------------------------------------------------------------------------
Special Issue Guest Editors
* Ian Foster, University of Chicago & Argonne National Laboratory
* Ioan Raicu, Northwestern University
* Yong Zhao, Microsoft
--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384
Evanston, IL 60208-3118
=================================================================
Cel: 1-847-722-0876
Tel: 1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web: http://www.eecs.northwestern.edu/~iraicu/
https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================
--
=================================================================
Ioan Raicu, Ph.D.
NSF/CRA Computing Innovation Fellow
=================================================================
Center for Ultra-scale Computing and Information Security (CUCIS)
Department of Electrical Engineering and Computer Science
Northwestern University
2145 Sheridan Rd, Tech M384
Evanston, IL 60208-3118
=================================================================
Cel: 1-847-722-0876
Tel: 1-847-491-8163
Email: iraicu at eecs.northwestern.edu
Web: http://www.eecs.northwestern.edu/~iraicu/
https://wiki.cucis.eecs.northwestern.edu/
=================================================================
=================================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: