From aespinosa at cs.uchicago.edu Tue Nov 3 16:56:26 2009 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 3 Nov 2009 16:56:26 -0600 Subject: [Swift-user] cobalt can't find wrapperlogs Message-ID: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> Hi, I'm using a cobalt-only sites.xml to launch MPI jobs from the BlueGene. But when i inspected the workdir, no job directories were created swift session: Swift svn swift-r3186 cog-r2577 RunID: run0 Progress: Progress: Submitted:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 1Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Checking status:1 Failed to transfer wrapper log from mpitest-run0/info/p on INTREPID Progress: Submitted:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Checking status:1 Failed to transfer wrapper log from mpitest-run0/info/r on INTREPID Progress: Submitted:1 Progress: Submitted:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Checking status:1 Failed to transfer wrapper log from mpitest-run0/info/t on INTREPID Execution failed: Exception in hello: Arguments: [] Host: INTREPID Directory: mpitest-run0/jobs/t/hello-tlf9cyij stderr.txt: stdout.txt: ---- Caused by: No status file was found. Check the shared filesystem on INTREPID listing of workdir: intrepid-fs0/users/espinosa/scratch/mpi_runs/mpitest-run0> find . . ./shared ./shared/_swiftwrap ./shared/_swiftseq ./kickstart ./status ./info ./200173.cobaltlog ./200174.cobaltlog ./200175.cobaltlog ./200176.cobaltlog ./200177.cobaltlog sites.xml: 64 HTCScienceApps 20 vn prod-devel /intrepid-fs0/users/espinosa/scratch/mpi_runs Where do you think these jobdirs were created? I have also attached the swift log in this email. -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: mpitest-run0.log Type: text/x-log Size: 45705 bytes Desc: not available URL: From hategan at mcs.anl.gov Tue Nov 3 16:58:18 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 03 Nov 2009 16:58:18 -0600 Subject: [Swift-user] cobalt can't find wrapperlogs In-Reply-To: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> Message-ID: <1257289098.18763.0.camel@localhost> Make sure you set GLOBUS_HOSTNAME to the IP of eth0 before running. On Tue, 2009-11-03 at 16:56 -0600, Allan Espinosa wrote: > Hi, > > I'm using a cobalt-only sites.xml to launch MPI jobs from the > BlueGene. But when i inspected the workdir, no job directories were > created > > swift session: > Swift svn swift-r3186 cog-r2577 > > RunID: run0 > Progress: > Progress: Submitted:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > 1Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Checking status:1 > Failed to transfer wrapper log from mpitest-run0/info/p on INTREPID > Progress: Submitted:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Checking status:1 > Failed to transfer wrapper log from mpitest-run0/info/r on INTREPID > Progress: Submitted:1 > Progress: Submitted:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Active:1 > Progress: Checking status:1 > Failed to transfer wrapper log from mpitest-run0/info/t on INTREPID > Execution failed: > Exception in hello: > Arguments: [] > Host: INTREPID > Directory: mpitest-run0/jobs/t/hello-tlf9cyij > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > No status file was found. Check the shared filesystem on INTREPID > > listing of workdir: > intrepid-fs0/users/espinosa/scratch/mpi_runs/mpitest-run0> find . > . > ./shared > ./shared/_swiftwrap > ./shared/_swiftseq > ./kickstart > ./status > ./info > ./200173.cobaltlog > ./200174.cobaltlog > ./200175.cobaltlog > ./200176.cobaltlog > ./200177.cobaltlog > > > sites.xml: > > > > > 64 > HTCScienceApps > 20 > vn > prod-devel > /intrepid-fs0/users/espinosa/scratch/mpi_runs > > > > > Where do you think these jobdirs were created? I have also attached > the swift log in this email. > > -Allan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From aespinosa at cs.uchicago.edu Tue Nov 3 17:23:52 2009 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 3 Nov 2009 17:23:52 -0600 Subject: [Swift-user] cobalt can't find wrapperlogs In-Reply-To: <1257289098.18763.0.camel@localhost> References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> <1257289098.18763.0.camel@localhost> Message-ID: <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com> I did. $ ifconfig eth0; echo $GLOBUS_HOSTNAME; ./demompi.sh eth0 Link encap:Ethernet HWaddr 00:14:5E:9C:0D:82 inet addr:172.17.5.144 Bcast:172.31.255.255 Mask:255.240.0.0 inet6 addr: fe80::214:5eff:fe9c:d82/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1 RX packets:88471902 errors:0 dropped:54 overruns:0 frame:222 TX packets:84299690 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:67184589504 (64072.2 Mb) TX bytes:69406208974 (66190.9 Mb) Interrupt:33 172.17.5.144 Swift svn swift-r3186 cog-r2577 RunID: run0 Progress: Progress: Stage in:1 Progress: Submitted:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Active:1 Progress: Checking status:1 Failed to transfer wrapper log from mpitest-run0/info/a on INTREPID Progress: Submitted:1 Progress: Active:1 Progress: Active:1 ... ... I also set it using the "env" namespace: 172.17.5.144 Yet it doesn't seem to be reflected in the cobalt logs: workdir$grep GLOBUS *.cobaltlog $ thanks, -Allan 2009/11/3 Mihael Hategan : > Make sure you set GLOBUS_HOSTNAME to the IP of eth0 before running. > > On Tue, 2009-11-03 at 16:56 -0600, Allan Espinosa wrote: >> Hi, >> >> I'm using a cobalt-only sites.xml to launch MPI jobs from the >> BlueGene. ?But when i inspected the workdir, no job directories were >> created >> >> swift session: >> Swift svn swift-r3186 cog-r2577 >> >> RunID: run0 >> Progress: >> Progress: ?Submitted:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> 1Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Checking status:1 >> Failed to transfer wrapper log from mpitest-run0/info/p on INTREPID >> Progress: ?Submitted:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Checking status:1 >> Failed to transfer wrapper log from mpitest-run0/info/r on INTREPID >> Progress: ?Submitted:1 >> Progress: ?Submitted:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Active:1 >> Progress: ?Checking status:1 >> Failed to transfer wrapper log from mpitest-run0/info/t on INTREPID >> Execution failed: >> ? ? ? ? Exception in hello: >> Arguments: [] >> Host: INTREPID >> Directory: mpitest-run0/jobs/t/hello-tlf9cyij >> stderr.txt: >> >> stdout.txt: >> >> ---- >> >> Caused by: >> ? ? ? ? No status file was found. Check the shared filesystem on INTREPID >> >> listing of workdir: >> intrepid-fs0/users/espinosa/scratch/mpi_runs/mpitest-run0> find . >> . >> ./shared >> ./shared/_swiftwrap >> ./shared/_swiftseq >> ./kickstart >> ./status >> ./info >> ./200173.cobaltlog >> ./200174.cobaltlog >> ./200175.cobaltlog >> ./200176.cobaltlog >> ./200177.cobaltlog >> >> >> sites.xml: >> >> >> ? ? >> ? ? >> ? ?64 >> ? ?HTCScienceApps >> ? ?20 >> ? ?vn >> ? ?prod-devel >> ? ?/intrepid-fs0/users/espinosa/scratch/mpi_runs >> >> >> >> >> Where do you think these jobdirs were created? ?I have also attached >> the swift log in this email. >> >> -Allan From hategan at mcs.anl.gov Tue Nov 3 17:31:29 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 03 Nov 2009 17:31:29 -0600 Subject: [Swift-user] cobalt can't find wrapperlogs In-Reply-To: <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com> References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> <1257289098.18763.0.camel@localhost> <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com> Message-ID: <1257291089.19332.3.camel@localhost> Oh, no coasters. I see. I don't think the swift wrapper will work on CNK. From aespinosa at cs.uchicago.edu Tue Nov 3 17:34:27 2009 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 3 Nov 2009 17:34:27 -0600 Subject: [Swift-user] cobalt can't find wrapperlogs In-Reply-To: <1257291089.19332.3.camel@localhost> References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> <1257289098.18763.0.camel@localhost> <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com> <1257291089.19332.3.camel@localhost> Message-ID: <50b07b4b0911031534yd0e88f1o979577e1d15dbee1@mail.gmail.com> ahh right. the way of running mpi jobs on Intrepid is different from what's the recommended procedure in the swift webpage. on the bluegene the actual "mpirun ./a.out" is equivalent to "cqsub ./a.out" itself. 2009/11/3 Mihael Hategan : > Oh, no coasters. I see. > > I don't think the swift wrapper will work on CNK. > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Tue Nov 3 17:41:49 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 03 Nov 2009 17:41:49 -0600 Subject: [Swift-user] cobalt can't find wrapperlogs In-Reply-To: <50b07b4b0911031534yd0e88f1o979577e1d15dbee1@mail.gmail.com> References: <50b07b4b0911031456u5979ca09vcfda75cf35d062b0@mail.gmail.com> <1257289098.18763.0.camel@localhost> <50b07b4b0911031523r40db0c27r62ce1576ccaef2e1@mail.gmail.com> <1257291089.19332.3.camel@localhost> <50b07b4b0911031534yd0e88f1o979577e1d15dbee1@mail.gmail.com> Message-ID: <1257291709.19637.2.camel@localhost> On Tue, 2009-11-03 at 17:34 -0600, Allan Espinosa wrote: > ahh right. > > the way of running mpi jobs on Intrepid is different from what's the > recommended procedure in the swift webpage. on the bluegene the > actual "mpirun ./a.out" is equivalent to "cqsub ./a.out" itself. I know. But you can run cnk executables on zeptoos. And I assume that since the swift wrapper is started by mpirun, the mpi environment will be set-up, so it should apply ok to the executable that the wrapper forks. So I'd say just change to zeptoos and try again. I suspect it might work. > > 2009/11/3 Mihael Hategan : > > Oh, no coasters. I see. > > > > I don't think the swift wrapper will work on CNK. > > > > > > > > > > > From wilde at mcs.anl.gov Wed Nov 4 12:54:13 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 04 Nov 2009 12:54:13 -0600 Subject: [Swift-user] How to use the swift command "text mode monitor" -tui ? Message-ID: <4AF1CDD5.9070007@mcs.anl.gov> How does one switch between the various tabs on the Swift command's "-tui" text mode monitor? Im on a Macbook, and tried number keys, alt-number, function keys, etc, but nothing seems to be recognized. The only keys it seems to respond to is hitting enter in response to "OK" upon script completion, which causes swift and the tui to exit. From hategan at mcs.anl.gov Wed Nov 4 13:02:15 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 04 Nov 2009 13:02:15 -0600 Subject: [Swift-user] How to use the swift command "text mode monitor" -tui ? In-Reply-To: <4AF1CDD5.9070007@mcs.anl.gov> References: <4AF1CDD5.9070007@mcs.anl.gov> Message-ID: <1257361335.6036.4.camel@localhost> On Wed, 2009-11-04 at 12:54 -0600, Michael Wilde wrote: > How does one switch between the various tabs on the Swift command's > "-tui" text mode monitor? > > Im on a Macbook, and tried number keys, alt-number, function keys, etc, > but nothing seems to be recognized. The function keys should work on standard terminals (including the OS X terminal). There may be certain configurations for which things don't work (I heard reports of GNU screen interfering with things on OS X). So it would be helpful if you could mention exactly what kind of configuration you're using. From aespinosa at cs.uchicago.edu Wed Nov 4 20:12:54 2009 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 4 Nov 2009 20:12:54 -0600 Subject: [Swift-user] hack to run mpi jobs on bluegene/p Message-ID: <50b07b4b0911041812k768dde9and6b1508a149487f7@mail.gmail.com> I made some hackish wrapper scripts to the app you want to run: cat hello_wrapper.sh #!/bin/bash echo "hello world" jobid=`qsub -t 20 -q prod-devel -n 64 --mode vn -o stdout.file \ /home/espinosa/experiments/mpitest/hello` getstatus(){ qstat | grep $jobid | awk '{ print $5}' } echo $jobid stat=`getstatus` while [ $stat != "exiting" ]; do stat=`getstatus` sleep 1 done sample workflow: > cat mpitest.swift type file; app (file out) hello() { hello; } file output<"stdout.file">; output = hello(); obviously we can't do stderr=@filename(x) and stuff, but we can still get progress and restartability features of the output files we are expecting. we would also need separate commandline processing for "wrapper" args to real program args. enjoy! :) -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From iraicu at cs.uchicago.edu Thu Nov 5 12:57:23 2009 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 05 Nov 2009 12:57:23 -0600 Subject: [Swift-user] hack to run mpi jobs on bluegene/p In-Reply-To: <50b07b4b0911041812k768dde9and6b1508a149487f7@mail.gmail.com> References: <50b07b4b0911041812k768dde9and6b1508a149487f7@mail.gmail.com> Message-ID: <4AF32013.6010205@cs.uchicago.edu> Hi Allan, I don't know if I understand your statement correctly. Are you saying that you got Swift to be able to run MPI jobs on the BG/P? If yes, with what provider? Coaster? Falkon? I don't think it was Falkon, as Falkon doesn't have support for allocating jobs with N-processors. Does Coaster support this kind of allocation, of multiple processors at the same time? If yes, I'd like to hear more about this, and how you got MPI to run on the BG/P through Swift. Thanks, Ioan Allan Espinosa wrote: > I made some hackish wrapper scripts to the app you want to run: > > cat hello_wrapper.sh > #!/bin/bash > echo "hello world" > jobid=`qsub -t 20 -q prod-devel -n 64 --mode vn -o stdout.file \ > /home/espinosa/experiments/mpitest/hello` > > getstatus(){ > qstat | grep $jobid | awk '{ print $5}' > } > > echo $jobid > stat=`getstatus` > while [ $stat != "exiting" ]; do > stat=`getstatus` > sleep 1 > done > > > sample workflow: > >> cat mpitest.swift >> > type file; > > app (file out) hello() { > hello; > } > > file output<"stdout.file">; > > output = hello(); > > > obviously we can't do stderr=@filename(x) and stuff, but we can still > get progress and restartability features of the output files we are > expecting. we would also need separate commandline processing for > "wrapper" args to real program args. > > enjoy! :) > -Allan > > -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ https://wiki.cucis.eecs.northwestern.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From fangfang at uchicago.edu Thu Nov 12 18:12:00 2009 From: fangfang at uchicago.edu (Fangfang Xia) Date: Thu, 12 Nov 2009 16:12:00 -0800 Subject: [Swift-user] RAxML error msgs Message-ID: <4bc7a37b0911121612s3a18502bv18bd3211f37b3a14@mail.gmail.com> Hi, I am trying to run RAxML with Swift on Surveyor and am getting the following error mesages. Could you help me with this? Thanks. directory: ~fangfang/work/jgi/phylo/test.raxml/ command line: swift -tc.file tc.data -sites.file sites.xml raxmlex1.swift Failed to transfer wrapper log from raxmlex1-20091112-1726-ct8vv8tf/info/m on surveyor Progress: Submitted:7 Failed to connect: Connection timed out at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 129. Failed to connect: Connection timed out at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 129. Failed to connect: Connection timed out at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 129. Progress: Submitted:6 Active:1 Failed to transfer wrapper log from raxmlex1-20091112-1726-ct8vv8tf/info/p on surveyor Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Progress: Submitted:7 Worker task failed: 1112-260539-000002Block task ended prematurely Statement unlikely to be reached at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 592. (Maybe you meant system() when you said exec()?) Statement unlikely to be reached at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 592. (Maybe you meant system() when you said exec()?) Statement unlikely to be reached at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 592. (Maybe you meant system() when you said exec()?) Statement unlikely to be reached at /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line 592. ... From hategan at mcs.anl.gov Thu Nov 12 20:05:56 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 12 Nov 2009 20:05:56 -0600 Subject: [Swift-user] RAxML error msgs In-Reply-To: <4bc7a37b0911121612s3a18502bv18bd3211f37b3a14@mail.gmail.com> References: <4bc7a37b0911121612s3a18502bv18bd3211f37b3a14@mail.gmail.com> Message-ID: <1258077956.29015.1.camel@localhost> Make sure you set GLOBUS_HOSTNAME=172.17.3.16 before starting swift and that you have true in sites.xml. Mihael On Thu, 2009-11-12 at 16:12 -0800, Fangfang Xia wrote: > Hi, > > I am trying to run RAxML with Swift on Surveyor and am getting the > following error mesages. Could you help me with this? Thanks. > > directory: > ~fangfang/work/jgi/phylo/test.raxml/ > > command line: > swift -tc.file tc.data -sites.file sites.xml raxmlex1.swift > > > Failed to transfer wrapper log from > raxmlex1-20091112-1726-ct8vv8tf/info/m on surveyor > Progress: Submitted:7 > > Failed to connect: Connection timed out at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 129. > Failed to connect: Connection timed out at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 129. > Failed to connect: Connection timed out at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 129. > > Progress: Submitted:6 Active:1 > Failed to transfer wrapper log from > raxmlex1-20091112-1726-ct8vv8tf/info/p on surveyor > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Progress: Submitted:7 > Worker task failed: 1112-260539-000002Block task ended prematurely > > Statement unlikely to be reached at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 592. > (Maybe you meant system() when you said exec()?) > Statement unlikely to be reached at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 592. > (Maybe you meant system() when you said exec()?) > Statement unlikely to be reached at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 592. > (Maybe you meant system() when you said exec()?) > Statement unlikely to be reached at > /home/fangfang/.globus/coasters/cscript7914625247065953287.pl line > 592. > > ... > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From yecartes at gmail.com Mon Nov 16 02:35:58 2009 From: yecartes at gmail.com (Allan Espinosa) Date: Mon, 16 Nov 2009 02:35:58 -0600 Subject: [Swift-user] Re: Copy of parallel scripting paper In-Reply-To: <1258360189.11500.4.camel@localhost> References: <1258073372.2595.7.camel@localhost> <50b07b4b0911121658j44867c43p131a7c330362099a@mail.gmail.com> <1258074413.2595.14.camel@localhost> <50b07b4b0911121708m481d9ccfx8030e01f4a52304f@mail.gmail.com> <1258078206.2595.15.camel@localhost> <50b07b4b0911121811n5ad685c5vb75be232c1231d0b@mail.gmail.com> <1258343957.8340.4.camel@localhost> <50b07b4b0911160014n6ba89dd1ia990925ff46fd1d8@mail.gmail.com> <1258359764.11500.3.camel@localhost> <1258360189.11500.4.camel@localhost> Message-ID: <50b07b4b0911160035m46695fbbx3bfc451afb9d9c54@mail.gmail.com> Hi william, Yeah. typically i place a tc.data in my current directory and then specify it with the "-tc.file ./tc.data" option. I guess we can run on PBS Torque. Here's a sample sites.xml config: /home/aespinosa 2.02 1.98 fast 01:00:00 My initialScore and jobThrottle score enables me to submit a maximum of 200 jobs at a time. Btw, I'm cc'ing you to the swift-user mailing list. Hope you enjoy tinkering around Swift! -Allan 2009/11/16 William Emmanuel S. Yu : > > I found the tc.data file. geez... its in a strange place! The etc folder > in the distribution directory. > > On Mon, 2009-11-16 at 16:22 +0800, William Emmanuel Yu wrote: >> On Mon, 2009-11-16 at 02:14 -0600, Allan Espinosa wrote: >> > I guess that works. ?But if you have ?a scheduler, you should use it. >> > we have support for pbs, cobalt and condor. ?any unsupported scheduler >> > can be interfaced with Globus GRAM2 or GRAM4. ?From my perspective, >> > the cluster sysads should make it easy for its users to submit jobs :) >> > >> Ok right now. That systems uses Torque and is really maintained at this >> point. So it was a real challenge using it. But, I managed to write a >> script with bash and mpirun that got the job done. However, I did >> encounter a lot of dependency issues. Haha. As expected. >> >> Do you have a document on torque integration? At least a quick one. >> >> Btw, I am using the default swift binary on my laptop with no installed >> scheduler. I ran into the following error: >> >> "Could not find any valid host for task "Task(type=UNKNOWN, >> identity=urn:cog-1258359742934)" with constraints {tr=convert, >> filenames=[Ljava.lang.String;@2de41d, trfqn=convert, >> filecache=org.griphyn.vdl.karajan.lib.cache.CacheMapAdapter at db4bcf}" >> >> Which is strange as I am not using any scheduler. >> >> Thanks! >> >> >> >> > 2009/11/15 William Emmanuel Yu : >> > > >> > > After reading the article, I think I have a better appreciate of the >> > > problems you are trying to solve. Let me try to read about SWIFT more. >> > > >> > > What is the easier way to run SWIFT on a small cluster (say 64 cores on >> > > 16 nodes)? Can I just do an NFS and passwordless-SSH thing? I think I >> > > will try to help a buddy of mine with his thesis on aquatic migrations >> > > in UP as practice. >> > > >> > > Thanks. >> > > >> > > On Thu, 2009-11-12 at 20:11 -0600, Allan Espinosa wrote: >> > >> you can try the swift tutorial designed for localhost: >> > >> >> > >> http://www.ci.uchicago.edu/swift/guides/tutorial.php >> > >> >> > >> 2009/11/12 William Emmanuel Yu : >> > >> > let me review first if this is easier to teach ... but, if you have a >> > >> > cookbook for a quick local install then that would also be cool. >> > >> > >> > >> > On Thu, 2009-11-12 at 19:08 -0600, Allan Espinosa wrote: >> > >> >> its user and setup easy when you use a local scheduler (run from >> > >> >> localhost). ?I can help you on the setup on local host or via ssh. >> > >> >> Some of my colleagues have also tried using this over Amazon EC2. >> > >> >> >> > >> >> -Allan >> > >> >> >> > >> >> 2009/11/12 William Emmanuel Yu : >> > >> >> > >> > >> >> > interesting.. but this is setup heavy but user easy right? >> > >> >> > >> > >> >> > On Thu, 2009-11-12 at 18:58 -0600, Allan Espinosa wrote: >> > >> >> >> hi william >> > >> >> >> >> > >> >> >> see attached file. >> > >> >> >> >> > >> >> >> We are using Swift (http://www.ci.uchicago.edu/swift). ?It has >> > >> >> >> adapters to cluster schedulers like PBS, GRAM2, GRAM4, Cobalt and >> > >> >> >> Condor. ?But you can use ssh and local (fork) for a start. >> > >> >> >> >> > >> >> >> We couple it to Falkon (http://dev.globus.org/wiki/Incubator/Falkon) >> > >> >> >> for faster job throughput in short-time jobs and deployments on >> > >> >> >> supercomputers. >> > >> >> >> >> > >> >> >> -Allan >> > >> >> >> >> > >> >> >> 2009/11/12 William Emmanuel Yu : >> > >> >> >> > >> > >> >> >> > Can I get a copy of this paper? What tools are you using now? >> > >> >> >> > >> > >> >> >> > -- >> > >> >> >> > ?------------------------------------------------------- >> > >> >> >> > William Emmanuel S. Yu (???) >> > >> >> >> > Department of Information Systems and Computer Science >> > >> >> >> > Ateneo de Manila University >> > >> >> >> > email ?: ?wyu at ateneo dot edu >> > >> >> >> > blog ? : ?http://hip2b2.yutivo.org/ >> > >> >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> > >> >> >> > phone ?: ?+63(2)4266001 loc. 4186 >> > >> >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> > >> >> >> > >> > >> >> >> > Confidentiality Issue: ?This message is intended only for the use of the >> > >> >> >> > addressee and may contain information that is privileged and >> > >> >> >> > confidential. If you are not the intended recipient, you are hereby >> > >> >> >> > notified that any use or dissemination of this communication is strictly >> > >> >> >> > prohibited. ?If you have received this communication in error, please >> > >> >> >> > notify us immediately by reply and delete this message from your system. >> > >> >> >> > >> > >> >> >> > >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> > -- >> > >> >> > ?------------------------------------------------------- >> > >> >> > William Emmanuel S. Yu (???) >> > >> >> > Department of Information Systems and Computer Science >> > >> >> > Ateneo de Manila University >> > >> >> > email ?: ?wyu at ateneo dot edu >> > >> >> > blog ? : ?http://hip2b2.yutivo.org/ >> > >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> > >> >> > phone ?: ?+63(2)4266001 loc. 4186 >> > >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> > >> >> > >> > >> >> > Confidentiality Issue: ?This message is intended only for the use of the >> > >> >> > addressee and may contain information that is privileged and >> > >> >> > confidential. If you are not the intended recipient, you are hereby >> > >> >> > notified that any use or dissemination of this communication is strictly >> > >> >> > prohibited. ?If you have received this communication in error, please >> > >> >> > notify us immediately by reply and delete this message from your system. >> > >> >> > >> > >> >> > >> > >> >> >> > >> >> >> > >> >> >> > >> > -- >> > >> > ?------------------------------------------------------- >> > >> > William Emmanuel S. Yu (???) >> > >> > Department of Information Systems and Computer Science >> > >> > Ateneo de Manila University >> > >> > email ?: ?wyu at ateneo dot edu >> > >> > blog ? : ?http://hip2b2.yutivo.org/ >> > >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> > >> > phone ?: ?+63(2)4266001 loc. 4186 >> > >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> > >> > >> > >> > Confidentiality Issue: ?This message is intended only for the use of the >> > >> > addressee and may contain information that is privileged and >> > >> > confidential. If you are not the intended recipient, you are hereby >> > >> > notified that any use or dissemination of this communication is strictly >> > >> > prohibited. ?If you have received this communication in error, please >> > >> > notify us immediately by reply and delete this message from your system. >> > >> > >> > >> > >> > >> >> > >> >> > >> >> > > -- >> > > ?------------------------------------------------------- >> > > William Emmanuel S. Yu (???) >> > > Department of Information Systems and Computer Science >> > > Ateneo de Manila University >> > > email ?: ?wyu at ateneo dot edu >> > > blog ? : ?http://hip2b2.yutivo.org/ >> > > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> > > phone ?: ?+63(2)4266001 loc. 4186 >> > > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> > > >> > > Confidentiality Issue: ?This message is intended only for the use of the >> > > addressee and may contain information that is privileged and >> > > confidential. If you are not the intended recipient, you are hereby >> > > notified that any use or dissemination of this communication is strictly >> > > prohibited. ?If you have received this communication in error, please >> > > notify us immediately by reply and delete this message from your system. >> > > >> > > >> > >> > >> > > -- > ?------------------------------------------------------- > William Emmanuel S. Yu (???) > Novare Technologies Inc. > 6th Floor Peninsula Court Building, > Makati Avenue corner Paseo de Roxas Avenue, > Makati City, 1226 Philippines > email ?: ?william dot yu at novare dot com dot hk > web ? ?: ?www.novare.com.hk > > Confidentiality Issue: ?This message is intended only for the use of the > addressee and may contain information that is privileged and > confidential. If you are not the intended recipient, you are hereby > notified that any use or dissemination of this communication is strictly > prohibited. ?If you have received this communication in error, please > notify us immediately by reply and delete this message from your system. > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From william.yu at novare.com.hk Mon Nov 16 02:48:44 2009 From: william.yu at novare.com.hk (William Emmanuel S. Yu) Date: Mon, 16 Nov 2009 16:48:44 +0800 Subject: [Swift-user] Re: Copy of parallel scripting paper In-Reply-To: <50b07b4b0911160035m46695fbbx3bfc451afb9d9c54@mail.gmail.com> References: <1258073372.2595.7.camel@localhost> <50b07b4b0911121658j44867c43p131a7c330362099a@mail.gmail.com> <1258074413.2595.14.camel@localhost> <50b07b4b0911121708m481d9ccfx8030e01f4a52304f@mail.gmail.com> <1258078206.2595.15.camel@localhost> <50b07b4b0911121811n5ad685c5vb75be232c1231d0b@mail.gmail.com> <1258343957.8340.4.camel@localhost> <50b07b4b0911160014n6ba89dd1ia990925ff46fd1d8@mail.gmail.com> <1258359764.11500.3.camel@localhost> <1258360189.11500.4.camel@localhost> <50b07b4b0911160035m46695fbbx3bfc451afb9d9c54@mail.gmail.com> Message-ID: <1258361324.11500.12.camel@localhost> Hey. Thanks for the quick response. I do have to note that I haven't gone through much of the documentation yet. But, these tidbits definitely help. Again thanks. On Mon, 2009-11-16 at 02:35 -0600, Allan Espinosa wrote: > Hi william, > > Yeah. typically i place a tc.data in my current directory and then > specify it with the "-tc.file ./tc.data" option. > > I guess we can run on PBS Torque. Here's a sample sites.xml config: > > > > > > /home/aespinosa > > 2.02 > 1.98 > > fast > 01:00:00 > > > > My initialScore and jobThrottle score enables me to submit a maximum > of 200 jobs at a time. > > > Btw, I'm cc'ing you to the swift-user mailing list. Hope you enjoy > tinkering around Swift! > > -Allan > > 2009/11/16 William Emmanuel S. Yu : > > > > I found the tc.data file. geez... its in a strange place! The etc folder > > in the distribution directory. > > > > On Mon, 2009-11-16 at 16:22 +0800, William Emmanuel Yu wrote: > >> On Mon, 2009-11-16 at 02:14 -0600, Allan Espinosa wrote: > >> > I guess that works. But if you have a scheduler, you should use it. > >> > we have support for pbs, cobalt and condor. any unsupported scheduler > >> > can be interfaced with Globus GRAM2 or GRAM4. From my perspective, > >> > the cluster sysads should make it easy for its users to submit jobs :) > >> > > >> Ok right now. That systems uses Torque and is really maintained at this > >> point. So it was a real challenge using it. But, I managed to write a > >> script with bash and mpirun that got the job done. However, I did > >> encounter a lot of dependency issues. Haha. As expected. > >> > >> Do you have a document on torque integration? At least a quick one. > >> > >> Btw, I am using the default swift binary on my laptop with no installed > >> scheduler. I ran into the following error: > >> > >> "Could not find any valid host for task "Task(type=UNKNOWN, > >> identity=urn:cog-1258359742934)" with constraints {tr=convert, > >> filenames=[Ljava.lang.String;@2de41d, trfqn=convert, > >> filecache=org.griphyn.vdl.karajan.lib.cache.CacheMapAdapter at db4bcf}" > >> > >> Which is strange as I am not using any scheduler. > >> > >> Thanks! > >> > >> > >> > >> > 2009/11/15 William Emmanuel Yu : > >> > > > >> > > After reading the article, I think I have a better appreciate of the > >> > > problems you are trying to solve. Let me try to read about SWIFT more. > >> > > > >> > > What is the easier way to run SWIFT on a small cluster (say 64 cores on > >> > > 16 nodes)? Can I just do an NFS and passwordless-SSH thing? I think I > >> > > will try to help a buddy of mine with his thesis on aquatic migrations > >> > > in UP as practice. > >> > > > >> > > Thanks. > >> > > > >> > > On Thu, 2009-11-12 at 20:11 -0600, Allan Espinosa wrote: > >> > >> you can try the swift tutorial designed for localhost: > >> > >> > >> > >> http://www.ci.uchicago.edu/swift/guides/tutorial.php > >> > >> > >> > >> 2009/11/12 William Emmanuel Yu : > >> > >> > let me review first if this is easier to teach ... but, if you have a > >> > >> > cookbook for a quick local install then that would also be cool. > >> > >> > > >> > >> > On Thu, 2009-11-12 at 19:08 -0600, Allan Espinosa wrote: > >> > >> >> its user and setup easy when you use a local scheduler (run from > >> > >> >> localhost). I can help you on the setup on local host or via ssh. > >> > >> >> Some of my colleagues have also tried using this over Amazon EC2. > >> > >> >> > >> > >> >> -Allan > >> > >> >> > >> > >> >> 2009/11/12 William Emmanuel Yu : > >> > >> >> > > >> > >> >> > interesting.. but this is setup heavy but user easy right? > >> > >> >> > > >> > >> >> > On Thu, 2009-11-12 at 18:58 -0600, Allan Espinosa wrote: > >> > >> >> >> hi william > >> > >> >> >> > >> > >> >> >> see attached file. > >> > >> >> >> > >> > >> >> >> We are using Swift (http://www.ci.uchicago.edu/swift). It has > >> > >> >> >> adapters to cluster schedulers like PBS, GRAM2, GRAM4, Cobalt and > >> > >> >> >> Condor. But you can use ssh and local (fork) for a start. > >> > >> >> >> > >> > >> >> >> We couple it to Falkon (http://dev.globus.org/wiki/Incubator/Falkon) > >> > >> >> >> for faster job throughput in short-time jobs and deployments on > >> > >> >> >> supercomputers. > >> > >> >> >> > >> > >> >> >> -Allan > >> > >> >> >> > >> > >> >> >> 2009/11/12 William Emmanuel Yu : > >> > >> >> >> > > >> > >> >> >> > Can I get a copy of this paper? What tools are you using now? > >> > >> >> >> > > >> > >> >> >> > -- > >> > >> >> >> > ?------------------------------------------------------- > >> > >> >> >> > William Emmanuel S. Yu (???) > >> > >> >> >> > Department of Information Systems and Computer Science > >> > >> >> >> > Ateneo de Manila University > >> > >> >> >> > email : wyu at ateneo dot edu > >> > >> >> >> > blog : http://hip2b2.yutivo.org/ > >> > >> >> >> > web : http://CNG.ateneo.edu/cng/wyu/ > >> > >> >> >> > phone : +63(2)4266001 loc. 4186 > >> > >> >> >> > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp > >> > >> >> >> > > >> > >> >> >> > Confidentiality Issue: This message is intended only for the use of the > >> > >> >> >> > addressee and may contain information that is privileged and > >> > >> >> >> > confidential. If you are not the intended recipient, you are hereby > >> > >> >> >> > notified that any use or dissemination of this communication is strictly > >> > >> >> >> > prohibited. If you have received this communication in error, please > >> > >> >> >> > notify us immediately by reply and delete this message from your system. > >> > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> > -- > >> > >> >> > ?------------------------------------------------------- > >> > >> >> > William Emmanuel S. Yu (???) > >> > >> >> > Department of Information Systems and Computer Science > >> > >> >> > Ateneo de Manila University > >> > >> >> > email : wyu at ateneo dot edu > >> > >> >> > blog : http://hip2b2.yutivo.org/ > >> > >> >> > web : http://CNG.ateneo.edu/cng/wyu/ > >> > >> >> > phone : +63(2)4266001 loc. 4186 > >> > >> >> > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp > >> > >> >> > > >> > >> >> > Confidentiality Issue: This message is intended only for the use of the > >> > >> >> > addressee and may contain information that is privileged and > >> > >> >> > confidential. If you are not the intended recipient, you are hereby > >> > >> >> > notified that any use or dissemination of this communication is strictly > >> > >> >> > prohibited. If you have received this communication in error, please > >> > >> >> > notify us immediately by reply and delete this message from your system. > >> > >> >> > > >> > >> >> > > >> > >> >> > >> > >> >> > >> > >> >> > >> > >> > -- > >> > >> > ?------------------------------------------------------- > >> > >> > William Emmanuel S. Yu (???) > >> > >> > Department of Information Systems and Computer Science > >> > >> > Ateneo de Manila University > >> > >> > email : wyu at ateneo dot edu > >> > >> > blog : http://hip2b2.yutivo.org/ > >> > >> > web : http://CNG.ateneo.edu/cng/wyu/ > >> > >> > phone : +63(2)4266001 loc. 4186 > >> > >> > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp > >> > >> > > >> > >> > Confidentiality Issue: This message is intended only for the use of the > >> > >> > addressee and may contain information that is privileged and > >> > >> > confidential. If you are not the intended recipient, you are hereby > >> > >> > notified that any use or dissemination of this communication is strictly > >> > >> > prohibited. If you have received this communication in error, please > >> > >> > notify us immediately by reply and delete this message from your system. > >> > >> > > >> > >> > > >> > >> > >> > >> > >> > >> > >> > > -- > >> > > ?------------------------------------------------------- > >> > > William Emmanuel S. Yu (???) > >> > > Department of Information Systems and Computer Science > >> > > Ateneo de Manila University > >> > > email : wyu at ateneo dot edu > >> > > blog : http://hip2b2.yutivo.org/ > >> > > web : http://CNG.ateneo.edu/cng/wyu/ > >> > > phone : +63(2)4266001 loc. 4186 > >> > > GPG : http://CNG.ateneo.net/cng/wyu/wyy.pgp > >> > > > >> > > Confidentiality Issue: This message is intended only for the use of the > >> > > addressee and may contain information that is privileged and > >> > > confidential. If you are not the intended recipient, you are hereby > >> > > notified that any use or dissemination of this communication is strictly > >> > > prohibited. If you have received this communication in error, please > >> > > notify us immediately by reply and delete this message from your system. > >> > > > >> > > > >> > > >> > > >> > > > -- > > ?------------------------------------------------------- > > William Emmanuel S. Yu (???) > > Novare Technologies Inc. > > 6th Floor Peninsula Court Building, > > Makati Avenue corner Paseo de Roxas Avenue, > > Makati City, 1226 Philippines > > email : william dot yu at novare dot com dot hk > > web : www.novare.com.hk > > > > Confidentiality Issue: This message is intended only for the use of the > > addressee and may contain information that is privileged and > > confidential. If you are not the intended recipient, you are hereby > > notified that any use or dissemination of this communication is strictly > > prohibited. If you have received this communication in error, please > > notify us immediately by reply and delete this message from your system. > > > > > > > -- ?------------------------------------------------------- William Emmanuel S. Yu (???) Novare Technologies Inc. 6th Floor Peninsula Court Building, Makati Avenue corner Paseo de Roxas Avenue, Makati City, 1226 Philippines email : william dot yu at novare dot com dot hk web : www.novare.com.hk Confidentiality Issue: This message is intended only for the use of the addressee and may contain information that is privileged and confidential. If you are not the intended recipient, you are hereby notified that any use or dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by reply and delete this message from your system. From yecartes at gmail.com Tue Nov 17 03:38:42 2009 From: yecartes at gmail.com (Allan Espinosa) Date: Tue, 17 Nov 2009 03:38:42 -0600 Subject: [Swift-user] Re: Copy of parallel scripting paper In-Reply-To: <1258440713.4998.7.camel@localhost> References: <1258073372.2595.7.camel@localhost> <50b07b4b0911121658j44867c43p131a7c330362099a@mail.gmail.com> <1258074413.2595.14.camel@localhost> <50b07b4b0911121708m481d9ccfx8030e01f4a52304f@mail.gmail.com> <1258078206.2595.15.camel@localhost> <50b07b4b0911121811n5ad685c5vb75be232c1231d0b@mail.gmail.com> <1258343957.8340.4.camel@localhost> <50b07b4b0911160014n6ba89dd1ia990925ff46fd1d8@mail.gmail.com> <1258440713.4998.7.camel@localhost> Message-ID: <50b07b4b0911170138k497cb092lc5daea3ef1fff896@mail.gmail.com> the first line in the app() function should be installed in tc.data. typically we deploy at systems level the programs we want. but this in itself is an interesting approach in running stuff. Also if you are load balancing the the application to difference architecture systems on the second stage, the other site will not be utilized since it will create a lot of failed jobs. -Allan 2009/11/17 William Emmanuel Yu : > > Hey, > > Here is an intermediate swift question. How do I run a program that I > just recently compiled. I would like to do something like what is > intended by the program below. Of course, this program does not work > because I can't run the generated exefile. > > --- start script --- > > type sourcefile; > type exefile; > type outputfile; > > (exefile e) compile(sourcefile s) { > ?app { > ? ?gcc "-o" @filename(e) @filename(s); > ?} > } > > (outputfile o) run(exefile e) { > ?app { > ? ?@filename(e) stdout=@filename(o) > ?} > } > > exefile efile <"hello.exe">; > sourcefile sfile <"hello.c">; > outputfile ofile <"hello.out">; > > efile = compile(sfile); > ofile = run(efile); > > --- end script --- > > Of course, the main idea is that I won't be compiling and running one > file per script but a full directly of C source files that I want to > compile and run. > > Thanks! > > On Mon, 2009-11-16 at 02:14 -0600, Allan Espinosa wrote: >> I guess that works. ?But if you have ?a scheduler, you should use it. >> we have support for pbs, cobalt and condor. ?any unsupported scheduler >> can be interfaced with Globus GRAM2 or GRAM4. ?From my perspective, >> the cluster sysads should make it easy for its users to submit jobs :) >> >> 2009/11/15 William Emmanuel Yu : >> > >> > After reading the article, I think I have a better appreciate of the >> > problems you are trying to solve. Let me try to read about SWIFT more. >> > >> > What is the easier way to run SWIFT on a small cluster (say 64 cores on >> > 16 nodes)? Can I just do an NFS and passwordless-SSH thing? I think I >> > will try to help a buddy of mine with his thesis on aquatic migrations >> > in UP as practice. >> > >> > Thanks. >> > >> > On Thu, 2009-11-12 at 20:11 -0600, Allan Espinosa wrote: >> >> you can try the swift tutorial designed for localhost: >> >> >> >> http://www.ci.uchicago.edu/swift/guides/tutorial.php >> >> >> >> 2009/11/12 William Emmanuel Yu : >> >> > let me review first if this is easier to teach ... but, if you have a >> >> > cookbook for a quick local install then that would also be cool. >> >> > >> >> > On Thu, 2009-11-12 at 19:08 -0600, Allan Espinosa wrote: >> >> >> its user and setup easy when you use a local scheduler (run from >> >> >> localhost). ?I can help you on the setup on local host or via ssh. >> >> >> Some of my colleagues have also tried using this over Amazon EC2. >> >> >> >> >> >> -Allan >> >> >> >> >> >> 2009/11/12 William Emmanuel Yu : >> >> >> > >> >> >> > interesting.. but this is setup heavy but user easy right? >> >> >> > >> >> >> > On Thu, 2009-11-12 at 18:58 -0600, Allan Espinosa wrote: >> >> >> >> hi william >> >> >> >> >> >> >> >> see attached file. >> >> >> >> >> >> >> >> We are using Swift (http://www.ci.uchicago.edu/swift). ?It has >> >> >> >> adapters to cluster schedulers like PBS, GRAM2, GRAM4, Cobalt and >> >> >> >> Condor. ?But you can use ssh and local (fork) for a start. >> >> >> >> >> >> >> >> We couple it to Falkon (http://dev.globus.org/wiki/Incubator/Falkon) >> >> >> >> for faster job throughput in short-time jobs and deployments on >> >> >> >> supercomputers. >> >> >> >> >> >> >> >> -Allan >> >> >> >> >> >> >> >> 2009/11/12 William Emmanuel Yu : >> >> >> >> > >> >> >> >> > Can I get a copy of this paper? What tools are you using now? >> >> >> >> > >> >> >> >> > -- >> >> >> >> > ?------------------------------------------------------- >> >> >> >> > William Emmanuel S. Yu (???) >> >> >> >> > Department of Information Systems and Computer Science >> >> >> >> > Ateneo de Manila University >> >> >> >> > email ?: ?wyu at ateneo dot edu >> >> >> >> > blog ? : ?http://hip2b2.yutivo.org/ >> >> >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> >> >> >> > phone ?: ?+63(2)4266001 loc. 4186 >> >> >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> >> >> >> > >> >> >> >> > Confidentiality Issue: ?This message is intended only for the use of the >> >> >> >> > addressee and may contain information that is privileged and >> >> >> >> > confidential. If you are not the intended recipient, you are hereby >> >> >> >> > notified that any use or dissemination of this communication is strictly >> >> >> >> > prohibited. ?If you have received this communication in error, please >> >> >> >> > notify us immediately by reply and delete this message from your system. >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > -- >> >> >> > ?------------------------------------------------------- >> >> >> > William Emmanuel S. Yu (???) >> >> >> > Department of Information Systems and Computer Science >> >> >> > Ateneo de Manila University >> >> >> > email ?: ?wyu at ateneo dot edu >> >> >> > blog ? : ?http://hip2b2.yutivo.org/ >> >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> >> >> > phone ?: ?+63(2)4266001 loc. 4186 >> >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> >> >> > >> >> >> > Confidentiality Issue: ?This message is intended only for the use of the >> >> >> > addressee and may contain information that is privileged and >> >> >> > confidential. If you are not the intended recipient, you are hereby >> >> >> > notified that any use or dissemination of this communication is strictly >> >> >> > prohibited. ?If you have received this communication in error, please >> >> >> > notify us immediately by reply and delete this message from your system. >> >> >> > >> >> >> > >> >> >> >> >> >> >> >> >> >> >> > -- >> >> > ?------------------------------------------------------- >> >> > William Emmanuel S. Yu (???) >> >> > Department of Information Systems and Computer Science >> >> > Ateneo de Manila University >> >> > email ?: ?wyu at ateneo dot edu >> >> > blog ? : ?http://hip2b2.yutivo.org/ >> >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> >> > phone ?: ?+63(2)4266001 loc. 4186 >> >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> >> > >> >> > Confidentiality Issue: ?This message is intended only for the use of the >> >> > addressee and may contain information that is privileged and >> >> > confidential. If you are not the intended recipient, you are hereby >> >> > notified that any use or dissemination of this communication is strictly >> >> > prohibited. ?If you have received this communication in error, please >> >> > notify us immediately by reply and delete this message from your system. >> >> > >> >> > >> >> >> >> >> >> >> > -- >> > ?------------------------------------------------------- >> > William Emmanuel S. Yu (???) >> > Department of Information Systems and Computer Science >> > Ateneo de Manila University >> > email ?: ?wyu at ateneo dot edu >> > blog ? : ?http://hip2b2.yutivo.org/ >> > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ >> > phone ?: ?+63(2)4266001 loc. 4186 >> > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp >> > >> > Confidentiality Issue: ?This message is intended only for the use of the >> > addressee and may contain information that is privileged and >> > confidential. If you are not the intended recipient, you are hereby >> > notified that any use or dissemination of this communication is strictly >> > prohibited. ?If you have received this communication in error, please >> > notify us immediately by reply and delete this message from your system. >> > >> > >> >> >> > -- > ?------------------------------------------------------- > William Emmanuel S. Yu (???) > Department of Information Systems and Computer Science > Ateneo de Manila University > email ?: ?wyu at ateneo dot edu > blog ? : ?http://hip2b2.yutivo.org/ > web ? ?: ?http://CNG.ateneo.edu/cng/wyu/ > phone ?: ?+63(2)4266001 loc. 4186 > GPG ? ?: ?http://CNG.ateneo.net/cng/wyu/wyy.pgp > > Confidentiality Issue: ?This message is intended only for the use of the > addressee and may contain information that is privileged and > confidential. If you are not the intended recipient, you are hereby > notified that any use or dissemination of this communication is strictly > prohibited. ?If you have received this communication in error, please > notify us immediately by reply and delete this message from your system. > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Tue Nov 17 10:20:37 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 17 Nov 2009 08:20:37 -0800 Subject: [Swift-user] How to set GLOBUS_HOSTNAME for ALCF Viz systems? Message-ID: <4B02CD55.8010607@mcs.anl.gov> What interface should GLOBUS_HOSTNAME be set to, eg on Eureka? eth4 10. net eth5 140. net myri0 172. net Im guessing myri0 here, for the compute node to reach the login node??? - Mike eth4 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A2 inet addr:10.40.9.151 Bcast:10.40.255.255 Mask:255.255.0.0 inet6 addr: fe80::230:48ff:fed1:ba2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:430006791 errors:0 dropped:105781973 overruns:0 frame:0 TX packets:1487287 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:28555408302 (27232.5 Mb) TX bytes:701260634 (668.7 Mb) Memory:d9540000-d9560000 eth5 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A3 inet addr:140.221.82.124 Bcast:140.221.82.255 Mask:255.255.255.0 inet6 addr: fe80::230:48ff:fed1:ba3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:81373909 errors:0 dropped:0 overruns:0 frame:0 TX packets:166797316 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:6260202432 (5970.1 Mb) TX bytes:235189046440 (224293.7 Mb) Memory:d9580000-d95a0000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:9421943 errors:0 dropped:0 overruns:0 frame:0 TX packets:9421943 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:3160789166 (3014.3 Mb) TX bytes:3160789166 (3014.3 Mb) myri0 Link encap:Ethernet HWaddr 00:60:DD:46:F9:02 inet addr:172.17.9.151 Bcast:172.31.255.255 Mask:255.240.0.0 inet6 addr: fe80::260:ddff:fe46:f902/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1 RX packets:252551665 errors:0 dropped:0 overruns:0 frame:0 TX packets:141201312 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:809041321883 (771561.9 Mb) TX bytes:556774673980 (530981.7 Mb) Interrupt:210 From aespinosa at cs.uchicago.edu Tue Nov 17 10:40:27 2009 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 17 Nov 2009 10:40:27 -0600 Subject: [Swift-user] How to set GLOBUS_HOSTNAME for ALCF Viz systems? In-Reply-To: <4B02CD55.8010607@mcs.anl.gov> References: <4B02CD55.8010607@mcs.anl.gov> Message-ID: <50b07b4b0911170840v661b42a8n61be4bad287de474@mail.gmail.com> I thought myri0 was the interface that connects to the DDN. 2009/11/17 Michael Wilde : > What interface should GLOBUS_HOSTNAME be set to, eg on Eureka? > > eth4 10. net > eth5 140. net > myri0 172. net > > Im guessing myri0 here, for the compute node to reach the login node??? > > - Mike > > > eth4 ? ? ?Link encap:Ethernet ?HWaddr 00:30:48:D1:0B:A2 > ? ? ? ? ?inet addr:10.40.9.151 ?Bcast:10.40.255.255 ?Mask:255.255.0.0 > ? ? ? ? ?inet6 addr: fe80::230:48ff:fed1:ba2/64 Scope:Link > ? ? ? ? ?UP BROADCAST RUNNING MULTICAST ?MTU:1500 ?Metric:1 > ? ? ? ? ?RX packets:430006791 errors:0 dropped:105781973 overruns:0 frame:0 > ? ? ? ? ?TX packets:1487287 errors:0 dropped:0 overruns:0 carrier:0 > ? ? ? ? ?collisions:0 txqueuelen:1000 > ? ? ? ? ?RX bytes:28555408302 (27232.5 Mb) ?TX bytes:701260634 (668.7 Mb) > ? ? ? ? ?Memory:d9540000-d9560000 > > eth5 ? ? ?Link encap:Ethernet ?HWaddr 00:30:48:D1:0B:A3 > ? ? ? ? ?inet addr:140.221.82.124 ?Bcast:140.221.82.255 Mask:255.255.255.0 > ? ? ? ? ?inet6 addr: fe80::230:48ff:fed1:ba3/64 Scope:Link > ? ? ? ? ?UP BROADCAST RUNNING MULTICAST ?MTU:1500 ?Metric:1 > ? ? ? ? ?RX packets:81373909 errors:0 dropped:0 overruns:0 frame:0 > ? ? ? ? ?TX packets:166797316 errors:0 dropped:0 overruns:0 carrier:0 > ? ? ? ? ?collisions:0 txqueuelen:1000 > ? ? ? ? ?RX bytes:6260202432 (5970.1 Mb) ?TX bytes:235189046440 (224293.7 > Mb) > ? ? ? ? ?Memory:d9580000-d95a0000 > > lo ? ? ? ?Link encap:Local Loopback > ? ? ? ? ?inet addr:127.0.0.1 ?Mask:255.0.0.0 > ? ? ? ? ?inet6 addr: ::1/128 Scope:Host > ? ? ? ? ?UP LOOPBACK RUNNING ?MTU:16436 ?Metric:1 > ? ? ? ? ?RX packets:9421943 errors:0 dropped:0 overruns:0 frame:0 > ? ? ? ? ?TX packets:9421943 errors:0 dropped:0 overruns:0 carrier:0 > ? ? ? ? ?collisions:0 txqueuelen:0 > ? ? ? ? ?RX bytes:3160789166 (3014.3 Mb) ?TX bytes:3160789166 (3014.3 Mb) > > myri0 ? ? Link encap:Ethernet ?HWaddr 00:60:DD:46:F9:02 > ? ? ? ? ?inet addr:172.17.9.151 ?Bcast:172.31.255.255 ?Mask:255.240.0.0 > ? ? ? ? ?inet6 addr: fe80::260:ddff:fe46:f902/64 Scope:Link > ? ? ? ? ?UP BROADCAST RUNNING MULTICAST ?MTU:9000 ?Metric:1 > ? ? ? ? ?RX packets:252551665 errors:0 dropped:0 overruns:0 frame:0 > ? ? ? ? ?TX packets:141201312 errors:0 dropped:0 overruns:0 carrier:0 > ? ? ? ? ?collisions:0 txqueuelen:1000 > ? ? ? ? ?RX bytes:809041321883 (771561.9 Mb) ?TX bytes:556774673980 > (530981.7 Mb) > ? ? ? ? ?Interrupt:210 > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Tue Nov 17 10:56:37 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 17 Nov 2009 08:56:37 -0800 Subject: [Swift-user] How to set GLOBUS_HOSTNAME for ALCF Viz systems? In-Reply-To: <50b07b4b0911170840v661b42a8n61be4bad287de474@mail.gmail.com> References: <4B02CD55.8010607@mcs.anl.gov> <50b07b4b0911170840v661b42a8n61be4bad287de474@mail.gmail.com> Message-ID: <4B02D5C5.5010303@mcs.anl.gov> Im unsure - it has the same IP net address as the suggested setting for GLOBUS_HOSTNAME that Mihael gave for Surveyor. On 11/17/09 8:40 AM, Allan Espinosa wrote: > I thought myri0 was the interface that connects to the DDN. > > 2009/11/17 Michael Wilde : >> What interface should GLOBUS_HOSTNAME be set to, eg on Eureka? >> >> eth4 10. net >> eth5 140. net >> myri0 172. net >> >> Im guessing myri0 here, for the compute node to reach the login node??? >> >> - Mike >> >> >> eth4 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A2 >> inet addr:10.40.9.151 Bcast:10.40.255.255 Mask:255.255.0.0 >> inet6 addr: fe80::230:48ff:fed1:ba2/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:430006791 errors:0 dropped:105781973 overruns:0 frame:0 >> TX packets:1487287 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:28555408302 (27232.5 Mb) TX bytes:701260634 (668.7 Mb) >> Memory:d9540000-d9560000 >> >> eth5 Link encap:Ethernet HWaddr 00:30:48:D1:0B:A3 >> inet addr:140.221.82.124 Bcast:140.221.82.255 Mask:255.255.255.0 >> inet6 addr: fe80::230:48ff:fed1:ba3/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >> RX packets:81373909 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:166797316 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:6260202432 (5970.1 Mb) TX bytes:235189046440 (224293.7 >> Mb) >> Memory:d9580000-d95a0000 >> >> lo Link encap:Local Loopback >> inet addr:127.0.0.1 Mask:255.0.0.0 >> inet6 addr: ::1/128 Scope:Host >> UP LOOPBACK RUNNING MTU:16436 Metric:1 >> RX packets:9421943 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:9421943 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:0 >> RX bytes:3160789166 (3014.3 Mb) TX bytes:3160789166 (3014.3 Mb) >> >> myri0 Link encap:Ethernet HWaddr 00:60:DD:46:F9:02 >> inet addr:172.17.9.151 Bcast:172.31.255.255 Mask:255.240.0.0 >> inet6 addr: fe80::260:ddff:fe46:f902/64 Scope:Link >> UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1 >> RX packets:252551665 errors:0 dropped:0 overruns:0 frame:0 >> TX packets:141201312 errors:0 dropped:0 overruns:0 carrier:0 >> collisions:0 txqueuelen:1000 >> RX bytes:809041321883 (771561.9 Mb) TX bytes:556774673980 >> (530981.7 Mb) >> Interrupt:210 >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > > > From wilde at mcs.anl.gov Tue Nov 17 11:07:42 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 17 Nov 2009 09:07:42 -0800 Subject: [Swift-user] coasters on eureka - block task ends prematurely Message-ID: <4B02D85E.7000807@mcs.anl.gov> Im getting the following on eureka for a 1-job cat sanity test of coasters: eur$ swift -tc.file tc -sites.file sites.xml cats.swift Swift svn swift-r3186 cog-r2577 RunID: 20091117-1031-71txxj43 Progress: Worker task failed: 1117-311023-000000Block task ended prematurely Progress: Active:1 Failed to transfer wrapper log from cats-20091117-1031-71txxj43/info/q on coast Progress: Submitted:1 Worker task failed: 1117-311023-000001Block task ended prematurely Progress: Active:1 Failed to transfer wrapper log from cats-20091117-1031-71txxj43/info/s on coast Progress: Submitted:1 Worker task failed: 1117-311023-000002Block task ended prematurely Progress: Active:1 Failed to transfer wrapper log from cats-20091117-1031-71txxj43/info/u on coast Execution failed: Exception in cat: Arguments: [] Host: coast Directory: cats-20091117-1031-71txxj43/jobs/u/cat-u6s7zkjj stderr.txt: stdout.txt: ---- Caused by: Task failed: 1117-311023-000002Block task ended prematurely Cleaning up... Shutting down service at https://10.40.9.151:58810 Got channel MetaChannel: 845296226 -> null + Done eur$ -- tc is: coast cat /bin/cat null null null sites.xml is: 1 1 8 1 JGI-Pilot zeptoos 1800 true 0.63 100000 /home/wilde/swiftwork /scratch -- Ive also tested with maxtime 3000 as in prior examples from Mihael. Latest logs are on Eureka in: eur$ pwd /home/wilde/swift/lab eur$ ls *log 23683.cobaltlog 23684.cobaltlog cats-20091117-1101-oapf33ye.0.rlog cats-20091117-1101-oapf33ye.log swift.log eur$ Moving logs to logs/ as I test further. First sign of trouble (that I can see) in the log above (*ye.log) is: 2009-11-17 11:01:48,582-0600 INFO BlockQueueProcessor Plan time: 1 2009-11-17 11:01:50,785-0600 INFO BlockQueueProcessor Updated allocsize: 8.66447649575794 2009-11-17 11:01:50,786-0600 INFO BlockQueueProcessor allocsize = 8.66447649575794, queuedsize = 1.0660596665516473, qsz = 1 2009-11-17 11:01:50,786-0600 INFO BlockQueueProcessor Plan time: 1 2009-11-17 11:01:51,940-0600 DEBUG TaskImpl Task(type=JOB_SUBMISSION, identity=urn:cog-1258477276784) setting status to Completed 2009-11-17 11:01:51,941-0600 INFO Block Block task status changed: Completed 2009-11-17 11:01:51,941-0600 WARN Block Worker task failed: 1117-011117-000000Block task ended prematurely -- - Mike From hategan at mcs.anl.gov Tue Nov 17 13:31:07 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 17 Nov 2009 13:31:07 -0600 Subject: [Swift-user] Re: How to set GLOBUS_HOSTNAME for ALCF Viz systems? In-Reply-To: <4B02CD55.8010607@mcs.anl.gov> References: <4B02CD55.8010607@mcs.anl.gov> Message-ID: <1258486267.11276.1.camel@localhost> On Tue, 2009-11-17 at 08:20 -0800, Michael Wilde wrote: > What interface should GLOBUS_HOSTNAME be set to, eg on Eureka? I don't know. Whatever interface the CNs can contact the LN on. If there is any symmetry between eureka and intrepid, it would be 172.x.x.x. From hategan at mcs.anl.gov Mon Nov 30 14:55:02 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 30 Nov 2009 14:55:02 -0600 Subject: [Swift-user] code branch Message-ID: <1259614502.26099.26.camel@localhost> Hello, I branched the cog and swift codes. This was done in order to meet both the needs of users who use Swift on a regular basis as well as our needs to commit "researchy" code that may not be as stable. I added a note on the downloads page (http://www.ci.uchicago.edu/swift/downloads/index.php) which contains information on how to access the stable branch(es). Here's the short version: https://cogkit.svn.sourceforge.net/svnroot/cogkit/branches/4.1.7/src/cog https://svn.ci.uchicago.edu/svn/vdl2/branches/1.0 swift The development code continues to be available at the previous locations in the repositories. Mihael From iraicu at cs.uchicago.edu Mon Nov 30 17:17:43 2009 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 30 Nov 2009 17:17:43 -0600 Subject: [Swift-user] CFP: IEEE Transactions on Parallel and Distributed Systems, Special Issue on Many-Task Computing on Grids and Supercomputers Message-ID: <4B145297.6050400@cs.uchicago.edu> Call for Papers --------------------------------------------------------------------------------------- IEEE Transactions on Parallel and Distributed Systems Special Issue on Many-Task Computing on Grids and Supercomputers http://dsl.cs.uchicago.edu/TPDS_MTC/ ======================================================================================= The Special Issue on Many-Task Computing (MTC) will provide the scientific community a dedicated forum, within the prestigious IEEE Transactions on Parallel and Distributed Systems Journal, for presenting new research, development, and deployment efforts of loosely coupled large scale applications on large scale clusters, Grids, Supercomputers, and Cloud Computing infrastructure. MTC, the focus of the special issue, encompasses loosely coupled applications, which are generally composed of many tasks (both independent and dependent tasks) to achieve some larger application goal. This special issue will cover challenges that can hamper efficiency and utilization in running applications on large-scale systems, such as local resource manager scalability and granularity, efficient utilization of the raw hardware, parallel file system contention and scalability, data management, I/O management, reliability at scale, and application scalability. We welcome paper submissions on all topics related to MTC on large scale systems. For more information on this special issue, please see http://dsl.cs.uchicago.edu/TPDS_MTC/. Scope --------------------------------------------------------------------------------------- This special issue will focus on the ability to manage and execute large scale applications on today's largest clusters, Grids, and Supercomputers. Clusters with tens of thousands of processor cores are readily available, Grids (i.e. TeraGrid) with a dozen sites and 100K+ processors, and supercomputers with up to 200K processors (i.e. IBM BlueGene/L and BlueGene/P, Cray XT5, Sun Constellation), are all now available to the broader scientific community for open science research. Large clusters and supercomputers have traditionally been high performance computing (HPC) systems, as they are efficient at executing tightly coupled parallel jobs within a particular machine with low-latency interconnects; the applications typically use message passing interface (MPI) to achieve the needed inter-process communication. On the other hand, Grids have been the preferred platform for more loosely coupled applications that tend to be managed and executed through workflow systems, commonly known to fit in the high-throughput computing (HTC) paradigm. Many-task computing (MTC) aims to bridge the gap between two computing paradigms, HTC and HPC. MTC is reminiscent to HTC, but it differs in the emphasis of using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics are measured in seconds (e.g. FLOPS, tasks/s, MB/s I/O rates), as opposed to operations (e.g. jobs) per month. MTC denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. Tasks may be small or large, uniprocessor or multiprocessor, compute-intensive or data-intensive. The set of tasks may be static or dynamic, homogeneous or heterogeneous, loosely coupled or tightly coupled. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large. MTC includes loosely coupled applications that are generally communication-intensive but not naturally expressed using standard message passing interface commonly found in HPC, drawing attention to the many computations that are heterogeneous but not "happily" parallel. There is more to HPC than tightly coupled MPI, and more to HTC than embarrassingly parallel long running jobs. Like HPC applications, and science itself, applications are becoming increasingly complex opening new doors for many opportunities to apply HPC in new ways if we broaden our perspective. Some applications have just so many simple tasks that managing them is hard. Applications that operate on or produce large amounts of data need sophisticated data management in order to scale. There exist applications that involve many tasks, each composed of tightly coupled MPI tasks. Loosely coupled applications often have dependencies among tasks, and typically use files for inter-process communication. Efficient support for these sorts of applications on existing large scale systems will involve substantial technical challenges and will have big impact on science. Today's existing HPC systems are a viable platform to host MTC applications. However, some challenges arise in large scale applications when run on large scale systems, which can hamper the efficiency and utilization of these large scale systems. These challenges vary from local resource manager scalability and granularity, efficient utilization of the raw hardware, parallel file system contention and scalability, data management, I/O management, reliability at scale, application scalability, and understanding the limitations of the HPC systems in order to identify good candidate MTC applications. Furthermore, the MTC paradigm can be naturally applied to the emerging Cloud Computing paradigm due to its loosely coupled nature, which is being adopted by industry as the next wave of technological advancement to reduce operational costs while improving efficiencies in large scale infrastructures. For an interesting discussion in a blog by Ian Foster on the difference between MTC and HTC, please see his blog at http://ianfoster.typepad.com/blog/2008/07/many-tasks-comp.html. The proposed editors also published several papers highly relevant to this special issue. One paper is titled "Toward Loosely Coupled Programming on Petascale Systems", and was published in IEEE/ACM Supercomputing 2008 (SC08) Conference; the second paper is titled "Many-Task Computing for Grids and Supercomputers", which was published in the IEEE Workshop on Many-Task Computing on Grids and Supercomputers 2008 (MTAGS08). To see last year's workshop program agenda, and accepted papers and presentations, please see http://dsl.cs.uchicago.edu/MTAGS08/. To see this year's workshop web site, see http://dsl.cs.uchicago.edu/MTAGS09/. Topics --------------------------------------------------------------------------------------- Topics of interest include, but are not limited to: * Compute Resource Management in large scale clusters, large Grids, Supercomputers, or Cloud Computing infrastructure o Scheduling o Job execution frameworks o Local resource manager extensions o Performance evaluation of resource managers in use on large scale systems o Challenges and opportunities in running many-task workloads on HPC systems o Challenges and opportunities in running many-task workloads on Cloud Computing infrastructure * Data Management in large scale Grid and Supercomputer environments: o Data-Aware Scheduling o Parallel File System performance and scalability in large deployments o Distributed file systems o Data caching frameworks and techniques * Large-Scale Workflow Systems o Workflow system performance and scalability analysis o Scalability of workflow systems o Workflow infrastructure and e-Science middleware o Programming Paradigms and Models * Large-Scale Many-Task Applications o Large-scale many-task applications o Large-scale many-task data-intensive applications o Large-scale high throughput computing (HTC) applications o Quasi-supercomputing applications, deployments, and experiences Paper Submission and Publication --------------------------------------------------------------------------------------- Authors are invited to submit papers with unpublished, original work of not more than 14 pages of double column text using single spaced 9.5 point size on 8.5 x 11 inch pages and 0.5 inch margins (http://www2.computer.org/portal/c/document_library/get_file?uuid=02e1509b-5526-4658-afb2-fe8b35044552&groupId=525767). Papers will be peer-reviewed, and accepted papers will be published in the IEEE digital library. Submitted articles must not have been previously published or currently submitted for journal publication elsewhere. As an author, you are responsible for understanding and adhering to our submission guidelines. You can access them by clicking on the following web link: http://www.computer.org/mc/tpds/author.htm. Please thoroughly read these before submitting your manuscript. Please submit your paper to Manuscript Central at http://cs-ieee.manuscriptcentral.com/. Please feel free to contact the Peer Review Publications Coordinator, Annissia Bryant at tpds at computer.org or the guest editors at foster at anl.gov, iraicu at cs.uchicago.edu, or yozha at microsoft.com if you have any questions. For more information on this special issue, please see http://dsl.cs.uchicago.edu/TPDS_MTC/. Important Dates --------------------------------------------------------------------------------------- * Abstract Due: December 14th, 2009 * Papers Due: December 21st, 2009 * First Round Decisions: February 22nd, 2010 * Major Revisions if needed: April 19th, 2010 * Second Round Decisions: May 24th, 2010 * Minor Revisions if needed: June 7th, 2010 * Final Decision: June 21st, 2010 * Publication Date: November, 2010 Guest Editors and Potential Reviewers --------------------------------------------------------------------------------------- Special Issue Guest Editors * Ian Foster, University of Chicago & Argonne National Laboratory * Ioan Raicu, Northwestern University * Yong Zhao, Microsoft -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ https://wiki.cucis.eecs.northwestern.edu/ ================================================================= ================================================================= -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ https://wiki.cucis.eecs.northwestern.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: