From benc at hawaga.org.uk Wed Oct 1 03:10:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 1 Oct 2008 08:10:28 +0000 (GMT) Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222830081.9463.4.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> Message-ID: I was playing with this last week so have a patch in my stack: cd cog/ wget http://www.ci.uchicago.edu/~benc/return-codes-1 patch -p1 < ./return-codes-1 This changes Swift behaviour but does not do anything to provider-deef, which might or might not work correctly at the moment. Running the tests in tests/misc/ by running ./run in that directory will help you test whether its working still or not - check they pass before you apply the patch, and then check afterwards too. -- From zhaozhang at uchicago.edu Wed Oct 1 16:52:54 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Wed, 01 Oct 2008 16:52:54 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> Message-ID: <48E3F136.6080404@uchicago.edu> Thanks, Ben. First I tested the tests/misc/run, and it is ok. Then I applied the patch, then swift failed: Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 sleep failed sleep failed sleep failed Exception in sleep: Arguments: [30] Host: bgp0 Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j stderr.txt: stdout.txt: zhao Ben Clifford wrote: > I was playing with this last week so have a patch in my stack: > > cd cog/ > wget http://www.ci.uchicago.edu/~benc/return-codes-1 > patch -p1 < ./return-codes-1 > > This changes Swift behaviour but does not do anything to provider-deef, > which might or might not work correctly at the moment. > > Running the tests in tests/misc/ by running ./run in that directory will > help you test whether its working still or not - check they pass before > you apply the patch, and then check afterwards too. > > From zhaozhang at uchicago.edu Wed Oct 1 16:53:18 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Wed, 01 Oct 2008 16:53:18 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222830081.9463.4.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> Message-ID: <48E3F14E.5000306@uchicago.edu> Thanks, Mihael I will work on a plan with more details. zhao Mihael Hategan wrote: > On Tue, 2008-09-30 at 21:40 -0500, Zhao Zhang wrote: > >> Hi, All >> >> I am trying to optimize the swift performance on BGP, I finished it for >> the input phase, >> but suffering the poor performance at the output phase, which is exactly >> the status file >> creation process, as you could tell from the following picture. In this >> test, I ran sleep_30 >> jobs, which is expected to finish in 30 seconds. >> >> I am wondering if we could use falkon return code instead of the status >> file? Thanks. >> > > Yes you could. > > You would have to do the following: > 1. Remove the relevant part from the wrapper (touching of the success > file and sticking failure info in the failure file) > 2. Comment out the checkJobStatus() call in vdl-int.k (around line 415) > 3. Make the deef provider set a fault on the task (should be a > JobException) when the exit code is not 0 > 4. Make the wrapper exit with a non-zero exit code when there is a > problem > > If this is too brief, let me know, and I'll give you more details. > > > > From benc at hawaga.org.uk Wed Oct 1 17:43:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 1 Oct 2008 22:43:28 +0000 (GMT) Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E3F136.6080404@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> Message-ID: ok, try the same compiled code with the localhost provider running those tests. If those pass, then there may be something not working as necessary in provider-deef or falkon (the step in Mihaels instructions about getting falkon+provider-deef to return process exit codes). If not, then there is some other problem. On Wed, 1 Oct 2008, Zhao Zhang wrote: > Thanks, Ben. > > First I tested the tests/misc/run, and it is ok. > Then I applied the patch, then swift failed: > Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 > Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 > Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 > Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 > sleep failed > sleep failed > sleep failed > Exception in sleep: > Arguments: [30] > Host: bgp0 > Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j > stderr.txt: > > stdout.txt: > > zhao > > Ben Clifford wrote: > > I was playing with this last week so have a patch in my stack: > > > > cd cog/ > > wget http://www.ci.uchicago.edu/~benc/return-codes-1 > > patch -p1 < ./return-codes-1 > > > > This changes Swift behaviour but does not do anything to provider-deef, > > which might or might not work correctly at the moment. > > > > Running the tests in tests/misc/ by running ./run in that directory will > > help you test whether its working still or not - check they pass before you > > apply the patch, and then check afterwards too. > > > > > > From benc at hawaga.org.uk Thu Oct 2 09:32:26 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 2 Oct 2008 14:32:26 +0000 (GMT) Subject: [Swift-devel] swift 0.7 release plan - early nov Message-ID: Late october / early november is about the time for a two-monthly Swift release. There hasn't been much new functionality per-se since 0.6, and I don't expect there to be; but there's been a bunch of bug-fixing and I expect there to be more of that. -- From zhaozhang at uchicago.edu Thu Oct 2 14:37:32 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 14:37:32 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> Message-ID: <48E522FC.9070109@uchicago.edu> After applying the patch, I rebuild falkon, and run the tests. It failed. zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run Removing files from previous runs Running test 130-fmri Swift svn swift-r2169 (Swift modified locally) cog-r2125 RunID: 20081002-1429-v0e0f3a3 Progress: touch started touch started touch started touch started Sorted: [localhost:0.000(1.000):0/1 overload: 0] Sorted: [localhost:0.000(1.000):1/1 overload: 0] Failed to transfer wrapper log from 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost Failed to transfer wrapper log from 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost Sorted: [localhost:-2.500(0.257):0/1 overload: 0] Failed to transfer wrapper log from 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost Sorted: [localhost:-4.200(0.129):0/1 overload: 0] Progress: Stage in:1 Failed but can retry:3 Failed to transfer wrapper log from 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost Sorted: [localhost:-5.900(0.079):0/1 overload: 0] Progress: Stage in:1 Failed but can retry:3 Failed to transfer wrapper log from 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost SWIFT RETURN CODE NON-ZERO test clusters ended with return value 1 zhao Ben Clifford wrote: > ok, try the same compiled code with the localhost provider running those > tests. If those pass, then there may be something not working as necessary > in provider-deef or falkon (the step in Mihaels instructions about getting > falkon+provider-deef to return process exit codes). If not, then there is > some other problem. > > On Wed, 1 Oct 2008, Zhao Zhang wrote: > > >> Thanks, Ben. >> >> First I tested the tests/misc/run, and it is ok. >> Then I applied the patch, then swift failed: >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 >> sleep failed >> sleep failed >> sleep failed >> Exception in sleep: >> Arguments: [30] >> Host: bgp0 >> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j >> stderr.txt: >> >> stdout.txt: >> >> zhao >> >> Ben Clifford wrote: >> >>> I was playing with this last week so have a patch in my stack: >>> >>> cd cog/ >>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 >>> patch -p1 < ./return-codes-1 >>> >>> This changes Swift behaviour but does not do anything to provider-deef, >>> which might or might not work correctly at the moment. >>> >>> Running the tests in tests/misc/ by running ./run in that directory will >>> help you test whether its working still or not - check they pass before you >>> apply the patch, and then check afterwards too. >>> >>> >>> >> > > From hategan at mcs.anl.gov Thu Oct 2 14:42:46 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 14:42:46 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E522FC.9070109@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> Message-ID: <1222976566.23246.0.camel@localhost> It would be useful if you could post the logs. On Thu, 2008-10-02 at 14:37 -0500, Zhao Zhang wrote: > After applying the patch, I rebuild falkon, and run the tests. It failed. > > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run > Removing files from previous runs > Running test 130-fmri > Swift svn swift-r2169 (Swift modified locally) cog-r2125 > > RunID: 20081002-1429-v0e0f3a3 > Progress: > touch started > touch started > touch started > touch started > Sorted: [localhost:0.000(1.000):0/1 overload: 0] > Sorted: [localhost:0.000(1.000):1/1 overload: 0] > Failed to transfer wrapper log from > 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost > Failed to transfer wrapper log from > 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost > Sorted: [localhost:-2.500(0.257):0/1 overload: 0] > Failed to transfer wrapper log from > 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost > Sorted: [localhost:-4.200(0.129):0/1 overload: 0] > Progress: Stage in:1 Failed but can retry:3 > Failed to transfer wrapper log from > 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost > Sorted: [localhost:-5.900(0.079):0/1 overload: 0] > Progress: Stage in:1 Failed but can retry:3 > Failed to transfer wrapper log from > 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost > SWIFT RETURN CODE NON-ZERO > test clusters ended with return value 1 > > zhao > > Ben Clifford wrote: > > ok, try the same compiled code with the localhost provider running those > > tests. If those pass, then there may be something not working as necessary > > in provider-deef or falkon (the step in Mihaels instructions about getting > > falkon+provider-deef to return process exit codes). If not, then there is > > some other problem. > > > > On Wed, 1 Oct 2008, Zhao Zhang wrote: > > > > > >> Thanks, Ben. > >> > >> First I tested the tests/misc/run, and it is ok. > >> Then I applied the patch, then swift failed: > >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 > >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 > >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 > >> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 > >> sleep failed > >> sleep failed > >> sleep failed > >> Exception in sleep: > >> Arguments: [30] > >> Host: bgp0 > >> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j > >> stderr.txt: > >> > >> stdout.txt: > >> > >> zhao > >> > >> Ben Clifford wrote: > >> > >>> I was playing with this last week so have a patch in my stack: > >>> > >>> cd cog/ > >>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 > >>> patch -p1 < ./return-codes-1 > >>> > >>> This changes Swift behaviour but does not do anything to provider-deef, > >>> which might or might not work correctly at the moment. > >>> > >>> Running the tests in tests/misc/ by running ./run in that directory will > >>> help you test whether its working still or not - check they pass before you > >>> apply the patch, and then check afterwards too. > >>> > >>> > >>> > >> > > > > From zhaozhang at uchicago.edu Thu Oct 2 15:15:09 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 15:15:09 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222976566.23246.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> Message-ID: <48E52BCD.5010708@uchicago.edu> I rerun it, and the log is at http://www.ci.uchicago.edu/~zzhang/130-fmri-20081002-1511-s2ezc795.log zhao Mihael Hategan wrote: > It would be useful if you could post the logs. > > On Thu, 2008-10-02 at 14:37 -0500, Zhao Zhang wrote: > >> After applying the patch, I rebuild falkon, and run the tests. It failed. >> >> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run >> Removing files from previous runs >> Running test 130-fmri >> Swift svn swift-r2169 (Swift modified locally) cog-r2125 >> >> RunID: 20081002-1429-v0e0f3a3 >> Progress: >> touch started >> touch started >> touch started >> touch started >> Sorted: [localhost:0.000(1.000):0/1 overload: 0] >> Sorted: [localhost:0.000(1.000):1/1 overload: 0] >> Failed to transfer wrapper log from >> 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost >> Failed to transfer wrapper log from >> 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost >> Sorted: [localhost:-2.500(0.257):0/1 overload: 0] >> Failed to transfer wrapper log from >> 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost >> Sorted: [localhost:-4.200(0.129):0/1 overload: 0] >> Progress: Stage in:1 Failed but can retry:3 >> Failed to transfer wrapper log from >> 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost >> Sorted: [localhost:-5.900(0.079):0/1 overload: 0] >> Progress: Stage in:1 Failed but can retry:3 >> Failed to transfer wrapper log from >> 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost >> SWIFT RETURN CODE NON-ZERO >> test clusters ended with return value 1 >> >> zhao >> >> Ben Clifford wrote: >> >>> ok, try the same compiled code with the localhost provider running those >>> tests. If those pass, then there may be something not working as necessary >>> in provider-deef or falkon (the step in Mihaels instructions about getting >>> falkon+provider-deef to return process exit codes). If not, then there is >>> some other problem. >>> >>> On Wed, 1 Oct 2008, Zhao Zhang wrote: >>> >>> >>> >>>> Thanks, Ben. >>>> >>>> First I tested the tests/misc/run, and it is ok. >>>> Then I applied the patch, then swift failed: >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 >>>> sleep failed >>>> sleep failed >>>> sleep failed >>>> Exception in sleep: >>>> Arguments: [30] >>>> Host: bgp0 >>>> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j >>>> stderr.txt: >>>> >>>> stdout.txt: >>>> >>>> zhao >>>> >>>> Ben Clifford wrote: >>>> >>>> >>>>> I was playing with this last week so have a patch in my stack: >>>>> >>>>> cd cog/ >>>>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 >>>>> patch -p1 < ./return-codes-1 >>>>> >>>>> This changes Swift behaviour but does not do anything to provider-deef, >>>>> which might or might not work correctly at the moment. >>>>> >>>>> Running the tests in tests/misc/ by running ./run in that directory will >>>>> help you test whether its working still or not - check they pass before you >>>>> apply the patch, and then check afterwards too. >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Oct 2 15:24:10 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 15:24:10 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E52BCD.5010708@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> Message-ID: <1222979050.25306.0.camel@localhost> 2008-10-02 15:12:14,275-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=touch-zdy94a0j - Application exception: vdl:execute @ vdl-int.k, line: 395 does not support a 'jobid' argument. There's something wrong with the version of swift that you have. What is it? On Thu, 2008-10-02 at 15:15 -0500, Zhao Zhang wrote: > I rerun it, and the log is at > http://www.ci.uchicago.edu/~zzhang/130-fmri-20081002-1511-s2ezc795.log > > > zhao > > Mihael Hategan wrote: > > It would be useful if you could post the logs. > > > > On Thu, 2008-10-02 at 14:37 -0500, Zhao Zhang wrote: > > > >> After applying the patch, I rebuild falkon, and run the tests. It failed. > >> > >> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run > >> Removing files from previous runs > >> Running test 130-fmri > >> Swift svn swift-r2169 (Swift modified locally) cog-r2125 > >> > >> RunID: 20081002-1429-v0e0f3a3 > >> Progress: > >> touch started > >> touch started > >> touch started > >> touch started > >> Sorted: [localhost:0.000(1.000):0/1 overload: 0] > >> Sorted: [localhost:0.000(1.000):1/1 overload: 0] > >> Failed to transfer wrapper log from > >> 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost > >> Failed to transfer wrapper log from > >> 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost > >> Sorted: [localhost:-2.500(0.257):0/1 overload: 0] > >> Failed to transfer wrapper log from > >> 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost > >> Sorted: [localhost:-4.200(0.129):0/1 overload: 0] > >> Progress: Stage in:1 Failed but can retry:3 > >> Failed to transfer wrapper log from > >> 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost > >> Sorted: [localhost:-5.900(0.079):0/1 overload: 0] > >> Progress: Stage in:1 Failed but can retry:3 > >> Failed to transfer wrapper log from > >> 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost > >> SWIFT RETURN CODE NON-ZERO > >> test clusters ended with return value 1 > >> > >> zhao > >> > >> Ben Clifford wrote: > >> > >>> ok, try the same compiled code with the localhost provider running those > >>> tests. If those pass, then there may be something not working as necessary > >>> in provider-deef or falkon (the step in Mihaels instructions about getting > >>> falkon+provider-deef to return process exit codes). If not, then there is > >>> some other problem. > >>> > >>> On Wed, 1 Oct 2008, Zhao Zhang wrote: > >>> > >>> > >>> > >>>> Thanks, Ben. > >>>> > >>>> First I tested the tests/misc/run, and it is ok. > >>>> Then I applied the patch, then swift failed: > >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 > >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 > >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 > >>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 > >>>> sleep failed > >>>> sleep failed > >>>> sleep failed > >>>> Exception in sleep: > >>>> Arguments: [30] > >>>> Host: bgp0 > >>>> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j > >>>> stderr.txt: > >>>> > >>>> stdout.txt: > >>>> > >>>> zhao > >>>> > >>>> Ben Clifford wrote: > >>>> > >>>> > >>>>> I was playing with this last week so have a patch in my stack: > >>>>> > >>>>> cd cog/ > >>>>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 > >>>>> patch -p1 < ./return-codes-1 > >>>>> > >>>>> This changes Swift behaviour but does not do anything to provider-deef, > >>>>> which might or might not work correctly at the moment. > >>>>> > >>>>> Running the tests in tests/misc/ by running ./run in that directory will > >>>>> help you test whether its working still or not - check they pass before you > >>>>> apply the patch, and then check afterwards too. > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Oct 2 15:28:51 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 15:28:51 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222979050.25306.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> Message-ID: <48E52F03.8090800@uchicago.edu> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> svn info Path: . URL: https://svn.ci.uchicago.edu/svn/vdl2/trunk/tests/misc Repository Root: https://svn.ci.uchicago.edu/svn/vdl2 Repository UUID: e2bb083e-7f23-0410-b3a8-8253ac9ef6d8 Revision: 2169 Node Kind: directory Schedule: normal Last Changed Author: benc Last Changed Rev: 2153 Last Changed Date: 2008-07-28 01:57:52 -0500 (Mon, 28 Jul 2008) so that is 2169 Mihael Hategan wrote: > 2008-10-02 15:12:14,275-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=touch-zdy94a0j - Application exception: vdl:execute @ vdl-int.k, > line: 395 does not support a 'jobid' argument. > > There's something wrong with the version of swift that you have. What is > it? > > On Thu, 2008-10-02 at 15:15 -0500, Zhao Zhang wrote: > >> I rerun it, and the log is at >> http://www.ci.uchicago.edu/~zzhang/130-fmri-20081002-1511-s2ezc795.log >> >> >> zhao >> >> Mihael Hategan wrote: >> >>> It would be useful if you could post the logs. >>> >>> On Thu, 2008-10-02 at 14:37 -0500, Zhao Zhang wrote: >>> >>> >>>> After applying the patch, I rebuild falkon, and run the tests. It failed. >>>> >>>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run >>>> Removing files from previous runs >>>> Running test 130-fmri >>>> Swift svn swift-r2169 (Swift modified locally) cog-r2125 >>>> >>>> RunID: 20081002-1429-v0e0f3a3 >>>> Progress: >>>> touch started >>>> touch started >>>> touch started >>>> touch started >>>> Sorted: [localhost:0.000(1.000):0/1 overload: 0] >>>> Sorted: [localhost:0.000(1.000):1/1 overload: 0] >>>> Failed to transfer wrapper log from >>>> 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost >>>> Failed to transfer wrapper log from >>>> 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost >>>> Sorted: [localhost:-2.500(0.257):0/1 overload: 0] >>>> Failed to transfer wrapper log from >>>> 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost >>>> Sorted: [localhost:-4.200(0.129):0/1 overload: 0] >>>> Progress: Stage in:1 Failed but can retry:3 >>>> Failed to transfer wrapper log from >>>> 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost >>>> Sorted: [localhost:-5.900(0.079):0/1 overload: 0] >>>> Progress: Stage in:1 Failed but can retry:3 >>>> Failed to transfer wrapper log from >>>> 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost >>>> SWIFT RETURN CODE NON-ZERO >>>> test clusters ended with return value 1 >>>> >>>> zhao >>>> >>>> Ben Clifford wrote: >>>> >>>> >>>>> ok, try the same compiled code with the localhost provider running those >>>>> tests. If those pass, then there may be something not working as necessary >>>>> in provider-deef or falkon (the step in Mihaels instructions about getting >>>>> falkon+provider-deef to return process exit codes). If not, then there is >>>>> some other problem. >>>>> >>>>> On Wed, 1 Oct 2008, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> Thanks, Ben. >>>>>> >>>>>> First I tested the tests/misc/run, and it is ok. >>>>>> Then I applied the patch, then swift failed: >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 >>>>>> sleep failed >>>>>> sleep failed >>>>>> sleep failed >>>>>> Exception in sleep: >>>>>> Arguments: [30] >>>>>> Host: bgp0 >>>>>> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j >>>>>> stderr.txt: >>>>>> >>>>>> stdout.txt: >>>>>> >>>>>> zhao >>>>>> >>>>>> Ben Clifford wrote: >>>>>> >>>>>> >>>>>> >>>>>>> I was playing with this last week so have a patch in my stack: >>>>>>> >>>>>>> cd cog/ >>>>>>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 >>>>>>> patch -p1 < ./return-codes-1 >>>>>>> >>>>>>> This changes Swift behaviour but does not do anything to provider-deef, >>>>>>> which might or might not work correctly at the moment. >>>>>>> >>>>>>> Running the tests in tests/misc/ by running ./run in that directory will >>>>>>> help you test whether its working still or not - check they pass before you >>>>>>> apply the patch, and then check afterwards too. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Oct 2 15:33:52 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 15:33:52 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E52F03.8090800@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> Message-ID: <1222979632.25487.0.camel@localhost> That's pretty old. The current revision is 2249. Alternatively you could remove the jobid argument to execute. On Thu, 2008-10-02 at 15:28 -0500, Zhao Zhang wrote: > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> svn info > Path: . > URL: https://svn.ci.uchicago.edu/svn/vdl2/trunk/tests/misc > Repository Root: https://svn.ci.uchicago.edu/svn/vdl2 > Repository UUID: e2bb083e-7f23-0410-b3a8-8253ac9ef6d8 > Revision: 2169 > Node Kind: directory > Schedule: normal > Last Changed Author: benc > Last Changed Rev: 2153 > Last Changed Date: 2008-07-28 01:57:52 -0500 (Mon, 28 Jul 2008) > > so that is 2169 > > Mihael Hategan wrote: > > 2008-10-02 15:12:14,275-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION > > jobid=touch-zdy94a0j - Application exception: vdl:execute @ vdl-int.k, > > line: 395 does not support a 'jobid' argument. > > > > There's something wrong with the version of swift that you have. What is > > it? > > > > On Thu, 2008-10-02 at 15:15 -0500, Zhao Zhang wrote: > > > >> I rerun it, and the log is at > >> http://www.ci.uchicago.edu/~zzhang/130-fmri-20081002-1511-s2ezc795.log > >> > >> > >> zhao > >> > >> Mihael Hategan wrote: > >> > >>> It would be useful if you could post the logs. > >>> > >>> On Thu, 2008-10-02 at 14:37 -0500, Zhao Zhang wrote: > >>> > >>> > >>>> After applying the patch, I rebuild falkon, and run the tests. It failed. > >>>> > >>>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run > >>>> Removing files from previous runs > >>>> Running test 130-fmri > >>>> Swift svn swift-r2169 (Swift modified locally) cog-r2125 > >>>> > >>>> RunID: 20081002-1429-v0e0f3a3 > >>>> Progress: > >>>> touch started > >>>> touch started > >>>> touch started > >>>> touch started > >>>> Sorted: [localhost:0.000(1.000):0/1 overload: 0] > >>>> Sorted: [localhost:0.000(1.000):1/1 overload: 0] > >>>> Failed to transfer wrapper log from > >>>> 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost > >>>> Failed to transfer wrapper log from > >>>> 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost > >>>> Sorted: [localhost:-2.500(0.257):0/1 overload: 0] > >>>> Failed to transfer wrapper log from > >>>> 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost > >>>> Sorted: [localhost:-4.200(0.129):0/1 overload: 0] > >>>> Progress: Stage in:1 Failed but can retry:3 > >>>> Failed to transfer wrapper log from > >>>> 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost > >>>> Sorted: [localhost:-5.900(0.079):0/1 overload: 0] > >>>> Progress: Stage in:1 Failed but can retry:3 > >>>> Failed to transfer wrapper log from > >>>> 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost > >>>> SWIFT RETURN CODE NON-ZERO > >>>> test clusters ended with return value 1 > >>>> > >>>> zhao > >>>> > >>>> Ben Clifford wrote: > >>>> > >>>> > >>>>> ok, try the same compiled code with the localhost provider running those > >>>>> tests. If those pass, then there may be something not working as necessary > >>>>> in provider-deef or falkon (the step in Mihaels instructions about getting > >>>>> falkon+provider-deef to return process exit codes). If not, then there is > >>>>> some other problem. > >>>>> > >>>>> On Wed, 1 Oct 2008, Zhao Zhang wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Thanks, Ben. > >>>>>> > >>>>>> First I tested the tests/misc/run, and it is ok. > >>>>>> Then I applied the patch, then swift failed: > >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 > >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 > >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 > >>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 > >>>>>> sleep failed > >>>>>> sleep failed > >>>>>> sleep failed > >>>>>> Exception in sleep: > >>>>>> Arguments: [30] > >>>>>> Host: bgp0 > >>>>>> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j > >>>>>> stderr.txt: > >>>>>> > >>>>>> stdout.txt: > >>>>>> > >>>>>> zhao > >>>>>> > >>>>>> Ben Clifford wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> I was playing with this last week so have a patch in my stack: > >>>>>>> > >>>>>>> cd cog/ > >>>>>>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 > >>>>>>> patch -p1 < ./return-codes-1 > >>>>>>> > >>>>>>> This changes Swift behaviour but does not do anything to provider-deef, > >>>>>>> which might or might not work correctly at the moment. > >>>>>>> > >>>>>>> Running the tests in tests/misc/ by running ./run in that directory will > >>>>>>> help you test whether its working still or not - check they pass before you > >>>>>>> apply the patch, and then check afterwards too. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Oct 2 16:52:38 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 16:52:38 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222979632.25487.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> Message-ID: <48E542A6.4070506@uchicago.edu> Hi, Mihael I update my swift, and applied the patch. Things are working well for the sanity test in tests/misc/run. But failed when I run it with falkon on BGP. The log is at http://www.ci.uchicago.edu/~zzhang/sleep-20081002-1635-21n9ho6b.log The error is like this: Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/9/d/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/c/c/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/8/e/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/m/i/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/q/e/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/0/e/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/5/e/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/5/i/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/2/j/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/u/i/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/1/j/bgp0 Failed to transfer wrapper log from sleep-20081002-1635-21n9ho6b/info/9/e/bgp0 zhao Mihael Hategan wrote: > That's pretty old. The current revision is 2249. > > Alternatively you could remove the jobid argument to execute. > > On Thu, 2008-10-02 at 15:28 -0500, Zhao Zhang wrote: > >> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> svn info >> Path: . >> URL: https://svn.ci.uchicago.edu/svn/vdl2/trunk/tests/misc >> Repository Root: https://svn.ci.uchicago.edu/svn/vdl2 >> Repository UUID: e2bb083e-7f23-0410-b3a8-8253ac9ef6d8 >> Revision: 2169 >> Node Kind: directory >> Schedule: normal >> Last Changed Author: benc >> Last Changed Rev: 2153 >> Last Changed Date: 2008-07-28 01:57:52 -0500 (Mon, 28 Jul 2008) >> >> so that is 2169 >> >> Mihael Hategan wrote: >> >>> 2008-10-02 15:12:14,275-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION >>> jobid=touch-zdy94a0j - Application exception: vdl:execute @ vdl-int.k, >>> line: 395 does not support a 'jobid' argument. >>> >>> There's something wrong with the version of swift that you have. What is >>> it? >>> >>> On Thu, 2008-10-02 at 15:15 -0500, Zhao Zhang wrote: >>> >>> >>>> I rerun it, and the log is at >>>> http://www.ci.uchicago.edu/~zzhang/130-fmri-20081002-1511-s2ezc795.log >>>> >>>> >>>> zhao >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> It would be useful if you could post the logs. >>>>> >>>>> On Thu, 2008-10-02 at 14:37 -0500, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> After applying the patch, I rebuild falkon, and run the tests. It failed. >>>>>> >>>>>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/tests/misc> ./run >>>>>> Removing files from previous runs >>>>>> Running test 130-fmri >>>>>> Swift svn swift-r2169 (Swift modified locally) cog-r2125 >>>>>> >>>>>> RunID: 20081002-1429-v0e0f3a3 >>>>>> Progress: >>>>>> touch started >>>>>> touch started >>>>>> touch started >>>>>> touch started >>>>>> Sorted: [localhost:0.000(1.000):0/1 overload: 0] >>>>>> Sorted: [localhost:0.000(1.000):1/1 overload: 0] >>>>>> Failed to transfer wrapper log from >>>>>> 130-fmri-20081002-1429-v0e0f3a3/info/4/localhost >>>>>> Failed to transfer wrapper log from >>>>>> 130-fmri-20081002-1429-v0e0f3a3/info/3/localhost >>>>>> Sorted: [localhost:-2.500(0.257):0/1 overload: 0] >>>>>> Failed to transfer wrapper log from >>>>>> 130-fmri-20081002-1429-v0e0f3a3/info/7/localhost >>>>>> Sorted: [localhost:-4.200(0.129):0/1 overload: 0] >>>>>> Progress: Stage in:1 Failed but can retry:3 >>>>>> Failed to transfer wrapper log from >>>>>> 130-fmri-20081002-1429-v0e0f3a3/info/9/localhost >>>>>> Sorted: [localhost:-5.900(0.079):0/1 overload: 0] >>>>>> Progress: Stage in:1 Failed but can retry:3 >>>>>> Failed to transfer wrapper log from >>>>>> 130-fmri-20081002-1429-v0e0f3a3/info/b/localhost >>>>>> SWIFT RETURN CODE NON-ZERO >>>>>> test clusters ended with return value 1 >>>>>> >>>>>> zhao >>>>>> >>>>>> Ben Clifford wrote: >>>>>> >>>>>> >>>>>> >>>>>>> ok, try the same compiled code with the localhost provider running those >>>>>>> tests. If those pass, then there may be something not working as necessary >>>>>>> in provider-deef or falkon (the step in Mihaels instructions about getting >>>>>>> falkon+provider-deef to return process exit codes). If not, then there is >>>>>>> some other problem. >>>>>>> >>>>>>> On Wed, 1 Oct 2008, Zhao Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Thanks, Ben. >>>>>>>> >>>>>>>> First I tested the tests/misc/run, and it is ok. >>>>>>>> Then I applied the patch, then swift failed: >>>>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/n/s/bgp0 >>>>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/t/s/bgp0 >>>>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/1/t/bgp0 >>>>>>>> Failed to transfer wrapper log from sleep-20081001-1558-in93l5j4/info/0/t/bgp0 >>>>>>>> sleep failed >>>>>>>> sleep failed >>>>>>>> sleep failed >>>>>>>> Exception in sleep: >>>>>>>> Arguments: [30] >>>>>>>> Host: bgp0 >>>>>>>> Directory: sleep-20081001-1558-in93l5j4/jobs/g/s/sleep-gsski80j >>>>>>>> stderr.txt: >>>>>>>> >>>>>>>> stdout.txt: >>>>>>>> >>>>>>>> zhao >>>>>>>> >>>>>>>> Ben Clifford wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I was playing with this last week so have a patch in my stack: >>>>>>>>> >>>>>>>>> cd cog/ >>>>>>>>> wget http://www.ci.uchicago.edu/~benc/return-codes-1 >>>>>>>>> patch -p1 < ./return-codes-1 >>>>>>>>> >>>>>>>>> This changes Swift behaviour but does not do anything to provider-deef, >>>>>>>>> which might or might not work correctly at the moment. >>>>>>>>> >>>>>>>>> Running the tests in tests/misc/ by running ./run in that directory will >>>>>>>>> help you test whether its working still or not - check they pass before you >>>>>>>>> apply the patch, and then check afterwards too. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Oct 2 17:03:31 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 17:03:31 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E542A6.4070506@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> Message-ID: <1222985011.27033.2.camel@localhost> On Thu, 2008-10-02 at 16:52 -0500, Zhao Zhang wrote: > Hi, Mihael > > I update my swift, and applied the patch. Things are working well for > the sanity test in tests/misc/run. > But failed when I run it with falkon on BGP. > > The log is at > http://www.ci.uchicago.edu/~zzhang/sleep-20081002-1635-21n9ho6b.log > > > The error is like this: > Failed to transfer wrapper log from > sleep-20081002-1635-21n9ho6b/info/9/d/bgp0 We need to fix that. That's not the actual error. The error is 2008-10-02 16:36:38,212-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=sleep-od9m7a0j - Application exception: No status file was found. Check the shared filesystem on bgp0 So it doesn't look like the patch made it through. Where did you apply it? You should apply it in the source tree and then you'll have to re-compile swift. From zhaozhang at uchicago.edu Thu Oct 2 17:06:28 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 17:06:28 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222985011.27033.2.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> Message-ID: <48E545E4.2080405@uchicago.edu> I applied it in /home/falkon/cog and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in /home/falkon/cog/module/vdsk zhao Mihael Hategan wrote: > On Thu, 2008-10-02 at 16:52 -0500, Zhao Zhang wrote: > >> Hi, Mihael >> >> I update my swift, and applied the patch. Things are working well for >> the sanity test in tests/misc/run. >> But failed when I run it with falkon on BGP. >> >> The log is at >> http://www.ci.uchicago.edu/~zzhang/sleep-20081002-1635-21n9ho6b.log >> >> >> The error is like this: >> Failed to transfer wrapper log from >> sleep-20081002-1635-21n9ho6b/info/9/d/bgp0 >> > > We need to fix that. That's not the actual error. The error is > > 2008-10-02 16:36:38,212-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION > jobid=sleep-od9m7a0j - Application exception: No status file was found. > Check the shared filesystem on bgp0 > > So it doesn't look like the patch made it through. Where did you apply > it? > > You should apply it in the source tree and then you'll have to > re-compile swift. > > > > From hategan at mcs.anl.gov Thu Oct 2 17:16:12 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 17:16:12 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E545E4.2080405@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> Message-ID: <1222985772.27276.2.camel@localhost> On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: > I applied it in /home/falkon/cog > and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in > /home/falkon/cog/module/vdsk Please type the following and post the output: cd /home/falkon/cog/module/vdsk which swift grep -n "check" dist/vdsk-svn/libexec/vdl-int.k From zhaozhang at uchicago.edu Thu Oct 2 17:17:54 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 17:17:54 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222985772.27276.2.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> Message-ID: <48E54892.8080602@uchicago.edu> Mihael Hategan wrote: > On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: > >> I applied it in /home/falkon/cog >> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in >> /home/falkon/cog/module/vdsk >> > > Please type the following and post the output: > > cd /home/falkon/cog/module/vdsk > which swift > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift > grep -n "check" dist/vdsk-svn/libexec/vdl-int.k > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" dist/vdsk-svn/libexec/vdl-int.k 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] > > > From hategan at mcs.anl.gov Thu Oct 2 17:24:33 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 17:24:33 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E54892.8080602@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> Message-ID: <1222986273.27446.2.camel@localhost> On Thu, 2008-10-02 at 17:17 -0500, Zhao Zhang wrote: > > Mihael Hategan wrote: > > On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: > > > >> I applied it in /home/falkon/cog > >> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in > >> /home/falkon/cog/module/vdsk > >> > > > > Please type the following and post the output: > > > > cd /home/falkon/cog/module/vdsk > > which swift > > > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift > /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift > > > grep -n "check" dist/vdsk-svn/libexec/vdl-int.k > > > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" > dist/vdsk-svn/libexec/vdl-int.k > 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] > > > Do you have anything in $CLASSPATH? From zhaozhang at uchicago.edu Thu Oct 2 17:25:26 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 17:25:26 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222986273.27446.2.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> Message-ID: <48E54A56.7040805@uchicago.edu> yep, a lot of things there. zhao Mihael Hategan wrote: > On Thu, 2008-10-02 at 17:17 -0500, Zhao Zhang wrote: > >> Mihael Hategan wrote: >> >>> On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: >>> >>> >>>> I applied it in /home/falkon/cog >>>> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in >>>> /home/falkon/cog/module/vdsk >>>> >>>> >>> Please type the following and post the output: >>> >>> cd /home/falkon/cog/module/vdsk >>> which swift >>> >>> >> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift >> /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift >> >> >>> grep -n "check" dist/vdsk-svn/libexec/vdl-int.k >>> >>> >> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" >> dist/vdsk-svn/libexec/vdl-int.k >> 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] >> >> > > Do you have anything in $CLASSPATH? > > > > From hategan at mcs.anl.gov Thu Oct 2 17:25:48 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 17:25:48 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222986273.27446.2.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> Message-ID: <1222986348.27446.4.camel@localhost> On Thu, 2008-10-02 at 17:24 -0500, Mihael Hategan wrote: > On Thu, 2008-10-02 at 17:17 -0500, Zhao Zhang wrote: > > > > Mihael Hategan wrote: > > > On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: > > > > > >> I applied it in /home/falkon/cog > > >> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in > > >> /home/falkon/cog/module/vdsk > > >> > > > > > > Please type the following and post the output: > > > > > > cd /home/falkon/cog/module/vdsk > > > which swift > > > > > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift > > /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift > > > > > grep -n "check" dist/vdsk-svn/libexec/vdl-int.k > > > > > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" > > dist/vdsk-svn/libexec/vdl-int.k > > 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] > > > > > > > Do you have anything in $CLASSPATH? Also, do you have anything in $SWIFT_HOME? i.e. what is the output of the following: echo $CLASSPATH echo $SWIFT_HOME From zhaozhang at uchicago.edu Thu Oct 2 17:27:59 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 17:27:59 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222986348.27446.4.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> Message-ID: <48E54AEF.6000403@uchicago.edu> Mihael Hategan wrote: > On Thu, 2008-10-02 at 17:24 -0500, Mihael Hategan wrote: > >> On Thu, 2008-10-02 at 17:17 -0500, Zhao Zhang wrote: >> >>> Mihael Hategan wrote: >>> >>>> On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: >>>> >>>> >>>>> I applied it in /home/falkon/cog >>>>> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in >>>>> /home/falkon/cog/module/vdsk >>>>> >>>>> >>>> Please type the following and post the output: >>>> >>>> cd /home/falkon/cog/module/vdsk >>>> which swift >>>> >>>> >>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift >>> /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift >>> >>> >>>> grep -n "check" dist/vdsk-svn/libexec/vdl-int.k >>>> >>>> >>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" >>> dist/vdsk-svn/libexec/vdl-int.k >>> 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] >>> >>> >> Do you have anything in $CLASSPATH? >> > > Also, do you have anything in $SWIFT_HOME? > > i.e. what is the output of the following: > > echo $CLASSPATH > zzhang at login6.surveyor:~/swift/etc> echo $CLASSPATH .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar > echo $SWIFT_HOME > zzhang at login6.surveyor:~/swift/etc> echo $SWIFT_HOME /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ > > > From hategan at mcs.anl.gov Thu Oct 2 17:35:02 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 17:35:02 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E54AEF.6000403@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> Message-ID: <1222986902.27710.3.camel@localhost> That doesn't make much sense. Your swift looks ok, your environment looks ok, yet your run seems to be using a different vdl-int.k, one that isn't patched. Are you sure that in your last run you used this swift and not something else? On Thu, 2008-10-02 at 17:27 -0500, Zhao Zhang wrote: > > Mihael Hategan wrote: > > On Thu, 2008-10-02 at 17:24 -0500, Mihael Hategan wrote: > > > >> On Thu, 2008-10-02 at 17:17 -0500, Zhao Zhang wrote: > >> > >>> Mihael Hategan wrote: > >>> > >>>> On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: > >>>> > >>>> > >>>>> I applied it in /home/falkon/cog > >>>>> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in > >>>>> /home/falkon/cog/module/vdsk > >>>>> > >>>>> > >>>> Please type the following and post the output: > >>>> > >>>> cd /home/falkon/cog/module/vdsk > >>>> which swift > >>>> > >>>> > >>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift > >>> /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift > >>> > >>> > >>>> grep -n "check" dist/vdsk-svn/libexec/vdl-int.k > >>>> > >>>> > >>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" > >>> dist/vdsk-svn/libexec/vdl-int.k > >>> 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] > >>> > >>> > >> Do you have anything in $CLASSPATH? > >> > > > > Also, do you have anything in $SWIFT_HOME? > > > > i.e. what is the output of the following: > > > > echo $CLASSPATH > > > zzhang at login6.surveyor:~/swift/etc> echo $CLASSPATH > .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar > > > echo $SWIFT_HOME > > > zzhang at login6.surveyor:~/swift/etc> echo $SWIFT_HOME > /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ > > > > > > > From zhaozhang at uchicago.edu Thu Oct 2 18:41:47 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 18:41:47 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222986902.27710.3.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> Message-ID: <48E55C3B.5040207@uchicago.edu> I am pretty sure, well, I willl double check. zhao Mihael Hategan wrote: > That doesn't make much sense. > > Your swift looks ok, your environment looks ok, yet your run seems to be > using a different vdl-int.k, one that isn't patched. > > Are you sure that in your last run you used this swift and not something > else? > > On Thu, 2008-10-02 at 17:27 -0500, Zhao Zhang wrote: > >> Mihael Hategan wrote: >> >>> On Thu, 2008-10-02 at 17:24 -0500, Mihael Hategan wrote: >>> >>> >>>> On Thu, 2008-10-02 at 17:17 -0500, Zhao Zhang wrote: >>>> >>>> >>>>> Mihael Hategan wrote: >>>>> >>>>> >>>>>> On Thu, 2008-10-02 at 17:06 -0500, Zhao Zhang wrote: >>>>>> >>>>>> >>>>>> >>>>>>> I applied it in /home/falkon/cog >>>>>>> and run swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift in >>>>>>> /home/falkon/cog/module/vdsk >>>>>>> >>>>>>> >>>>>>> >>>>>> Please type the following and post the output: >>>>>> >>>>>> cd /home/falkon/cog/module/vdsk >>>>>> which swift >>>>>> >>>>>> >>>>>> >>>>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> which swift >>>>> /home/falkon/cog/modules/vdsk/dist/vdsk-svn/bin/swift >>>>> >>>>> >>>>> >>>>>> grep -n "check" dist/vdsk-svn/libexec/vdl-int.k >>>>>> >>>>>> >>>>>> >>>>> zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk> grep -n "check" >>>>> dist/vdsk-svn/libexec/vdl-int.k >>>>> 63: element(checkJobStatus, [rhost, wfdir, jobid, tr, jobdir] >>>>> >>>>> >>>>> >>>> Do you have anything in $CLASSPATH? >>>> >>>> >>> Also, do you have anything in $SWIFT_HOME? >>> >>> i.e. what is the output of the following: >>> >>> echo $CLASSPATH >>> >>> >> zzhang at login6.surveyor:~/swift/etc> echo $CLASSPATH >> .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar >> >> >>> echo $SWIFT_HOME >>> >>> >> zzhang at login6.surveyor:~/swift/etc> echo $SWIFT_HOME >> /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ >> >>> >>> > > > From benc at hawaga.org.uk Thu Oct 2 19:34:06 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Oct 2008 00:34:06 +0000 (GMT) Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E55C3B.5040207@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> Message-ID: In one shell session type all the commands that Mihael told you to type, also: echo $PATH and: swift /home/falkon/cog/module/vdsk/examples/first.swift and paste the entire shell session into an email without deleting anything, without closing any window, without even changing to a different window or getting up to get a drink or anything else at all. -- From zhaozhang at uchicago.edu Thu Oct 2 19:49:09 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 19:49:09 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> Message-ID: <48E56C05.3040807@uchicago.edu> echo $CLASSPATH zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/dist/vdsk-svn/examples/vdsk> echo $CLASSPATH .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar echo $SWIFT_HOME zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/dist/vdsk-svn/examples/vdsk> echo $SWIFT_HOME /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ Ben Clifford wrote: > In one shell session type all the commands that Mihael told you to type, > also: > > echo $PATH > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/dist/vdsk-svn/examples/vdsk> echo $PATH /home/zzhang/xar/bin:/home/falkon/cog/modules/vdsk/dist/vdsk-svn//bin:/home/falkon/falkon/bin:/home/falkon/falkon/service:/home/falkon/falkon/worker:/home/falkon/falkon/client:/home/falkon/falkon/monitor:/home/falkon/falkon/webserver:/home/falkon/falkon/ploticus/src:/home/falkon/falkon/apache-ant-1.7.0:/home/falkon/falkon/apache-ant-1.7.0/bin:/usr/lib/jvm/java:/usr/lib/jvm/java/bin:/home/falkon/falkon/container:/home/falkon/falkon/container/bin:/bin:/usr/sbin:/etc:/usr/X11R6/bin:/usr/bin:/sbin:/usr/local/bin:/bgsys/drivers/ppcfloor/bin:/bgsys/drivers/ppcfloor/comm/bin:/dbhome/bgpdb2c/sqllib/lib:/opt/ibmcmp/vac/bg/9.0/bin:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:/software/common/apps/mpiscripts:/software/common/apps/clusterbank-0.3.2/wrap:/software/common/apps/projects-list/bin:/home/zzhang/bin/linux-sles10-ppc64:/home/zzhang/bin:.:/software/common/apps/misc-scripts:/bgsys/drivers/ppcfloor/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin > and: > > swift /home/falkon/cog/module/vdsk/examples/first.swift > zzhang at login6.surveyor:/home/falkon/cog/modules/vdsk/dist/vdsk-svn/examples/vdsk> swift first.swift Swift svn swift-r2249 (Swift modified locally) cog-r2216 RunID: 20081002-1944-9o23gci1 Progress: echo started Sorted: [localhost:0.000(1.000):0/1 overload: 0] echo completed Final status: Finished successfully:1 the log file is at http://www.ci.uchicago.edu/~zzhang/first-20081002-1944-9o23gci1.log zhao > and paste the entire shell session into an email without deleting > anything, without closing any window, without even changing to a > different window or getting up to get a drink or anything else at all. > > From hategan at mcs.anl.gov Thu Oct 2 19:55:31 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 19:55:31 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E56C05.3040807@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> Message-ID: <1222995331.30093.2.camel@localhost> Now do the exact same thing (that means everything Ben said), but instead of "?swift /home/falkon/cog/module/vdsk/examples/first.swift" run "swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift" From benc at hawaga.org.uk Thu Oct 2 20:06:34 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Oct 2008 01:06:34 +0000 (GMT) Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E56C05.3040807@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> Message-ID: On Thu, 2 Oct 2008, Zhao Zhang wrote: > Swift svn swift-r2249 (Swift modified locally) cog-r2216 ok, so you got a recent swift there - more recent even that what I have on my laptop. Now in the same window that you get the above versions (if you closed the window, run first.swift again and check you get exactly the same version string), please paste exactly what you type to make Swift run through falkon. -- From zhaozhang at uchicago.edu Thu Oct 2 21:54:59 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 21:54:59 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1222995331.30093.2.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> Message-ID: <48E58983.8090307@uchicago.edu> yep, I use the same linux session, get an allocation of 64 CN on BGP. still get the same error Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/7/m/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/a/l/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/q/l/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/o/l/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/f/m/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/4/m/bgp0 Sorted: [bgp0:35.918(66.664):208/534 overload: 0] Sorted: [bgp0:35.918(66.664):209/534 overload: 0] Sorted: [bgp0:35.818(66.590):209/533 overload: 0] Sorted: [bgp0:35.818(66.590):210/533 overload: 0] Sorted: [bgp0:35.818(66.590):211/533 overload: 0] Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/m/l/bgp0 Progress: Stage in:1 Submitted:244 Failed but can retry:11 Sorted: [bgp0:34.918(65.904):212/528 overload: 0] Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/1/l/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/u/k/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/e/l/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/2/l/bgp0 Sorted: [bgp0:33.426(64.706):210/518 overload: 0] Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/9/l/bgp0 Sorted: [bgp0:33.529(64.791):210/519 overload: 0] Sorted: [bgp0:33.529(64.791):211/519 overload: 0] Sorted: [bgp0:33.334(64.629):210/518 overload: 0] Sorted: [bgp0:33.334(64.629):211/518 overload: 0] Sorted: [bgp0:33.334(64.629):212/518 overload: 0] Waiting for notification for 0 ms Received notification with 1 messages Waiting for notification for 0 ms Received notification with 1 messages Progress: Submitted:255 Active:1 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/t/m/bgp0 Failed to transfer wrapper log from sleep-20081002-2151-53sor77c/info/v/m/bgp0 sleep failed sleep failed Execution failed: Exception in sleep: Arguments: [30] Host: bgp0 Directory: sleep-20081002-2151-53sor77c/jobs/t/m/sleep-tms9ka0j stderr.txt: stdout.txt: ---- Caused by: No status file was found. Check the shared filesystem on bgp0 zhao Mihael Hategan wrote: > Now do the exact same thing (that means everything Ben said), but > instead of "?swift /home/falkon/cog/module/vdsk/examples/first.swift" > run "swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift" > > > > > From zhaozhang at uchicago.edu Thu Oct 2 21:55:36 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 21:55:36 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E58983.8090307@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> Message-ID: <48E589A8.5070604@uchicago.edu> oops, the log is at http://www.ci.uchicago.edu/~zzhang/sleep-20081002-2151-53sor77c.log Zhao Zhang wrote: > yep, I use the same linux session, get an allocation of 64 CN on BGP. > > still get the same error > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/7/m/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/a/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/q/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/o/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/f/m/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/4/m/bgp0 > Sorted: [bgp0:35.918(66.664):208/534 overload: 0] > Sorted: [bgp0:35.918(66.664):209/534 overload: 0] > Sorted: [bgp0:35.818(66.590):209/533 overload: 0] > Sorted: [bgp0:35.818(66.590):210/533 overload: 0] > Sorted: [bgp0:35.818(66.590):211/533 overload: 0] > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/m/l/bgp0 > Progress: Stage in:1 Submitted:244 Failed but can retry:11 > Sorted: [bgp0:34.918(65.904):212/528 overload: 0] > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/1/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/u/k/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/e/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/2/l/bgp0 > Sorted: [bgp0:33.426(64.706):210/518 overload: 0] > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/9/l/bgp0 > Sorted: [bgp0:33.529(64.791):210/519 overload: 0] > Sorted: [bgp0:33.529(64.791):211/519 overload: 0] > Sorted: [bgp0:33.334(64.629):210/518 overload: 0] > Sorted: [bgp0:33.334(64.629):211/518 overload: 0] > Sorted: [bgp0:33.334(64.629):212/518 overload: 0] > Waiting for notification for 0 ms > Received notification with 1 messages > Waiting for notification for 0 ms > Received notification with 1 messages > Progress: Submitted:255 Active:1 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/t/m/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/v/m/bgp0 > sleep failed > sleep failed > Execution failed: > Exception in sleep: > Arguments: [30] > Host: bgp0 > Directory: sleep-20081002-2151-53sor77c/jobs/t/m/sleep-tms9ka0j > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > No status file was found. Check the shared filesystem on bgp0 > > zhao > > Mihael Hategan wrote: >> Now do the exact same thing (that means everything Ben said), but >> instead of "?swift /home/falkon/cog/module/vdsk/examples/first.swift" >> run "swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift" >> >> >> >> >> > From hategan at mcs.anl.gov Thu Oct 2 22:17:42 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 22:17:42 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E58983.8090307@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> Message-ID: <1223003862.32358.2.camel@localhost> Please type the commands mentioned earlier and paste in an email their output and everything else between that and the part where you run swift. On Thu, 2008-10-02 at 21:54 -0500, Zhao Zhang wrote: > yep, I use the same linux session, get an allocation of 64 CN on BGP. > > still get the same error > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/7/m/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/a/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/q/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/o/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/f/m/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/4/m/bgp0 > Sorted: [bgp0:35.918(66.664):208/534 overload: 0] > Sorted: [bgp0:35.918(66.664):209/534 overload: 0] > Sorted: [bgp0:35.818(66.590):209/533 overload: 0] > Sorted: [bgp0:35.818(66.590):210/533 overload: 0] > Sorted: [bgp0:35.818(66.590):211/533 overload: 0] > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/m/l/bgp0 > Progress: Stage in:1 Submitted:244 Failed but can retry:11 > Sorted: [bgp0:34.918(65.904):212/528 overload: 0] > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/1/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/u/k/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/e/l/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/2/l/bgp0 > Sorted: [bgp0:33.426(64.706):210/518 overload: 0] > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/9/l/bgp0 > Sorted: [bgp0:33.529(64.791):210/519 overload: 0] > Sorted: [bgp0:33.529(64.791):211/519 overload: 0] > Sorted: [bgp0:33.334(64.629):210/518 overload: 0] > Sorted: [bgp0:33.334(64.629):211/518 overload: 0] > Sorted: [bgp0:33.334(64.629):212/518 overload: 0] > Waiting for notification for 0 ms > Received notification with 1 messages > Waiting for notification for 0 ms > Received notification with 1 messages > Progress: Submitted:255 Active:1 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/t/m/bgp0 > Failed to transfer wrapper log from > sleep-20081002-2151-53sor77c/info/v/m/bgp0 > sleep failed > sleep failed > Execution failed: > Exception in sleep: > Arguments: [30] > Host: bgp0 > Directory: sleep-20081002-2151-53sor77c/jobs/t/m/sleep-tms9ka0j > stderr.txt: > > stdout.txt: > > ---- > > Caused by: > No status file was found. Check the shared filesystem on bgp0 > > zhao > > Mihael Hategan wrote: > > Now do the exact same thing (that means everything Ben said), but > > instead of "?swift /home/falkon/cog/module/vdsk/examples/first.swift" > > run "swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift" > > > > > > > > > > From zhaozhang at uchicago.edu Thu Oct 2 22:46:03 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 22:46:03 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1223003862.32358.2.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> Message-ID: <48E5957B.3040604@uchicago.edu> zzhang at login6.surveyor:~> echo $SWIFT_HOME /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ zzhang at login6.surveyor:~> echo $CLASSPATH .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar zzhang at login6.surveyor:~> echo $PATH /home/zzhang/xar/bin:/home/falkon/cog/modules/vdsk/dist/vdsk-svn//bin:/home/falkon/falkon/bin:/home/falkon/falkon/service:/home/falkon/falkon/worker:/home/falkon/falkon/client:/home/falkon/falkon/monitor:/home/falkon/falkon/webserver:/home/falkon/falkon/ploticus/src:/home/falkon/falkon/apache-ant-1.7.0:/home/falkon/falkon/apache-ant-1.7.0/bin:/usr/lib/jvm/java:/usr/lib/jvm/java/bin:/home/falkon/falkon/container:/home/falkon/falkon/container/bin:/bin:/usr/sbin:/etc:/usr/X11R6/bin:/usr/bin:/sbin:/usr/local/bin:/bgsys/drivers/ppcfloor/bin:/bgsys/drivers/ppcfloor/comm/bin:/dbhome/bgpdb2c/sqllib/lib:/opt/ibmcmp/vac/bg/9.0/bin:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:/software/common/apps/mpiscripts:/software/common/apps/clusterbank-0.3.2/wrap:/software/common/apps/projects-list/bin:/home/zzhang/bin/linux-sles10-ppc64:/home/zzhang/bin:.:/software/common/apps/misc-scripts:/bgsys/drivers/ppcfloor/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin zzhang at login6.surveyor:~/swift/etc> swift -sites.file ./sites.xml -tc.file ./tc.data sleep.swift Swift svn swift-r2249 (Swift modified locally) cog-r2216 RunID: 20081002-2244-4274vop7 Failed to transfer wrapper log from sleep-20081002-2244-4274vop7/info/r/e/bgp0 Failed to transfer wrapper log from sleep-20081002-2244-4274vop7/info/3/f/bgp0 Failed to transfer wrapper log from sleep-20081002-2244-4274vop7/info/z/c/bgp0 Sorted: [bgp0:619.143(97.660):254/782 overload: 0] zhao From hategan at mcs.anl.gov Thu Oct 2 23:46:34 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 02 Oct 2008 23:46:34 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E5957B.3040604@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222830081.9463.4.camel@localhost> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> Message-ID: <1223009194.1243.1.camel@localhost> On Thu, 2008-10-02 at 22:46 -0500, Zhao Zhang wrote: > zzhang at login6.surveyor:~> echo $SWIFT_HOME > /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ > > zzhang at login6.surveyor:~> echo $CLASSPATH > .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar > > zzhang at login6.surveyor:~> echo $PATH > /home/zzhang/xar/bin:/home/falkon/cog/modules/vdsk/dist/vdsk-svn//bin:/home/falkon/falkon/bin:/home/falkon/falkon/service:/home/falkon/falkon/worker:/home/falkon/falkon/client:/home/falkon/falkon/monitor:/home/falkon/falkon/webserver:/home/falkon/falkon/ploticus/src:/home/falkon/falkon/apache-ant-1.7.0:/home/falkon/falkon/apache-ant-1.7.0/bin:/usr/lib/jvm/java:/usr/lib/jvm/java/bin:/home/falkon/falkon/container:/home/falkon/falkon/container/bin:/bin:/usr/sbin:/etc:/usr/X11R6/bin:/usr/bin:/sbin:/usr/local/bin:/bgsys/drivers/ppcfloor/bin:/bgsys/drivers/ppcfloor/comm/bin:/dbhome/bgpdb2c/sqllib/lib:/opt/ibmcmp/vac/bg/9.0/bin:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:/software/common/apps/mpiscripts:/software/common/apps/clusterbank-0.3.2/wrap:/software/common/apps/projects-list/bin:/home/zzhang/bin/linux-sles10-ppc64:/home/zzhang/bin:.:/software/common/apps/misc-scripts:/bgsys/drivers/ppcfloor/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin > > zzhang at login6.surveyor:~/swift/etc> swift -sites.file ./sites.xml > -tc.file ./tc.data sleep.swift Please paste the output of the following: ls ~/swift/etc From zhaozhang at uchicago.edu Thu Oct 2 23:48:35 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 02 Oct 2008 23:48:35 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1223009194.1243.1.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> Message-ID: <48E5A423.4080200@uchicago.edu> zzhang at login6.surveyor:~/swift/etc> ls -l ~/swift/etc total 158848 -rw-r--r-- 1 zzhang users 48 2008-07-31 13:21 dock2-20080731-1321-lrv4n1gg.0.rlog -rw-r--r-- 1 zzhang users 22852 2008-07-31 13:21 dock2-20080731-1321-lrv4n1gg.log -rw-r--r-- 1 zzhang users 0 2008-07-31 13:24 dock2-20080731-1324-mr1yjdaf.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-07-31 13:24 dock2-20080731-1324-mr1yjdaf.d -rw-r--r-- 1 zzhang users 36321 2008-07-31 13:24 dock2-20080731-1324-mr1yjdaf.log -rw-r--r-- 1 zzhang users 29826 2008-07-31 13:29 dock2-20080731-1329-77k1u9yf.log -rw-r--r-- 1 zzhang users 30070 2008-07-31 13:32 dock2-20080731-1332-8m8lpaq1.log -rw-r--r-- 1 zzhang users 13462 2008-07-31 13:33 dock2-20080731-1333-bq7avf63.log -rw-r--r-- 1 zzhang users 32335 2008-07-31 13:49 dock2-20080731-1348-2elgi2i5.log -rw-r--r-- 1 zzhang users 49104 2008-07-31 13:52 dock2-20080731-1351-lslry6kf.log -rw-r--r-- 1 zzhang users 52184 2008-08-01 11:36 dock2-20080801-1135-rgtyk5n9.log -rw-r--r-- 1 zzhang users 52421 2008-08-01 15:25 dock2-20080801-1525-956krv41.log -rw-r--r-- 1 zzhang users 5301 2008-07-31 13:48 dock2.kml -rw-r--r-- 1 zzhang users 448 2008-07-31 13:47 dock2.swift -rw-r--r-- 1 zzhang users 439 2008-07-31 13:20 dock2.swift~ -rw-r--r-- 1 zzhang users 3058 2008-07-31 13:48 dock2.xml -rw-r--r-- 1 zzhang users 179 2008-07-21 15:24 first.swift -rw-r--r-- 1 zzhang users 51604 2008-09-23 15:09 log -rwxr-xr-x 1 zzhang users 45 2008-10-01 11:07 makedir -rw-r--r-- 1 zzhang users 174 2008-10-01 10:30 #makedir# -rwxr-xr-x 1 zzhang users 35 2008-10-01 11:04 makedir~ -rw-r--r-- 1 zzhang users 8850 2008-10-01 10:34 makeworkload -rw-r--r-- 1 zzhang users 301 2008-07-31 13:51 paramlist -rw-r--r-- 1 zzhang users 163 2008-07-31 13:47 paramlist~ drwxr-xr-x 2 zzhang users 131072 2008-09-23 13:23 report-sleep-20080923-1310-o4a1kfpf drwxr-xr-x 2 zzhang users 131072 2008-09-23 15:13 report-sleep-20080923-1458-ohbjdd6c drwxr-xr-x 2 zzhang users 131072 2008-09-30 11:20 report-sleep-20080930-1115-o4n7t837 drwxr-xr-x 2 zzhang users 131072 2008-09-30 12:05 report-sleep-20080930-1157-j9g7o4ab drwxr-xr-x 2 zzhang users 131072 2008-09-30 14:39 report-sleep-20080930-1436-qmuf7e12 drwxr-xr-x 2 zzhang users 131072 2008-09-30 14:57 report-sleep-20080930-1446-u7l7yb6f drwxr-xr-x 2 zzhang users 131072 2008-09-30 15:27 report-sleep-20080930-1523-v6eta39f drwxr-xr-x 2 zzhang users 131072 2008-09-30 15:37 report-sleep-20080930-1533-265jfd55 drwxr-xr-x 2 zzhang users 131072 2008-09-30 15:59 report-sleep-20080930-1553-0mnvpezg drwxr-xr-x 2 zzhang users 131072 2008-09-30 21:56 report-sleep-20080930-2151-essbrds2 drwxr-xr-x 2 zzhang users 131072 2008-09-30 22:01 report-sleep-20080930-2157-gv79pa0g drwxr-xr-x 2 zzhang users 131072 2008-09-30 22:09 report-sleep-20080930-2206-4dtkc5k9 drwxr-xr-x 2 zzhang users 131072 2008-09-30 23:15 report-sleep-20080930-2311-pn6sbed7 drwxr-xr-x 2 zzhang users 131072 2008-10-01 11:20 report-sleep-20081001-1116-fjt1qtlg drwxr-xr-x 2 zzhang users 131072 2008-10-01 13:43 report-sleep-20081001-1340-tc37c9cg drwxr-xr-x 2 zzhang users 131072 2008-10-01 13:53 report-sleep-20081001-1348-6gygxi78 drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:04 report-sleep-20081001-1400-7k9veu61 drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:23 report-sleep-20081001-1418-yiksiatc drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:31 report-sleep-20081001-1428-e0zvs2ng drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:40 report-sleep-20081001-1437-3rp5z5k2 drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:46 report-sleep-20081001-1443-9ap03bya -rw-r--r-- 1 zzhang users 740 2008-07-31 21:50 #sites.xml# -rw-r--r-- 1 zzhang users 741 2008-10-02 22:44 sites.xml -rw-r--r-- 1 zzhang users 742 2008-10-02 21:47 sites.xml~ drwxr-xr-x 2 zzhang users 131072 2008-10-01 11:18 sleep-20081001-1116-fjt1qtlg.d -rw-r--r-- 1 zzhang users 5978421 2008-10-01 11:18 sleep-20081001-1116-fjt1qtlg.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 13:42 sleep-20081001-1340-tc37c9cg.d -rw-r--r-- 1 zzhang users 5965829 2008-10-01 13:42 sleep-20081001-1340-tc37c9cg.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 13:50 sleep-20081001-1348-6gygxi78.d -rw-r--r-- 1 zzhang users 5965018 2008-10-01 13:50 sleep-20081001-1348-6gygxi78.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:02 sleep-20081001-1400-7k9veu61.d -rw-r--r-- 1 zzhang users 5978042 2008-10-01 14:02 sleep-20081001-1400-7k9veu61.log -rw-r--r-- 1 zzhang users 0 2008-10-01 14:12 sleep-20081001-1412-19d51n85.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:14 sleep-20081001-1412-19d51n85.d -rw-r--r-- 1 zzhang users 9057141 2008-10-01 14:14 sleep-20081001-1412-19d51n85.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:20 sleep-20081001-1418-yiksiatc.d -rw-r--r-- 1 zzhang users 5991840 2008-10-01 14:20 sleep-20081001-1418-yiksiatc.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:28 sleep-20081001-1426-1t0ia81e.d -rw-r--r-- 1 zzhang users 5978206 2008-10-01 14:28 sleep-20081001-1426-1t0ia81e.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:30 sleep-20081001-1428-e0zvs2ng.d -rw-r--r-- 1 zzhang users 5952062 2008-10-01 14:30 sleep-20081001-1428-e0zvs2ng.log drwxr-xr-x 2 zzhang users 131072 2008-10-01 14:38 sleep-20081001-1437-3rp5z5k2.d -rw-r--r-- 1 zzhang users 5978868 2008-10-01 14:38 sleep-20081001-1437-3rp5z5k2.log -rw-r--r-- 1 zzhang users 5427040 2008-10-01 14:45 sleep-20081001-1443-9ap03bya.log -rw-r--r-- 1 zzhang users 5426797 2008-10-01 14:51 sleep-20081001-1450-uki952id.log -rw-r--r-- 1 zzhang users 48 2008-10-01 15:36 sleep-20081001-1534-tiiz3ndf.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-01 15:35 sleep-20081001-1534-tiiz3ndf.d -rw-r--r-- 1 zzhang users 12152674 2008-10-01 15:36 sleep-20081001-1534-tiiz3ndf.log -rw-r--r-- 1 zzhang users 0 2008-10-01 15:46 sleep-20081001-1546-pocfmzug.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-01 15:47 sleep-20081001-1546-pocfmzug.d -rw-r--r-- 1 zzhang users 6042086 2008-10-01 15:47 sleep-20081001-1546-pocfmzug.log -rw-r--r-- 1 zzhang users 0 2008-10-01 15:55 sleep-20081001-1555-6vluoyz3.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-01 15:57 sleep-20081001-1555-6vluoyz3.d -rw-r--r-- 1 zzhang users 8518465 2008-10-01 15:57 sleep-20081001-1555-6vluoyz3.log -rw-r--r-- 1 zzhang users 48 2008-10-01 15:59 sleep-20081001-1558-in93l5j4.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-01 15:59 sleep-20081001-1558-in93l5j4.d -rw-r--r-- 1 zzhang users 10373721 2008-10-01 15:59 sleep-20081001-1558-in93l5j4.log -rw-r--r-- 1 zzhang users 0 2008-10-02 16:35 sleep-20081002-1635-21n9ho6b.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-02 16:36 sleep-20081002-1635-21n9ho6b.d -rw-r--r-- 1 zzhang users 8410292 2008-10-02 16:36 sleep-20081002-1635-21n9ho6b.log -rw-r--r-- 1 zzhang users 0 2008-10-02 17:19 sleep-20081002-1719-gd0m85r6.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-02 17:20 sleep-20081002-1719-gd0m85r6.d -rw-r--r-- 1 zzhang users 8423901 2008-10-02 17:20 sleep-20081002-1719-gd0m85r6.log -rw-r--r-- 1 zzhang users 48 2008-10-02 21:53 sleep-20081002-2151-53sor77c.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-02 21:52 sleep-20081002-2151-53sor77c.d -rw-r--r-- 1 zzhang users 12264428 2008-10-02 21:53 sleep-20081002-2151-53sor77c.log -rw-r--r-- 1 zzhang users 48 2008-10-02 22:46 sleep-20081002-2244-4274vop7.0.rlog drwxr-xr-x 2 zzhang users 131072 2008-10-02 22:45 sleep-20081002-2244-4274vop7.d -rw-r--r-- 1 zzhang users 12279653 2008-10-02 22:46 sleep-20081002-2244-4274vop7.log -rw-r--r-- 1 zzhang users 2022 2008-09-30 14:36 sleep.kml -rw-r--r-- 1 zzhang users 152 2008-09-30 14:31 sleep.swift -rw-r--r-- 1 zzhang users 151 2008-09-30 11:09 sleep.swift~ -rw-r--r-- 1 zzhang users 1016 2008-09-30 14:36 sleep.xml -rw-r--r-- 1 zzhang users 3135 2008-10-02 22:44 swift.log -rw-r--r-- 1 zzhang users 1819 2008-07-31 11:47 tc.data -rw-r--r-- 1 zzhang users 15372 2008-10-01 14:31 vdl-int.k -rw-r--r-- 1 zzhang users 5897 2008-10-01 14:42 wrapper.sh -rw-r--r-- 1 zzhang users 5896 2008-10-01 14:33 wrapper.sh~ Mihael Hategan wrote: > On Thu, 2008-10-02 at 22:46 -0500, Zhao Zhang wrote: > >> zzhang at login6.surveyor:~> echo $SWIFT_HOME >> /home/falkon/cog/modules/vdsk/dist/vdsk-svn/ >> >> zzhang at login6.surveyor:~> echo $CLASSPATH >> .:/home/falkon/falkon/container:/home/falkon/falkon/container/build/classes:/home/falkon/falkon/container/lib/addressing-1.0.jar:/home/falkon/falkon/container/lib/axis.jar:/home/falkon/falkon/container/lib/axis-url.jar:/home/falkon/falkon/container/lib/bootstrap.jar:/home/falkon/falkon/container/lib/cog-axis.jar:/home/falkon/falkon/container/lib/cog-jglobus.jar:/home/falkon/falkon/container/lib/cog-tomcat.jar:/home/falkon/falkon/container/lib/cog-url.jar:/home/falkon/falkon/container/lib/commonj.jar:/home/falkon/falkon/container/lib/commons-beanutils.jar:/home/falkon/falkon/container/lib/commons-cli-2.0.jar:/home/falkon/falkon/container/lib/commons-collections-3.0.jar:/home/falkon/falkon/container/lib/commons-digester.jar:/home/falkon/falkon/container/lib/commons-discovery.jar:/home/falkon/falkon/container/lib/commons-logging.jar:/home/falkon/falkon/container/lib/concurrent.jar:/home/falkon/falkon/container/lib/cryptix32.jar:/home/falkon/falkon/container/lib/cryptix-asn1.jar:/home/falkon/falkon/container/lib/cryptix.jar:/home/falkon/falkon/container/lib/globus_delegation_service.jar:/home/falkon/falkon/container/lib/globus_delegation_stubs.jar:/home/falkon/falkon/container/lib/globus_usage_core.jar:/home/falkon/falkon/container/lib/globus_usage_packets_common.jar:/home/falkon/falkon/container/lib/globus_wsrf_mds_aggregator_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rendezvous_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_rft_stubs.jar:/home/falkon/falkon/container/lib/globus_wsrf_tools_test.jar:/home/falkon/falkon/container/lib/gram-client.jar:/home/falkon/falkon/container/lib/gram-stubs.jar:/home/falkon/falkon/container/lib/gram-utils.jar:/home/falkon/falkon/container/lib/jaxrpc.jar:/home/falkon/falkon/container/lib/jce-jdk13-125.jar:/home/falkon/falkon/container/lib/jgss.jar:/home/falkon/falkon/container/lib/junit.jar:/home/falkon/falkon/container/lib/log4j-1.2.8.jar:/home/falkon/falkon/container/lib/naming-common.jar:/home/falkon/falkon/container/lib/naming-factory.jar:/home/falkon/falkon/container/lib/naming-java.jar:/home/falkon/falkon/container/lib/naming-resources.jar:/home/falkon/falkon/container/lib/nom_tam.jar:/home/falkon/falkon/container/lib/opensaml.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_common.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS.jar:/home/falkon/falkon/container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar:/home/falkon/falkon/container/lib/puretls.jar:/home/falkon/falkon/container/lib/resolver.jar:/home/falkon/falkon/container/lib/saaj.jar:/home/falkon/falkon/container/lib/servlet.jar:/home/falkon/falkon/container/lib/swing-layout-1.0.jar:/home/falkon/falkon/container/lib/wsdl4j.jar:/home/falkon/falkon/container/lib/wsrf_common.jar:/home/falkon/falkon/container/lib/wsrf_core.jar:/home/falkon/falkon/container/lib/wsrf_core_registry.jar:/home/falkon/falkon/container/lib/wsrf_core_registry_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_authzService_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_counter_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt.jar:/home/falkon/falkon/container/lib/wsrf_core_samples_mgmt_stubs.jar:/home/falkon/falkon/container/lib/wsrf_core_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_index_stubs.jar:/home/falkon/falkon/container/lib/wsrf_mds_usefulrp_schema_stubs.jar:/home/falkon/falkon/container/lib/wsrf_provider_jce.jar:/home/falkon/falkon/container/lib/wsrf_test_interop.jar:/home/falkon/falkon/container/lib/wsrf_test_interop_stubs.jar:/home/falkon/falkon/container/lib/wsrf_test.jar:/home/falkon/falkon/container/lib/wsrf_test_unit.jar:/home/falkon/falkon/container/lib/wsrf_test_unit_stubs.jar:/home/falkon/falkon/container/lib/wsrf_tools.jar:/home/falkon/falkon/container/lib/wss4j.jar:/home/falkon/falkon/container/lib/xalan.jar:/home/falkon/falkon/container/lib/xercesImpl.jar:/home/falkon/falkon/container/lib/xml-apis.jar:/home/falkon/falkon/container/lib/xmlsec.jar >> >> zzhang at login6.surveyor:~> echo $PATH >> /home/zzhang/xar/bin:/home/falkon/cog/modules/vdsk/dist/vdsk-svn//bin:/home/falkon/falkon/bin:/home/falkon/falkon/service:/home/falkon/falkon/worker:/home/falkon/falkon/client:/home/falkon/falkon/monitor:/home/falkon/falkon/webserver:/home/falkon/falkon/ploticus/src:/home/falkon/falkon/apache-ant-1.7.0:/home/falkon/falkon/apache-ant-1.7.0/bin:/usr/lib/jvm/java:/usr/lib/jvm/java/bin:/home/falkon/falkon/container:/home/falkon/falkon/container/bin:/bin:/usr/sbin:/etc:/usr/X11R6/bin:/usr/bin:/sbin:/usr/local/bin:/bgsys/drivers/ppcfloor/bin:/bgsys/drivers/ppcfloor/comm/bin:/dbhome/bgpdb2c/sqllib/lib:/opt/ibmcmp/vac/bg/9.0/bin:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:/software/common/apps/mpiscripts:/software/common/apps/clusterbank-0.3.2/wrap:/software/common/apps/projects-list/bin:/home/zzhang/bin/linux-sles10-ppc64:/home/zzhang/bin:.:/software/common/apps/misc-scripts:/bgsys/drivers/ppcfloor/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin >> >> zzhang at login6.surveyor:~/swift/etc> swift -sites.file ./sites.xml >> -tc.file ./tc.data sleep.swift >> > > Please paste the output of the following: > > ls ~/swift/etc > > > > From hategan at mcs.anl.gov Fri Oct 3 00:02:56 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 03 Oct 2008 00:02:56 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E5A423.4080200@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> Message-ID: <1223010176.1562.6.camel@localhost> On Thu, 2008-10-02 at 23:48 -0500, Zhao Zhang wrote: > -rw-r--r-- 1 zzhang users 15372 2008-10-01 14:31 vdl-int.k There you go. You have "." in you classpath, and you're running swift from ~/swift/etc which contains a bogus vdl-int.k There are a few options for you: 1. Remove "." from CLASSPATH 1.a. Remove everything from CLASSPATH. At least when running Swift, you don't need anything there (unless you KNOW otherwise) 2. Run swift from another directory that does not contain some random vdl-int.k (or any other swift libraries) 3. Remove vdl-int.k from the etc directory And there is one option for me: 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused only problems and I have never heard of anybody having a legitimate use for this behavior (which may be because people don't complain if they don't have a problem). So I'm thinking of removing that. Any opinions? From benc at hawaga.org.uk Fri Oct 3 06:27:03 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Oct 2008 11:27:03 +0000 (GMT) Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1223010176.1562.6.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> Message-ID: > There you go. You have "." in you classpath, and you're running swift > from ~/swift/etc which contains a bogus vdl-int.k That's hit someone before, maybe Nika when she was trying to use falkon? > 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused > only problems and I have never heard of anybody having a legitimate use > for this behavior (which may be because people don't complain if they > don't have a problem). So I'm thinking of removing that. Any opinions? I was thinking very similar about SWIFT_HOME yesterday, which similarly appears to only have use in screwing up installations. CLASSPATH would have use if people started trying to do link/run time loading of interesting things, rather than compiling stuff into the source directory. But that doesn't happen. I think I'm happy with both of those being ignored from the calling environment, or at least a stern warning being given or both. -- From zhaozhang at uchicago.edu Fri Oct 3 10:08:34 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 03 Oct 2008 10:08:34 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1223010176.1562.6.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222976566.23246.0.camel@localhost> <48E52BCD.5010708@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> Message-ID: <48E63572.3000204@uchicago.edu> Cool, Remove everything doesn't solve the problem, but mv everything to another dir does. I am testing more about this for now, and will post the results. Thanks, guys. zhao Mihael Hategan wrote: > On Thu, 2008-10-02 at 23:48 -0500, Zhao Zhang wrote: > >> -rw-r--r-- 1 zzhang users 15372 2008-10-01 14:31 vdl-int.k >> > > There you go. You have "." in you classpath, and you're running swift > from ~/swift/etc which contains a bogus vdl-int.k > > There are a few options for you: > 1. Remove "." from CLASSPATH > 1.a. Remove everything from CLASSPATH. At least when running Swift, you > don't need anything there (unless you KNOW otherwise) > 2. Run swift from another directory that does not contain some random > vdl-int.k (or any other swift libraries) > 3. Remove vdl-int.k from the etc directory > > And there is one option for me: > 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused > only problems and I have never heard of anybody having a legitimate use > for this behavior (which may be because people don't complain if they > don't have a problem). So I'm thinking of removing that. Any opinions? > > > From benc at hawaga.org.uk Fri Oct 3 10:22:38 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Oct 2008 15:22:38 +0000 (GMT) Subject: [Swift-devel] bg/p /dev/shm slow filesystem acccess? Message-ID: When we looked at wrapper log plots last week when I was at the CI, there were some very simple steps (such as making a directory on the worker node in-memory filesystem) that were being logged as taking a long time. This is something that I think is worth investigating. I can imagine at least two possible causes: i) one is that logging is taking a very long time in comparison to such operations. ii) another is that the /dev/shm filesystems actually are being slow. Both of those situations seem to be a cause for further investigation/concern/characterisation. -- From zhaozhang at uchicago.edu Fri Oct 3 10:26:18 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 03 Oct 2008 10:26:18 -0500 Subject: [Swift-devel] Re: bg/p /dev/shm slow filesystem acccess? In-Reply-To: References: Message-ID: <48E6399A.10301@uchicago.edu> Thanks, Ben /dev/shm is ram disk, and from our experience, it should not be slow, I will investigate more about this. zhao Ben Clifford wrote: > When we looked at wrapper log plots last week when I was at the CI, there > were some very simple steps (such as making a directory on the worker node > in-memory filesystem) that were being logged as taking a long time. > > This is something that I think is worth investigating. I can imagine at > least two possible causes: > > i) one is that logging is taking a very long time in comparison to such > operations. > > ii) another is that the /dev/shm filesystems actually are being slow. > > Both of those situations seem to be a cause for further > investigation/concern/characterisation. > > From hategan at mcs.anl.gov Fri Oct 3 10:32:51 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 03 Oct 2008 10:32:51 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: References: <48E2E314.6040809@uchicago.edu> <48E3F136.6080404@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> Message-ID: <1223047971.2910.0.camel@localhost> There are legitimate cases for setting the classpath. Not so much for SWIFT_HOME. Zhao, why do you set SWIFT_HOME? On Fri, 2008-10-03 at 11:27 +0000, Ben Clifford wrote: > > There you go. You have "." in you classpath, and you're running swift > > from ~/swift/etc which contains a bogus vdl-int.k > > That's hit someone before, maybe Nika when she was trying to use falkon? > > > 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused > > only problems and I have never heard of anybody having a legitimate use > > for this behavior (which may be because people don't complain if they > > don't have a problem). So I'm thinking of removing that. Any opinions? > > I was thinking very similar about SWIFT_HOME yesterday, which similarly > appears to only have use in screwing up installations. > > CLASSPATH would have use if people started trying to do link/run time > loading of interesting things, rather than compiling stuff into the source > directory. But that doesn't happen. > > I think I'm happy with both of those being ignored from the calling > environment, or at least a stern warning being given or both. > From zhaozhang at uchicago.edu Fri Oct 3 10:55:14 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 03 Oct 2008 10:55:14 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1223047971.2910.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> Message-ID: <48E64062.6050601@uchicago.edu> I think it is from swift web page some time ago. zhao Mihael Hategan wrote: > There are legitimate cases for setting the classpath. Not so much for > SWIFT_HOME. > > Zhao, why do you set SWIFT_HOME? > > > On Fri, 2008-10-03 at 11:27 +0000, Ben Clifford wrote: > >>> There you go. You have "." in you classpath, and you're running swift >>> from ~/swift/etc which contains a bogus vdl-int.k >>> >> That's hit someone before, maybe Nika when she was trying to use falkon? >> >> >>> 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused >>> only problems and I have never heard of anybody having a legitimate use >>> for this behavior (which may be because people don't complain if they >>> don't have a problem). So I'm thinking of removing that. Any opinions? >>> >> I was thinking very similar about SWIFT_HOME yesterday, which similarly >> appears to only have use in screwing up installations. >> >> CLASSPATH would have use if people started trying to do link/run time >> loading of interesting things, rather than compiling stuff into the source >> directory. But that doesn't happen. >> >> I think I'm happy with both of those being ignored from the calling >> environment, or at least a stern warning being given or both. >> >> > > > From hategan at mcs.anl.gov Fri Oct 3 11:01:25 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 03 Oct 2008 11:01:25 -0500 Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <48E64062.6050601@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> Message-ID: <1223049685.3449.0.camel@localhost> Yes. I see that. Although the documentation does not say you should set it, the way it's structured seems to imply so. I think we should correct that. On Fri, 2008-10-03 at 10:55 -0500, Zhao Zhang wrote: > I think it is from swift web page some time ago. > > zhao > > Mihael Hategan wrote: > > There are legitimate cases for setting the classpath. Not so much for > > SWIFT_HOME. > > > > Zhao, why do you set SWIFT_HOME? > > > > > > On Fri, 2008-10-03 at 11:27 +0000, Ben Clifford wrote: > > > >>> There you go. You have "." in you classpath, and you're running swift > >>> from ~/swift/etc which contains a bogus vdl-int.k > >>> > >> That's hit someone before, maybe Nika when she was trying to use falkon? > >> > >> > >>> 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused > >>> only problems and I have never heard of anybody having a legitimate use > >>> for this behavior (which may be because people don't complain if they > >>> don't have a problem). So I'm thinking of removing that. Any opinions? > >>> > >> I was thinking very similar about SWIFT_HOME yesterday, which similarly > >> appears to only have use in screwing up installations. > >> > >> CLASSPATH would have use if people started trying to do link/run time > >> loading of interesting things, rather than compiling stuff into the source > >> directory. But that doesn't happen. > >> > >> I think I'm happy with both of those being ignored from the calling > >> environment, or at least a stern warning being given or both. > >> > >> > > > > > > From zhaozhang at uchicago.edu Fri Oct 3 12:56:08 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 03 Oct 2008 12:56:08 -0500 Subject: [Swift-devel] Thank you guys for the help In-Reply-To: <1223049685.3449.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> Message-ID: <48E65CB8.5030109@uchicago.edu> Hi, All I got swift optimization on BGP for now. You could tell how it is more efficient comparing with the old test logs from http://www.ci.uchicago.edu/~zzhang/report-sleep-20080930-2311-pn6sbed7/ and http://www.ci.uchicago.edu/~zzhang/report-sleep-20081003-1246-hj5bau82/ for the same sleep_30 workload. I will test more on this, and make a stable version of swift on BGP, then push it for production. Thanks for your effort zhao From iraicu at cs.uchicago.edu Fri Oct 3 13:17:14 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 03 Oct 2008 13:17:14 -0500 Subject: [Swift-devel] Thank you guys for the help In-Reply-To: <48E65CB8.5030109@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1222995331.30093.2.camel@localhost> <48E58983.8090307@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> <48E65CB8.5030109@uchicago.edu> Message-ID: <48E661AA.1010006@cs.uchicago.edu> Thats great, the original run was: 38/73/92 (min/med/max) and the new run after optimizations is: 36/43/47 (min/med/max) with the ideal time being 30 seconds. So, the median overhead was reduced from 43 seconds down to 13 seconds per task, on 256 processors. Great job! Ioan Zhao Zhang wrote: > Hi, All > > I got swift optimization on BGP for now. You could tell how it is more > efficient comparing with the old test logs from > http://www.ci.uchicago.edu/~zzhang/report-sleep-20080930-2311-pn6sbed7/ > > > and > http://www.ci.uchicago.edu/~zzhang/report-sleep-20081003-1246-hj5bau82/ > > > > for the same sleep_30 workload. > > I will test more on this, and make a stable version of swift on BGP, > then push it for production. > > Thanks for your effort > zhao > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Fri Oct 3 13:45:46 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Oct 2008 18:45:46 +0000 (GMT) Subject: [Swift-devel] Re: Thank you guys for the help In-Reply-To: <48E65CB8.5030109@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> <48E65CB8.5030109@uchicago.edu> Message-ID: The wrapper log graph here shows not much time taken apart from the execution: http://www.ci.uchicago.edu/~zzhang/report-sleep-20081003-1246-hj5bau82/info.zeroed-trailsx.png The plot from your pre-patched version on 30th september: http://www.ci.uchicago.edu/~zzhang/report-sleep-20080930-2311-pn6sbed7/info.zeroed-trailsx.png looks like its spending a lot more time doing other stuff. It seems strange that changing the status file handling caused such a tightening of the other lines (representing time taken to do other wrapper.sh activities) on the graph, especially the times related to the jobdir handling (see previous message). -- From zhaozhang at uchicago.edu Fri Oct 3 13:50:12 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 03 Oct 2008 13:50:12 -0500 Subject: [Swift-devel] Re: Thank you guys for the help In-Reply-To: References: <48E2E314.6040809@uchicago.edu> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> <48E65CB8.5030109@uchicago.edu> Message-ID: <48E66964.1020801@uchicago.edu> yep, I notice this also from the tests before. I was trying to shorten this period of time logstate "RM_JOBDIR" #rm -rf "$DIR" 2>&1 >& "$INFO" #checkError 254 "Failed to remove job directory $DIR" logstate "TOUCH_SUCCESS" As I comment the above 2 lines out, there was stilll a time gap between these two time stamp. zhao Ben Clifford wrote: > The wrapper log graph here shows not much time taken apart from the > execution: > > http://www.ci.uchicago.edu/~zzhang/report-sleep-20081003-1246-hj5bau82/info.zeroed-trailsx.png > > The plot from your pre-patched version on 30th september: > > http://www.ci.uchicago.edu/~zzhang/report-sleep-20080930-2311-pn6sbed7/info.zeroed-trailsx.png > > looks like its spending a lot more time doing other stuff. > > It seems strange that changing the status file handling caused such a > tightening of the other lines (representing time taken to do other > wrapper.sh activities) on the graph, especially the times related to the > jobdir handling (see previous message). > > From hategan at mcs.anl.gov Fri Oct 3 13:56:44 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 03 Oct 2008 13:56:44 -0500 Subject: [Swift-devel] Re: Thank you guys for the help In-Reply-To: <48E66964.1020801@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> <48E65CB8.5030109@uchicago.edu> <48E66964.1020801@uchicago.edu> Message-ID: <1223060204.8311.0.camel@localhost> So how do you clean up after the jobs? On Fri, 2008-10-03 at 13:50 -0500, Zhao Zhang wrote: > yep, I notice this also from the tests before. I was trying to shorten > this period of time > logstate "RM_JOBDIR" > #rm -rf "$DIR" 2>&1 >& "$INFO" > #checkError 254 "Failed to remove job directory $DIR" > > logstate "TOUCH_SUCCESS" > > As I comment the above 2 lines out, there was stilll a time gap between > these two time stamp. > > zhao > > Ben Clifford wrote: > > The wrapper log graph here shows not much time taken apart from the > > execution: > > > > http://www.ci.uchicago.edu/~zzhang/report-sleep-20081003-1246-hj5bau82/info.zeroed-trailsx.png > > > > The plot from your pre-patched version on 30th september: > > > > http://www.ci.uchicago.edu/~zzhang/report-sleep-20080930-2311-pn6sbed7/info.zeroed-trailsx.png > > > > looks like its spending a lot more time doing other stuff. > > > > It seems strange that changing the status file handling caused such a > > tightening of the other lines (representing time taken to do other > > wrapper.sh activities) on the graph, especially the times related to the > > jobdir handling (see previous message). > > > > From zhaozhang at uchicago.edu Fri Oct 3 13:57:50 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 03 Oct 2008 13:57:50 -0500 Subject: [Swift-devel] Re: Thank you guys for the help In-Reply-To: <1223060204.8311.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> <48E65CB8.5030109@uchicago.edu> <48E66964.1020801@uchicago.edu> <1223060204.8311.0.camel@localhost> Message-ID: <48E66B2E.6080006@uchicago.edu> I was just trying to see if this is a problem, I put these two lines back after my test. zhao Mihael Hategan wrote: > So how do you clean up after the jobs? > > On Fri, 2008-10-03 at 13:50 -0500, Zhao Zhang wrote: > >> yep, I notice this also from the tests before. I was trying to shorten >> this period of time >> logstate "RM_JOBDIR" >> #rm -rf "$DIR" 2>&1 >& "$INFO" >> #checkError 254 "Failed to remove job directory $DIR" >> >> logstate "TOUCH_SUCCESS" >> >> As I comment the above 2 lines out, there was stilll a time gap between >> these two time stamp. >> >> zhao >> >> Ben Clifford wrote: >> >>> The wrapper log graph here shows not much time taken apart from the >>> execution: >>> >>> http://www.ci.uchicago.edu/~zzhang/report-sleep-20081003-1246-hj5bau82/info.zeroed-trailsx.png >>> >>> The plot from your pre-patched version on 30th september: >>> >>> http://www.ci.uchicago.edu/~zzhang/report-sleep-20080930-2311-pn6sbed7/info.zeroed-trailsx.png >>> >>> looks like its spending a lot more time doing other stuff. >>> >>> It seems strange that changing the status file handling caused such a >>> tightening of the other lines (representing time taken to do other >>> wrapper.sh activities) on the graph, especially the times related to the >>> jobdir handling (see previous message). >>> >>> >>> > > > From benc at hawaga.org.uk Fri Oct 3 14:06:25 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Oct 2008 19:06:25 +0000 (GMT) Subject: [Swift-devel] Re: Thank you guys for the help In-Reply-To: <48E66964.1020801@uchicago.edu> References: <48E2E314.6040809@uchicago.edu> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> <48E65CB8.5030109@uchicago.edu> <48E66964.1020801@uchicago.edu> Message-ID: On Fri, 3 Oct 2008, Zhao Zhang wrote: > logstate "RM_JOBDIR" > #rm -rf "$DIR" 2>&1 >& "$INFO" > #checkError 254 "Failed to remove job directory $DIR" > > logstate "TOUCH_SUCCESS" > > As I comment the above 2 lines out, there was stilll a time gap between these > two time stamp. Ideally there should be no overhead from making log entries. On BG/P your date command only has 1s resolution, so its fairly likely that sometimes you'll get a 1s difference, sometimes 0s. If two log entries in a row without anything between are giving more than that, then you/we should investigate what's happening there. -- From benc at hawaga.org.uk Sat Oct 4 05:02:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 4 Oct 2008 10:02:28 +0000 (GMT) Subject: [Swift-devel] could swift use return code from falkon as a success notification? In-Reply-To: <1223049685.3449.0.camel@localhost> References: <48E2E314.6040809@uchicago.edu> <48E522FC.9070109@uchicago.edu> <1222979050.25306.0.camel@localhost> <48E52F03.8090800@uchicago.edu> <1222979632.25487.0.camel@localhost> <48E542A6.4070506@uchicago.edu> <1222985011.27033.2.camel@localhost> <48E545E4.2080405@uchicago.edu> <1222985772.27276.2.camel@localhost> <48E54892.8080602@uchicago.edu> <1222986273.27446.2.camel@localhost> <1222986348.27446.4.camel@localhost> <48E54AEF.6000403@uchicago.edu> <1222986902.27710.3.camel@localhost> <48E55C3B.5040207@uchicago.edu> <48E56C05.3040807@uchicago.edu> <1223003862.32358.2.camel@localhost> <48E5957B.3040604@uchicago.edu> <1223009194.1243.1.camel@localhost> <48E5A423.4080200@uchicago.edu> <1223010176.1562.6.camel@localhost> <1223047971.2910.0.camel@localhost> <48E64062.6050601@uchicago.edu> <1223049685.3449.0.camel@localhost> Message-ID: r2267 removes most references to $SWIFT_HOME and adds a caution on the remaining reference., in the docs/ documentation. On Fri, 3 Oct 2008, Mihael Hategan wrote: > Yes. I see that. Although the documentation does not say you should set > it, the way it's structured seems to imply so. > > I think we should correct that. > > On Fri, 2008-10-03 at 10:55 -0500, Zhao Zhang wrote: > > I think it is from swift web page some time ago. > > > > zhao > > > > Mihael Hategan wrote: > > > There are legitimate cases for setting the classpath. Not so much for > > > SWIFT_HOME. > > > > > > Zhao, why do you set SWIFT_HOME? > > > > > > > > > On Fri, 2008-10-03 at 11:27 +0000, Ben Clifford wrote: > > > > > >>> There you go. You have "." in you classpath, and you're running swift > > >>> from ~/swift/etc which contains a bogus vdl-int.k > > >>> > > >> That's hit someone before, maybe Nika when she was trying to use falkon? > > >> > > >> > > >>> 1. Swift/cog launchers using a pre-existing CLASSPATH has so far caused > > >>> only problems and I have never heard of anybody having a legitimate use > > >>> for this behavior (which may be because people don't complain if they > > >>> don't have a problem). So I'm thinking of removing that. Any opinions? > > >>> > > >> I was thinking very similar about SWIFT_HOME yesterday, which similarly > > >> appears to only have use in screwing up installations. > > >> > > >> CLASSPATH would have use if people started trying to do link/run time > > >> loading of interesting things, rather than compiling stuff into the source > > >> directory. But that doesn't happen. > > >> > > >> I think I'm happy with both of those being ignored from the calling > > >> environment, or at least a stern warning being given or both. > > >> > > >> > > > > > > > > > > > From iraicu at cs.uchicago.edu Sat Oct 4 15:17:43 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sat, 04 Oct 2008 15:17:43 -0500 Subject: [Swift-devel] Falkon history log plot, March 2008 - August 2008 Message-ID: <48E7CF67.5080607@cs.uchicago.edu> Hi all, I managed to create a small program that parsed all my saved logs that I had since March (probably not 100% complete), and created a summary plot of the Falkon tasks and allocated number of CPUs (every point on the plot corresponds to a 15 min average). You can see this plot at: http://dev.globus.org/wiki/Image:Falkon-logs-history-03-08-to-08-08.jpg. There were probably more logs (that date back to early 2007), but many were deleted long ago to save space in routine cleanup. For example, the 6 months worth of raw logs in this plot consume about 30GB of disk space. Also, some of the older logs had different log formats, so even if I had some of the older logs, they were likely skipped as they didn't conform to the log format the simple parser was programmed to parse. I will be at SC08 in Austin Texas in November for the entire week, so if anyone wants to meet to discuss Falkon, or anything related to Falkon, let me know. Cheers, Ioan -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Mon Oct 6 00:09:34 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Oct 2008 05:09:34 +0000 (GMT) Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: <1222823030.7845.0.camel@localhost> References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> Message-ID: On Tue, 30 Sep 2008, Mihael Hategan wrote: > I was mostly thinking about the conceptual part and the implications. At > least for small jobs, Falkon turned out to be the better option. The ahead-of-time clsutering code always seemed to be a bit buggy and I never saw it used with much success. So I don't really have any quantitative performance comparison between that and running something like falkon/coasters. I'm pretty convinced that the falkon/coaster appraoch is better for job dispatch, but I have no numeric data from actual real application executions. -- From hategan at mcs.anl.gov Mon Oct 6 00:18:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Oct 2008 00:18:21 -0500 Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> Message-ID: <1223270301.30448.5.camel@localhost> On Mon, 2008-10-06 at 05:09 +0000, Ben Clifford wrote: > On Tue, 30 Sep 2008, Mihael Hategan wrote: > > > I was mostly thinking about the conceptual part and the implications. At > > least for small jobs, Falkon turned out to be the better option. > > The ahead-of-time clsutering code always seemed to be a bit buggy and I > never saw it used with much success. So I don't really have any > quantitative performance comparison between that and running something > like falkon/coasters. I'm pretty convinced that the falkon/coaster > appraoch is better for job dispatch, but I have no numeric data from > actual real application executions. > Ioan had some in a paper/presentation he did with Yong. From iraicu at cs.uchicago.edu Mon Oct 6 08:17:37 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Oct 2008 08:17:37 -0500 Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: <1223270301.30448.5.camel@localhost> References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> <1223270301.30448.5.camel@localhost> Message-ID: <48EA0FF1.5050308@cs.uchicago.edu> We have plenty of application use cases. For example, section 5.4 in http://people.cs.uchicago.edu/~iraicu/publications/2008_NOVA08_book-chapter_Swift.pdf, shows 3 apps, Montage, fMRI, and MolDyn, where Falkon outperformed GRAM or PBS with or without clustering. We also had a poster (http://people.cs.uchicago.edu/~iraicu/publications/2008_TG08_swift+hadoop-poster.pdf) which compares Swift+Falkon and Hadoop... and I recall clearly that Swift+Falkon did better than Swift+PBS with and without clustering for the data mining (word count and sort) apps we were running. Ioan Mihael Hategan wrote: > On Mon, 2008-10-06 at 05:09 +0000, Ben Clifford wrote: > >> On Tue, 30 Sep 2008, Mihael Hategan wrote: >> >> >>> I was mostly thinking about the conceptual part and the implications. At >>> least for small jobs, Falkon turned out to be the better option. >>> >> The ahead-of-time clsutering code always seemed to be a bit buggy and I >> never saw it used with much success. So I don't really have any >> quantitative performance comparison between that and running something >> like falkon/coasters. I'm pretty convinced that the falkon/coaster >> appraoch is better for job dispatch, but I have no numeric data from >> actual real application executions. >> >> > > Ioan had some in a paper/presentation he did with Yong. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Oct 6 11:40:43 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Oct 2008 16:40:43 +0000 (GMT) Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: <1223270301.30448.5.camel@localhost> References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> <1223270301.30448.5.camel@localhost> Message-ID: On Mon, 6 Oct 2008, Mihael Hategan wrote: > Ioan had some in a paper/presentation he did with Yong. Being a cynic, firstly I would be disinclined to believe results obtained to prove a point; and even more so, I'd be disinclined to belive results which claim some piece of swift works when I don't think it doesn't very well. So I maintain my original stance. -- From iraicu at cs.uchicago.edu Mon Oct 6 11:48:44 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Oct 2008 11:48:44 -0500 Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> <1223270301.30448.5.camel@localhost> Message-ID: <48EA416C.7080202@cs.uchicago.edu> You said in an earlier email: > I'm pretty convinced that the falkon/coaster > appraoch is better for job dispatch, but I have no numeric data from > actual real application executions. So I pointed you to real application runs we made, which ended up in several papers, and shows "numeric data" for the improvements! If you want to believe these results, great, if not, thats fine too. Ioan > Ben Clifford wrote: > On Mon, 6 Oct 2008, Mihael Hategan wrote: > > >> Ioan had some in a paper/presentation he did with Yong. >> > > Being a cynic, firstly I would be disinclined to believe results obtained > to prove a point; > and even more so, I'd be disinclined to belive results > which claim some piece of swift works when I don't think it doesn't very > well. > > So I maintain my original stance. > > -- > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Oct 6 12:15:10 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Oct 2008 17:15:10 +0000 (GMT) Subject: [Swift-devel] log plots of start/end times Message-ID: The logs that come out of swift have multiple notions of 'start' and 'end'. Two are: the karajan job sumission Active time and Completed time and the wrapper log start and end times. In a perfect world, these would be very close together. In reality they aren't. For gram2, they'll likely be way off (by about 30s); in gram4 they should be much closer, and I guess in the provisioning providers - falkon and coasters - they should be pretty close together. In interactions with a couple of people in the past month I've had concern about lack of correlation (specifically how much can you rely on the first above to imply the second?) so I've added a couple of plots to the standard swift-lot-log plots that the log-processing module makes. These appear at the bottom on the info page on plots. The first plots the difference between info and karajan start times for a job, with the x axis being info start time (to indicate slow-downs in state change that are tied to a particular time, as I think some gram4 notification slowdowns might be); and the second shows two cumulative plots for how much difference is between the karajan and info times - in local mode, the start line is a little bit above the x=y line and the end line is a little bit below x=y. With more interesting job submission systems I'd expect these lines to diverge a bit more; in the rumoured case of gram4 notifications being quite delayed in some cases, I'd expect some linear shift to the left(?). Mostly these are here in order to get a feel for how appropriate it is to regard the karajan start/end times as actual job execution times _ think often not, but this is intended to give me the pretty pictures to say yes or no. -- From benc at hawaga.org.uk Mon Oct 6 12:23:33 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Oct 2008 17:23:33 +0000 (GMT) Subject: [Swift-devel] Re: log plots of start/end times In-Reply-To: References: Message-ID: actually, the equations are completely wrong in there. prize of desirability to whoever actually bothers to read and understand whats going on enough to explain. -- From benc at hawaga.org.uk Wed Oct 8 08:00:26 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 8 Oct 2008 13:00:26 +0000 (GMT) Subject: [Swift-devel] log plots of start/end times In-Reply-To: References: Message-ID: On Mon, 6 Oct 2008, Ben Clifford wrote: > In interactions with a couple of people in the past month I've had concern > about lack of correlation (specifically how much can you rely on the first > above to imply the second?) so I've added a couple of plots to the > standard swift-lot-log plots that the log-processing module makes. These > appear at the bottom on the info page on plots. I did a run of 3000 touch jobs against the UC teragrid site using gram4. The plots mentioned above are the last two on this page: http://www.ci.uchicago.edu/~benc/report-066-many-20081008-0620-tdnpx947/info.html The difference in time on the client side for active and completion status changes differs from the worker node by some amount - a minute or so once a large number of jobs are going through. For the purposes of estimating jobs in progress, a simple delay on notification delivery shouldn't matter too much. What is more interesting in that respect is that there is a larger delay for active completions than for starts. That means that using Active state as a way of estimating jobs actually running is going to over-estimate by some amount. In the plots above it looks like theres about 5..10s more delay n completion notifications compared to start notifications. The long delay in completion notifications will have an effect in slowing down job throughput through gram4 - stageout of output data and subsequent allocation of the site for another job will both be delayed. I've heard that in gt4.2, this notification delivery is a lot better, though in practice at gridka I saw severe notification delays when a room full of students hit a container so the future there is not all roses. I think for coasters and falkon, job completions will be indicated in a much more timely fashion - however I've not actually plotted the above graphs for runs of either. I think for falkon, Zhao has been keeping Swift -info logs for the purposes of debugging worker node performance, so there is enough information around to get these plots for Falkon already (by running the latest version of swift-plot-log). I'd be interested to see that, as a sanity check. -- From benc at hawaga.org.uk Wed Oct 8 08:05:49 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 8 Oct 2008 13:05:49 +0000 (GMT) Subject: [Swift-devel] log plots of start/end times In-Reply-To: References: Message-ID: Something else I find interesting in those plots is how maybe 300 jobs run really fast, then there's a suddent jump in the curve, and after that the curve is fairly smooth. That can be seen both on 'info duration histogram' and on 'how wrapper.sh is spending its time'. Not sure what causes that. Initially I'd guess something related to fielsystem load. -- From hategan at mcs.anl.gov Wed Oct 8 12:16:38 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Oct 2008 12:16:38 -0500 Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: <48EA0FF1.5050308@cs.uchicago.edu> References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> <1223270301.30448.5.camel@localhost> <48EA0FF1.5050308@cs.uchicago.edu> Message-ID: <1223486198.19296.0.camel@localhost> On Mon, 2008-10-06 at 08:17 -0500, Ioan Raicu wrote: > We have plenty of application use cases. For example, section 5.4 in > http://people.cs.uchicago.edu/~iraicu/publications/2008_NOVA08_book-chapter_Swift.pdf, Were the falkon workers pre-allocated there? From iraicu at cs.uchicago.edu Wed Oct 8 12:33:14 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Oct 2008 12:33:14 -0500 Subject: [Swift-devel] lots of very small files vs gridftp In-Reply-To: <1223486198.19296.0.camel@localhost> References: <1222731470.23121.21.camel@localhost> <2C47C9EB-948D-4E3E-BC83-402111B6C45E@mcs.anl.gov> <1222752219.29016.11.camel@localhost> <1222821944.6731.24.camel@localhost> <1222823030.7845.0.camel@localhost> <1223270301.30448.5.camel@localhost> <48EA0FF1.5050308@cs.uchicago.edu> <1223486198.19296.0.camel@localhost> Message-ID: <48ECEEDA.2060402@cs.uchicago.edu> No, the times included the time to allocate nodes as well, but that wasn't really a significant part of those experiments due to the fact that for the shorter experiments, we only used 8 to 32 nodes (probably all got allocated in under 60 sec), and for the larger test, like MolDyn, the tests lasted hours, so the several minutes it took to allocate 200 processors was again not significant. In fact, in MolDyn, you can actually see that only 150 or so processors were allocated initially, and the rest up to the max of 216 processors were allocated incrementally as they became available. Ioan Mihael Hategan wrote: > On Mon, 2008-10-06 at 08:17 -0500, Ioan Raicu wrote: > >> We have plenty of application use cases. For example, section 5.4 in >> http://people.cs.uchicago.edu/~iraicu/publications/2008_NOVA08_book-chapter_Swift.pdf, >> > > Were the falkon workers pre-allocated there? > > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Thu Oct 9 03:50:51 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Oct 2008 03:50:51 -0500 Subject: [Swift-devel] Falkon history log plot, December 2007 - October 2008 In-Reply-To: <48E7CF67.5080607@cs.uchicago.edu> References: <48E7CF67.5080607@cs.uchicago.edu> Message-ID: <48EDC5EB.80605@cs.uchicago.edu> Hi again, I found more logs that dated back to December 2007, and found a few bugs in the way I was computing various metrics (when ill-formatted logs were found). Here is the new plot, for a 10 month summary. http://dev.globus.org/wiki/Image:Falkon-logs-history-12-07-to-10-08.jpg Cheers, Ioan Ioan Raicu wrote: > Hi all, > I managed to create a small program that parsed all my saved logs that > I had since March (probably not 100% complete), and created a summary > plot of the Falkon tasks and allocated number of CPUs (every point on > the plot corresponds to a 15 min average). You can see this plot at: > http://dev.globus.org/wiki/Image:Falkon-logs-history-03-08-to-08-08.jpg. > > There were probably more logs (that date back to early 2007), but many > were deleted long ago to save space in routine cleanup. For example, > the 6 months worth of raw logs in this plot consume about 30GB of disk > space. Also, some of the older logs had different log formats, so even > if I had some of the older logs, they were likely skipped as they > didn't conform to the log format the simple parser was programmed to > parse. > I will be at SC08 in Austin Texas in November for the entire week, so > if anyone wants to meet to discuss Falkon, or anything related to > Falkon, let me know. > > Cheers, > Ioan > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Sat Oct 11 08:39:20 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 11 Oct 2008 13:39:20 +0000 (GMT) Subject: [Swift-devel] gram4 + coasters + environment Message-ID: Whenr unning through gram2 plain, gram4 plain, gram2 + coasters, on TG UC/ANL, as user 'sidgrid', I get the same environment for all three, a fairly well populated environment including a nice big path. When I run coasters + gram4, I get a very sparse environment, specifically with no PATH set. Its interesting that the environment doesn't get set when launching coasters through gram4, but that neither one of them on its own is sufficient to lose my environment. -- From benc at hawaga.org.uk Mon Oct 13 01:05:00 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 13 Oct 2008 06:05:00 +0000 (GMT) Subject: [Swift-devel] swift guts doc with many diagrams and graphs Message-ID: I started writing this document: http://www.ci.uchicago.edu/~benc/tmp/plot-tour/plot-tour.html Its an attempt at presenting how the internals of Swift work to someone who is not interested in the minor details of the code but still would like some undestanding of what is going on, for example to understand performance graphs or to help when debugging execution problems. So the target audience is probably people around the level of skenny or Zhao. It is most definitely not a swift introductory tutorial - I present swift programs without any instruction on what they do or how to run them. I intend to produce further sections on other layers within Swift, but I would be interested in any feedback on the general style/structure of the first two sections (on the execute and execute2 logs) -- From tiberius at ci.uchicago.edu Mon Oct 13 13:29:31 2008 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Mon, 13 Oct 2008 13:29:31 -0500 Subject: [Swift-devel] defining a type as an array of integers does not work Message-ID: This type row { int column[]; } fails with the error: java.util.ConcurrentModificationException caught while invoking Java method: null -- Tiberiu (Tibi) Stef-Praun, PhD Computational Sciences Researcher Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Mon Oct 13 15:09:59 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 13 Oct 2008 15:09:59 -0500 Subject: [Swift-devel] defining a type as an array of integers does not work In-Reply-To: References: Message-ID: <1223928599.28388.0.camel@localhost> On Mon, 2008-10-13 at 13:29 -0500, Tiberiu Stef-Praun wrote: > This > type row { int column[]; } > > fails with the error: > java.util.ConcurrentModificationException caught while invoking Java > method: null > Should be fixed in r2292. From abejan at ci.uchicago.edu Mon Oct 13 16:48:56 2008 From: abejan at ci.uchicago.edu (Alina Bejan) Date: Mon, 13 Oct 2008 16:48:56 -0500 Subject: [Swift-devel] Re: swift guts doc with many diagrams and graphs In-Reply-To: <20081013170003.EBCC52C0043@mail.ci.uchicago.edu> References: <20081013170003.EBCC52C0043@mail.ci.uchicago.edu> Message-ID: <48F3C248.1030308@ci.uchicago.edu> Hi Ben, I think such as guide is a very good resource for users. I have recently tried to make some graphs from some swift log files, but been unsuccessful. Please tell me if the instructions below are still accurate, or if anything changed -- basically the 'report-abc' directory fails to being created. "On a computer such as communicado, type: svn co https://svn.ci.uchicago.edu/svn/vdl2/log-processing cd log-processing/bin export PATH=$(pwd):$PATH swift-plot-log mylog-01234456-8922-abcdef.log This will give you a directory report-mylog-01234456-8922-abcdef/ I usually then mv report-mylog-01234456-8922-abcdef ~benc/public_html/ so that I can view it with a web browser at http://www.ci.uchicago.edu/~benc/report-mylog-01234456-8922-abcdef/ " Thanks, Alina swift-devel-request at ci.uchicago.edu wrote: > Send Swift-devel mailing list submissions to > swift-devel at ci.uchicago.edu > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > or, via email, send a message with subject or body 'help' to > swift-devel-request at ci.uchicago.edu > > You can reach the person managing the list at > swift-devel-owner at ci.uchicago.edu > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Swift-devel digest..." > > > Today's Topics: > > 1. swift guts doc with many diagrams and graphs (Ben Clifford) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 13 Oct 2008 06:05:00 +0000 (GMT) > From: Ben Clifford > Subject: [Swift-devel] swift guts doc with many diagrams and graphs > To: swift-devel at ci.uchicago.edu > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII > > > I started writing this document: > > http://www.ci.uchicago.edu/~benc/tmp/plot-tour/plot-tour.html > > Its an attempt at presenting how the internals of Swift work to someone > who is not interested in the minor details of the code but still would > like some undestanding of what is going on, for example to understand > performance graphs or to help when debugging execution problems. > > So the target audience is probably people around the level of skenny or > Zhao. > > It is most definitely not a swift introductory tutorial - I present swift > programs without any instruction on what they do or how to run them. > > I intend to produce further sections on other layers within Swift, but I > would be interested in any feedback on the general style/structure of the > first two sections (on the execute and execute2 logs) > > From benc at hawaga.org.uk Mon Oct 13 20:23:24 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 14 Oct 2008 01:23:24 +0000 (GMT) Subject: [Swift-devel] Re: swift guts doc with many diagrams and graphs In-Reply-To: <48F3C248.1030308@ci.uchicago.edu> References: <20081013170003.EBCC52C0043@mail.ci.uchicago.edu> <48F3C248.1030308@ci.uchicago.edu> Message-ID: On Mon, 13 Oct 2008, Alina Bejan wrote: > Please tell me if the instructions below are still accurate, or if anything > changed -- basically the 'report-abc' directory fails to being created. Probably the thing to do is make a report to eithe r this or the users list with the following: a URL or filesystem path where we can see the log file; the output of running swoft-plot-log. This guide isn't intended to contain basic user instructions for that - the page that you looked at before (with the below instructions) is. Those instructions that you pasted below should still work. > "On a computer such as communicado, type: > > svn co https://svn.ci.uchicago.edu/svn/vdl2/log-processing > cd log-processing/bin > export PATH=$(pwd):$PATH > swift-plot-log mylog-01234456-8922-abcdef.log > > This will give you a directory report-mylog-01234456-8922-abcdef/ > > I usually then > > mv report-mylog-01234456-8922-abcdef ~benc/public_html/ > > so that I can view it with a web browser at > http://www.ci.uchicago.edu/~benc/report-mylog-01234456-8922-abcdef/ > " -- From benc at hawaga.org.uk Mon Oct 13 23:02:39 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 14 Oct 2008 04:02:39 +0000 (GMT) Subject: [Swift-devel] swift builds on my machiens but not at nmi build&test, with dependency problems Message-ID: When I run a build either on my laptop or communicado (my usual build directory on my laptop, two separate fresh checkouts on communicado) I can build ok. When the nmi b&t machines run builds they have been failing since cog r One example failure is here: http://nmi-s005.cs.wisc.edu:80/nmi/index.php?page=results/runDetails&runid=109313&opt_project=swift The output of my build is in http://www.ci.uchicago.edu/~benc/tmp/depfail-benc.out The output of the nmi build (in addition to being linked from the above nmi-s005 link) is in http://www.ci.uchicago.edu/~benc/tmp/depfail-nmi.out A diff of those two is in http://www.ci.uchicago.edu/~benc/tmp/depfail.diff So it looks like in the nmi case, coasters are getting built before karajan is attempted (or perhaps karajan is not attempted at all). In my case, karajan is built first. Not sure what is causing that. -- From hategan at mcs.anl.gov Mon Oct 13 23:13:41 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 13 Oct 2008 23:13:41 -0500 Subject: [Swift-devel] swift builds on my machiens but not at nmi build&test, with dependency problems In-Reply-To: References: Message-ID: <1223957621.2871.4.camel@localhost> I made a commit today that should have fixed it. It was my attempt at automatically building coasters, which caused a circular dependency. It may work if you have a previous build. On Tue, 2008-10-14 at 04:02 +0000, Ben Clifford wrote: > When I run a build either on my laptop or communicado (my usual build > directory on my laptop, two separate fresh checkouts on communicado) I can > build ok. > > When the nmi b&t machines run builds they have been failing since cog r > > One example failure is here: > http://nmi-s005.cs.wisc.edu:80/nmi/index.php?page=results/runDetails&runid=109313&opt_project=swift > > The output of my build is in > http://www.ci.uchicago.edu/~benc/tmp/depfail-benc.out > > The output of the nmi build (in addition to being linked from the above > nmi-s005 link) is in > http://www.ci.uchicago.edu/~benc/tmp/depfail-nmi.out > > A diff of those two is in > http://www.ci.uchicago.edu/~benc/tmp/depfail.diff > > So it looks like in the nmi case, coasters are getting built before > karajan is attempted (or perhaps karajan is not attempted at all). In my > case, karajan is built first. > > Not sure what is causing that. > From hategan at mcs.anl.gov Tue Oct 14 13:41:37 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 14 Oct 2008 13:41:37 -0500 Subject: [Swift-devel] readData2 Message-ID: <1224009697.10205.6.camel@localhost> Due to the inflexibility of readData (made obvious by something Tibi was trying to do), I added readData2 in r2299. It reads things from files which have a syntax similar to the one used by the ext mapper. Example: type vector { int columns[]; } type matrix { vector rows[]; } matrix m; m = readData2("readData2.in"); where readData2.in could contain: rows[0].columns[0] = 0 rows[0].columns[1] = 2 rows[0].columns[2] = 4 rows[1].columns[0] = 1 rows[1].columns[1] = 3 rows[1].columns[2] = 5 So syntax is basically: comments lines start with "#" and are ignored; empty lines are ignored; \hs* '=' \hs* where \hs is any horizontal whitespace. From benc at hawaga.org.uk Wed Oct 15 05:38:47 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 15 Oct 2008 10:38:47 +0000 (GMT) Subject: [Swift-devel] readData2 In-Reply-To: <1224009697.10205.6.camel@localhost> References: <1224009697.10205.6.camel@localhost> Message-ID: I wrote a test for this, committed in r2301. I got a bit stuck with an issue that I thought wasn't a problem any more, but apparently still is. In the test, I initially said something like: matrix m = readData2(..) int s = m[1] + m[2]; outfile = echo(s); which hangs. Changing the s line to: int s; s=m[1]+m[2]; removes the hang - in the former case, the expression gets evaluated in the sequential bit where variables are defined; in the latter, the s variable is defined without a value and the expression evaluation happens in parallel bit where data dependencies work right. A solution in this case would be to make int s= compile to the same definition + parallel code, rather than having that sequential. However, I think that wil break other pieces of code - I think, for example, mapper parameters which rely on variables having their values defined 'before' (where 'before' is an implementation specific, rather than SwiftScript specific temporal ordering) Language-wise, the appropriate behaviour is to be properly dataset dependent for all use of values in SwiftScript, I think - that's been discussed before, and is not trivial to do. But the above is one more quantum of motivation. -- From benc at hawaga.org.uk Wed Oct 15 17:53:17 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 15 Oct 2008 22:53:17 +0000 (GMT) Subject: [Swift-devel] swift builds on my machiens but not at nmi build&test, with dependency problems In-Reply-To: <1223957621.2871.4.camel@localhost> References: <1223957621.2871.4.camel@localhost> Message-ID: yep, the past few days of builds have not had this problem, it looks like. On Mon, 13 Oct 2008, Mihael Hategan wrote: > I made a commit today that should have fixed it. > > It was my attempt at automatically building coasters, which caused a > circular dependency. It may work if you have a previous build. > > On Tue, 2008-10-14 at 04:02 +0000, Ben Clifford wrote: > > When I run a build either on my laptop or communicado (my usual build > > directory on my laptop, two separate fresh checkouts on communicado) I can > > build ok. > > > > When the nmi b&t machines run builds they have been failing since cog r > > > > One example failure is here: > > http://nmi-s005.cs.wisc.edu:80/nmi/index.php?page=results/runDetails&runid=109313&opt_project=swift > > > > The output of my build is in > > http://www.ci.uchicago.edu/~benc/tmp/depfail-benc.out > > > > The output of the nmi build (in addition to being linked from the above > > nmi-s005 link) is in > > http://www.ci.uchicago.edu/~benc/tmp/depfail-nmi.out > > > > A diff of those two is in > > http://www.ci.uchicago.edu/~benc/tmp/depfail.diff > > > > So it looks like in the nmi case, coasters are getting built before > > karajan is attempted (or perhaps karajan is not attempted at all). In my > > case, karajan is built first. > > > > Not sure what is causing that. > > > > From hategan at mcs.anl.gov Wed Oct 15 17:58:23 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 15 Oct 2008 17:58:23 -0500 Subject: [Swift-devel] swift builds on my machiens but not at nmi build&test, with dependency problems In-Reply-To: References: <1223957621.2871.4.camel@localhost> Message-ID: <1224111503.22170.0.camel@localhost> On Wed, 2008-10-15 at 22:53 +0000, Ben Clifford wrote: > yep, the past few days of builds have not had this problem, it looks like. > > On Mon, 13 Oct 2008, Mihael Hategan wrote: > > > I made a commit today that should have fixed it. > > > > It was my attempt at automatically building coasters, which caused a > > circular dependency. It may work if you have a previous build. Though this could still be done without causing funny things if coasters were a direct dependency of Swift. From iraicu at cs.uchicago.edu Thu Oct 16 13:44:32 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 16 Oct 2008 13:44:32 -0500 Subject: [Swift-devel] plotting using gkrellm Message-ID: <48F78B90.1090209@cs.uchicago.edu> Hi all, I have been thinking about how to plot performance data from Falkon at runtime while experiments are running. Initially, I tried ploticus and a simple web server, which would regenerate plots every 60 seconds, but it was a fragile solution, which didn't always work well. I have always liked how the gkrellm monitor looked and performed, and started thinking about what API it might have. After 5 min of searching through google, I found a plug-in for it that allows you to read an arbitrary text file and plot numerical values by gkrellm :) After a few hours with fiddling with the format of the log files to be read by gkrellm, and the appearance of gkrellm, here is what I have: http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1M-gkrellm.jpg where running 1M tasks, of 60 seconds each, on an emulated BG/P with 160K CPUs. The graph represents about 8 minutes of real time, in which the experiment started from scratch, and completed. Since it was relatively little effort, and the results are quite nice (especially for demos, I am thinking of SC), I think it would be nice if Swift tried to export some of its log info in the right format to be able to view the data in gkrellm in real time as the experiment progresses. Here is an example of the log format, which takes the first value of the last line in the falkon-log.txt file, and creates a new file gkrellm-log.txt with the right formated log, ready for gkrellm to read with the fchart plugin (http://lasr.cs.ucla.edu/geoff/gkrellm-fchart.html): tail -n 1 falkon-log.txt | awk ' { printf($1 "\n"); printf($1 "\n"); printf("NORMAL\n"); printf($1 "\n"); printf("!!TOOLTIP!!\n"); printf("!!EOF!!\n"); }' > gkrellm-log.tmp cp gkrellm-log.tmp gkrellm-log.txt I simply have this code in a while loop that goes on forever, and sleeps 1 second between iterations. Cheers, Ioan -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From iraicu at cs.uchicago.edu Thu Oct 16 14:09:02 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 16 Oct 2008 14:09:02 -0500 Subject: [Swift-devel] running 1 billion tasks through Falkon Message-ID: <48F7914E.2000203@cs.uchicago.edu> Hi again, Up to a few days ago, the largest test I ever did with Falkon, in terms of number of tasks, was 20 million tasks, which worked as expected. For the sake of pushing Falkon, perhaps to the point where it might break, I tried the 20M task experiment again, but now with 1B (billion) tasks. Note, this 1 billion tasks is from a single invocation of the Falkon command line client. On an orthogonal issue, I noticed that on simple sleep 0 tasks, I can't seem to saturate my 8 CPU cores where I run the service, and usually, I get 1~2 cores utilized. So, I decided to run 4 Falkon services on the same machine, and use the load-balancing client (designed for the BG/P) and ran 1B tasks (from another node with dual CPUs) across these 4 services (running on the 8-core node). Each service managed 32 CPUs in ANL/UC TG cluster, for a total of 128 CPUs, and each task was a simple sleep 0, with no I/O. Here is the plot of the run. http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1B-4serv-128cpu.jpg The good news is that the test ran great, getting an average of 15558 tasks/sec. Now, the bad news is that the throughput seemed to drop from 17000 tasks/sec (at the beginning) to 15500 tasks/sec (at the end). The explanation to the drop in throughput come from the memory management in Java on the client side, which apparently was spending 5~10 seconds in garbage collection every 60 seconds or so, and the amount of free heap space was monotonically decreasing, on average. See the graph (http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1B-4serv-128cpu-mem.jpg), which shows the free heap size decreased down to around 200MB (from the max of 1536MB). At the current rate of the potential memory leak I have in the client, the free heap would get diminished to 0MB by 1.5 billion tasks. I just thought this was an interesting experiment, which revealed a memory leak in the client, that did not show up in the smaller tests I had done so far. Cheers, Ioan -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From foster at anl.gov Thu Oct 16 13:53:25 2008 From: foster at anl.gov (Ian Foster) Date: Thu, 16 Oct 2008 13:53:25 -0500 Subject: [Swift-devel] plotting using gkrellm In-Reply-To: <48F78B90.1090209@cs.uchicago.edu> References: <48F78B90.1090209@cs.uchicago.edu> Message-ID: <66CC7AC9-ECC1-4E68-B120-0071E7F6177F@anl.gov> wow, very nice From foster at anl.gov Thu Oct 16 14:27:37 2008 From: foster at anl.gov (Ian Foster) Date: Thu, 16 Oct 2008 14:27:37 -0500 Subject: [Swift-devel] running 1 billion tasks through Falkon In-Reply-To: <48F7914E.2000203@cs.uchicago.edu> References: <48F7914E.2000203@cs.uchicago.edu> Message-ID: <51400A33-0BC4-4BA4-99BB-49B7B97ED07C@anl.gov> Way to go -- GigaJobs! On Oct 16, 2008, at 2:09 PM, Ioan Raicu wrote: > Hi again, > Up to a few days ago, the largest test I ever did with Falkon, in > terms of number of tasks, was 20 million tasks, which worked as > expected. For the sake of pushing Falkon, perhaps to the point where > it might break, I tried the 20M task experiment again, but now with > 1B (billion) tasks. Note, this 1 billion tasks is from a single > invocation of the Falkon command line client. > > On an orthogonal issue, I noticed that on simple sleep 0 tasks, I > can't seem to saturate my 8 CPU cores where I run the service, and > usually, I get 1~2 cores utilized. So, I decided to run 4 Falkon > services on the same machine, and use the load-balancing client > (designed for the BG/P) and ran 1B tasks (from another node with > dual CPUs) across these 4 services (running on the 8-core node). > Each service managed 32 CPUs in ANL/UC TG cluster, for a total of > 128 CPUs, and each task was a simple sleep 0, with no I/O. > > Here is the plot of the run. > http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1B-4serv-128cpu.jpg > > The good news is that the test ran great, getting an average of > 15558 tasks/sec. Now, the bad news is that the throughput seemed to > drop from 17000 tasks/sec (at the beginning) to 15500 tasks/sec (at > the end). The explanation to the drop in throughput come from the > memory management in Java on the client side, which apparently was > spending 5~10 seconds in garbage collection every 60 seconds or so, > and the amount of free heap space was monotonically decreasing, on > average. See the graph (http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1B-4serv-128cpu-mem.jpg > ), which shows the free heap size decreased down to around 200MB > (from the max of 1536MB). At the current rate of the potential > memory leak I have in the client, the free heap would get diminished > to 0MB by 1.5 billion tasks. > > I just thought this was an interesting experiment, which revealed a > memory leak in the client, that did not show up in the smaller tests > I had done so far. > > Cheers, > Ioan > > -- > =================================================== > Ioan Raicu > Ph.D. Candidate > =================================================== > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > =================================================== > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dev.globus.org/wiki/Incubator/Falkon > http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page > =================================================== > =================================================== > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From iraicu at cs.uchicago.edu Thu Oct 16 14:50:06 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 16 Oct 2008 14:50:06 -0500 Subject: [Swift-devel] Re: plotting using gkrellm In-Reply-To: <48F78B90.1090209@cs.uchicago.edu> References: <48F78B90.1090209@cs.uchicago.edu> Message-ID: <48F79AEE.5060203@cs.uchicago.edu> BTW, my scripts currently work for a single service. Running in a distributed mode as we do on the real BG/P where each service has their own separate logs, will require some aggregation before we can plot the logs with gkrellm. The biggest issue will be how often can be do this aggregation, when the logs are spread out over potentially 640 directories and 640 files on GPFS, that might be under heavy load? My guess, is that under light GPFS load and small scale runs, perhaps once every second is OK, but as we scale up, and GPFS load will increase, I doubt we'll be able to aggregate the logs more often than once every minute or so. I am thinking that for the large scale tests, if we want relatively real time feedback (say every second), we can't rely on these hundreds of log files... and we'll need a more costom monitoring solution that say, transmits the needed log info via TCP to an aggregator service every second. This way, as long as the aggregator service can handle the 640 clients with 1 message per second from each, then we should be ok. I think sending the needed log info via TCP messages will scale better than to append to files, and use some external program to parse many files. In the end, we'll just have to try what we have now, to see how well it works, and then try alternatives as we see the current techniques not working. Ioan Ioan Raicu wrote: > Hi all, > I have been thinking about how to plot performance data from Falkon at > runtime while experiments are running. Initially, I tried ploticus > and a simple web server, which would regenerate plots every 60 > seconds, but it was a fragile solution, which didn't always work well. > I have always liked how the gkrellm monitor looked and performed, and > started thinking about what API it might have. After 5 min of > searching through google, I found a plug-in for it that allows you to > read an arbitrary text file and plot numerical values by gkrellm :) > After a few hours with fiddling with the format of the log files to be > read by gkrellm, and the appearance of gkrellm, here is what I have: > http://people.cs.uchicago.edu/~iraicu/projects/Falkon/plots/Falkon-1M-gkrellm.jpg > > where running 1M tasks, of 60 seconds each, on an emulated BG/P with > 160K CPUs. The graph represents about 8 minutes of real time, in which > the experiment started from scratch, and completed. > > Since it was relatively little effort, and the results are quite nice > (especially for demos, I am thinking of SC), I think it would be nice > if Swift tried to export some of its log info in the right format to > be able to view the data in gkrellm in real time as the experiment > progresses. > > Here is an example of the log format, which takes the first value of > the last line in the falkon-log.txt file, and creates a new file > gkrellm-log.txt with the right formated log, ready for gkrellm to read > with the fchart plugin > (http://lasr.cs.ucla.edu/geoff/gkrellm-fchart.html): > tail -n 1 falkon-log.txt | awk ' > { > printf($1 "\n"); > printf($1 "\n"); > printf("NORMAL\n"); > printf($1 "\n"); > printf("!!TOOLTIP!!\n"); > printf("!!EOF!!\n"); > }' > gkrellm-log.tmp > cp gkrellm-log.tmp gkrellm-log.txt > > I simply have this code in a while loop that goes on forever, and > sleeps 1 second between iterations. > > Cheers, > Ioan > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Thu Oct 16 19:24:56 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 17 Oct 2008 00:24:56 +0000 (GMT) Subject: [Swift-devel] plotting using gkrellm In-Reply-To: <48F78B90.1090209@cs.uchicago.edu> References: <48F78B90.1090209@cs.uchicago.edu> Message-ID: The Swift executable itself at the moment doesn't output "total" stye log lines, so I think the changes would need to be more than a formatting change. All the totaling and other derivative information comes from the log-processing module; but that can work on live log files to some extent and could easily wire into this. -- From iraicu at cs.uchicago.edu Fri Oct 17 00:11:38 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 17 Oct 2008 00:11:38 -0500 Subject: [Swift-devel] plotting using gkrellm In-Reply-To: References: <48F78B90.1090209@cs.uchicago.edu> Message-ID: <48F81E8A.1070408@cs.uchicago.edu> Right. In fact, I didn't modify Falkon either for this, I created a bash script that uses awk to parse the last line of some Falkon log to get the desired input to gkrellm. Ioan Ben Clifford wrote: > The Swift executable itself at the moment doesn't output "total" stye log > lines, so I think the changes would need to be more than a formatting > change. All the totaling and other derivative information comes from the > log-processing module; but that can work on live log files to some extent > and could easily wire into this. > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From zhaozhang at uchicago.edu Fri Oct 17 12:55:15 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 17 Oct 2008 12:55:15 -0500 Subject: [Swift-devel] swift performance optimization on BGP Message-ID: <48F8D183.2010303@uchicago.edu> Hi, All After the optimization last week, I was suffering the 2nd round task poor performance, as you could tell from the "info trails" picture on http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ with bunch of tests and verification, I found the poor performance comes from the info log transfer. Thus I tried the collective IO for info log transfer, and it solves the problem, I get such a beautiful picture: I am working more to integrate the Collective IO system along with swift on BGP. zhao -------------- next part -------------- A non-text attachment was scrubbed... Name: info-trails.png Type: image/png Size: 5628 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: info-trails1.png Type: image/png Size: 4989 bytes Desc: not available URL: From hategan at mcs.anl.gov Fri Oct 17 13:02:50 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 17 Oct 2008 13:02:50 -0500 Subject: [Swift-devel] swift performance optimization on BGP In-Reply-To: <48F8D183.2010303@uchicago.edu> References: <48F8D183.2010303@uchicago.edu> Message-ID: <1224266570.28757.2.camel@localhost> On Fri, 2008-10-17 at 12:55 -0500, Zhao Zhang wrote: > Hi, All > > After the optimization last week, I was suffering the 2nd round task > poor performance, as you could tell > from the "info trails" picture on > http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ > > > with bunch of tests and verification, I found the poor performance comes > from the info log transfer. Thus I tried > the collective IO for info log transfer, and it solves the problem, I > get such a beautiful picture: > > I am working more to integrate the Collective IO system along with swift > on BGP. If you were to write a fileresource provider (in the same spirit that the falkon execution provider is written), that may be all you need to do. From zhaozhang at uchicago.edu Fri Oct 17 13:03:12 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 17 Oct 2008 13:03:12 -0500 Subject: [Swift-devel] swift performance optimization on BGP In-Reply-To: <1224266570.28757.2.camel@localhost> References: <48F8D183.2010303@uchicago.edu> <1224266570.28757.2.camel@localhost> Message-ID: <48F8D360.8030504@uchicago.edu> yep, that is what I am thinking about. zhao Mihael Hategan wrote: > On Fri, 2008-10-17 at 12:55 -0500, Zhao Zhang wrote: > >> Hi, All >> >> After the optimization last week, I was suffering the 2nd round task >> poor performance, as you could tell >> from the "info trails" picture on >> http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ >> >> >> with bunch of tests and verification, I found the poor performance comes >> from the info log transfer. Thus I tried >> the collective IO for info log transfer, and it solves the problem, I >> get such a beautiful picture: >> >> I am working more to integrate the Collective IO system along with swift >> on BGP. >> > > If you were to write a fileresource provider (in the same spirit that > the falkon execution provider is written), that may be all you need to > do. > > > > From iraicu at cs.uchicago.edu Fri Oct 17 13:04:16 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 17 Oct 2008 13:04:16 -0500 Subject: [Swift-devel] Re: swift performance optimization on BGP In-Reply-To: <48F8D183.2010303@uchicago.edu> References: <48F8D183.2010303@uchicago.edu> Message-ID: <48F8D3A0.3010207@cs.uchicago.edu> Is this the right summary? > Total number of events: 512 > Shortest event (s): 30 > Longest event (s): 43 > Total duration of all events (s): 17906 > Mean event duration (s): 34.97265625 > Standard deviation of event duration (s): 2.67205488703656 > Maximum number of events at one time: 262 > which shows that it takes ~35 sec for Swift to run a 30 sec task? 5 sec overhead sounds pretty good, if we can keep it this low as we scale up to more CPUs :) Good job! Ioan Zhao Zhang wrote: > Hi, All > > After the optimization last week, I was suffering the 2nd round task > poor performance, as you could tell > from the "info trails" picture on > http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ > > with bunch of tests and verification, I found the poor performance > comes from the info log transfer. Thus I tried > the collective IO for info log transfer, and it solves the problem, I > get such a beautiful picture: > > I am working more to integrate the Collective IO system along with > swift on BGP. > > zhao > > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------ > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 5628 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 4989 bytes Desc: not available URL: From zhaozhang at uchicago.edu Fri Oct 17 13:06:20 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 17 Oct 2008 13:06:20 -0500 Subject: [Swift-devel] Re: swift performance optimization on BGP In-Reply-To: <48F8D3A0.3010207@cs.uchicago.edu> References: <48F8D183.2010303@uchicago.edu> <48F8D3A0.3010207@cs.uchicago.edu> Message-ID: <48F8D41C.40206@uchicago.edu> that was the one I want to show the problem. The optimized results is here http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1808-96bsfgec/ Total number of events: 512 Shortest event (s): 30 Longest event (s): 33 Total duration of all events (s): 15865 Mean event duration (s): 30.986328125 Standard deviation of event duration (s): 0.784249612581342 Maximum number of events at one time: 277 zhao Ioan Raicu wrote: > Is this the right summary? >> Total number of events: 512 >> Shortest event (s): 30 >> Longest event (s): 43 >> Total duration of all events (s): 17906 >> Mean event duration (s): 34.97265625 >> Standard deviation of event duration (s): 2.67205488703656 >> Maximum number of events at one time: 262 >> > which shows that it takes ~35 sec for Swift to run a 30 sec task? 5 > sec overhead sounds pretty good, if we can keep it this low as we > scale up to more CPUs :) > > Good job! > > Ioan > > Zhao Zhang wrote: >> Hi, All >> >> After the optimization last week, I was suffering the 2nd round task >> poor performance, as you could tell >> from the "info trails" picture on >> http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ >> >> >> with bunch of tests and verification, I found the poor performance >> comes from the info log transfer. Thus I tried >> the collective IO for info log transfer, and it solves the problem, I >> get such a beautiful picture: >> >> I am working more to integrate the Collective IO system along with >> swift on BGP. >> >> zhao >> >> ------------------------------------------------------------------------ >> >> >> ------------------------------------------------------------------------ >> > > -- > =================================================== > Ioan Raicu > Ph.D. Candidate > =================================================== > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > =================================================== > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dev.globus.org/wiki/Incubator/Falkon > http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page > =================================================== > =================================================== > > From iraicu at cs.uchicago.edu Fri Oct 17 13:09:07 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 17 Oct 2008 13:09:07 -0500 Subject: [Swift-devel] Re: swift performance optimization on BGP In-Reply-To: <48F8D41C.40206@uchicago.edu> References: <48F8D183.2010303@uchicago.edu> <48F8D3A0.3010207@cs.uchicago.edu> <48F8D41C.40206@uchicago.edu> Message-ID: <48F8D4C3.1070904@cs.uchicago.edu> wow, 1 second overhead ;)! That's fantastic! Just out of curiosity, can you try a 1 rack run (16X more CPUs), with 8192 tasks? Will the overheads remain stable? Ioan Zhao Zhang wrote: > that was the one I want to show the problem. > The optimized results is here > http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1808-96bsfgec/ > > Total number of events: 512 > Shortest event (s): 30 > Longest event (s): 33 > Total duration of all events (s): 15865 > Mean event duration (s): 30.986328125 > Standard deviation of event duration (s): 0.784249612581342 > Maximum number of events at one time: 277 > > zhao > > Ioan Raicu wrote: >> Is this the right summary? >>> Total number of events: 512 >>> Shortest event (s): 30 >>> Longest event (s): 43 >>> Total duration of all events (s): 17906 >>> Mean event duration (s): 34.97265625 >>> Standard deviation of event duration (s): 2.67205488703656 >>> Maximum number of events at one time: 262 >>> >> which shows that it takes ~35 sec for Swift to run a 30 sec task? 5 >> sec overhead sounds pretty good, if we can keep it this low as we >> scale up to more CPUs :) >> >> Good job! >> >> Ioan >> >> Zhao Zhang wrote: >>> Hi, All >>> >>> After the optimization last week, I was suffering the 2nd round task >>> poor performance, as you could tell >>> from the "info trails" picture on >>> http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ >>> >>> >>> with bunch of tests and verification, I found the poor performance >>> comes from the info log transfer. Thus I tried >>> the collective IO for info log transfer, and it solves the problem, >>> I get such a beautiful picture: >>> >>> I am working more to integrate the Collective IO system along with >>> swift on BGP. >>> >>> zhao >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >> >> -- >> =================================================== >> Ioan Raicu >> Ph.D. Candidate >> =================================================== >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> =================================================== >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dev.globus.org/wiki/Incubator/Falkon >> http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page >> =================================================== >> =================================================== >> >> > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From zhaozhang at uchicago.edu Fri Oct 17 13:57:09 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 17 Oct 2008 13:57:09 -0500 Subject: [Swift-devel] Re: swift performance optimization on BGP In-Reply-To: <48F8D4C3.1070904@cs.uchicago.edu> References: <48F8D183.2010303@uchicago.edu> <48F8D3A0.3010207@cs.uchicago.edu> <48F8D41C.40206@uchicago.edu> <48F8D4C3.1070904@cs.uchicago.edu> Message-ID: <48F8E005.9050100@uchicago.edu> yep, I will try that after I got ready for Mike's presentation next Wed. zhao Ioan Raicu wrote: > wow, 1 second overhead ;)! That's fantastic! Just out of curiosity, > can you try a 1 rack run (16X more CPUs), with 8192 tasks? Will the > overheads remain stable? > > Ioan > > Zhao Zhang wrote: >> that was the one I want to show the problem. >> The optimized results is here >> http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1808-96bsfgec/ >> >> Total number of events: 512 >> Shortest event (s): 30 >> Longest event (s): 33 >> Total duration of all events (s): 15865 >> Mean event duration (s): 30.986328125 >> Standard deviation of event duration (s): 0.784249612581342 >> Maximum number of events at one time: 277 >> >> zhao >> >> Ioan Raicu wrote: >>> Is this the right summary? >>>> Total number of events: 512 >>>> Shortest event (s): 30 >>>> Longest event (s): 43 >>>> Total duration of all events (s): 17906 >>>> Mean event duration (s): 34.97265625 >>>> Standard deviation of event duration (s): 2.67205488703656 >>>> Maximum number of events at one time: 262 >>>> >>> which shows that it takes ~35 sec for Swift to run a 30 sec task? 5 >>> sec overhead sounds pretty good, if we can keep it this low as we >>> scale up to more CPUs :) >>> >>> Good job! >>> >>> Ioan >>> >>> Zhao Zhang wrote: >>>> Hi, All >>>> >>>> After the optimization last week, I was suffering the 2nd round >>>> task poor performance, as you could tell >>>> from the "info trails" picture on >>>> http://www.ci.uchicago.edu/~zzhang/report-sleep-20081016-1509-gqr8tea9/ >>>> >>>> >>>> with bunch of tests and verification, I found the poor performance >>>> comes from the info log transfer. Thus I tried >>>> the collective IO for info log transfer, and it solves the problem, >>>> I get such a beautiful picture: >>>> >>>> I am working more to integrate the Collective IO system along with >>>> swift on BGP. >>>> >>>> zhao >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> >>>> >>> >>> -- >>> =================================================== >>> Ioan Raicu >>> Ph.D. Candidate >>> =================================================== >>> Distributed Systems Laboratory >>> Computer Science Department >>> University of Chicago >>> 1100 E. 58th Street, Ryerson Hall >>> Chicago, IL 60637 >>> =================================================== >>> Email: iraicu at cs.uchicago.edu >>> Web: http://www.cs.uchicago.edu/~iraicu >>> http://dev.globus.org/wiki/Incubator/Falkon >>> http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page >>> =================================================== >>> =================================================== >>> >>> >> > From benc at hawaga.org.uk Mon Oct 20 00:37:42 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 20 Oct 2008 05:37:42 +0000 (GMT) Subject: [Swift-devel] plotting using gkrellm In-Reply-To: <48F81E8A.1070408@cs.uchicago.edu> References: <48F78B90.1090209@cs.uchicago.edu> <48F81E8A.1070408@cs.uchicago.edu> Message-ID: took a little while to get the fchart plugin built on os x, but after that it was fairly straightforward to get some simple Swift log data going into gkrellm. Seems like a fairly pretty way to show things as a run progresses. -- From tiberius at ci.uchicago.edu Tue Oct 21 11:22:14 2008 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 21 Oct 2008 11:22:14 -0500 Subject: [Swift-devel] Small issue Message-ID: type file{}; file fileArray[]; trace(@strcat("Files: ", at filenames(fileArray))); Result: SwiftScript trace: Files: blafoobar Expected result: SwiftScript trace: Files: bla foo bar Issue to report: it seems that the trace prints results differently for ext array, compared with all the other mappers (which leave spaces in between the elements of the array) Tibi -- Tiberiu (Tibi) Stef-Praun, PhD Computational Sciences Researcher Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Wed Oct 22 00:54:54 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 22 Oct 2008 05:54:54 +0000 (GMT) Subject: [Swift-devel] @filenames (was: Small issue) In-Reply-To: References: Message-ID: > trace(@strcat("Files: ", at filenames(fileArray))); @filenames is a funny function - it doesn't have a swift data type for its return value(!). Especially, it doesn't return a SwiftScript array. Its intended for use in an app {} block to have one expression evaluate to multiple commandline parameters. If you want the filenames printed as a single string with space separations, you can say @filename(fileArray) (at least according to the user guide); but use @filenames in app {} block parameters. My recent thinking on this has been that @filenames should return a SwiftScript array rather than its present secret magic datatype (actually a karajan list, I think, or something like that), and things lie app {} and trace modified to cope with that. -- From benc at hawaga.org.uk Wed Oct 22 12:29:11 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 22 Oct 2008 17:29:11 +0000 (GMT) Subject: [Swift-devel] app {} syntax Message-ID: I added a new syntax for delcaring app invocations. You can now say app (file myfile) p() { echo "hi" stdout=@myfile; } instead of (file myfile) p() { app { echo "hi" stdout=@myfile; } } The old syntax still works, though I think it should be made to disappear in the next 6 months or so. -- From wilde at mcs.anl.gov Wed Oct 22 13:14:57 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 22 Oct 2008 13:14:57 -0500 Subject: [Swift-devel] app {} syntax In-Reply-To: References: Message-ID: <48FF6DA1.8020509@mcs.anl.gov> Thats excellent! On 10/22/08 12:29 PM, Ben Clifford wrote: > I added a new syntax for delcaring app invocations. You can now say > > > app (file myfile) p() { > echo "hi" stdout=@myfile; > } > > instead of > > (file myfile) p() { > app { > echo "hi" stdout=@myfile; > } > } > > The old syntax still works, though I think it should be made to disappear > in the next 6 months or so. > From bugzilla-daemon at mcs.anl.gov Wed Oct 22 23:14:55 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 22 Oct 2008 23:14:55 -0500 (CDT) Subject: [Swift-devel] [Bug 61] semantics of [*] and multi-return-values need clarifying In-Reply-To: Message-ID: <20081023041455.629B1164B1@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=61 ------- Comment #1 from benc at hawaga.org.uk 2008-10-22 23:14 ------- For point 2, I think the appropriate behaviour is to change this 'multiple values' type to the SwiftScript data type string[], and make consequent modification to app {} to take string arrays to mean multiple values. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From zhaozhang at uchicago.edu Thu Oct 30 17:01:22 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 17:01:22 -0500 Subject: [Swift-devel] Q about wrapper.sh Message-ID: <490A2EB2.3000709@uchicago.edu> Hi, I got a problem when running swift and falkon together on BGP. Here is the description. If there are multiple input files from swift, the command falkon would receive is shared/wrapper.sh thetaworker-9ypial1j -jobdir 9/y -e /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err stderr.txt -i -d iofiles -if *iofiles/DataInit1.txt|iofiles/ApproxData1.txt* -of -k -a 1 iofiles/DataInit1.txt iofiles/ApproxData1.txt the delimiter is "|" When the following code in wrapper tried to parse "-if", it only take one file out, what kind of change do I need to change? Thanks best wishes zhangzhao getarg() { NAME=$1 shift VALUE="" SHIFTCOUNT=0 if [ "$1" == "$NAME" ]; then shift let "SHIFTCOUNT=$SHIFTCOUNT+1" while [ "${1:0:1}" != "-" ] && [ "$#" != "0" ]; do VALUE="$VALUE $1" shift let "SHIFTCOUNT=$SHIFTCOUNT+1" done else fail 254 "Missing $NAME argument" fi VALUE="${VALUE:1}" } From zhaozhang at uchicago.edu Thu Oct 30 17:02:59 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 17:02:59 -0500 Subject: [Swift-devel] Re: Q about wrapper.sh In-Reply-To: <490A2EB2.3000709@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> Message-ID: <490A2F13.90706@uchicago.edu> I am also attaching one info file: dev/shm/swift-info/k/k # cat thetaworker-kk8c9l1j-info Progress 2008-10-30 21:25:13 LOG_START Missing -of argument _____________________________________________________________________________ uname -a _____________________________________________________________________________ Linux (none) 2.6.19.2 #1 SMP Thu Jul 31 17:08:44 CDT 2008 ppc unknown _____________________________________________________________________________ id _____________________________________________________________________________ uid=0(root) gid=0(root) _____________________________________________________________________________ env _____________________________________________________________________________ PLOTICUS_HOME=/home/falkon/users/zzhang/0671/ploticus FALKON_CLIENT_HOME=/home/falkon/users/zzhang/0671/client GLOBUS_OPTIONS_MISC=-Xms512M -Xmx512M -Xss128K GLOBUS_LOCATION=/home/falkon/users/zzhang/0671/container FALKON_CONFIG=/home/falkon/users/zzhang/0671/config GLOBUS_PATH=/home/falkon/users/zzhang/0671/container ANT_HOME=/home/falkon/users/zzhang/0671/apache-ant-1.7.0 LD_LIBRARY_PATH=.:/lib:/fuse/lib:/fuse/usr/lib:/home/falkon/users/zzhang/0671/container/lib:/lib:/fuse/lib:/fuse/usr/lib FALKON_LOGS=/home/falkon/users/zzhang/0671/logs FALKON_CLIENT_WORKLOADS_HOME=/home/falkon/users/zzhang/0671/workloads FALKON_HOME=/home/falkon/users/zzhang/0671 PATH=/home/falkon/users/zzhang/0671/bin:/home/falkon/users/zzhang/0671/service:/home/falkon/users/zzhang/0671/worker:/home/falkon/users/zzhang/0671/client:/home/falkon/users/zzhang/0671/monitor:/home/falkon/users/zzhang/0671/webserver:/home/falkon/users/zzhang/0671/ploticus/src:/home/falkon/users/zzhang/0671/apache-ant-1.7.0:/home/falkon/users/zzhang/0671/apache-ant-1.7.0/bin:/home/falkon/users/zzhang/0671/ibm-java2-ppc-50/jre:/home/falkon/users/zzhang/0671/ibm-java2-ppc-50/jre/bin:/home/falkon/users/zzhang/0671/container:/home/falkon/users/zzhang/0671/container/bin:/home/falkon/users/zzhang/0671/cog/modules/vdsk/dist/vdsk-svn/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:. PWD=/fuse/gpfs/home/falkon/economics/tibi-20081030-1624-9mvoav3f JAVA_HOME=/home/falkon/users/zzhang/0671/ibm-java2-ppc-50/jre COBALT_JOBID=78356 SWIFT_HOME=/home/falkon/users/zzhang/0671/cog/modules/vdsk/dist/vdsk-svn SHLVL=2 GLOBUS_TCP_PORT_RANGE=50000,59999 FALKON_WWW_HOME=/home/falkon/users/zzhang/0671/webserver BG_SIZE=1 FALKON_MONITOR_HOME=/home/falkon/users/zzhang/0671/monitor FALKON_WORKER_HOME=/home/falkon/users/zzhang/0671/worker FALKON_SERVICE_HOME=/home/falkon/users/zzhang/0671/service CONTROL_INIT=4195440,0,1,0,44 FALKON_ROOT=/home/falkon/users/zzhang _=/bin/env _____________________________________________________________________________ df _____________________________________________________________________________ Filesystem 1k-blocks Used Available Use% Mounted on none 782528 896 781632 0% /dev/shm _____________________________________________________________________________ /proc/cpuinfo _____________________________________________________________________________ processor : 0 cpu : 450 Blue Gene/P DD2 revision : 24.128 (pvr 5213 1880) bogomips : 1700.00 processor : 1 cpu : 450 Blue Gene/P DD2 revision : 24.128 (pvr 5213 1880) bogomips : 1700.00 processor : 2 cpu : 450 Blue Gene/P DD2 revision : 24.128 (pvr 5213 1880) bogomips : 1700.00 processor : 3 cpu : 450 Blue Gene/P DD2 revision : 24.128 (pvr 5213 1880) bogomips : 1700.00 total bogomips : 6800.00 vendor : IBM machine : Blue Gene _____________________________________________________________________________ /proc/meminfo _____________________________________________________________________________ MemTotal: 1565120 kB MemFree: 1517696 kB Buffers: 0 kB Cached: 27072 kB SwapCached: 0 kB Active: 26240 kB Inactive: 8256 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 5888 kB Mapped: 4544 kB Slab: 6400 kB SReclaimable: 896 kB SUnreclaim: 5504 kB PageTables: 1984 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 782528 kB Committed_AS: 69376 kB VmallocTotal: 212928 kB VmallocUsed: 512 kB VmallocChunk: 212416 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 16384 kB _____________________________________________________________________________ command line _____________________________________________________________________________ thetaworker-kk8c9l1j -jobdir k/k -e /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err stderr.txt -i -d iofiles -if iofiles/DataInit1.txt Zhao Zhang wrote: > Hi, > > I got a problem when running swift and falkon together on BGP. Here is > the description. > If there are multiple input files from swift, the command falkon would > receive is > shared/wrapper.sh thetaworker-9ypial1j -jobdir 9/y -e > /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err > stderr.txt -i -d iofiles -if > *iofiles/DataInit1.txt|iofiles/ApproxData1.txt* -of -k -a 1 > iofiles/DataInit1.txt iofiles/ApproxData1.txt > the delimiter is "|" > > When the following code in wrapper tried to parse "-if", it only take > one file out, what kind of change do I need to change? Thanks > > best wishes > zhangzhao > > getarg() { > NAME=$1 > shift > VALUE="" > SHIFTCOUNT=0 > if [ "$1" == "$NAME" ]; then > shift > let "SHIFTCOUNT=$SHIFTCOUNT+1" > while [ "${1:0:1}" != "-" ] && [ "$#" != "0" ]; do > VALUE="$VALUE $1" > shift > let "SHIFTCOUNT=$SHIFTCOUNT+1" > done > else > fail 254 "Missing $NAME argument" > fi > VALUE="${VALUE:1}" > } > > From hategan at mcs.anl.gov Thu Oct 30 17:31:07 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 17:31:07 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <490A2EB2.3000709@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> Message-ID: <1225405867.10494.2.camel@localhost> On Thu, 2008-10-30 at 17:01 -0500, Zhao Zhang wrote: > Hi, > > I got a problem when running swift and falkon together on BGP. Here is > the description. > If there are multiple input files from swift, the command falkon would > receive is > shared/wrapper.sh thetaworker-9ypial1j -jobdir 9/y -e > /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err > stderr.txt -i -d iofiles -if > *iofiles/DataInit1.txt|iofiles/ApproxData1.txt* Is that a literal "*"? > -of -k -a 1 > iofiles/DataInit1.txt iofiles/ApproxData1.txt > the delimiter is "|" Delimiter for what? From hategan at mcs.anl.gov Thu Oct 30 17:32:17 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 17:32:17 -0500 Subject: [Swift-devel] Re: Q about wrapper.sh In-Reply-To: <490A2F13.90706@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> <490A2F13.90706@uchicago.edu> Message-ID: <1225405937.10494.5.camel@localhost> On Thu, 2008-10-30 at 17:02 -0500, Zhao Zhang wrote: > I am also attaching one info file: > > dev/shm/swift-info/k/k # cat thetaworker-kk8c9l1j-info > Progress 2008-10-30 21:25:13 LOG_START > Missing -of argument [...] > thetaworker-kk8c9l1j -jobdir k/k -e > /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err > stderr.txt -i -d iofiles -if iofiles/DataInit1.txt > > "-of" does seem to be missing. Did you make any changes to vdl-int.k? From hategan at mcs.anl.gov Thu Oct 30 17:33:46 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 17:33:46 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225405867.10494.2.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> Message-ID: <1225406026.10494.7.camel@localhost> On Thu, 2008-10-30 at 17:31 -0500, Mihael Hategan wrote: > On Thu, 2008-10-30 at 17:01 -0500, Zhao Zhang wrote: > > Hi, > > > > I got a problem when running swift and falkon together on BGP. Here is > > the description. > > If there are multiple input files from swift, the command falkon would > > receive is > > shared/wrapper.sh thetaworker-9ypial1j -jobdir 9/y -e > > /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err > > stderr.txt -i -d iofiles -if > > *iofiles/DataInit1.txt|iofiles/ApproxData1.txt* > > Is that a literal "*"? > > > -of -k -a 1 > > iofiles/DataInit1.txt iofiles/ApproxData1.txt > > the delimiter is "|" > > Delimiter for what? Nevermind. I see what you mean. However, I'm still unsure how the "*" got there, or what the problem is. From zhaozhang at uchicago.edu Thu Oct 30 17:34:40 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 17:34:40 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225406026.10494.7.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> Message-ID: <490A3680.9060507@uchicago.edu> "*" should not be there, I want to make them bold, and thunderbird place "*" there. zhao Mihael Hategan wrote: > On Thu, 2008-10-30 at 17:31 -0500, Mihael Hategan wrote: > >> On Thu, 2008-10-30 at 17:01 -0500, Zhao Zhang wrote: >> >>> Hi, >>> >>> I got a problem when running swift and falkon together on BGP. Here is >>> the description. >>> If there are multiple input files from swift, the command falkon would >>> receive is >>> shared/wrapper.sh thetaworker-9ypial1j -jobdir 9/y -e >>> /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err >>> stderr.txt -i -d iofiles -if >>> *iofiles/DataInit1.txt|iofiles/ApproxData1.txt* >>> >> Is that a literal "*"? >> >> >>> -of -k -a 1 >>> iofiles/DataInit1.txt iofiles/ApproxData1.txt >>> the delimiter is "|" >>> >> Delimiter for what? >> > > Nevermind. I see what you mean. > > However, I'm still unsure how the "*" got there, or what the problem is. > > > From zhaozhang at uchicago.edu Thu Oct 30 17:39:05 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 17:39:05 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225406026.10494.7.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> Message-ID: <490A3789.2010902@uchicago.edu> I made a small script with that getarg() part, and tried parse "-if iofiles/DataInit1.txt|iofiles/ApproxData1.txt" bash-3.1# ./getarg.sh iofiles/DataInit1.txt|iofiles/ApproxData1.txt This output is right, but the same code in wrapper.sh doesn't work like this. The vdl-int.k on BGP is a customized one, but I am sure the thing I changed is only the jobdir part, change a single letter jobdir to two level jobdir. zhao Mihael Hategan wrote: > On Thu, 2008-10-30 at 17:31 -0500, Mihael Hategan wrote: > >> On Thu, 2008-10-30 at 17:01 -0500, Zhao Zhang wrote: >> >>> Hi, >>> >>> I got a problem when running swift and falkon together on BGP. Here is >>> the description. >>> If there are multiple input files from swift, the command falkon would >>> receive is >>> shared/wrapper.sh thetaworker-9ypial1j -jobdir 9/y -e >>> /home/falkon/economics/bin/runWorker-bgp.sh -out stdout.txt -err >>> stderr.txt -i -d iofiles -if >>> *iofiles/DataInit1.txt|iofiles/ApproxData1.txt* >>> >> Is that a literal "*"? >> >> >>> -of -k -a 1 >>> iofiles/DataInit1.txt iofiles/ApproxData1.txt >>> the delimiter is "|" >>> >> Delimiter for what? >> > > Nevermind. I see what you mean. > > However, I'm still unsure how the "*" got there, or what the problem is. > > > From hategan at mcs.anl.gov Thu Oct 30 17:40:36 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 17:40:36 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <490A3680.9060507@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3680.9060507@uchicago.edu> Message-ID: <1225406436.16669.3.camel@localhost> On Thu, 2008-10-30 at 17:34 -0500, Zhao Zhang wrote: > "*" should not be there, I want to make them bold, and thunderbird place > "*" there. Ok. Does the wrapper log in the second email correspond to the thing you mentioned in the first email? If yes, then the problem isn't in the wrapper. The wrapper is receiving the command up to the first "|". This may be because some layer somewhere does a pipe when it sees the "|". From hategan at mcs.anl.gov Thu Oct 30 17:41:58 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 17:41:58 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <490A3789.2010902@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> Message-ID: <1225406518.16669.5.camel@localhost> On Thu, 2008-10-30 at 17:39 -0500, Zhao Zhang wrote: > I made a small script with that getarg() part, and tried parse "-if > iofiles/DataInit1.txt|iofiles/ApproxData1.txt" > > bash-3.1# ./getarg.sh > iofiles/DataInit1.txt|iofiles/ApproxData1.txt > > This output is right, but the same code in wrapper.sh doesn't work like > this. Heh. Not quite. Something in between swift and the wrapper invocation does funny things when it sees a "|". How does the worker invoke executables? From zhaozhang at uchicago.edu Thu Oct 30 18:35:47 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 18:35:47 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225406518.16669.5.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> Message-ID: <490A44D3.8000104@uchicago.edu> using system function in c library. zhao Mihael Hategan wrote: > On Thu, 2008-10-30 at 17:39 -0500, Zhao Zhang wrote: > >> I made a small script with that getarg() part, and tried parse "-if >> iofiles/DataInit1.txt|iofiles/ApproxData1.txt" >> >> bash-3.1# ./getarg.sh >> iofiles/DataInit1.txt|iofiles/ApproxData1.txt >> >> This output is right, but the same code in wrapper.sh doesn't work like >> this. >> > > Heh. Not quite. Something in between swift and the wrapper invocation > does funny things when it sees a "|". How does the worker invoke > executables? > > > From hategan at mcs.anl.gov Thu Oct 30 18:40:45 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 18:40:45 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <490A44D3.8000104@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> Message-ID: <1225410045.23926.2.camel@localhost> I have a feeling we've had this discussion before, but system() invokes a subshell (i.e. /bin/bash) to execute your command, and will do all kinds of redirection nastiness. Use fork/execve instead. On Thu, 2008-10-30 at 18:35 -0500, Zhao Zhang wrote: > using system function in c library. > > zhao > > Mihael Hategan wrote: > > On Thu, 2008-10-30 at 17:39 -0500, Zhao Zhang wrote: > > > >> I made a small script with that getarg() part, and tried parse "-if > >> iofiles/DataInit1.txt|iofiles/ApproxData1.txt" > >> > >> bash-3.1# ./getarg.sh > >> iofiles/DataInit1.txt|iofiles/ApproxData1.txt > >> > >> This output is right, but the same code in wrapper.sh doesn't work like > >> this. > >> > > > > Heh. Not quite. Something in between swift and the wrapper invocation > > does funny things when it sees a "|". How does the worker invoke > > executables? > > > > > > From zhaozhang at uchicago.edu Thu Oct 30 18:41:49 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 18:41:49 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225410045.23926.2.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> Message-ID: <490A463D.6090507@uchicago.edu> ahha, I remembered that, also we had a temporary solution some time before, right? zhao Mihael Hategan wrote: > I have a feeling we've had this discussion before, but system() invokes > a subshell (i.e. /bin/bash) to execute your command, and will do all > kinds of redirection nastiness. > > Use fork/execve instead. > > On Thu, 2008-10-30 at 18:35 -0500, Zhao Zhang wrote: > >> using system function in c library. >> >> zhao >> >> Mihael Hategan wrote: >> >>> On Thu, 2008-10-30 at 17:39 -0500, Zhao Zhang wrote: >>> >>> >>>> I made a small script with that getarg() part, and tried parse "-if >>>> iofiles/DataInit1.txt|iofiles/ApproxData1.txt" >>>> >>>> bash-3.1# ./getarg.sh >>>> iofiles/DataInit1.txt|iofiles/ApproxData1.txt >>>> >>>> This output is right, but the same code in wrapper.sh doesn't work like >>>> this. >>>> >>>> >>> Heh. Not quite. Something in between swift and the wrapper invocation >>> does funny things when it sees a "|". How does the worker invoke >>> executables? >>> >>> >>> >>> > > > From hategan at mcs.anl.gov Thu Oct 30 18:46:45 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 18:46:45 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <490A463D.6090507@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> Message-ID: <1225410405.29779.0.camel@localhost> On Thu, 2008-10-30 at 18:41 -0500, Zhao Zhang wrote: > ahha, I remembered that, also we had a temporary solution some time > before, right? Probably to change the separator to something else. From zhaozhang at uchicago.edu Thu Oct 30 18:47:12 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 18:47:12 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225410405.29779.0.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> Message-ID: <490A4780.2080908@uchicago.edu> is that in vdl-int.k? if this is the case, I could manage it myself, I think. zhao Mihael Hategan wrote: > On Thu, 2008-10-30 at 18:41 -0500, Zhao Zhang wrote: > >> ahha, I remembered that, also we had a temporary solution some time >> before, right? >> > > Probably to change the separator to something else. > > > > From hategan at mcs.anl.gov Thu Oct 30 18:52:58 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 18:52:58 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <490A4780.2080908@uchicago.edu> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> <490A4780.2080908@uchicago.edu> Message-ID: <1225410778.513.1.camel@localhost> On Thu, 2008-10-30 at 18:47 -0500, Zhao Zhang wrote: > is that in vdl-int.k? seems so. But also in wrapper.sh. > if this is the case, I could manage it myself, I > think. see flatten() around line 50. Change for(i, butLast(...), if(isList(i) flatten(i) i), "|") last(...) to for(i, butLast(...), if(isList(i) flatten(i) i), "") last(...) From hategan at mcs.anl.gov Thu Oct 30 18:54:09 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 18:54:09 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225410778.513.1.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> <490A4780.2080908@uchicago.edu> <1225410778.513.1.camel@localhost> Message-ID: <1225410849.1994.0.camel@localhost> On Thu, 2008-10-30 at 18:52 -0500, Mihael Hategan wrote: > On Thu, 2008-10-30 at 18:47 -0500, Zhao Zhang wrote: > > is that in vdl-int.k? > > seems so. But also in wrapper.sh. > > > if this is the case, I could manage it myself, I > > think. > > see flatten() around line 50. > > Change > for(i, butLast(...), if(isList(i) flatten(i) i), "|") last(...) > > to > for(i, butLast(...), if(isList(i) flatten(i) i), > "") last(...) > You should fix it properly though. Otherwise we may keep having variations on this discussion. From zhaozhang at uchicago.edu Thu Oct 30 18:54:30 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 18:54:30 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225410778.513.1.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> <490A4780.2080908@uchicago.edu> <1225410778.513.1.camel@localhost> Message-ID: <490A4936.2020909@uchicago.edu> ok, got it, I know in wrapper.sh there is IFS, which is "|" zhao Mihael Hategan wrote: > On Thu, 2008-10-30 at 18:47 -0500, Zhao Zhang wrote: > >> is that in vdl-int.k? >> > > seems so. But also in wrapper.sh. > > >> if this is the case, I could manage it myself, I >> think. >> > > see flatten() around line 50. > > Change > for(i, butLast(...), if(isList(i) flatten(i) i), "|") last(...) > > to > for(i, butLast(...), if(isList(i) flatten(i) i), > "") last(...) > > > > From zhaozhang at uchicago.edu Thu Oct 30 18:55:28 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 30 Oct 2008 18:55:28 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225410849.1994.0.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> <490A4780.2080908@uchicago.edu> <1225410778.513.1.camel@localhost> <1225410849.1994.0.camel@localhost> Message-ID: <490A4970.8030601@uchicago.edu> sure, we need this immediately for a run on BGP, after that, I will carefully document this, and propose the solution. change system() to execv()+fork() or something else. zhao Mihael Hategan wrote: > On Thu, 2008-10-30 at 18:52 -0500, Mihael Hategan wrote: > >> On Thu, 2008-10-30 at 18:47 -0500, Zhao Zhang wrote: >> >>> is that in vdl-int.k? >>> >> seems so. But also in wrapper.sh. >> >> >>> if this is the case, I could manage it myself, I >>> think. >>> >> see flatten() around line 50. >> >> Change >> for(i, butLast(...), if(isList(i) flatten(i) i), "|") last(...) >> >> to >> for(i, butLast(...), if(isList(i) flatten(i) i), >> "") last(...) >> >> > > You should fix it properly though. Otherwise we may keep having > variations on this discussion. > > > From benc at hawaga.org.uk Thu Oct 30 19:08:37 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Oct 2008 00:08:37 +0000 (GMT) Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: <1225410849.1994.0.camel@localhost> References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> <490A4780.2080908@uchicago.edu> <1225410778.513.1.camel@localhost> <1225410849.1994.0.camel@localhost> Message-ID: On Thu, 30 Oct 2008, Mihael Hategan wrote: > You should fix it properly though. Otherwise we may keep having > variations on this discussion. Swift could be changed to use something more bulletproof too. -- From hategan at mcs.anl.gov Thu Oct 30 19:12:58 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 19:12:58 -0500 Subject: [Swift-devel] Q about wrapper.sh In-Reply-To: References: <490A2EB2.3000709@uchicago.edu> <1225405867.10494.2.camel@localhost> <1225406026.10494.7.camel@localhost> <490A3789.2010902@uchicago.edu> <1225406518.16669.5.camel@localhost> <490A44D3.8000104@uchicago.edu> <1225410045.23926.2.camel@localhost> <490A463D.6090507@uchicago.edu> <1225410405.29779.0.camel@localhost> <490A4780.2080908@uchicago.edu> <1225410778.513.1.camel@localhost> <1225410849.1994.0.camel@localhost> Message-ID: <1225411978.14126.0.camel@localhost> On Fri, 2008-10-31 at 00:08 +0000, Ben Clifford wrote: > On Thu, 30 Oct 2008, Mihael Hategan wrote: > > > You should fix it properly though. Otherwise we may keep having > > variations on this discussion. > > Swift could be changed to use something more bulletproof too. Of course. Except it would be more like "swift could be changed to use blanks". From benc at hawaga.org.uk Thu Oct 30 19:22:57 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Oct 2008 00:22:57 +0000 (GMT) Subject: [Swift-devel] 0.7rc1 coming up Message-ID: I'm going to make Swift 0.7rc1 in the next day or so. Please don't commit anything too likely to break things. -- From benc at hawaga.org.uk Thu Oct 30 19:34:21 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Oct 2008 00:34:21 +0000 (GMT) Subject: [Swift-devel] more data-ordered execution Message-ID: I've been working on moving all assignments and mapper parameter evaluation into a data-ordered mode, rather than the present situation where some expressions are evaluated as they are first seen and therefore must be fully evaluatable at that point (without any yet-to-be-initialized values). This is a problem which has been around since the dawn of Swift; and seems to trip people up with some regularity (knowledge of what causes this problem and what doesn't is somewhat contingent on knowing 'too much' about the innards of the Swift implementation). I'm making steady progress, but encountering plenty of stuff on the way that is stopping this from being a quick fix. Array assignments: at present, a = [1,2,3] doesn't work or mean anything outside of a variable declaration. so you can say: int a[] = [1,2,3]; but not: int a[]; a=[1,2,3]; I done some work to fix that, though I'm still a little hazy on what the correct semantics should be for complex data structures involving files. Having got assignment mostly working, I then change assignment behaviour so that initialisations in variable declarations are compiled ot the same code as if their initialisation assignment was separate (so int a = 7; now evalautes to the code that would have been previously produced by int a; a = 7; ) For mapper parameters, I make a variable for each mapper parameter, which is then initialized (as with any variable now) in proper data-dependant order; and modify data node handling so that mappers are not initialized until all their parameter variables are closed. This is perhaps not the most efficient way to do things - at the moment I'm interested in getting correct behaviour rather than efficiency. -- From hategan at mcs.anl.gov Thu Oct 30 20:55:24 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Oct 2008 20:55:24 -0500 Subject: [Swift-devel] more data-ordered execution In-Reply-To: References: Message-ID: <1225418124.32189.17.camel@localhost> On Fri, 2008-10-31 at 00:34 +0000, Ben Clifford wrote: [...] > I done some work to fix that, though I'm still a little hazy on what the > correct semantics should be for complex data structures involving files. Right. That seems to always have caused problems when we thought about it. However, and I think what you say below expresses this fairly well, the correct model is something on the following lines: A Swift variable is a mapper, value tuple: V = (M, X) where both M and X are futures. The current model, only treats X as future, but not M (hence the need to deal with M in a different way). A declaration should not bind either M or X. Instead V.M = SomeMapperInstance(parameters*) and V.X should happen concurrently (whereas only V.X does at present). [...] > > For mapper parameters, I make a variable for each mapper parameter, which > is then initialized (as with any variable now) in proper data-dependant > order; and modify data node handling so that mappers are not initialized > until all their parameter variables are closed. Right. You can think of a mapper as a function of parameters of which all can be futures. In principle, this is a generalized function with futures as arguments, which only fully evaluates when all (required) parameters are bound. > This is perhaps not the > most efficient way to do things - at the moment I'm interested in getting > correct behaviour rather than efficiency. From benc at hawaga.org.uk Fri Oct 31 00:15:58 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Oct 2008 05:15:58 +0000 (GMT) Subject: [Swift-devel] swift 0.7rc1 - please test Message-ID: I have put Swift 0.7rc1 online at: http://www.ci.uchicago.edu/~benc/vdsk-0.7-rc1.tar.gz Please test and report back either way. If you only intend to test one release candidate, rc1 is the one you should test. If there are no fatal bugs, then this will become the 0.7 release roughly 7 days from the time of this message. As before, and for the same reasons, I'm going to ignore the dev.globus release procedure. -- From benc at hawaga.org.uk Fri Oct 31 03:21:34 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Oct 2008 08:21:34 +0000 (GMT) Subject: [Swift-devel] Re: swift 0.7rc1 - please test In-Reply-To: References: Message-ID: I ran the site tests on 0.7rc1. Some stuff worked. Some stuff didn't. The below failures don't bother me excessively for release, but I'll investigate them more deeply. These sites failed: fletch-condor-gram2.xml fletch-fork-gram2.xml osg-edu.cs.wisc.edu-condor.xml tgncsa-hg-pbs-gram2.xml tgncsa-hg-pbs-gram4.xml tgpurdue-condor-gram2.xml tgpurdue-condor-gram4.xml tp-pbs-gram2.xml UCLA_Saxon_Tier3-fork.xml These sites worked: osg-edu.cs.wisc.edu-fork.xml tgncsa-hg-fork-gram2.xml tgncsa-hg-fork-gram4.xml tgpurdue-fork-gram2.xml tgpurdue-fork-gram4.xml tgtacc-fork-gram2.xml tgtacc-lsf-gram2.xml tguc-fork-gram2.xml tguc-fork-gram4.xml tguc-pbs-gram2-syntax1.xml tguc-pbs-gram2.xml tguc-pbs-gram4.xml tp-fork-gram2.xml tp-fork-gram4.xml --