From jwells2 at twcny.rr.com Sun Mar 2 16:22:54 2008 From: jwells2 at twcny.rr.com (Jeffrey Wells) Date: Sun, 2 Mar 2008 17:22:54 -0500 Subject: [Swift-user] Submitting first.swift to another node Message-ID: <006901c87cb3$f62577e0$6401a8c0@twcny.rr.com> All; I am new to Swift and I am trying to implement a simple test on our grid. Any help or direction would be appreciated. Jeff Wells Once I inoke the swift command workflow starts but hangs on the line below: Recompilation suppressed. Using sites file: /home/wells/Desktop/sites.xml Using tc.data: /usr/local/swift/etc/tc.data Swift v0.3 r1319 (modified locally) Swift v0.3 r1319 (modified locally) RunID: 20080302-1719-dyd3zjg5 RunID: 20080302-1719-dyd3zjg5 echo started START thread=0 tr=echo START host=ruth - Initializing shared directory Task(type=FILE_OPERATION, identity=urn:0-1204496396858) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1204496396858) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1204496396861) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1204496396861) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1204496396861) setting status to Completed Task(type=FILE_TRANSFER, identity=urn:0-1204496396865) setting status to Submitted Task(type=FILE_TRANSFER, identity=urn:0-1204496396865) setting status to Active Task(type=FILE_TRANSFER, identity=urn:0-1204496396865) setting status to Completed Task(type=FILE_OPERATION, identity=urn:0-1204496396868) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1204496396868) setting status to Completed Task(type=FILE_OPERATION, identity=urn:0-1204496396870) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1204496396870) setting status to Completed Task(type=FILE_OPERATION, identity=urn:0-1204496396872) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1204496396872) setting status to Completed END host=ruth - Done initializing shared directory THREAD_ASSOCIATION jobid=echo-01e588pi thread=0 host=ruth START jobid=echo-01e588pi host=ruth - Initializing directory structure Creating directory structure in first-20080302-1719-dyd3zjg5/shared (first-20080302-1719-dyd3zjg5/shared/) Task(type=FILE_OPERATION, identity=urn:0-1204496396874) setting status to Active Task(type=FILE_OPERATION, identity=urn:0-1204496396874) setting status to Completed END jobid=echo-01e588pi - Done initializing directory structure START jobid=echo-01e588pi - Staging in files END jobid=echo-01e588pi - Staging in finished JOB_START jobid=echo-01e588pi tr=echo arguments=[Hello, world!] tmpdir=first-20080302-1719-dyd3zjg5/echo-01e588pi host=ruth ********* The sites.xml was modified so that the first.swift script would be run our remote machine. /tmp My command line is: swift -sites.file sites.xml first.swift -verbose -debug The exception is: 2008-03-02 16:27:29,523-0500 DEBUG Loader Recompilation suppressed. 2008-03-02 16:27:31,929-0500 INFO unknown Using sites file: /home/wells/Desktop/sites.xml 2008-03-02 16:27:31,936-0500 INFO unknown Using tc.data: /usr/local/swift/etc/tc.data 2008-03-02 16:27:33,338-0500 INFO unknown Swift v0.3 r1319 (modified locally) 2008-03-02 16:27:33,341-0500 INFO unknown RunID: 20080302-1627-dzk4seng 2008-03-02 16:27:33,574-0500 INFO vdl:execute START thread=0 tr=echo 2008-03-02 16:27:33,604-0500 INFO vdl:initshareddir START host=ruth - Initializing shared directory 2008-03-02 16:27:35,633-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250899) setting status to Active 2008-03-02 16:27:35,660-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250899) setting status to Completed 2008-03-02 16:27:35,700-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250902) setting status to Submitted 2008-03-02 16:27:35,702-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250902) setting status to Active 2008-03-02 16:27:36,269-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250902) setting status to Completed 2008-03-02 16:27:36,272-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250906) setting status to Submitted 2008-03-02 16:27:36,273-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250906) setting status to Active 2008-03-02 16:27:36,486-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250906) setting status to Completed 2008-03-02 16:27:36,488-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250909) setting status to Active 2008-03-02 16:27:36,496-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250909) setting status to Completed 2008-03-02 16:27:36,498-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250911) setting status to Active 2008-03-02 16:27:36,513-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250911) setting status to Completed 2008-03-02 16:27:36,516-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250913) setting status to Active 2008-03-02 16:27:36,521-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250913) setting status to Completed 2008-03-02 16:27:36,524-0500 INFO vdl:initshareddir END host=ruth - Done initializing shared directory 2008-03-02 16:27:36,527-0500 DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=echo-kv0268pi thread=0 host=ruth 2008-03-02 16:27:36,539-0500 INFO vdl:createdirset START jobid=echo-kv0268pi host=ruth - Initializing directory structure 2008-03-02 16:27:36,544-0500 INFO vdl:createdirs Creating directory structure in first-20080302-1627-dzk4seng/shared (first-20080302-1627-dzk4seng/shared/) 2008-03-02 16:27:36,545-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250915) setting status to Active 2008-03-02 16:27:36,549-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250915) setting status to Completed 2008-03-02 16:27:36,551-0500 INFO vdl:createdirset END jobid=echo-kv0268pi - Done initializing directory structure 2008-03-02 16:27:36,552-0500 INFO vdl:dostagein START jobid=echo-kv0268pi - Staging in files 2008-03-02 16:27:36,554-0500 INFO vdl:dostagein END jobid=echo-kv0268pi - Staging in finished 2008-03-02 16:27:36,556-0500 DEBUG vdl:execute2 JOB_START jobid=echo-kv0268pi tr=echo arguments=[Hello, world!] tmpdir=first-20080302-1627-dzk4seng/echo-kv0268pi host=ruth 2008-03-02 16:30:36,747-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION, identity=urn:0-1204493250917) setting status to Failed null 2008-03-02 16:30:36,761-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=echo-kv0268pi - Application exception: null task:execute @ vdl-int.k, line: 360 sys:sequential @ vdl-int.k, line: 353 sys:try @ vdl-int.k, line: 352 task:allocatehost @ vdl-int.k, line: 334 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 144 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20080302-1627-dzk4seng Caused by: java.lang.ClassCastException at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:165) at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:92) at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:571) at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166) at org.globus.gram.Gram.request(Gram.java:315) at org.globus.gram.GramJob.request(GramJob.java:262) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:136) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:534) 2008-03-02 16:30:36,781-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250919) setting status to Submitted 2008-03-02 16:30:36,782-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250919) setting status to Active 2008-03-02 16:30:37,271-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250919) setting status to Failed Exception in getFile 2008-03-02 16:30:37,275-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250923) setting status to Active 2008-03-02 16:30:37,276-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250923) setting status to Completed 2008-03-02 16:30:37,281-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250927) setting status to Submitted 2008-03-02 16:30:37,282-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250927) setting status to Active 2008-03-02 16:30:37,506-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250927) setting status to Failed Exception in getFile 2008-03-02 16:30:37,508-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250930) setting status to Active 2008-03-02 16:30:37,509-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250930) setting status to Completed 2008-03-02 16:30:37,521-0500 DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=echo-lv0268pi thread=0 host=ruth 2008-03-02 16:30:37,525-0500 INFO vdl:createdirset START jobid=echo-lv0268pi host=ruth - Initializing directory structure 2008-03-02 16:30:37,526-0500 INFO vdl:createdirset END jobid=echo-lv0268pi - Done initializing directory structure 2008-03-02 16:30:37,528-0500 INFO vdl:dostagein START jobid=echo-lv0268pi - Staging in files 2008-03-02 16:30:37,528-0500 INFO vdl:dostagein END jobid=echo-lv0268pi - Staging in finished 2008-03-02 16:30:37,529-0500 DEBUG vdl:execute2 JOB_START jobid=echo-lv0268pi tr=echo arguments=[Hello, world!] tmpdir=first-20080302-1627-dzk4seng/echo-lv0268pi host=ruth 2008-03-02 16:33:37,633-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION, identity=urn:0-1204493250934) setting status to Failed null 2008-03-02 16:33:37,635-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=echo-lv0268pi - Application exception: null task:execute @ vdl-int.k, line: 360 sys:sequential @ vdl-int.k, line: 353 sys:try @ vdl-int.k, line: 352 task:allocatehost @ vdl-int.k, line: 334 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 144 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20080302-1627-dzk4seng Caused by: java.lang.ClassCastException at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:165) at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:92) at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:571) at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166) at org.globus.gram.Gram.request(Gram.java:315) at org.globus.gram.GramJob.request(GramJob.java:262) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:136) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:534) 2008-03-02 16:33:37,640-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250936) setting status to Submitted 2008-03-02 16:33:37,641-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250936) setting status to Active 2008-03-02 16:33:38,060-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250936) setting status to Failed Exception in getFile 2008-03-02 16:33:38,064-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250940) setting status to Active 2008-03-02 16:33:38,065-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250940) setting status to Completed 2008-03-02 16:33:38,068-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250944) setting status to Submitted 2008-03-02 16:33:38,069-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250944) setting status to Active 2008-03-02 16:33:38,098-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250944) setting status to Failed Exception in getFile 2008-03-02 16:33:38,101-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250947) setting status to Active 2008-03-02 16:33:38,102-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250947) setting status to Completed 2008-03-02 16:33:38,113-0500 DEBUG vdl:execute2 THREAD_ASSOCIATION jobid=echo-mv0268pi thread=0 host=ruth 2008-03-02 16:33:38,117-0500 INFO vdl:createdirset START jobid=echo-mv0268pi host=ruth - Initializing directory structure 2008-03-02 16:33:38,119-0500 INFO vdl:createdirset END jobid=echo-mv0268pi - Done initializing directory structure 2008-03-02 16:33:38,120-0500 INFO vdl:dostagein START jobid=echo-mv0268pi - Staging in files 2008-03-02 16:33:38,120-0500 INFO vdl:dostagein END jobid=echo-mv0268pi - Staging in finished 2008-03-02 16:33:38,121-0500 DEBUG vdl:execute2 JOB_START jobid=echo-mv0268pi tr=echo arguments=[Hello, world!] tmpdir=first-20080302-1627-dzk4seng/echo-mv0268pi host=ruth 2008-03-02 16:36:38,203-0500 DEBUG TaskImpl Task(type=JOB_SUBMISSION, identity=urn:0-1204493250951) setting status to Failed null 2008-03-02 16:36:38,205-0500 DEBUG vdl:execute2 APPLICATION_EXCEPTION jobid=echo-mv0268pi - Application exception: null task:execute @ vdl-int.k, line: 360 sys:sequential @ vdl-int.k, line: 353 sys:try @ vdl-int.k, line: 352 task:allocatehost @ vdl-int.k, line: 334 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 144 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20080302-1627-dzk4seng Caused by: java.lang.ClassCastException at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:165) at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:92) at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:571) at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166) at org.globus.gram.Gram.request(Gram.java:315) at org.globus.gram.GramJob.request(GramJob.java:262) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:136) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:534) 2008-03-02 16:36:38,213-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250953) setting status to Submitted 2008-03-02 16:36:38,214-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250953) setting status to Active 2008-03-02 16:36:38,703-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250953) setting status to Failed Exception in getFile 2008-03-02 16:36:38,706-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250957) setting status to Active 2008-03-02 16:36:38,707-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250957) setting status to Completed 2008-03-02 16:36:38,710-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250961) setting status to Submitted 2008-03-02 16:36:38,711-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250961) setting status to Active 2008-03-02 16:36:38,923-0500 DEBUG TaskImpl Task(type=FILE_TRANSFER, identity=urn:0-1204493250961) setting status to Failed Exception in getFile 2008-03-02 16:36:38,926-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250964) setting status to Active 2008-03-02 16:36:38,928-0500 DEBUG TaskImpl Task(type=FILE_OPERATION, identity=urn:0-1204493250964) setting status to Completed 2008-03-02 16:36:38,934-0500 INFO vdl:execute END_FAILURE thread=0 tr=echo 2008-03-02 16:36:38,942-0500 DEBUG VDL2ExecutionContext Exception in echo: Arguments: [Hello, world!] Host: ruth Directory: first-20080302-1627-dzk4seng/echo-mv0268pi stderr.txt: stdout.txt: ---- Exception in echo: Arguments: [Hello, world!] Host: ruth Directory: first-20080302-1627-dzk4seng/echo-mv0268pi stderr.txt: stdout.txt: ---- sys:exception @ vdl-int.k, line: 416 sys:throw @ vdl-int.k, line: 415 sys:catch @ vdl-int.k, line: 393 sys:try @ vdl-int.k, line: 352 task:allocatehost @ vdl-int.k, line: 334 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 144 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20080302-1627-dzk4seng Caused by: null task:execute @ vdl-int.k, line: 360 sys:sequential @ vdl-int.k, line: 353 sys:try @ vdl-int.k, line: 352 task:allocatehost @ vdl-int.k, line: 334 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 144 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20080302-1627-dzk4seng Caused by: java.lang.ClassCastException at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:165) at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:92) at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:571) at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166) at org.globus.gram.Gram.request(Gram.java:315) at org.globus.gram.GramJob.request(GramJob.java:262) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:136) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:534) at org.globus.cog.karajan.workflow.nodes.functions.KException.function(KException.java:29) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:45) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:192) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:33) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:335) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:299) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:46) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.executeChildren(AbstractFunction.java:40) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:123) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:97) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: null task:execute @ vdl-int.k, line: 360 sys:sequential @ vdl-int.k, line: 353 sys:try @ vdl-int.k, line: 352 task:allocatehost @ vdl-int.k, line: 334 vdl:execute2 @ execute-default.k, line: 23 sys:restartonerror @ execute-default.k, line: 21 sys:sequential @ execute-default.k, line: 19 sys:try @ execute-default.k, line: 18 sys:if @ execute-default.k, line: 17 sys:then @ execute-default.k, line: 16 sys:if @ execute-default.k, line: 15 vdl:execute @ first.kml, line: 16 greeting @ first.kml, line: 43 vdl:mainp @ first.kml, line: 42 mainp @ vdl.k, line: 144 vdl:mains @ first.kml, line: 41 vdl:mains @ first.kml, line: 41 rlog:restartlog @ first.kml, line: 39 kernel:project @ first.kml, line: 2 first-20080302-1627-dzk4seng Caused by: java.lang.ClassCastException at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:165) at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:92) at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:571) at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166) at org.globus.gram.Gram.request(Gram.java:315) at org.globus.gram.GramJob.request(GramJob.java:262) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:136) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:534) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) at org.globus.cog.karajan.workflow.nodes.FlowNode.failImmediately(FlowNode.java:152) at org.globus.cog.karajan.workflow.nodes.grid.GridExec.taskFailed(GridExec.java:300) at org.globus.cog.karajan.workflow.nodes.grid.AbstractGridNode.statusChanged(AbstractGridNode.java:271) at org.globus.cog.karajan.scheduler.AbstractScheduler.fireJobStatusChangeEvent(AbstractScheduler.java:165) at org.globus.cog.karajan.scheduler.LateBindingScheduler.statusChanged(LateBindingScheduler.java:647) at org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler.statusChanged(WeightedHostScoreScheduler.java:367) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.statusChanged(VDSAdaptiveScheduler.java:403) at org.globus.cog.abstraction.impl.common.task.TaskImpl.setStatus(TaskImpl.java:216) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.notifyPreviousQueue(NonBlockingSubmit.java:74) at org.globus.cog.karajan.scheduler.submitQueue.AbstractSubmitQueue.submitCompleted(AbstractSubmitQueue.java:35) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.notifyPreviousQueue(NonBlockingSubmit.java:58) at org.globus.cog.karajan.scheduler.submitQueue.AbstractSubmitQueue.submitCompleted(AbstractSubmitQueue.java:35) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.notifyPreviousQueue(NonBlockingSubmit.java:58) at org.globus.cog.karajan.scheduler.submitQueue.AbstractSubmitQueue.submitCompleted(AbstractSubmitQueue.java:35) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.notifyPreviousQueue(NonBlockingSubmit.java:58) at org.globus.cog.karajan.scheduler.submitQueue.NullQueue.submitCompleted(NullQueue.java:18) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.notifyPreviousQueue(NonBlockingSubmit.java:58) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:87) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:534) Caused by: java.lang.ClassCastException at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:165) at org.globus.gsi.bc.BouncyCastleCertProcessingFactory.createCertificate(BouncyCastleCertProcessingFactory.java:92) at org.globus.gsi.gssapi.GlobusGSSContextImpl.initSecContext(GlobusGSSContextImpl.java:571) at org.globus.gsi.gssapi.net.GssSocket.authenticateClient(GssSocket.java:107) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:145) at org.globus.gsi.gssapi.net.GssSocket.getOutputStream(GssSocket.java:166) at org.globus.gram.Gram.request(Gram.java:315) at org.globus.gram.GramJob.request(GramJob.java:262) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:136) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) ... 5 more 2008-03-02 16:36:39,103-0500 DEBUG Loader Swift finished - workflow had errors -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sun Mar 2 21:03:08 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 3 Mar 2008 03:03:08 +0000 (GMT) Subject: [Swift-user] Submitting first.swift to another node In-Reply-To: <006901c87cb3$f62577e0$6401a8c0@twcny.rr.com> References: <006901c87cb3$f62577e0$6401a8c0@twcny.rr.com> Message-ID: Your site definition should look like this: should have major="4" (assuming you are using GRAM4/WS-GRAM on ruth.cs.sunyit.edu - the URL makes it look like you are) -- From jamalphd at gmail.com Tue Mar 4 07:52:43 2008 From: jamalphd at gmail.com (J A) Date: Tue, 4 Mar 2008 08:52:43 -0500 Subject: [Swift-user] Several questions on swift Message-ID: Hi All: I am new to swift and have several questions: 1. I have 2 programs that runs for long time and would like to use the grid computing feature so that i can get the results faster. One program is implemented in C# and another in C++. They both run on a windows machine. How can i use Swift to run my programs? 2. Can swift work on a windows platform? 3. When using swift, do i still need to use any MPI code in my programs? 4. I have an access to grid machines at school. If i install swift on my account then is there a way that i can access swift from my windows machine? Thank you in advance for your cooperation. J. A. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Mar 4 17:33:18 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 4 Mar 2008 23:33:18 +0000 (GMT) Subject: [Swift-user] Swift log processing code Message-ID: Over the past few months, I've been developing log processing and analysis code that takes in various log files from Swift runs and uses them to make various plots. I have written a small note on how to download and use these tools, so that others can experiment with them: http://www.ci.uchicago.edu/swift/guides/log-processing.php Much of the output is rather rough and poorly documented, however I'm quite happy to explain stuff on these lists if/when people have questions. -- From quanpt at cs.uchicago.edu Mon Mar 17 10:43:52 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 17 Mar 2008 10:43:52 -0500 Subject: [Swift-user] Too many open files problem Message-ID: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> Hi all, I did find a thread about this problem "Too many open files" in the swift-user previous mails, but I could not find any solution yet. Can someone please tell me what is the problem/solution? I used: Swift v0.3-dev r1684 on tp-login2.ci (ulimit said: open files = 1024) Thank you very much -- Quan Tran Pham PhD Student Department of Computer Science University of Chicago 1100 E 58th Street, Chicago, IL 60637 Office: Ryerson 178 Phone: (773)702-4227 Fax: (773)702-8487 quanpt at cs.uchicago.edu --- -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: log_toomanyopenfile.tbz Type: application/octet-stream Size: 62399 bytes Desc: not available URL: From benc at hawaga.org.uk Mon Mar 17 16:12:13 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 17 Mar 2008 21:12:13 +0000 (GMT) Subject: [Swift-user] Swift v0.4 released Message-ID: Swift 0.4 is released. You can download it from http://www.ci.uchicago.edu/swift/downloads/ In addition, there are a few pages of release notes detailing the substantial changes since v0.3 here: http://www.ci.uchicago.edu/swift/packages/release-notes-0.4.txt -- From benc at hawaga.org.uk Mon Mar 17 17:01:53 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 17 Mar 2008 22:01:53 +0000 (GMT) Subject: [Swift-user] google summer of code Message-ID: The Globus Alliance was accepted as a Google summer of code mentor organization. Under that umbrella, interested students can work on Swift related projects. See http://dev.globus.org/wiki/Google_Summer_of_Code_2008_Ideas for more information - there are a few Swift-related projects listed there, but Google encourage students to also come up with their own. -- From quanpt at cs.uchicago.edu Mon Mar 17 21:35:42 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 17 Mar 2008 21:35:42 -0500 Subject: [Swift-user] Fwd: Too many open files problem In-Reply-To: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> References: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> Message-ID: <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> I checked my message on the web (http://mail.ci.uchicago.edu/pipermail/swift-user/2008-March/000257.html) and somehow the email content got disappeared (the attachment is still accessible), I resend the message to the groups, hopefully someone can help me with the problem. Thank you very much Quan ---------- Forwarded message ---------- From: Quan Tran Pham Date: Mon, Mar 17, 2008 at 10:43 AM Subject: Too many open files problem To: swift-user at ci.uchicago.edu Hi all, I did find a thread about this problem "Too many open files" in the swift-user previous mails, but I could not find any solution yet. Can someone please tell me what is the problem/solution? I used: Swift v0.3-dev r1684 on tp-login2.ci (ulimit said: open files = 1024) Thank you very much -- Quan Tran Pham PhD Student Department of Computer Science University of Chicago 1100 E 58th Street, Chicago, IL 60637 Office: Ryerson 178 Phone: (773)702-4227 Fax: (773)702-8487 quanpt at cs.uchicago.edu From benc at hawaga.org.uk Mon Mar 17 21:43:05 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 18 Mar 2008 02:43:05 +0000 (GMT) Subject: [Swift-user] Fwd: Too many open files problem In-Reply-To: <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> References: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> Message-ID: Hi. Can you describe your problem some more: Doe it happen with every swift script (for example, with the example first.swift)? Does it happenw hen you run a large workflow? If you make that run smaller (eg fewer input files) does the problem go away? At what point? Can you send the swiftscript source code that you are using? > From: Quan Tran Pham > Date: Mon, Mar 17, 2008 at 10:43 AM > > > Hi all, > > I did find a thread about this problem "Too many open files" in the > swift-user previous mails, but I could not find any solution yet. Can > someone please tell me what is the problem/solution? > > I used: > Swift v0.3-dev r1684 > on tp-login2.ci (ulimit said: open files = 1024) > > Thank you very much > > From quanpt at cs.uchicago.edu Mon Mar 17 22:07:07 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 17 Mar 2008 22:07:07 -0500 Subject: [Swift-user] Fwd: Too many open files problem In-Reply-To: References: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> Message-ID: <4290b6c60803172007r3d26cff8hfdf2fdf641d99f59@mail.gmail.com> Hi, On Mon, Mar 17, 2008 at 9:43 PM, Ben Clifford wrote: > > Hi. Can you describe your problem some more: > > Doe it happen with every swift script (for example, with the example > first.swift)? > If you make that run smaller (eg fewer input files) does the problem go > away? At what point? It happens with many of my scripts. It does not always happen, sometimes I can run the script repeatedly and success, and sometimes, with the same input, I got this error. > > Does it happenw hen you run a large workflow? It seems to happen more often with larger input (still using the same workflow, same number of submitted jobs), but I cannot confirm this. > > Can you send the swiftscript source code that you are using? Sure, attached (there was some incorrect dependency between sort procedure and fileList var, but you can assume all input files are correctly available for the workflow). Thank you very much -- Quan -------------- next part -------------- A non-text attachment was scrubbed... Name: 08.sort.swift Type: application/octet-stream Size: 1405 bytes Desc: not available URL: From benc at hawaga.org.uk Mon Mar 17 22:37:03 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 18 Mar 2008 03:37:03 +0000 (GMT) Subject: [Swift-user] Fwd: Too many open files problem In-Reply-To: <4290b6c60803172007r3d26cff8hfdf2fdf641d99f59@mail.gmail.com> References: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> <4290b6c60803172007r3d26cff8hfdf2fdf641d99f59@mail.gmail.com> Message-ID: do you always run with the PBS provider? have you tried any other provider (eg the local execution or GRAM providers) and if so, have you had this error? -- From quanpt at cs.uchicago.edu Mon Mar 17 22:42:28 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 17 Mar 2008 22:42:28 -0500 Subject: [Swift-user] Fwd: Too many open files problem In-Reply-To: References: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> <4290b6c60803172007r3d26cff8hfdf2fdf641d99f59@mail.gmail.com> Message-ID: <4290b6c60803172042x1515b856o4b222498a5b6a0dd@mail.gmail.com> I never have this problem with local provider (on my local machine). Other than that, I work only with PBS on teraport. On Mon, Mar 17, 2008 at 10:37 PM, Ben Clifford wrote: > do you always run with the PBS provider? > have you tried any other provider (eg the local execution or GRAM > providers) and if so, have you had this error? > -- > > -- Quan From benc at hawaga.org.uk Tue Mar 18 06:40:49 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 18 Mar 2008 11:40:49 +0000 (GMT) Subject: [Swift-user] Fwd: Too many open files problem In-Reply-To: <4290b6c60803172042x1515b856o4b222498a5b6a0dd@mail.gmail.com> References: <4290b6c60803170843p2fad9a54q39bf8333e179b336@mail.gmail.com> <4290b6c60803171935q1a68fddeg5dd53f2db4099bc7@mail.gmail.com> <4290b6c60803172007r3d26cff8hfdf2fdf641d99f59@mail.gmail.com> <4290b6c60803172042x1515b856o4b222498a5b6a0dd@mail.gmail.com> Message-ID: Can you try using swift from here: /home/benc/cog/modules/vdsk/dist/vdsk-svn/bin/ I made a tweak in the PBS provider. -- From piccoli at fnal.gov Tue Mar 18 14:54:34 2008 From: piccoli at fnal.gov (Luciano Piccoli) Date: Tue, 18 Mar 2008 14:54:34 -0500 Subject: [Swift-user] iterate syntax Message-ID: <47E01DFA.1030003@fnal.gov> Hi, I justed checked out the latest swift and it looks like the iterate syntax changed. This code runs with revision 1673: ... iterate i { int trajectory = i; ... } until (@extractint(u0[i+1]) == @extractint(u0[i])); ... Using revision 1735 I get this message: Could not start execution. Variable i is undefined Defining the variable before the iterate block does not help: Could not start execution. Variable i is already defined. What would be the correct way to use iterate? Thanks, Luciano From benc at hawaga.org.uk Tue Mar 18 15:59:14 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 18 Mar 2008 20:59:14 +0000 (GMT) Subject: [Swift-user] iterate syntax In-Reply-To: <47E01DFA.1030003@fnal.gov> References: <47E01DFA.1030003@fnal.gov> Message-ID: On Tue, 18 Mar 2008, Luciano Piccoli wrote: > I justed checked out the latest swift and it looks like the iterate syntax > changed. I recently added some more compile time checking of how variables are used. The below should work, but it appears that you have found a bug in that compile time checking. You should not need to change how you use iterate. I'll open a bug for this. -- From benc at hawaga.org.uk Tue Mar 18 17:21:41 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 18 Mar 2008 22:21:41 +0000 (GMT) Subject: [Swift-user] iterate syntax In-Reply-To: <47E01DFA.1030003@fnal.gov> References: <47E01DFA.1030003@fnal.gov> Message-ID: On Tue, 18 Mar 2008, Luciano Piccoli wrote: > I justed checked out the latest swift and it looks like the iterate syntax > changed. r1736 contains some adjustments to the way compile-time checking is done for iterate statements. Please can you try that? -- From piccoli at fnal.gov Wed Mar 19 10:38:51 2008 From: piccoli at fnal.gov (Luciano Piccoli) Date: Wed, 19 Mar 2008 10:38:51 -0500 Subject: [Swift-user] iterate syntax In-Reply-To: References: <47E01DFA.1030003@fnal.gov> Message-ID: <47E1338B.3090000@fnal.gov> Thanks, the iterate works now. However the following script does not work. Is this case included in the nested statement blocks deprecated features from the release notes? --- type file {} (file t) echo(int m) { app { echo m stdout=@filename(t); } } (file t) nested_echo(int m) { t = echo (m); } file a[]; a[0] = nested_echo(0); --- Luciano Ben Clifford wrote: > On Tue, 18 Mar 2008, Luciano Piccoli wrote: > > >> I justed checked out the latest swift and it looks like the iterate syntax >> changed. >> > > r1736 contains some adjustments to the way compile-time checking is done > for iterate statements. Please can you try that? > > From benc at hawaga.org.uk Wed Mar 19 11:27:33 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 19 Mar 2008 16:27:33 +0000 (GMT) Subject: [Swift-user] iterate syntax In-Reply-To: <47E1338B.3090000@fnal.gov> References: <47E01DFA.1030003@fnal.gov> <47E1338B.3090000@fnal.gov> Message-ID: > Thanks, the iterate works now. However the following script does not work. Is > this case included in the nested statement blocks deprecated features from the > release notes? What error do you see there? -- From piccoli at fnal.gov Wed Mar 19 11:33:43 2008 From: piccoli at fnal.gov (Luciano Piccoli) Date: Wed, 19 Mar 2008 11:33:43 -0500 Subject: [Swift-user] iterate syntax In-Reply-To: References: <47E01DFA.1030003@fnal.gov> <47E1338B.3090000@fnal.gov> Message-ID: <47E14067.8030905@fnal.gov> Ben Clifford wrote: >> Thanks, the iterate works now. However the following script does not work. Is >> this case included in the nested statement blocks deprecated features from the >> release notes? >> > > What error do you see there? > Short version: ========= -bash-3.00$ swift -tc.file tc.data simple_iterate.swift Swift vsvn swift-r1736 cog-r1936 RunID: 20080319-1129-e2lfrmi9 Progress: echo started echo completed Execution failed: java.lang.NullPointerException at java.util.StringTokenizer.(StringTokenizer.java:146) at java.util.StringTokenizer.(StringTokenizer.java:162) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:74) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Long version (using -d): ================ ... null vdl:partialclosedataset @ simple_iterate.kml, line: 60 sys:sequential @ simple_iterate.kml, line: 54 nested_echo @ simple_iterate.kml, line: 79 sys:sequential @ simple_iterate.kml, line: 78 vdl:mainp @ simple_iterate.kml, line: 77 mainp @ vdl.k, line: 150 vdl:mains @ simple_iterate.kml, line: 75 vdl:mains @ simple_iterate.kml, line: 75 rlog:restartlog @ simple_iterate.kml, line: 73 kernel:project @ simple_iterate.kml, line: 2 simple_iterate-20080319-1130-eytnjkp2 Caused by: java.lang.NullPointerException at java.util.StringTokenizer.(StringTokenizer.java:146) at java.util.StringTokenizer.(StringTokenizer.java:162) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:74) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:362) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: java.lang.NullPointerException at java.util.StringTokenizer.(StringTokenizer.java:146) at java.util.StringTokenizer.(StringTokenizer.java:162) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:74) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) ... 4 more Execution failed: java.lang.NullPointerException at java.util.StringTokenizer.(StringTokenizer.java:146) at java.util.StringTokenizer.(StringTokenizer.java:162) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:74) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Detailed exception: null vdl:partialclosedataset @ simple_iterate.kml, line: 60 sys:sequential @ simple_iterate.kml, line: 54 nested_echo @ simple_iterate.kml, line: 79 sys:sequential @ simple_iterate.kml, line: 78 vdl:mainp @ simple_iterate.kml, line: 77 mainp @ vdl.k, line: 150 vdl:mains @ simple_iterate.kml, line: 75 vdl:mains @ simple_iterate.kml, line: 75 rlog:restartlog @ simple_iterate.kml, line: 73 kernel:project @ simple_iterate.kml, line: 2 simple_iterate-20080319-1130-eytnjkp2 Caused by: java.lang.NullPointerException at java.util.StringTokenizer.(StringTokenizer.java:146) at java.util.StringTokenizer.(StringTokenizer.java:162) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:74) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:362) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:125) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java:99) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java:69) Caused by: java.lang.NullPointerException at java.util.StringTokenizer.(StringTokenizer.java:146) at java.util.StringTokenizer.(StringTokenizer.java:162) at org.griphyn.vdl.karajan.lib.PartialCloseDataset.function(PartialCloseDataset.java:74) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:65) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:51) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:27) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:240) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:281) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:393) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:332) ... 4 more Swift finished with errors From benc at hawaga.org.uk Wed Mar 19 12:43:06 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 19 Mar 2008 17:43:06 +0000 (GMT) Subject: [Swift-user] iterate syntax In-Reply-To: <47E14067.8030905@fnal.gov> References: <47E01DFA.1030003@fnal.gov> <47E1338B.3090000@fnal.gov> <47E14067.8030905@fnal.gov> Message-ID: Try r1744. I think I have fixed this. I've also added the code you supplied to our test suite. -- From piccoli at fnal.gov Wed Mar 19 13:54:51 2008 From: piccoli at fnal.gov (Luciano Piccoli) Date: Wed, 19 Mar 2008 13:54:51 -0500 Subject: [Swift-user] iterate syntax In-Reply-To: References: <47E01DFA.1030003@fnal.gov> <47E1338B.3090000@fnal.gov> <47E14067.8030905@fnal.gov> Message-ID: <47E1617B.1030904@fnal.gov> Thank you. It works fine now. Luciano Ben Clifford wrote: > Try r1744. I think I have fixed this. I've also added the code you > supplied to our test suite. > > From piccoli at fnal.gov Wed Mar 19 17:14:06 2008 From: piccoli at fnal.gov (Luciano Piccoli) Date: Wed, 19 Mar 2008 17:14:06 -0500 Subject: [Swift-user] Swift v0.4 released (external mapper example) In-Reply-To: References: Message-ID: <47E1902E.6010700@fnal.gov> Is there an example of usage of the new external mapper? I grep'd through the examples and did not find any reference. Thanks, Luciano Ben Clifford wrote: > Swift 0.4 is released. > > You can download it from http://www.ci.uchicago.edu/swift/downloads/ > > In addition, there are a few pages of release notes detailing the > substantial changes since v0.3 here: > http://www.ci.uchicago.edu/swift/packages/release-notes-0.4.txt > > From wilde at mcs.anl.gov Wed Mar 19 17:41:19 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 19 Mar 2008 17:41:19 -0500 Subject: [Swift-user] Swift v0.4 released (external mapper example) In-Reply-To: <47E1902E.6010700@fnal.gov> References: <47E1902E.6010700@fnal.gov> Message-ID: <47E1968F.8070301@mcs.anl.gov> Here's a trivial example, not perhaps the most clear, but runnable:. The ext mapper is a script that returns two columns: Swift expression is an index or field reference expression relative to a structured swift object (array or structure, possibly nested). Physical name is the filename or gridftp URI of the physical object to be mapped to that swift expression. The example below, mapping is done to a simple array, so the swift-expressions are [0], [1], etc. But they could be for example [0].data, [0].info, etc if you were mapping an array of structures of two fields, .info and .data. Ben or Mihael, please clarify or correct. - Mike UC64$ cat awf8.swift type pcapfile; type angleout; type anglecenter; (angleout ofile, anglecenter cfile) angle4 (pcapfile ifile) { app { angle4 @ifile @ofile @cfile; } } pcapfile pcapfiles[]; // gsiftp://tp-osg.ci.uchicago.edu// disks/ci-gpfs/angle/ spool_1/ anl2-1182294000-dump.1.167.pcap.gz angleout of[] ; anglecenter cf[] ; foreach pf,i in pcapfiles { (of[i],cf[i]) = angle4(pf); } UC64$ cat map1 #! /bin/sh awk > Is there an example of usage of the new external mapper? I grep'd > through the examples and did not find any reference. > Thanks, > Luciano > > Ben Clifford wrote: >> Swift 0.4 is released. >> >> You can download it from http://www.ci.uchicago.edu/swift/downloads/ >> >> In addition, there are a few pages of release notes detailing the >> substantial changes since v0.3 here: >> http://www.ci.uchicago.edu/swift/packages/release-notes-0.4.txt >> >> > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From benc at hawaga.org.uk Wed Mar 19 22:20:58 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 20 Mar 2008 03:20:58 +0000 (GMT) Subject: [Swift-user] Swift v0.4 released (external mapper example) In-Reply-To: <47E1902E.6010700@fnal.gov> References: <47E1902E.6010700@fnal.gov> Message-ID: On Wed, 19 Mar 2008, Luciano Piccoli wrote: > Is there an example of usage of the new external mapper? I grep'd through the > examples and did not find any reference. Here's Mihael's original messgae from last October giving details: http://mail.ci.uchicago.edu/pipermail/swift-devel/2007-October/002121.html I'll move some of that infomration into the user guide. -- From wilde at mcs.anl.gov Thu Mar 20 14:38:45 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 20 Mar 2008 14:38:45 -0500 Subject: [Swift-user] Specifying 32/64 bit hosts on uc-teragrid Message-ID: <47E2BD45.4070804@mcs.anl.gov> Rob, I forgot to mention: the UC-Teragrid cluster has both 32 and 64-bit compute nodes. tg-login is a 64-bit host. tg-viz-login is a 32 bit host. When you run swift in this cluster, unless your app can run on both architectures, you need to specify in your sites or tc files which arch to run on. Can someone point Rob to info on how/where to specify this? From benc at hawaga.org.uk Thu Mar 20 18:01:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 20 Mar 2008 23:01:16 +0000 (GMT) Subject: [Swift-user] Specifying 32/64 bit hosts on uc-teragrid In-Reply-To: <47E2BD45.4070804@mcs.anl.gov> References: <47E2BD45.4070804@mcs.anl.gov> Message-ID: On Thu, 20 Mar 2008, Michael Wilde wrote: > tg-login is a 64-bit host. tg-viz-login is a 32 bit host. > > When you run swift in this cluster, unless your app can run on both > architectures, you need to specify in your sites or tc files which arch to run > on. > > Can someone point Rob to info on how/where to specify this? Specify a host_types profile key in the site catalog entry or tc.data entry for your site/applictions. Here's an example of a site entry that will force everything to the ia64-compute nodes, using GRAM4. /home/kubal/Swift_Runs ia64-compute From wilde at mcs.anl.gov Fri Mar 21 11:17:28 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 21 Mar 2008 11:17:28 -0500 Subject: [Swift-user] Specifying 32/64 bit hosts on uc-teragrid In-Reply-To: <643833.70394.qm@web52306.mail.re2.yahoo.com> References: <643833.70394.qm@web52306.mail.re2.yahoo.com> Message-ID: <47E3DF98.4080207@mcs.anl.gov> Mike, this userguide section says how to use your own .swift file to override one in the $SWIFT_HOME tree: http://www.ci.uchicago.edu/swift/guides/userguide.php#engineconfiguration "Various aspects of the behavior of the Swift Engine can be configured through properties. The Swift Engine recognizes a global, per installation properties file which can found in $SWIFT_HOME/etc/swift.properties and a user properties file which can be created by each user in ~/.swift/swift.properties. The Swift Engine will first load the global properties file. It will then try to load the user properties file. If a user properties file is found, individual properties explicitly set in that file will override the respective properties in the global properties file. Furthermore, some of the properties can be overridden directly using command line arguments to the swift command." - Mike On 3/21/08 11:05 AM, Mike Kubal wrote: > I'm using your swift to avoid the 'cannot execute > binary' error. Can you set your throttle to 4 and I'll > run again. Thanks. > > > --- Ben Clifford wrote: > >> If you run with the throttle set to 4 or so, then >> the run should happen in >> an hour or so; if its hanging after a certain number >> of jobs you should >> get that pretty quickly at that time scale. >> -- >> >> > > > > ____________________________________________________________________________________ > Looking for last minute shopping deals? > Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping > > From wilde at mcs.anl.gov Tue Mar 25 09:46:14 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 Mar 2008 09:46:14 -0500 Subject: [Swift-user] How to wait on functions that return no data? Message-ID: <47E91036.9070806@mcs.anl.gov> For the petro-model app Im working on, it would be interesting to run the parameter sweep in "map reduce" manner, in which each invocation bites off a portion of the parameter space and processes it, resulting in a set of result tuples. Each run of the model will produce a set of tuples, and when all are done, we want to aggregate and plot the tuples. While with batching this is not strictly needed, it would be interesting to let the model results accumulate on the local filesystem (as in this case they are small) and collect them either at the end of the run, or periodically and perhaps asynchronously during the run. To do this, we'd want to write the model invocation as a swift function with only scalar numeric parameters, and no output. The question is how to call a zero-returns function in a swift foreach() loop, and embed that foreach() in a function that doesnt return until all members of the foreach() have been processed. I havent tried to code this yet, because I cant think of a way to express it in swift, due to the data-dependency semantics. In the example below, I want collectResults() to get invoked after all the runam() calls complete in doall(). Anyone have any ideas? This is a low-priority question, just food for thought, as the batched way of running this parameter sweep should be straightforward and efficient. Mike // Amiga-Mars Parameter Sweep type amout; runam (string id , string p1, string p2) // no ret val { app { runam3 id p1 p2 ; } } type params { string id; string p1; string p2; }; doall(params p[]) { foreach pset in p { runam(pset.id, pset.p1, pset.p2); } // waitTillAllDone(); // want to block here till all above finish, // but no data to wait on. any way to // achieve this??? } // Main params p[]; p = readdata("paramlist"); doall(p); amout amdata ; amdata = collectResults(); // ^^^ Want collectresults to run AFTER all runam() calls finish // in the doall() function. From hategan at mcs.anl.gov Tue Mar 25 10:00:54 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 25 Mar 2008 10:00:54 -0500 Subject: [Swift-user] How to wait on functions that return no data? In-Reply-To: <47E91036.9070806@mcs.anl.gov> References: <47E91036.9070806@mcs.anl.gov> Message-ID: <1206457254.19756.5.camel@blabla.mcs.anl.gov> On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote: > For the petro-model app Im working on, it would be interesting to run > the parameter sweep in "map reduce" manner, in which each invocation > bites off a portion of the parameter space and processes it, resulting > in a set of result tuples. Each run of the model will produce a set of > tuples, and when all are done, we want to aggregate and plot the tuples. > > While with batching this is not strictly needed, it would be interesting > to let the model results accumulate on the local filesystem (as in this > case they are small) and collect them either at the end of the run, or > periodically and perhaps asynchronously during the run. > > To do this, we'd want to write the model invocation as a swift function > with only scalar numeric parameters, and no output. That assertion I'm not sure about. > > The question is how to call a zero-returns function in a swift foreach() > loop, and embed that foreach() in a function that doesnt return until > all members of the foreach() have been processed. The very notion of "return" as it would appear in a strict language doesn't make much sense in Swift, so I'm not quite sure. > > I havent tried to code this yet, because I cant think of a way to > express it in swift, due to the data-dependency semantics. > > In the example below, I want collectResults() to get invoked after all > the runam() calls complete in doall(). results = doall(); collectResults(results); Mihael > > Anyone have any ideas? > > This is a low-priority question, just food for thought, as the batched > way of running this parameter sweep should be straightforward and efficient. > > Mike > > > > // Amiga-Mars Parameter Sweep > > type amout; > > runam (string id , string p1, string p2) // no ret val > { > app { runam3 id p1 p2 ; } > } > > type params { > string id; > string p1; > string p2; > }; > > doall(params p[]) > { > foreach pset in p { > runam(pset.id, pset.p1, pset.p2); > } > // waitTillAllDone(); > // want to block here till all above finish, > // but no data to wait on. any way to > // achieve this??? > } > > // Main > > params p[]; > p = readdata("paramlist"); > doall(p); > amout amdata ; > amdata = collectResults(); > > // ^^^ Want collectresults to run AFTER all runam() calls finish > // in the doall() function. > > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From wilde at mcs.anl.gov Tue Mar 25 10:14:40 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 Mar 2008 10:14:40 -0500 Subject: [Swift-user] How to wait on functions that return no data? In-Reply-To: <1206457254.19756.5.camel@blabla.mcs.anl.gov> References: <47E91036.9070806@mcs.anl.gov> <1206457254.19756.5.camel@blabla.mcs.anl.gov> Message-ID: <47E916E0.2020103@mcs.anl.gov> >> In the example below, I want collectResults() to get invoked after all >> the runam() calls complete in doall(). > > results = doall(); > collectResults(results); > > Mihael But thats the problem: doall() does not in this example return results. If it would return an artificial result, how would we get such a return to wait until all the runam() calls made within the freach() have completed? Each of the runam() call runs a small model, and in this proposed scenario would leave those results on a local disk for later collection, either in a single shared file that many invocations would append to, or in a set of files. Then collectresults() would run a job that collects all the data when done. One approach can be to have collectresults() just run iteratively until it has collected a sufficient number of results. I.e., to have it not depend on swift to find out when all the runam() calls have completed. That might work. - Mike On 3/25/08 10:00 AM, Mihael Hategan wrote: > On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote: >> For the petro-model app Im working on, it would be interesting to run >> the parameter sweep in "map reduce" manner, in which each invocation >> bites off a portion of the parameter space and processes it, resulting >> in a set of result tuples. Each run of the model will produce a set of >> tuples, and when all are done, we want to aggregate and plot the tuples. >> >> While with batching this is not strictly needed, it would be interesting >> to let the model results accumulate on the local filesystem (as in this >> case they are small) and collect them either at the end of the run, or >> periodically and perhaps asynchronously during the run. >> >> To do this, we'd want to write the model invocation as a swift function >> with only scalar numeric parameters, and no output. > > That assertion I'm not sure about. > >> The question is how to call a zero-returns function in a swift foreach() >> loop, and embed that foreach() in a function that doesnt return until >> all members of the foreach() have been processed. > > The very notion of "return" as it would appear in a strict language > doesn't make much sense in Swift, so I'm not quite sure. > >> I havent tried to code this yet, because I cant think of a way to >> express it in swift, due to the data-dependency semantics. >> >> In the example below, I want collectResults() to get invoked after all >> the runam() calls complete in doall(). > > results = doall(); > collectResults(results); > > Mihael > >> Anyone have any ideas? >> >> This is a low-priority question, just food for thought, as the batched >> way of running this parameter sweep should be straightforward and efficient. >> >> Mike >> >> >> >> // Amiga-Mars Parameter Sweep >> >> type amout; >> >> runam (string id , string p1, string p2) // no ret val >> { >> app { runam3 id p1 p2 ; } >> } >> >> type params { >> string id; >> string p1; >> string p2; >> }; >> >> doall(params p[]) >> { >> foreach pset in p { >> runam(pset.id, pset.p1, pset.p2); >> } >> // waitTillAllDone(); >> // want to block here till all above finish, >> // but no data to wait on. any way to >> // achieve this??? >> } >> >> // Main >> >> params p[]; >> p = readdata("paramlist"); >> doall(p); >> amout amdata ; >> amdata = collectResults(); >> >> // ^^^ Want collectresults to run AFTER all runam() calls finish >> // in the doall() function. >> >> >> >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > From hategan at mcs.anl.gov Tue Mar 25 10:23:44 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 25 Mar 2008 10:23:44 -0500 Subject: [Swift-user] How to wait on functions that return no data? In-Reply-To: <47E916E0.2020103@mcs.anl.gov> References: <47E91036.9070806@mcs.anl.gov> <1206457254.19756.5.camel@blabla.mcs.anl.gov> <47E916E0.2020103@mcs.anl.gov> Message-ID: <1206458624.20974.7.camel@blabla.mcs.anl.gov> On Tue, 2008-03-25 at 10:14 -0500, Michael Wilde wrote: > >> In the example below, I want collectResults() to get invoked after all > >> the runam() calls complete in doall(). > > > > results = doall(); > > collectResults(results); > > > > Mihael > > But thats the problem: doall() does not in this example return results. Then it should be fixed. > If it would return an artificial result, how would we get such a return > to wait until all the runam() calls made within the freach() have completed? > > Each of the runam() call runs a small model, and in this proposed > scenario would leave those results on a local disk for later collection, > either in a single shared file that many invocations would append to, or > in a set of files. I don't think the solution to performance problems in Swift is to hack stuff like that. > > Then collectresults() would run a job that collects all the data when done. > > One approach can be to have collectresults() just run iteratively until > it has collected a sufficient number of results. I.e., to have it not > depend on swift to find out when all the runam() calls have completed. > That might work. Don't use Swift then. Seriously. If you don't want to express things in a dataflow oriented way, and are not satisfied with its performance for the given problem, don't use it. Mihael > > - Mike > > > On 3/25/08 10:00 AM, Mihael Hategan wrote: > > On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote: > >> For the petro-model app Im working on, it would be interesting to run > >> the parameter sweep in "map reduce" manner, in which each invocation > >> bites off a portion of the parameter space and processes it, resulting > >> in a set of result tuples. Each run of the model will produce a set of > >> tuples, and when all are done, we want to aggregate and plot the tuples. > >> > >> While with batching this is not strictly needed, it would be interesting > >> to let the model results accumulate on the local filesystem (as in this > >> case they are small) and collect them either at the end of the run, or > >> periodically and perhaps asynchronously during the run. > >> > >> To do this, we'd want to write the model invocation as a swift function > >> with only scalar numeric parameters, and no output. > > > > That assertion I'm not sure about. > > > >> The question is how to call a zero-returns function in a swift foreach() > >> loop, and embed that foreach() in a function that doesnt return until > >> all members of the foreach() have been processed. > > > > The very notion of "return" as it would appear in a strict language > > doesn't make much sense in Swift, so I'm not quite sure. > > > >> I havent tried to code this yet, because I cant think of a way to > >> express it in swift, due to the data-dependency semantics. > >> > >> In the example below, I want collectResults() to get invoked after all > >> the runam() calls complete in doall(). > > > > results = doall(); > > collectResults(results); > > > > Mihael > > > >> Anyone have any ideas? > >> > >> This is a low-priority question, just food for thought, as the batched > >> way of running this parameter sweep should be straightforward and efficient. > >> > >> Mike > >> > >> > >> > >> // Amiga-Mars Parameter Sweep > >> > >> type amout; > >> > >> runam (string id , string p1, string p2) // no ret val > >> { > >> app { runam3 id p1 p2 ; } > >> } > >> > >> type params { > >> string id; > >> string p1; > >> string p2; > >> }; > >> > >> doall(params p[]) > >> { > >> foreach pset in p { > >> runam(pset.id, pset.p1, pset.p2); > >> } > >> // waitTillAllDone(); > >> // want to block here till all above finish, > >> // but no data to wait on. any way to > >> // achieve this??? > >> } > >> > >> // Main > >> > >> params p[]; > >> p = readdata("paramlist"); > >> doall(p); > >> amout amdata ; > >> amdata = collectResults(); > >> > >> // ^^^ Want collectresults to run AFTER all runam() calls finish > >> // in the doall() function. > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > >> > > > > > From wilde at mcs.anl.gov Tue Mar 25 10:45:55 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 Mar 2008 10:45:55 -0500 Subject: [Swift-user] How to wait on functions that return no data? In-Reply-To: <1206458624.20974.7.camel@blabla.mcs.anl.gov> References: <47E91036.9070806@mcs.anl.gov> <1206457254.19756.5.camel@blabla.mcs.anl.gov> <47E916E0.2020103@mcs.anl.gov> <1206458624.20974.7.camel@blabla.mcs.anl.gov> Message-ID: <47E91E33.3020700@mcs.anl.gov> Your view has merits in terms of language purity, but I disagree with it. This was posed as an academic question, and I think its interesting to discuss. The point here is that there's an application that could best be done by batching up its output, and in fact perhaps by using the map-reduce representation of tuples for that output. Its still driven by dataflow and data dependencies, just not the simplistic lock-step dependencies that swift implements today. For example, one way to address the problem is to say that batching of function calls, the way swift does today, is helpful but ignores the problem that small tasks often have small data inputs and outputs, and that these should be batched along with the job execution. That would leave swift language semantics unchanged, but the implementation would get more efficient and could handle finer-grained tasks. An even more efficient and interesting approach, fully in keeping with the language as it stands today, would be to allow tuples to be expressed as inputs and outputs, and to have swift efficiently and automatically route (and batch) tuples in and out of jobs. So I view what I was asking for here as a prototype or exploration of that direction. It would be good to test the performance of an implementation that streamed output tuples into a subsequent ("reduce") stage of processing, before we even consider what the language and/or implementation would need to do for such a case. On 3/25/08 10:23 AM, Mihael Hategan wrote: ... > Don't use Swift then. Seriously. If you don't want to express things in > a dataflow oriented way, and are not satisfied with its performance for > the given problem, don't use it. I want to express things as dataflow, with high performance, in Swift. Mike On 3/25/08 10:23 AM, Mihael Hategan wrote: > On Tue, 2008-03-25 at 10:14 -0500, Michael Wilde wrote: >>>> In the example below, I want collectResults() to get invoked after all >> >> the runam() calls complete in doall(). >> > >> > results = doall(); >> > collectResults(results); >> > >> > Mihael >> >> But thats the problem: doall() does not in this example return results. > > Then it should be fixed. > >> If it would return an artificial result, how would we get such a return >> to wait until all the runam() calls made within the freach() have completed? >> >> Each of the runam() call runs a small model, and in this proposed >> scenario would leave those results on a local disk for later collection, >> either in a single shared file that many invocations would append to, or >> in a set of files. > > I don't think the solution to performance problems in Swift is to hack > stuff like that. > >> Then collectresults() would run a job that collects all the data when done. >> >> One approach can be to have collectresults() just run iteratively until >> it has collected a sufficient number of results. I.e., to have it not >> depend on swift to find out when all the runam() calls have completed. >> That might work. > > Don't use Swift then. Seriously. If you don't want to express things in > a dataflow oriented way, and are not satisfied with its performance for > the given problem, don't use it. > > Mihael > >> - Mike >> >> >> On 3/25/08 10:00 AM, Mihael Hategan wrote: >>> On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote: >>>> For the petro-model app Im working on, it would be interesting to run >>>> the parameter sweep in "map reduce" manner, in which each invocation >>>> bites off a portion of the parameter space and processes it, resulting >>>> in a set of result tuples. Each run of the model will produce a set of >>>> tuples, and when all are done, we want to aggregate and plot the tuples. >>>> >>>> While with batching this is not strictly needed, it would be interesting >>>> to let the model results accumulate on the local filesystem (as in this >>>> case they are small) and collect them either at the end of the run, or >>>> periodically and perhaps asynchronously during the run. >>>> >>>> To do this, we'd want to write the model invocation as a swift function >>>> with only scalar numeric parameters, and no output. >>> That assertion I'm not sure about. >>> >>>> The question is how to call a zero-returns function in a swift foreach() >>>> loop, and embed that foreach() in a function that doesnt return until >>>> all members of the foreach() have been processed. >>> The very notion of "return" as it would appear in a strict language >>> doesn't make much sense in Swift, so I'm not quite sure. >>> >>>> I havent tried to code this yet, because I cant think of a way to >>>> express it in swift, due to the data-dependency semantics. >>>> >>>> In the example below, I want collectResults() to get invoked after all >>>> the runam() calls complete in doall(). >>> results = doall(); >>> collectResults(results); >>> >>> Mihael >>> >>>> Anyone have any ideas? >>>> >>>> This is a low-priority question, just food for thought, as the batched >>>> way of running this parameter sweep should be straightforward and efficient. >>>> >>>> Mike >>>> >>>> >>>> >>>> // Amiga-Mars Parameter Sweep >>>> >>>> type amout; >>>> >>>> runam (string id , string p1, string p2) // no ret val >>>> { >>>> app { runam3 id p1 p2 ; } >>>> } >>>> >>>> type params { >>>> string id; >>>> string p1; >>>> string p2; >>>> }; >>>> >>>> doall(params p[]) >>>> { >>>> foreach pset in p { >>>> runam(pset.id, pset.p1, pset.p2); >>>> } >>>> // waitTillAllDone(); >>>> // want to block here till all above finish, >>>> // but no data to wait on. any way to >>>> // achieve this??? >>>> } >>>> >>>> // Main >>>> >>>> params p[]; >>>> p = readdata("paramlist"); >>>> doall(p); >>>> amout amdata ; >>>> amdata = collectResults(); >>>> >>>> // ^^^ Want collectresults to run AFTER all runam() calls finish >>>> // in the doall() function. >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>>> >>> > > From hategan at mcs.anl.gov Tue Mar 25 11:01:11 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 25 Mar 2008 11:01:11 -0500 Subject: [Swift-user] How to wait on functions that return no data? In-Reply-To: <47E91E33.3020700@mcs.anl.gov> References: <47E91036.9070806@mcs.anl.gov> <1206457254.19756.5.camel@blabla.mcs.anl.gov> <47E916E0.2020103@mcs.anl.gov> <1206458624.20974.7.camel@blabla.mcs.anl.gov> <47E91E33.3020700@mcs.anl.gov> Message-ID: <1206460871.20974.18.camel@blabla.mcs.anl.gov> I think there is some confusion here between language and implementation. The language can express the problem just fine. That's why I'm saying you should change doall() to return an array with all the outputs. It's the implementation that behaves in a very poor way if the applications are very fine grained. You seem to be trying to solve the problem by: 1. Doing some magic with the way files are moved around 2. Convincing Swift that it should work without knowing about data dependencies, despite the fact that it only works properly if it knows about all data dependencies. By definition. There is some middle ground here. It may be possible to let Swift know what the data dependencies are, but also prevent it from dealing with certain files, by marking them as "virtual" (or whatever the term). Mihael On Tue, 2008-03-25 at 10:45 -0500, Michael Wilde wrote: > Your view has merits in terms of language purity, but I disagree with it. > > This was posed as an academic question, and I think its interesting to > discuss. > > The point here is that there's an application that could best be done by > batching up its output, and in fact perhaps by using the map-reduce > representation of tuples for that output. > > Its still driven by dataflow and data dependencies, just not the > simplistic lock-step dependencies that swift implements today. > > For example, one way to address the problem is to say that batching of > function calls, the way swift does today, is helpful but ignores the > problem that small tasks often have small data inputs and outputs, and > that these should be batched along with the job execution. > > That would leave swift language semantics unchanged, but the > implementation would get more efficient and could handle finer-grained > tasks. > > An even more efficient and interesting approach, fully in keeping with > the language as it stands today, would be to allow tuples to be > expressed as inputs and outputs, and to have swift efficiently and > automatically route (and batch) tuples in and out of jobs. > > So I view what I was asking for here as a prototype or exploration of > that direction. It would be good to test the performance of an > implementation that streamed output tuples into a subsequent ("reduce") > stage of processing, before we even consider what the language and/or > implementation would need to do for such a case. > > > On 3/25/08 10:23 AM, Mihael Hategan wrote: > ... > > Don't use Swift then. Seriously. If you don't want to express things in > > a dataflow oriented way, and are not satisfied with its performance for > > the given problem, don't use it. > > I want to express things as dataflow, with high performance, in Swift. > > Mike > > > On 3/25/08 10:23 AM, Mihael Hategan wrote: > > On Tue, 2008-03-25 at 10:14 -0500, Michael Wilde wrote: > >>>> In the example below, I want collectResults() to get invoked after all > >> >> the runam() calls complete in doall(). > >> > > >> > results = doall(); > >> > collectResults(results); > >> > > >> > Mihael > >> > >> But thats the problem: doall() does not in this example return results. > > > > Then it should be fixed. > > > >> If it would return an artificial result, how would we get such a return > >> to wait until all the runam() calls made within the freach() have completed? > >> > >> Each of the runam() call runs a small model, and in this proposed > >> scenario would leave those results on a local disk for later collection, > >> either in a single shared file that many invocations would append to, or > >> in a set of files. > > > > I don't think the solution to performance problems in Swift is to hack > > stuff like that. > > > >> Then collectresults() would run a job that collects all the data when done. > >> > >> One approach can be to have collectresults() just run iteratively until > >> it has collected a sufficient number of results. I.e., to have it not > >> depend on swift to find out when all the runam() calls have completed. > >> That might work. > > > > Don't use Swift then. Seriously. If you don't want to express things in > > a dataflow oriented way, and are not satisfied with its performance for > > the given problem, don't use it. > > > > Mihael > > > >> - Mike > >> > >> > >> On 3/25/08 10:00 AM, Mihael Hategan wrote: > >>> On Tue, 2008-03-25 at 09:46 -0500, Michael Wilde wrote: > >>>> For the petro-model app Im working on, it would be interesting to run > >>>> the parameter sweep in "map reduce" manner, in which each invocation > >>>> bites off a portion of the parameter space and processes it, resulting > >>>> in a set of result tuples. Each run of the model will produce a set of > >>>> tuples, and when all are done, we want to aggregate and plot the tuples. > >>>> > >>>> While with batching this is not strictly needed, it would be interesting > >>>> to let the model results accumulate on the local filesystem (as in this > >>>> case they are small) and collect them either at the end of the run, or > >>>> periodically and perhaps asynchronously during the run. > >>>> > >>>> To do this, we'd want to write the model invocation as a swift function > >>>> with only scalar numeric parameters, and no output. > >>> That assertion I'm not sure about. > >>> > >>>> The question is how to call a zero-returns function in a swift foreach() > >>>> loop, and embed that foreach() in a function that doesnt return until > >>>> all members of the foreach() have been processed. > >>> The very notion of "return" as it would appear in a strict language > >>> doesn't make much sense in Swift, so I'm not quite sure. > >>> > >>>> I havent tried to code this yet, because I cant think of a way to > >>>> express it in swift, due to the data-dependency semantics. > >>>> > >>>> In the example below, I want collectResults() to get invoked after all > >>>> the runam() calls complete in doall(). > >>> results = doall(); > >>> collectResults(results); > >>> > >>> Mihael > >>> > >>>> Anyone have any ideas? > >>>> > >>>> This is a low-priority question, just food for thought, as the batched > >>>> way of running this parameter sweep should be straightforward and efficient. > >>>> > >>>> Mike > >>>> > >>>> > >>>> > >>>> // Amiga-Mars Parameter Sweep > >>>> > >>>> type amout; > >>>> > >>>> runam (string id , string p1, string p2) // no ret val > >>>> { > >>>> app { runam3 id p1 p2 ; } > >>>> } > >>>> > >>>> type params { > >>>> string id; > >>>> string p1; > >>>> string p2; > >>>> }; > >>>> > >>>> doall(params p[]) > >>>> { > >>>> foreach pset in p { > >>>> runam(pset.id, pset.p1, pset.p2); > >>>> } > >>>> // waitTillAllDone(); > >>>> // want to block here till all above finish, > >>>> // but no data to wait on. any way to > >>>> // achieve this??? > >>>> } > >>>> > >>>> // Main > >>>> > >>>> params p[]; > >>>> p = readdata("paramlist"); > >>>> doall(p); > >>>> amout amdata ; > >>>> amdata = collectResults(); > >>>> > >>>> // ^^^ Want collectresults to run AFTER all runam() calls finish > >>>> // in the doall() function. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Swift-user mailing list > >>>> Swift-user at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > >>>> > >>> > > > > > From quanpt at cs.uchicago.edu Tue Mar 25 11:01:24 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Tue, 25 Mar 2008 11:01:24 -0500 Subject: [Swift-user] Using swift with falkon on teraport Message-ID: <4290b6c60803250901i189169enb421f87abc60380@mail.gmail.com> Hi, I just wonder if anyone has run Swift with Falkon on teraport? How do you config Swift (sites.xml (no sample on falkon), tc.data (no need to change?)). I find a link about Swift and Falkon here http://dev.globus.org/wiki/Incubator/Falkon#Project_Branches , but the link to the article has no content. I am try ing to: run falkon on tp-login, run swift on that same machine to submit jobs to falkon to run on teraport. Thank you very much Quan Pham From wilde at mcs.anl.gov Tue Mar 25 11:18:42 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 25 Mar 2008 11:18:42 -0500 Subject: [Swift-user] Using swift with falkon on teraport In-Reply-To: <4290b6c60803250901i189169enb421f87abc60380@mail.gmail.com> References: <4290b6c60803250901i189169enb421f87abc60380@mail.gmail.com> Message-ID: <47E925E2.1050606@mcs.anl.gov> Hi Quan, I'm doing something similar at the moment on machines at Argonne. Do you already have Falkon built? (I'm using the attached file of notes that I compiled from Ioan). I run swift and falkon together on a host that has access to the cluster shared filesystem, which in your case would be tp-login (or better yet a cluster node that you can allocate using qsub -I, as as not to over-tax a login host). I use the local data provider, so that swift uses direct shared-filesystem access to move data back and forth and do directory and status file management. Here's my sites file: Below is my working doc of info form Ioan and Zhao, also attached in word. - Mike /home/wilde/swiftwork Compiling Swift with Falkon support: when you build Swift, add the -Dwith-provider-deef option: cd ${FALKON_ROOT}/cog/modules/vdsk/ ant -Dwith-provider-deef redist Security Note BGexec supports no security they connect back to the Falkon service and get work from there they don't have any server sockets so someone would have to hijack the connections and fake the service for them to inject jobs to the workers... if the workers would have had server sockets listening on some ports then it would be different but they are simple clients that only generate outgoing connections to a specific IP the service IP and the Falkon service can run on the same box with Swift behind a firewall with only 3 ports open Java Needs IA64 nodes require Java 1.4 work up to 1.6 Falkon Tarball wget http://people.cs.uchicago.edu/~iraicu/source/falkon-r83.tgz tar xfz falkon-r83.tgz cd falkon-r83/ source falkon.env if you want to re-build (not needed for this tar ball) falkon-clean.sh falkon-build.sh Building Falkon The SVN archive has grown rather large recently, and some of the directories (i.e. workloads and AstroPortal) make up the largest part of the contents. With its current organization, here is how you would do a minimal checkout (~43MB, Falkon User Guide, Section 2.1, http://dev.globus.org/images/0/0e/Falkon_User_Guide_v2.pdf), and compile: export ANT_HOME=/home/wilde/ant/dist svn co https://svn.globus.org/repos/falkon -N cd falkon svn co https://svn.globus.org/repos/falkon/bin source falkon.env falkon-checkout-minimal.sh source falkon.env falkon-build.sh This checkout takes 62 seconds for me, and the compile takes 43 seconds. BTW, the entire thing (including all .svn dirs and compiled) is 148MB after a clean checkout and compilation. Starting Falkon On screen 1: cd falkon-r83 source falkon.env falkon-service-stdout.sh 50001 config/Falkon-TCPCore.config On screen 2: cd falkon-r83 source falkon.env falkon-worker-stdout.sh localhost 50001 at this point, you have the service running... press any key and enter at the worker to terminate BGexec?s on sico: The file: /home/iraicu/java/svn/falkon/worker/ServiceName.txt points each BGexec to where the service is running so you need to update that file prior to starting the BGexecs with the IP of the service then to start them: cd ~iraicu/java/svn/falkon/worker ./run.drp-slurm.sh 6 60 this would start 6 BGexecs for 60 minutes you might need to copy over the BGexec source (1 file) and compile it on the SiCo itself and the starting scripts (2 of them) Testing: create a 3rd screen cd falkon-r83 source falkon.env falkon-client.sh 140.221.37.30 50001 workloads/sleep/sleep_1x10 the IP can also be localhost at this point Debugging and Logs here are the logs you need to make sure you capture when running in debug mode: cd ~/java/svn/falkon/config cat Falkon-TCPCore.config GenericPortalWS=falkon_task_submission_history.txt GenericPortalWS_perf_per_sec=falkon_summary.txt GenericPortalWS_taskPerf=falkon_task_perf.txt GenericPortalWS_task=falkon_task_status.txt When running in normal mode (when we know things work fine), we just need: cd ~/java/svn/falkon/config cat Falkon-TCPCore.config GenericPortalWS_perf_per_sec=falkon_summary.txt GenericPortalWS_taskPerf=falkon_task_perf.txt In the event that we can't figure out things from the Swift and Falkon service logs, we might have to enable worker side logs as well, which you do from the run.worker-c.sh (or run.worker-c-ram.sh) script(s). On 3/25/08 11:01 AM, Quan Tran Pham wrote: > Hi, > > I just wonder if anyone has run Swift with Falkon on teraport? How do > you config Swift (sites.xml (no sample on falkon), tc.data (no need to > change?)). I find a link about Swift and Falkon here > http://dev.globus.org/wiki/Incubator/Falkon#Project_Branches , but the > link to the article has no content. > > I am try ing to: run falkon on tp-login, run swift on that same > machine to submit jobs to falkon to run on teraport. > > Thank you very much > > Quan Pham > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -------------- next part -------------- A non-text attachment was scrubbed... Name: Falkon.SiCo.FromIoan.2008.0311.doc Type: application/msword Size: 37376 bytes Desc: not available URL: From quanpt at cs.uchicago.edu Tue Mar 25 11:36:22 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Tue, 25 Mar 2008 11:36:22 -0500 Subject: [Swift-user] Using swift with falkon on teraport In-Reply-To: <47E925E2.1050606@mcs.anl.gov> References: <4290b6c60803250901i189169enb421f87abc60380@mail.gmail.com> <47E925E2.1050606@mcs.anl.gov> Message-ID: <4290b6c60803250936x3fe6d3a0tf64567d222e86a08@mail.gmail.com> Thank you Mike, I have Falkon built already, now I should recompile swift. I think the Swift with Falkon part should be put up on "http://dev.globus.org/wiki/Incubator/Falkon#Project_Branches", at "Running Swift Applications over Falkon" so that others can have some reference. Regards Quan Pham On Tue, Mar 25, 2008 at 11:18 AM, Michael Wilde wrote: > Hi Quan, > > I'm doing something similar at the moment on machines at Argonne. > > Do you already have Falkon built? (I'm using the attached file of notes > that I compiled from Ioan). > > I run swift and falkon together on a host that has access to the cluster > shared filesystem, which in your case would be tp-login (or better yet a > cluster node that you can allocate using qsub -I, as as not to over-tax > a login host). > > I use the local data provider, so that swift uses direct > shared-filesystem access to move data back and forth and do directory > and status file management. > > Here's my sites file: > > > Below is my working doc of info form Ioan and Zhao, also attached in word. > > - Mike > From iraicu at cs.uchicago.edu Tue Mar 25 13:42:19 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 25 Mar 2008 13:42:19 -0500 Subject: [Swift-user] Using swift with falkon on teraport In-Reply-To: <4290b6c60803250936x3fe6d3a0tf64567d222e86a08@mail.gmail.com> References: <4290b6c60803250901i189169enb421f87abc60380@mail.gmail.com> <47E925E2.1050606@mcs.anl.gov> <4290b6c60803250936x3fe6d3a0tf64567d222e86a08@mail.gmail.com> Message-ID: <47E9478B.8020904@cs.uchicago.edu> Hi Pham, I finally updated the Falkon site to include the instructions to get Swift and Falkon setup together: http://dev.globus.org/wiki/Incubator/Falkon/Swift. You'll also have to update Falkon to the latest R115 to make sure you have the latest build scripts for the instructions to work. Try them out, and let me know if there are any bugs in the instructions (as I haven't actually tried them verbatim). Ioan Quan Tran Pham wrote: > Thank you Mike, > > I have Falkon built already, now I should recompile swift. > I think the Swift with Falkon part should be put up on > "http://dev.globus.org/wiki/Incubator/Falkon#Project_Branches", at > "Running Swift Applications over Falkon" so that others can have some > reference. > > Regards > > Quan Pham > > On Tue, Mar 25, 2008 at 11:18 AM, Michael Wilde wrote: > >> Hi Quan, >> >> I'm doing something similar at the moment on machines at Argonne. >> >> Do you already have Falkon built? (I'm using the attached file of notes >> that I compiled from Ioan). >> >> I run swift and falkon together on a host that has access to the cluster >> shared filesystem, which in your case would be tp-login (or better yet a >> cluster node that you can allocate using qsub -I, as as not to over-tax >> a login host). >> >> I use the local data provider, so that swift uses direct >> shared-filesystem access to move data back and forth and do directory >> and status file management. >> >> Here's my sites file: >> >> >> Below is my working doc of info form Ioan and Zhao, also attached in word. >> >> - Mike >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Mar 25 17:45:29 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 25 Mar 2008 22:45:29 +0000 (GMT) Subject: [Swift-user] How to wait on functions that return no data? In-Reply-To: <47E91036.9070806@mcs.anl.gov> References: <47E91036.9070806@mcs.anl.gov> Message-ID: On Tue, 25 Mar 2008, Michael Wilde wrote: > While with batching this is not strictly needed, it would be interesting to > let the model results accumulate on the local filesystem (as in this case they > are small) and collect them either at the end of the run, or periodically and > perhaps asynchronously during the run. Results do to some extent get collected asynchronously at the moment: if you run a procedure, tehre are three (relevant) steps: stagein, run, stageout, that can be interleaved with the same steps from other procedure invocations. We've even had incorrect bug reports about this, on the lines of: I see that 100 jobs are being reported as completed by the job execution bit of the logs, but I see only one output file. What's been happening here is that the full permissible load for file transfer is being used for stageins, with stageouts happening much later ('asynchronously'). By batching, do you mean 'results from multiple procedure invocations going into a single file' ? -- From piccoli at fnal.gov Wed Mar 26 14:15:38 2008 From: piccoli at fnal.gov (Luciano Piccoli) Date: Wed, 26 Mar 2008 14:15:38 -0500 Subject: [Swift-user] iterate doesn't stop Message-ID: <47EAA0DA.1010607@fnal.gov> Hi, The following simple Swift script works fine: -- iterate.swift ------- int N=4; iterate i { print (i); } until (i == N); --------------------- bash-3.00$ swift iterate.swift Swift vsvn swift-r1744 cog-r1936 RunID: 20080326-1408-b4l23enc Progress: 0 1 2 3 4 Final status: However, if the number of iterations N is read from the command line it does not stop: -- iterate.swift ------- int N=@arg("N"); iterate i { print (i); } until (i == N); ---------------------- bash-3.00$ swift iterate.swift -N=4 Swift vsvn swift-r1744 cog-r1936 RunID: 20080326-1409-bv0q7jm2 Progress: 0 1 2 3 4 5 ... 69348 69349 ... Any ideas why? Thanks, Luciano From benc at hawaga.org.uk Wed Mar 26 20:43:17 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 27 Mar 2008 01:43:17 +0000 (GMT) Subject: [Swift-user] iterate doesn't stop In-Reply-To: <47EAA0DA.1010607@fnal.gov> References: <47EAA0DA.1010607@fnal.gov> Message-ID: On Wed, 26 Mar 2008, Luciano Piccoli wrote: > However, if the number of iterations N is read from the command line it does > not stop: > > -- iterate.swift ------- > int N=@arg("N"); > iterate i { > print (i); > } until (i == N); > Any ideas why? That shouldn't even compile because the types aren't correct. @args returns a String, "5"; which is never the same as the number 5. Not enough type checking going on there - it should fail here: > int N=@arg("N"); However, it doesn't because the compiler doesn't type check enough. I just committed (in r1772) a function, @toint, that you can use like this: int N=@toint(@arg("N")); I've used it in the past for exactly this purpose. -- From wilde at mcs.anl.gov Sat Mar 29 07:22:33 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 29 Mar 2008 07:22:33 -0500 Subject: [Swift-user] debug log not being produced Message-ID: <47EE3489.2040607@mcs.anl.gov> Im not getting a swift .log file on the BGP after rebuilding swift to 1771 with provider-deef. Im backtracking and debugging now, but does anyone know of recent changes in this area, or of what I might have done wrong? (I lost access to many files when my sicortex NFS client host crashed and then went into limited access mode, so I may have mis-configured things) Does anyone see a cause? Thanks, Mike -- Im running swift with this script: WORKFLOW=$1 shift site $* >sites.xml ( echo Swift script $WORKFLOW.swift starting at `date` echo running on sites: $* echo swift \ -sites.file ./sites.xml \ -tc.file ./tc.data \ $WORKFLOW.swift echo echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? ) >swift.out 2>&1 -- etc/log4j.properties bg$ cat log4j.properties # Set root category priority to WARN and its appenders to CONSOLE and FILE. log4j.rootCategory=INFO, CONSOLE, FILE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout.ConversionPattern=%m%n log4j.appender.FILE=org.apache.log4j.FileAppender log4j.appender.FILE.File=swift.log log4j.appender.FILE.layout=org.apache.log4j.PatternLayout log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n log4j.logger.swift=DEBUG log4j.logger.org.globus.swift.trace=INFO log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG log4j.logger.org.griphyn.vdl.engine.Karajan=INFO -- ~/.swift/swift.properties sitedir.keep=true wrapperlog.always.transfer=true lazy.errors=true execution.retries=0 #kickstart.always.transfer=true throttle.submit=off throttle.host.submit=off throttle.transfers=20 throttle.file.operations=20 throttle.score.job.factor=1000000 -- run dir after a swift run - no .log file! bg$ cd ~/mars/run01 bg$ ls -lt total 1280 -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out -rw-r--r-- 1 wilde users 48 Mar 28 17:20 amps2-20080328-1720-6jit1g59.0.rlog drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 amps2-20080328-1720-6jit1g59.d -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift bg$ From iraicu at cs.uchicago.edu Sat Mar 29 08:19:43 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sat, 29 Mar 2008 08:19:43 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EE3489.2040607@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> Message-ID: <47EE41EF.40007@cs.uchicago.edu> Did you compile Swift with ant -Dwith-provider-deef redist in the cog/modules/vdsk dir? Also, make sure you don't compile the deef provider with the old way: ant -Ddist.dir=../vdsk/dist/vdsk-0.3-dev/ dist in the cog/modules/provider-deef Ioan Michael Wilde wrote: > Im not getting a swift .log file on the BGP after rebuilding > swift to 1771 with provider-deef. > > Im backtracking and debugging now, but does anyone know of recent > changes in this area, or of what I might have done wrong? > > (I lost access to many files when my sicortex NFS client host crashed > and then went into limited access mode, so I may have mis-configured > things) > > Does anyone see a cause? > > Thanks, > > Mike > > -- Im running swift with this script: > > WORKFLOW=$1 > shift > > site $* >sites.xml > > ( echo Swift script $WORKFLOW.swift starting at `date` > echo running on sites: $* > echo > swift \ > -sites.file ./sites.xml \ > -tc.file ./tc.data \ > $WORKFLOW.swift > echo > echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? > ) >swift.out 2>&1 > > -- etc/log4j.properties > > bg$ cat log4j.properties > # Set root category priority to WARN and its appenders to CONSOLE and > FILE. > log4j.rootCategory=INFO, CONSOLE, FILE > > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout > log4j.appender.CONSOLE.Threshold=INFO > log4j.appender.CONSOLE.layout.ConversionPattern=%m%n > > log4j.appender.FILE=org.apache.log4j.FileAppender > log4j.appender.FILE.File=swift.log > log4j.appender.FILE.layout=org.apache.log4j.PatternLayout > log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd > HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n > > log4j.logger.swift=DEBUG > > log4j.logger.org.globus.swift.trace=INFO > > log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG > log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN > log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN > log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG > > log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG > log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG > log4j.logger.org.griphyn.vdl.engine.Karajan=INFO > > -- ~/.swift/swift.properties > > sitedir.keep=true > wrapperlog.always.transfer=true > lazy.errors=true > execution.retries=0 > > #kickstart.always.transfer=true > > throttle.submit=off > throttle.host.submit=off > throttle.transfers=20 > throttle.file.operations=20 > throttle.score.job.factor=1000000 > > -- run dir after a swift run - no .log file! > > bg$ cd ~/mars/run01 > bg$ ls -lt > total 1280 > -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out > -rw-r--r-- 1 wilde users 48 Mar 28 17:20 > amps2-20080328-1720-6jit1g59.0.rlog > drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 > amps2-20080328-1720-6jit1g59.d > -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml > -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml > -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml > -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist > -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data > -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift > bg$ > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From hategan at mcs.anl.gov Sat Mar 29 08:27:57 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 29 Mar 2008 08:27:57 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EE3489.2040607@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> Message-ID: <1206797277.21645.2.camel@blabla.mcs.anl.gov> log4j.properties looks fine. What's in amps2-20080328-1720-6jit1g59.d and what does swift.out say? Mihael On Sat, 2008-03-29 at 07:22 -0500, Michael Wilde wrote: > Im not getting a swift .log file on the BGP after rebuilding > swift to 1771 with provider-deef. > > Im backtracking and debugging now, but does anyone know of recent > changes in this area, or of what I might have done wrong? > > (I lost access to many files when my sicortex NFS client host crashed > and then went into limited access mode, so I may have mis-configured things) > > Does anyone see a cause? > > Thanks, > > Mike > > -- Im running swift with this script: > > WORKFLOW=$1 > shift > > site $* >sites.xml > > ( echo Swift script $WORKFLOW.swift starting at `date` > echo running on sites: $* > echo > swift \ > -sites.file ./sites.xml \ > -tc.file ./tc.data \ > $WORKFLOW.swift > echo > echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? > ) >swift.out 2>&1 > > -- etc/log4j.properties > > bg$ cat log4j.properties > # Set root category priority to WARN and its appenders to CONSOLE and FILE. > log4j.rootCategory=INFO, CONSOLE, FILE > > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout > log4j.appender.CONSOLE.Threshold=INFO > log4j.appender.CONSOLE.layout.ConversionPattern=%m%n > > log4j.appender.FILE=org.apache.log4j.FileAppender > log4j.appender.FILE.File=swift.log > log4j.appender.FILE.layout=org.apache.log4j.PatternLayout > log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd > HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n > > log4j.logger.swift=DEBUG > > log4j.logger.org.globus.swift.trace=INFO > > log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG > log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN > log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN > log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG > log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG > log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG > log4j.logger.org.griphyn.vdl.engine.Karajan=INFO > > -- ~/.swift/swift.properties > > sitedir.keep=true > wrapperlog.always.transfer=true > lazy.errors=true > execution.retries=0 > > #kickstart.always.transfer=true > > throttle.submit=off > throttle.host.submit=off > throttle.transfers=20 > throttle.file.operations=20 > throttle.score.job.factor=1000000 > > -- run dir after a swift run - no .log file! > > bg$ cd ~/mars/run01 > bg$ ls -lt > total 1280 > -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out > -rw-r--r-- 1 wilde users 48 Mar 28 17:20 > amps2-20080328-1720-6jit1g59.0.rlog > drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 amps2-20080328-1720-6jit1g59.d > -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml > -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml > -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml > -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist > -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data > -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift > bg$ > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From wilde at mcs.anl.gov Sat Mar 29 09:25:10 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 29 Mar 2008 09:25:10 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <1206797277.21645.2.camel@blabla.mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> Message-ID: <47EE5146.4010803@mcs.anl.gov> I rebuild a clean swift from r1773 from scratch, without deef, on both terminable and the bgp, then tested with a simple "swift first.swift" in examples/vdsk. On terminable I get the log file fine, on the bgp I dont. On 3/29/08 8:27 AM, Mihael Hategan wrote: > log4j.properties looks fine. > > What's in amps2-20080328-1720-6jit1g59.d and what does swift.out say? The *.d dir was empty. That swift.out said: Swift script amps2.swift starting at Fri Mar 28 17:20:12 CDT 2008 running on sites: bgps Swift svn swift-r1771 cog-r1953 RunID: 20080328-1720-6jit1g59 Progress: runam6 started error: Notification(int timeout): socket = new ServerSocket(recvPort); Address already in use error: Notification(int timeout): socket = new ServerSocket(recvPort); Address already in use 2008-03-28 17:20:27,788 WARN submitQueue.NonBlockingSubmit [pool-1-thread-1,notifyPreviousQueue:71] Warning: Task handler throws exception and also sets status runam6 failed Final status: Failed:1 The following errors have occurred: 1. Application "runam6" failed (Task failed) Arguments: "000000, 0.200000, 0.000391, 0.204419, 0.200000, 0.000391, 0.204419, 10" Host: bgps Directory: amps2-20080328-1720-6jit1g59/jobs/a/runam6-a9750gqi STDERR: STDOUT: Swift Script amps2.swift ended at Fri Mar 28 17:20:28 CDT 2008 with exit code 0 -- it was that error that I was going after, and couldn't get the log file. -- For the test I did on a fresh clean build of 1773 on the bgp, swift stdout/err gave: bg$ $s/bin/swift first.swift Swift svn swift-r1771 cog-r1953 RunID: 20080329-0913-xdviok1a Progress: echo started echo completed Final status: Finished successfully:1 bg$ ls anonymous.swift default.dtm first.xml helloworld_named.dtm regexp.swift array_index.dtm diamond.dtm fixedarray.swift manyparam.swift tutorial array_iteration.dtm file_counter.dtm foreach.swift parameter.swift types.swift array_wildcard.dtm first.kml hello.txt q16.txt arraymapper.dtm first.swift helloworld.dtm range.dtm bg$ pwd /home/wilde/swift/rev/1773c/examples/vdsk bg$ -- Im using this java: bg$ echo $JAVA_HOME /home/falkon/java bg$ which java /home/falkon/java/bin/java bg$ java -version java version "1.5.0" Java(TM) 2 Runtime Environment, Standard Edition (build pxp64dev-20070511(SR5)) IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux ppc64-64 j9vmxp6423-20070426 (JIT enabled) J9VM - 20070420_12448_BHdSMr JIT - 20070419_1806_r8 GC - 200704_19) JCL - 20070511 bg$ bg$ ls -l $JAVA_HOME lrwxrwxrwx 1 wilde falkon 32 Mar 26 15:18 /home/falkon/java -> /home/falkon/ibm-java2-ppc64-50/ -- - Mike > > Mihael > > On Sat, 2008-03-29 at 07:22 -0500, Michael Wilde wrote: >> Im not getting a swift .log file on the BGP after rebuilding >> swift to 1771 with provider-deef. >> >> Im backtracking and debugging now, but does anyone know of recent >> changes in this area, or of what I might have done wrong? >> >> (I lost access to many files when my sicortex NFS client host crashed >> and then went into limited access mode, so I may have mis-configured things) >> >> Does anyone see a cause? >> >> Thanks, >> >> Mike >> >> -- Im running swift with this script: >> >> WORKFLOW=$1 >> shift >> >> site $* >sites.xml >> >> ( echo Swift script $WORKFLOW.swift starting at `date` >> echo running on sites: $* >> echo >> swift \ >> -sites.file ./sites.xml \ >> -tc.file ./tc.data \ >> $WORKFLOW.swift >> echo >> echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? >> ) >swift.out 2>&1 >> >> -- etc/log4j.properties >> >> bg$ cat log4j.properties >> # Set root category priority to WARN and its appenders to CONSOLE and FILE. >> log4j.rootCategory=INFO, CONSOLE, FILE >> >> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender >> log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout >> log4j.appender.CONSOLE.Threshold=INFO >> log4j.appender.CONSOLE.layout.ConversionPattern=%m%n >> >> log4j.appender.FILE=org.apache.log4j.FileAppender >> log4j.appender.FILE.File=swift.log >> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout >> log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd >> HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n >> >> log4j.logger.swift=DEBUG >> >> log4j.logger.org.globus.swift.trace=INFO >> >> log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG >> log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN >> log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN >> log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG >> log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG >> log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG >> log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG >> log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG >> log4j.logger.org.griphyn.vdl.engine.Karajan=INFO >> >> -- ~/.swift/swift.properties >> >> sitedir.keep=true >> wrapperlog.always.transfer=true >> lazy.errors=true >> execution.retries=0 >> >> #kickstart.always.transfer=true >> >> throttle.submit=off >> throttle.host.submit=off >> throttle.transfers=20 >> throttle.file.operations=20 >> throttle.score.job.factor=1000000 >> >> -- run dir after a swift run - no .log file! >> >> bg$ cd ~/mars/run01 >> bg$ ls -lt >> total 1280 >> -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out >> -rw-r--r-- 1 wilde users 48 Mar 28 17:20 >> amps2-20080328-1720-6jit1g59.0.rlog >> drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 amps2-20080328-1720-6jit1g59.d >> -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml >> -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml >> -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml >> -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist >> -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data >> -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift >> bg$ >> >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > From wilde at mcs.anl.gov Sat Mar 29 09:39:28 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 29 Mar 2008 09:39:28 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EE5146.4010803@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> Message-ID: <47EE54A0.1080906@mcs.anl.gov> sorry, i didnt say clearly whats going on here: with a clean 1773 build, running first.swift (hello world), on terminable I get the log file fine, and on the bgp i get no log file. I will test on the bgp with the alternative Java 1.4 release in /usr/bin when I get a chance. - mike On 3/29/08 9:25 AM, Michael Wilde wrote: > I rebuild a clean swift from r1773 from scratch, without deef, on both > terminable and the bgp, then tested with a simple "swift first.swift" in > examples/vdsk. > > On terminable I get the log file fine, on the bgp I dont. > > > > On 3/29/08 8:27 AM, Mihael Hategan wrote: >> log4j.properties looks fine. >> >> What's in amps2-20080328-1720-6jit1g59.d and what does swift.out say? > > The *.d dir was empty. That swift.out said: > > Swift script amps2.swift starting at Fri Mar 28 17:20:12 CDT 2008 > running on sites: bgps > > Swift svn swift-r1771 cog-r1953 > > RunID: 20080328-1720-6jit1g59 > Progress: > runam6 started > error: Notification(int timeout): socket = new ServerSocket(recvPort); > Address already in use > error: Notification(int timeout): socket = new ServerSocket(recvPort); > Address already in use > 2008-03-28 17:20:27,788 WARN submitQueue.NonBlockingSubmit > [pool-1-thread-1,notifyPreviousQueue:71] Warning: Task handler throws > exception and also sets status > runam6 failed > Final status: Failed:1 > The following errors have occurred: > 1. Application "runam6" failed (Task failed) > Arguments: "000000, 0.200000, 0.000391, 0.204419, 0.200000, > 0.000391, 0.204419, 10" > Host: bgps > Directory: amps2-20080328-1720-6jit1g59/jobs/a/runam6-a9750gqi > STDERR: > STDOUT: > > Swift Script amps2.swift ended at Fri Mar 28 17:20:28 CDT 2008 with exit > code 0 > > -- > it was that error that I was going after, and couldn't get the log file. > > -- > For the test I did on a fresh clean build of 1773 on the bgp, swift > stdout/err gave: > > bg$ $s/bin/swift first.swift > Swift svn swift-r1771 cog-r1953 > > RunID: 20080329-0913-xdviok1a > Progress: > echo started > echo completed > Final status: Finished successfully:1 > bg$ ls > anonymous.swift default.dtm first.xml helloworld_named.dtm > regexp.swift > array_index.dtm diamond.dtm fixedarray.swift manyparam.swift > tutorial > array_iteration.dtm file_counter.dtm foreach.swift parameter.swift > types.swift > array_wildcard.dtm first.kml hello.txt q16.txt > arraymapper.dtm first.swift helloworld.dtm range.dtm > bg$ pwd > /home/wilde/swift/rev/1773c/examples/vdsk > bg$ > > -- > > Im using this java: > > bg$ echo $JAVA_HOME > /home/falkon/java > bg$ which java > /home/falkon/java/bin/java > bg$ java -version > java version "1.5.0" > Java(TM) 2 Runtime Environment, Standard Edition (build > pxp64dev-20070511(SR5)) > IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux ppc64-64 > j9vmxp6423-20070426 (JIT enabled) > J9VM - 20070420_12448_BHdSMr > JIT - 20070419_1806_r8 > GC - 200704_19) > JCL - 20070511 > bg$ > > bg$ ls -l $JAVA_HOME > lrwxrwxrwx 1 wilde falkon 32 Mar 26 15:18 /home/falkon/java -> > /home/falkon/ibm-java2-ppc64-50/ > > -- > > - Mike > > > >> >> Mihael >> >> On Sat, 2008-03-29 at 07:22 -0500, Michael Wilde wrote: >>> Im not getting a swift .log file on the BGP after rebuilding >>> swift to 1771 with provider-deef. >>> >>> Im backtracking and debugging now, but does anyone know of recent >>> changes in this area, or of what I might have done wrong? >>> >>> (I lost access to many files when my sicortex NFS client host crashed >>> and then went into limited access mode, so I may have mis-configured >>> things) >>> >>> Does anyone see a cause? >>> >>> Thanks, >>> >>> Mike >>> >>> -- Im running swift with this script: >>> >>> WORKFLOW=$1 >>> shift >>> >>> site $* >sites.xml >>> >>> ( echo Swift script $WORKFLOW.swift starting at `date` >>> echo running on sites: $* >>> echo >>> swift \ >>> -sites.file ./sites.xml \ >>> -tc.file ./tc.data \ >>> $WORKFLOW.swift >>> echo >>> echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? >>> ) >swift.out 2>&1 >>> >>> -- etc/log4j.properties >>> >>> bg$ cat log4j.properties >>> # Set root category priority to WARN and its appenders to CONSOLE and >>> FILE. >>> log4j.rootCategory=INFO, CONSOLE, FILE >>> >>> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender >>> log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout >>> log4j.appender.CONSOLE.Threshold=INFO >>> log4j.appender.CONSOLE.layout.ConversionPattern=%m%n >>> >>> log4j.appender.FILE=org.apache.log4j.FileAppender >>> log4j.appender.FILE.File=swift.log >>> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout >>> log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd >>> HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n >>> >>> log4j.logger.swift=DEBUG >>> >>> log4j.logger.org.globus.swift.trace=INFO >>> >>> log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG >>> log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN >>> log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN >>> log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG >>> >>> log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG >>> log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG >>> log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG >>> log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG >>> log4j.logger.org.griphyn.vdl.engine.Karajan=INFO >>> >>> -- ~/.swift/swift.properties >>> >>> sitedir.keep=true >>> wrapperlog.always.transfer=true >>> lazy.errors=true >>> execution.retries=0 >>> >>> #kickstart.always.transfer=true >>> >>> throttle.submit=off >>> throttle.host.submit=off >>> throttle.transfers=20 >>> throttle.file.operations=20 >>> throttle.score.job.factor=1000000 >>> >>> -- run dir after a swift run - no .log file! >>> >>> bg$ cd ~/mars/run01 >>> bg$ ls -lt >>> total 1280 >>> -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out >>> -rw-r--r-- 1 wilde users 48 Mar 28 17:20 >>> amps2-20080328-1720-6jit1g59.0.rlog >>> drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 >>> amps2-20080328-1720-6jit1g59.d >>> -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml >>> -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml >>> -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml >>> -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist >>> -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data >>> -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift >>> bg$ >>> >>> >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From wilde at mcs.anl.gov Sat Mar 29 11:15:40 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 29 Mar 2008 11:15:40 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EE5146.4010803@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> Message-ID: <47EE6B2C.6030507@mcs.anl.gov> Rebuilt with the native Java 1.4, the BGP produces the Swift log just fine. Ioan, where did the Java 1.5 in /home/falkon come from, and it it needed to make Falkon run correctly? - Mike On 3/29/08 9:25 AM, Michael Wilde wrote: > I rebuild a clean swift from r1773 from scratch, without deef, on both > terminable and the bgp, then tested with a simple "swift first.swift" in > examples/vdsk. > > On terminable I get the log file fine, on the bgp I dont. > > > > On 3/29/08 8:27 AM, Mihael Hategan wrote: >> log4j.properties looks fine. >> >> What's in amps2-20080328-1720-6jit1g59.d and what does swift.out say? > > The *.d dir was empty. That swift.out said: > > Swift script amps2.swift starting at Fri Mar 28 17:20:12 CDT 2008 > running on sites: bgps > > Swift svn swift-r1771 cog-r1953 > > RunID: 20080328-1720-6jit1g59 > Progress: > runam6 started > error: Notification(int timeout): socket = new ServerSocket(recvPort); > Address already in use > error: Notification(int timeout): socket = new ServerSocket(recvPort); > Address already in use > 2008-03-28 17:20:27,788 WARN submitQueue.NonBlockingSubmit > [pool-1-thread-1,notifyPreviousQueue:71] Warning: Task handler throws > exception and also sets status > runam6 failed > Final status: Failed:1 > The following errors have occurred: > 1. Application "runam6" failed (Task failed) > Arguments: "000000, 0.200000, 0.000391, 0.204419, 0.200000, > 0.000391, 0.204419, 10" > Host: bgps > Directory: amps2-20080328-1720-6jit1g59/jobs/a/runam6-a9750gqi > STDERR: > STDOUT: > > Swift Script amps2.swift ended at Fri Mar 28 17:20:28 CDT 2008 with exit > code 0 > > -- > it was that error that I was going after, and couldn't get the log file. > > -- > For the test I did on a fresh clean build of 1773 on the bgp, swift > stdout/err gave: > > bg$ $s/bin/swift first.swift > Swift svn swift-r1771 cog-r1953 > > RunID: 20080329-0913-xdviok1a > Progress: > echo started > echo completed > Final status: Finished successfully:1 > bg$ ls > anonymous.swift default.dtm first.xml helloworld_named.dtm > regexp.swift > array_index.dtm diamond.dtm fixedarray.swift manyparam.swift > tutorial > array_iteration.dtm file_counter.dtm foreach.swift parameter.swift > types.swift > array_wildcard.dtm first.kml hello.txt q16.txt > arraymapper.dtm first.swift helloworld.dtm range.dtm > bg$ pwd > /home/wilde/swift/rev/1773c/examples/vdsk > bg$ > > -- > > Im using this java: > > bg$ echo $JAVA_HOME > /home/falkon/java > bg$ which java > /home/falkon/java/bin/java > bg$ java -version > java version "1.5.0" > Java(TM) 2 Runtime Environment, Standard Edition (build > pxp64dev-20070511(SR5)) > IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux ppc64-64 > j9vmxp6423-20070426 (JIT enabled) > J9VM - 20070420_12448_BHdSMr > JIT - 20070419_1806_r8 > GC - 200704_19) > JCL - 20070511 > bg$ > > bg$ ls -l $JAVA_HOME > lrwxrwxrwx 1 wilde falkon 32 Mar 26 15:18 /home/falkon/java -> > /home/falkon/ibm-java2-ppc64-50/ > > -- > > - Mike > > > >> >> Mihael >> >> On Sat, 2008-03-29 at 07:22 -0500, Michael Wilde wrote: >>> Im not getting a swift .log file on the BGP after rebuilding >>> swift to 1771 with provider-deef. >>> >>> Im backtracking and debugging now, but does anyone know of recent >>> changes in this area, or of what I might have done wrong? >>> >>> (I lost access to many files when my sicortex NFS client host crashed >>> and then went into limited access mode, so I may have mis-configured >>> things) >>> >>> Does anyone see a cause? >>> >>> Thanks, >>> >>> Mike >>> >>> -- Im running swift with this script: >>> >>> WORKFLOW=$1 >>> shift >>> >>> site $* >sites.xml >>> >>> ( echo Swift script $WORKFLOW.swift starting at `date` >>> echo running on sites: $* >>> echo >>> swift \ >>> -sites.file ./sites.xml \ >>> -tc.file ./tc.data \ >>> $WORKFLOW.swift >>> echo >>> echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? >>> ) >swift.out 2>&1 >>> >>> -- etc/log4j.properties >>> >>> bg$ cat log4j.properties >>> # Set root category priority to WARN and its appenders to CONSOLE and >>> FILE. >>> log4j.rootCategory=INFO, CONSOLE, FILE >>> >>> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender >>> log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout >>> log4j.appender.CONSOLE.Threshold=INFO >>> log4j.appender.CONSOLE.layout.ConversionPattern=%m%n >>> >>> log4j.appender.FILE=org.apache.log4j.FileAppender >>> log4j.appender.FILE.File=swift.log >>> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout >>> log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd >>> HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n >>> >>> log4j.logger.swift=DEBUG >>> >>> log4j.logger.org.globus.swift.trace=INFO >>> >>> log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG >>> log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN >>> log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN >>> log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG >>> >>> log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG >>> log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG >>> log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG >>> log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG >>> log4j.logger.org.griphyn.vdl.engine.Karajan=INFO >>> >>> -- ~/.swift/swift.properties >>> >>> sitedir.keep=true >>> wrapperlog.always.transfer=true >>> lazy.errors=true >>> execution.retries=0 >>> >>> #kickstart.always.transfer=true >>> >>> throttle.submit=off >>> throttle.host.submit=off >>> throttle.transfers=20 >>> throttle.file.operations=20 >>> throttle.score.job.factor=1000000 >>> >>> -- run dir after a swift run - no .log file! >>> >>> bg$ cd ~/mars/run01 >>> bg$ ls -lt >>> total 1280 >>> -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out >>> -rw-r--r-- 1 wilde users 48 Mar 28 17:20 >>> amps2-20080328-1720-6jit1g59.0.rlog >>> drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 >>> amps2-20080328-1720-6jit1g59.d >>> -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml >>> -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml >>> -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml >>> -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist >>> -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data >>> -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift >>> bg$ >>> >>> >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From benc at hawaga.org.uk Sat Mar 29 13:24:39 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 29 Mar 2008 18:24:39 +0000 (GMT) Subject: [Swift-user] RSL multi valued key Message-ID: gram2+condor allows specification of condor requirements like this: &(executable=/bin/true)(condorsubmit=(Requirements "Arch == ""WINNT5""")) which (when put in a file 1.rsl) will successfully submit a job to GRAM using this command: globusrun -f ./1.rsl -r fletch.bsd.uchicago.edu which will put the specified requirement on the job in the condor queue. I can't immediately figure out of multivalue RSL sequences (or whatever they are called) can be specified in Swift (i.e. how to specify the above condorsubmit attribute). I think not, but I haven't really played round with it. -- From hategan at mcs.anl.gov Sat Mar 29 18:19:41 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 29 Mar 2008 18:19:41 -0500 Subject: [Swift-user] RSL multi valued key In-Reply-To: References: Message-ID: <1206832781.31366.3.camel@blabla.mcs.anl.gov> On Sat, 2008-03-29 at 18:24 +0000, Ben Clifford wrote: > gram2+condor allows specification of condor requirements like this: > > &(executable=/bin/true)(condorsubmit=(Requirements "Arch == ""WINNT5""")) > > which (when put in a file 1.rsl) will successfully submit a job to GRAM > using this command: > > globusrun -f ./1.rsl -r fletch.bsd.uchicago.edu > > which will put the specified requirement on the job in the condor queue. > > I can't immediately figure out of multivalue RSL sequences (or whatever > they are called) can be specified in Swift (i.e. how to specify the above > condorsubmit attribute). I think not, but I haven't really played round > with it. For the gt2 provider all GLOBUS:: attributes will make it into an rsl argument named . Coincidentally (or not), skenny mentioned a similar thing yesterday. I'm not yet sure if it worked for her or not. > From iraicu at cs.uchicago.edu Sat Mar 29 18:30:45 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sat, 29 Mar 2008 18:30:45 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EE6B2C.6030507@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE6B2C.6030507@mcs.anl.gov> Message-ID: <47EED125.8080909@cs.uchicago.edu> Hi, There is nothing Java 1.5 specific in Falkon, and have run it many times with 1.4. If Java 1.4 runs OK on the BG/P for Swift, Falkon should also be OK. Ioan Michael Wilde wrote: > Rebuilt with the native Java 1.4, the BGP produces the Swift log just > fine. > > Ioan, where did the Java 1.5 in /home/falkon come from, and it it > needed to make Falkon run correctly? > > - Mike > > > On 3/29/08 9:25 AM, Michael Wilde wrote: >> I rebuild a clean swift from r1773 from scratch, without deef, on >> both terminable and the bgp, then tested with a simple "swift >> first.swift" in examples/vdsk. >> >> On terminable I get the log file fine, on the bgp I dont. >> >> >> >> On 3/29/08 8:27 AM, Mihael Hategan wrote: >>> log4j.properties looks fine. >>> >>> What's in amps2-20080328-1720-6jit1g59.d and what does swift.out say? >> >> The *.d dir was empty. That swift.out said: >> >> Swift script amps2.swift starting at Fri Mar 28 17:20:12 CDT 2008 >> running on sites: bgps >> >> Swift svn swift-r1771 cog-r1953 >> >> RunID: 20080328-1720-6jit1g59 >> Progress: >> runam6 started >> error: Notification(int timeout): socket = new >> ServerSocket(recvPort); Address already in use >> error: Notification(int timeout): socket = new >> ServerSocket(recvPort); Address already in use >> 2008-03-28 17:20:27,788 WARN submitQueue.NonBlockingSubmit >> [pool-1-thread-1,notifyPreviousQueue:71] Warning: Task handler throws >> exception and also sets status >> runam6 failed >> Final status: Failed:1 >> The following errors have occurred: >> 1. Application "runam6" failed (Task failed) >> Arguments: "000000, 0.200000, 0.000391, 0.204419, 0.200000, >> 0.000391, 0.204419, 10" >> Host: bgps >> Directory: amps2-20080328-1720-6jit1g59/jobs/a/runam6-a9750gqi >> STDERR: >> STDOUT: >> >> Swift Script amps2.swift ended at Fri Mar 28 17:20:28 CDT 2008 with >> exit code 0 >> >> -- >> it was that error that I was going after, and couldn't get the log file. >> >> -- >> For the test I did on a fresh clean build of 1773 on the bgp, swift >> stdout/err gave: >> >> bg$ $s/bin/swift first.swift >> Swift svn swift-r1771 cog-r1953 >> >> RunID: 20080329-0913-xdviok1a >> Progress: >> echo started >> echo completed >> Final status: Finished successfully:1 >> bg$ ls >> anonymous.swift default.dtm first.xml >> helloworld_named.dtm regexp.swift >> array_index.dtm diamond.dtm fixedarray.swift >> manyparam.swift tutorial >> array_iteration.dtm file_counter.dtm foreach.swift >> parameter.swift types.swift >> array_wildcard.dtm first.kml hello.txt q16.txt >> arraymapper.dtm first.swift helloworld.dtm range.dtm >> bg$ pwd >> /home/wilde/swift/rev/1773c/examples/vdsk >> bg$ >> >> -- >> >> Im using this java: >> >> bg$ echo $JAVA_HOME >> /home/falkon/java >> bg$ which java >> /home/falkon/java/bin/java >> bg$ java -version >> java version "1.5.0" >> Java(TM) 2 Runtime Environment, Standard Edition (build >> pxp64dev-20070511(SR5)) >> IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 Linux ppc64-64 >> j9vmxp6423-20070426 (JIT enabled) >> J9VM - 20070420_12448_BHdSMr >> JIT - 20070419_1806_r8 >> GC - 200704_19) >> JCL - 20070511 >> bg$ >> >> bg$ ls -l $JAVA_HOME >> lrwxrwxrwx 1 wilde falkon 32 Mar 26 15:18 /home/falkon/java -> >> /home/falkon/ibm-java2-ppc64-50/ >> >> -- >> >> - Mike >> >> >> >>> >>> Mihael >>> >>> On Sat, 2008-03-29 at 07:22 -0500, Michael Wilde wrote: >>>> Im not getting a swift .log file on the BGP after rebuilding >>>> swift to 1771 with provider-deef. >>>> >>>> Im backtracking and debugging now, but does anyone know of recent >>>> changes in this area, or of what I might have done wrong? >>>> >>>> (I lost access to many files when my sicortex NFS client host >>>> crashed and then went into limited access mode, so I may have >>>> mis-configured things) >>>> >>>> Does anyone see a cause? >>>> >>>> Thanks, >>>> >>>> Mike >>>> >>>> -- Im running swift with this script: >>>> >>>> WORKFLOW=$1 >>>> shift >>>> >>>> site $* >sites.xml >>>> >>>> ( echo Swift script $WORKFLOW.swift starting at `date` >>>> echo running on sites: $* >>>> echo >>>> swift \ >>>> -sites.file ./sites.xml \ >>>> -tc.file ./tc.data \ >>>> $WORKFLOW.swift >>>> echo >>>> echo Swift Script $WORKFLOW.swift ended at `date` with exit code $? >>>> ) >swift.out 2>&1 >>>> >>>> -- etc/log4j.properties >>>> >>>> bg$ cat log4j.properties >>>> # Set root category priority to WARN and its appenders to CONSOLE >>>> and FILE. >>>> log4j.rootCategory=INFO, CONSOLE, FILE >>>> >>>> log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender >>>> log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout >>>> log4j.appender.CONSOLE.Threshold=INFO >>>> log4j.appender.CONSOLE.layout.ConversionPattern=%m%n >>>> >>>> log4j.appender.FILE=org.apache.log4j.FileAppender >>>> log4j.appender.FILE.File=swift.log >>>> log4j.appender.FILE.layout=org.apache.log4j.PatternLayout >>>> log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd >>>> HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n >>>> >>>> log4j.logger.swift=DEBUG >>>> >>>> log4j.logger.org.globus.swift.trace=INFO >>>> >>>> log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG >>>> log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN >>>> log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN >>>> log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG >>>> >>>> log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG >>>> log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG >>>> log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG >>>> >>>> log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG >>>> log4j.logger.org.griphyn.vdl.engine.Karajan=INFO >>>> >>>> -- ~/.swift/swift.properties >>>> >>>> sitedir.keep=true >>>> wrapperlog.always.transfer=true >>>> lazy.errors=true >>>> execution.retries=0 >>>> >>>> #kickstart.always.transfer=true >>>> >>>> throttle.submit=off >>>> throttle.host.submit=off >>>> throttle.transfers=20 >>>> throttle.file.operations=20 >>>> throttle.score.job.factor=1000000 >>>> >>>> -- run dir after a swift run - no .log file! >>>> >>>> bg$ cd ~/mars/run01 >>>> bg$ ls -lt >>>> total 1280 >>>> -rw-r--r-- 1 wilde users 908 Mar 28 17:20 swift.out >>>> -rw-r--r-- 1 wilde users 48 Mar 28 17:20 >>>> amps2-20080328-1720-6jit1g59.0.rlog >>>> drwxr-xr-x 2 wilde users 131072 Mar 28 17:20 >>>> amps2-20080328-1720-6jit1g59.d >>>> -rw-r--r-- 1 wilde users 7859 Mar 28 17:20 amps2.kml >>>> -rw-r--r-- 1 wilde users 4397 Mar 28 17:20 amps2.xml >>>> -rw-r--r-- 1 wilde users 422 Mar 28 17:20 sites.xml >>>> -rw-r--r-- 1 wilde users 103 Mar 28 16:02 paramlist >>>> -rw-r--r-- 1 wilde users 1432 Mar 28 14:48 tc.data >>>> -rw-r--r-- 1 wilde users 654 Mar 28 14:36 amps2.swift >>>> bg$ >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>>> >>> >>> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Sat Mar 29 19:12:01 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 30 Mar 2008 00:12:01 +0000 (GMT) Subject: [Swift-user] RSL multi valued key In-Reply-To: <1206832781.31366.3.camel@blabla.mcs.anl.gov> References: <1206832781.31366.3.camel@blabla.mcs.anl.gov> Message-ID: On Sat, 29 Mar 2008, Mihael Hategan wrote: > For the gt2 provider all GLOBUS:: attributes will make it into an > rsl argument named . I know that. It gives a single string value. eg. something like this: (condorsubmit= "(Requirements "Arch == ""WINNT5""")" What I'm after here is a multiple valued RSL key, something like the way the 'environment' RSL parameter is used: (condorsubmit=(Requirements "Arch == ""WINNT5""") > Coincidentally (or not), skenny mentioned a similar thing yesterday. I'm > not yet sure if it worked for her or not. not coincidental. this is the same. I can do it from a GRAM rsl file (what I put in the original post) but I can't get that RSL generated from within Swift - I can only get a single string literal RSL value which doesn't work. -- From quanpt at cs.uchicago.edu Sat Mar 29 19:57:35 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Sat, 29 Mar 2008 19:57:35 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EE54A0.1080906@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> Message-ID: <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> Hi, I am in the same situation, the log is not written out with my swift script that run fine before. The script was fine with Swift v0.3-dev r1684 (it did produce the log). My new swift + falkon: Swift svn swift-r1773 cog-r1953 (compiled follow http://dev.globus.org/wiki/Incubator/Falkon/Swift) I am running swift on Teraport tp-login2.ci JRE: Standard Edition (build 1.5.0_06-b05) Should I use some explicit flags for swift so that it will build the log? Thank you very much Quan Pham On Sat, Mar 29, 2008 at 9:39 AM, Michael Wilde wrote: > sorry, i didnt say clearly whats going on here: > > with a clean 1773 build, running first.swift (hello world), on > terminable I get the log file fine, and on the bgp i get no log file. > > I will test on the bgp with the alternative Java 1.4 release in /usr/bin > when I get a chance. > > - mike > From iraicu at cs.uchicago.edu Sat Mar 29 19:58:37 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sat, 29 Mar 2008 19:58:37 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> Message-ID: <47EEE5BD.50305@cs.uchicago.edu> Hi, The fix for Mike on the BG/P was to switch from Java 1.5 to Java 1.4. Not sure if this will also fix your problem, but its worth a try. Falkon will compile just fine with 1.4 (don't forget to do falkon-clean.sh first), and Swift should also be OK with 1.4. Ioan Quan Tran Pham wrote: > Hi, > > I am in the same situation, the log is not written out with my swift > script that run fine before. > The script was fine with Swift v0.3-dev r1684 (it did produce the log). > > My new swift + falkon: Swift svn swift-r1773 cog-r1953 (compiled > follow http://dev.globus.org/wiki/Incubator/Falkon/Swift) > I am running swift on Teraport tp-login2.ci > JRE: Standard Edition (build 1.5.0_06-b05) > > Should I use some explicit flags for swift so that it will build the log? > > Thank you very much > > Quan Pham > > On Sat, Mar 29, 2008 at 9:39 AM, Michael Wilde wrote: > >> sorry, i didnt say clearly whats going on here: >> >> with a clean 1773 build, running first.swift (hello world), on >> terminable I get the log file fine, and on the bgp i get no log file. >> >> I will test on the bgp with the alternative Java 1.4 release in /usr/bin >> when I get a chance. >> >> - mike >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From benc at hawaga.org.uk Sat Mar 29 22:56:34 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 30 Mar 2008 03:56:34 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> Message-ID: On Sat, 29 Mar 2008, Quan Tran Pham wrote: > My new swift + falkon: Swift svn swift-r1773 cog-r1953 (compiled > follow http://dev.globus.org/wiki/Incubator/Falkon/Swift) I don't see how those instructions build provider-deef. -- From iraicu at cs.uchicago.edu Sat Mar 29 23:03:13 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sat, 29 Mar 2008 23:03:13 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> Message-ID: <47EF1101.80208@cs.uchicago.edu> One of the steps did: > falkon-checkout-swift.sh This script can be found at: https://svn.globus.org/repos/falkon/bin/falkon-build-swift.sh which contains a line > ant -Dwith-provider-deef redist Is that what you were looking for? Ioan Ben Clifford wrote: > On Sat, 29 Mar 2008, Quan Tran Pham wrote: > > >> My new swift + falkon: Swift svn swift-r1773 cog-r1953 (compiled >> follow http://dev.globus.org/wiki/Incubator/Falkon/Swift) >> > > I don't see how those instructions build provider-deef. > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sat Mar 29 23:26:17 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 30 Mar 2008 04:26:17 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <47EF1101.80208@cs.uchicago.edu> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EF1101.80208@cs.uchicago.edu> Message-ID: On Sat, 29 Mar 2008, Ioan Raicu wrote: > which contains a line > > ant -Dwith-provider-deef redist > Is that what you were looking for? It is - my SVN checkout was outdated... -- From benc at hawaga.org.uk Sat Mar 29 23:29:59 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 30 Mar 2008 04:29:59 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> Message-ID: I just tried to look at your cog checkout in ~quanpt/cog but you have permissions set to not allow that so I can't look. $ ls -ld ~quanpt/cog drwx------ 12 quanpt ci-users 4096 Mar 29 22:51 /home/quanpt/cog On Sat, 29 Mar 2008, Quan Tran Pham wrote: > Hi, > > I am in the same situation, the log is not written out with my swift > script that run fine before. > The script was fine with Swift v0.3-dev r1684 (it did produce the log). > > My new swift + falkon: Swift svn swift-r1773 cog-r1953 (compiled > follow http://dev.globus.org/wiki/Incubator/Falkon/Swift) > I am running swift on Teraport tp-login2.ci > JRE: Standard Edition (build 1.5.0_06-b05) > > Should I use some explicit flags for swift so that it will build the log? > > Thank you very much > > Quan Pham > > On Sat, Mar 29, 2008 at 9:39 AM, Michael Wilde wrote: > > sorry, i didnt say clearly whats going on here: > > > > with a clean 1773 build, running first.swift (hello world), on > > terminable I get the log file fine, and on the bgp i get no log file. > > > > I will test on the bgp with the alternative Java 1.4 release in /usr/bin > > when I get a chance. > > > > - mike > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From quanpt at cs.uchicago.edu Sun Mar 30 00:17:24 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Sun, 30 Mar 2008 00:17:24 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EEE5BD.50305@cs.uchicago.edu> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> Message-ID: <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> Hi, I tried to playing with different Java version for a while, but still cannot make Swift output the log on tp-login2 (Java 1.5 and Java 1.4). Should I rebuild ant as well? Everything else looks ok with Swift over Falkon. Thanks Quan On Sat, Mar 29, 2008 at 7:58 PM, Ioan Raicu wrote: > Hi, > The fix for Mike on the BG/P was to switch from Java 1.5 to Java 1.4. > Not sure if this will also fix your problem, but its worth a try. > Falkon will compile just fine with 1.4 (don't forget to do > falkon-clean.sh first), and Swift should also be OK with 1.4. > > Ioan > From hategan at mcs.anl.gov Sun Mar 30 00:32:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 30 Mar 2008 00:32:21 -0500 Subject: [Swift-user] RSL multi valued key In-Reply-To: References: <1206832781.31366.3.camel@blabla.mcs.anl.gov> Message-ID: <1206855141.31605.1.camel@blabla.mcs.anl.gov> On Sun, 2008-03-30 at 00:12 +0000, Ben Clifford wrote: > On Sat, 29 Mar 2008, Mihael Hategan wrote: > > > For the gt2 provider all GLOBUS:: attributes will make it into an > > rsl argument named . > > I know that. It gives a single string value. > > eg. something like this: > (condorsubmit= "(Requirements "Arch == ""WINNT5""")" > > What I'm after here is a multiple valued RSL key, something like the way > the 'environment' RSL parameter is used: > > (condorsubmit=(Requirements "Arch == ""WINNT5""") I see. I don't think there is a way to do it nicely at this time. > > > > Coincidentally (or not), skenny mentioned a similar thing yesterday. I'm > > not yet sure if it worked for her or not. > > not coincidental. this is the same. I can do it from a GRAM rsl file (what > I put in the original post) but I can't get that RSL generated from within > Swift - I can only get a single string literal RSL value which doesn't > work. > From benc at hawaga.org.uk Sun Mar 30 05:51:53 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 30 Mar 2008 10:51:53 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> Message-ID: what does 'which swift' output for you on that machine? On Sun, 30 Mar 2008, Quan Tran Pham wrote: > Hi, > > I tried to playing with different Java version for a while, but still > cannot make Swift output the log on tp-login2 (Java 1.5 and Java 1.4). > Should I rebuild ant as well? Everything else looks ok with Swift over > Falkon. > > Thanks > > Quan > > On Sat, Mar 29, 2008 at 7:58 PM, Ioan Raicu wrote: > > Hi, > > The fix for Mike on the BG/P was to switch from Java 1.5 to Java 1.4. > > Not sure if this will also fix your problem, but its worth a try. > > Falkon will compile just fine with 1.4 (don't forget to do > > falkon-clean.sh first), and Swift should also be OK with 1.4. > > > > Ioan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From wilde at mcs.anl.gov Sun Mar 30 07:48:24 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 30 Mar 2008 07:48:24 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> Message-ID: <47EF8C18.7060301@mcs.anl.gov> Its possible that using Java 1.4 is not the solution after all. The tests I ran using Java 1.4 were done using "ant dist" rather than the provider-deef. Now I've rebuilt swift and falkon, with Java 1.4, and I once again have the problem of no swift log being produced. Its possible that the problem is in fact that building swift with provider-deef is causing the no-log problem. I'll explore further. - Mike On 3/30/08 5:51 AM, Ben Clifford wrote: > what does 'which swift' output for you on that machine? > > On Sun, 30 Mar 2008, Quan Tran Pham wrote: > >> Hi, >> >> I tried to playing with different Java version for a while, but still >> cannot make Swift output the log on tp-login2 (Java 1.5 and Java 1.4). >> Should I rebuild ant as well? Everything else looks ok with Swift over >> Falkon. >> >> Thanks >> >> Quan >> >> On Sat, Mar 29, 2008 at 7:58 PM, Ioan Raicu wrote: >>> Hi, >>> The fix for Mike on the BG/P was to switch from Java 1.5 to Java 1.4. >>> Not sure if this will also fix your problem, but its worth a try. >>> Falkon will compile just fine with 1.4 (don't forget to do >>> falkon-clean.sh first), and Swift should also be OK with 1.4. >>> >>> Ioan >>> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From wilde at mcs.anl.gov Sun Mar 30 07:50:40 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 30 Mar 2008 07:50:40 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <47EF8C18.7060301@mcs.anl.gov> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <47EF8C18.7060301@mcs.anl.gov> Message-ID: <47EF8CA0.8040007@mcs.anl.gov> Please ignore the last message (below). I discovered another mistake of mine and need to correct things before I can say whats happening. - Mike On 3/30/08 7:48 AM, Michael Wilde wrote: > Its possible that using Java 1.4 is not the solution after all. > The tests I ran using Java 1.4 were done using "ant dist" rather than > the provider-deef. > > Now I've rebuilt swift and falkon, with Java 1.4, and I once again have > the problem of no swift log being produced. > > Its possible that the problem is in fact that building swift with > provider-deef is causing the no-log problem. > > I'll explore further. > > - Mike > > On 3/30/08 5:51 AM, Ben Clifford wrote: >> what does 'which swift' output for you on that machine? >> >> On Sun, 30 Mar 2008, Quan Tran Pham wrote: >> >>> Hi, >>> >>> I tried to playing with different Java version for a while, but still >>> cannot make Swift output the log on tp-login2 (Java 1.5 and Java 1.4). >>> Should I rebuild ant as well? Everything else looks ok with Swift over >>> Falkon. >>> >>> Thanks >>> >>> Quan >>> >>> On Sat, Mar 29, 2008 at 7:58 PM, Ioan Raicu >>> wrote: >>>> Hi, >>>> The fix for Mike on the BG/P was to switch from Java 1.5 to Java 1.4. >>>> Not sure if this will also fix your problem, but its worth a try. >>>> Falkon will compile just fine with 1.4 (don't forget to do >>>> falkon-clean.sh first), and Swift should also be OK with 1.4. >>>> >>>> Ioan >>>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> >>> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > From quanpt at cs.uchicago.edu Sun Mar 30 10:47:07 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Sun, 30 Mar 2008 10:47:07 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> Message-ID: <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> Hi Ben, $ which swift ~/bin/vdsk/bin/swift $ ls -ld ~/bin/vdsk lrwxrwxrwx 1 quanpt ci-users 44 Mar 29 11:18 /home/quanpt/bin/vdsk -> /home/quanpt/cog/modules/vdsk/dist/vdsk-svn/ Also, I see the version of Swift and Cog when trying to run swift, hence I think I use the correct version (the incorrect version on my account should not work with Falkon) Thanks Quan On Sun, Mar 30, 2008 at 5:51 AM, Ben Clifford wrote: > > what does 'which swift' output for you on that machine? > From iraicu at cs.uchicago.edu Sun Mar 30 19:02:08 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 30 Mar 2008 19:02:08 -0500 Subject: [Swift-user] 1) disable retry mechanism and 2) continue on failure? Message-ID: <47F02A00.6090203@cs.uchicago.edu> Hi all, We have a workflow (a simple for loop) that is failing due to > runam6 failed > Execution failed: > Exception in runam6: > Arguments: [000000, 0.200000, 0.000391, 0.204419, 0.200000, 0.000391, > 0.204419, 10] > Host: bgp > Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi > stderr.txt: mkdir: cannot create directory `am.000000': File exists > mkdir: cannot create directory `source': File exists > mkdir: cannot create directory `scendat': File exists > mkdir: cannot create directory `scendat/bgtestcases': File exists > mkdir: cannot create directory `out_dir': File exists > mkdir: cannot create directory `rftxtdat': File exists Here is the command we invoke as seen by Falkon: > execute: /bin/bash shared/wrapper.sh runam6-xvkudjqi -jobdir x -e > /tmp/runam6 -out stdout.txt -err stderr.txt -i -d -if -of > amdi.000000 -k -a 000000 0.200000 0.000391 0.204419 0.200000 0.000391 > 0.204419 10 > 1206921034.45020: executing taskID urn:0-1-1-1206920989652 /bin/bash > shared/wrapper.sh runam6-xvkudjqi -jobdir x -e /tmp/runam6 -out > stdout.txt -err stderr.txt -i -d -if -of amdi.000000 -k -a 000000 > 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 ... completed > with exit code 0 in 40585.011719 ms! > sendResults: urn:0-1-1-1206920989652#0 Things look fine when we look at the output manually, but somehow Swift doesn't think so. BTW, the application execution time is about 40 seconds, and we are running things on a single CPU at the moment, so I don't think its a concurrency issue. The same problem appears, even if we run both Swift, Falkon, and the Falkon worker on the same node. Either way, we need to get some results tonight for a talk tomorrow, and we don't have the time to fix the real problem, whatever it may be. So, here are the 2 questions I have. 1) How do we disable the retry mechanism, to make sure that Swift won't retry failed jobs? 2) How do we configure Swift to continue sending all tasks it is able to (in our case, it should be all tasks, as we only have 1 for loop, with no data dependencies between iterations), although all tasks will eventually fail? The motivation for these questions is so we can do a large run via Swift and let the Falkon exit codes guide us to whether tasks failed or were successful. Our hopes are that things are actually executing fine at the application (we'll do some sanity checks to make sure), and that somehow Swift is reporting errors due to some reason we don't understand. If the application indeed runs successfully, we could produce some graphs of the run from the Falkon logs, and get those results for the talk tomorrow. We'll then have to figure out exactly why the error is happening, and how to fix that, but that seems out of the scope of our work for the next 12 hours. One last thing. The Swift and Falkon installs we have from SVN (updated today) passed sanity checks... we could run sleep jobs just fine. But this app unfortunately doesn't. Thanks, Ioan -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== From quanpt at cs.uchicago.edu Sun Mar 30 19:23:34 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Sun, 30 Mar 2008 19:23:34 -0500 Subject: [Swift-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F02A00.6090203@cs.uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> Message-ID: <4290b6c60803301723h414d3a73sce77da85304a8c31@mail.gmail.com> Hi, > 1) How do we disable the retry mechanism, to make sure that Swift won't > retry failed jobs? I think you want to modify "execution.retries" in swift.properties. However, please remember that once a job fails (cannot retry anymore), he whole workflow stops. I think you need to change the Swift code to acquire "continue on failure". Or you might want to "cheat" Swift by adding some trigger jobs in front of your workflow? Then Swift will send many jobs at once after those trigger I believe. Regards Quan From benc at hawaga.org.uk Sun Mar 30 20:35:59 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 01:35:59 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F02A00.6090203@cs.uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> Message-ID: On Sun, 30 Mar 2008, Ioan Raicu wrote: > > runam6 failed > > Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi > > stderr.txt: mkdir: cannot create directory `am.000000': File exists I think when I've seen that error before, its not been swift-level retries that have been hapepning - when Swift retries a job, it gets a different identifier ('yvkudjqi' in the above). If a job gets partly executed and then retried by the underlying execution mechanism below swift (eg. any part of cog downwards) then the above will happen. Does falkon ever try to retry a job that its been given if it thinks something went wrong? If so, that might cause a problem here - what needs to hapepn is that the failure gets reported all the way back to swift for swift to do a retry. Another cause might be duplicate job IDs generated within swift (the 'yvkudjqi' string again) but that would be very unusual (as in, I've never seen that happen) > 1) How do we disable the retry mechanism, to make sure that Swift won't retry > failed jobs? What Quan said - set execution.retries=0 in swift.properties > 2) How do we configure Swift to continue sending all tasks it is able to (in > our case, it should be all tasks, as we only have 1 for loop, with no data > dependencies between iterations), although all tasks will eventually fail? throttle.score.job.factor=off I think will do what you want. -- From benc at hawaga.org.uk Sun Mar 30 20:46:28 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 01:46:28 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F02A00.6090203@cs.uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> Message-ID: On Sun, 30 Mar 2008, Ioan Raicu wrote: hmm. this: > > Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi doesn't match up with this: > Here is the command we invoke as seen by Falkon: > > execute: /bin/bash shared/wrapper.sh runam6-xvkudjqi -jobdir x -e as in, swift is expecting to run job id yvkudjqi and you are claiming that falkon is running the wrapper script associated with job id xvkudjqi. That seems wrong - did you cut and paste the wrong log line or is there some mismatch going on in the code? Can you check? Swift should make at most one submission for any job id (that looks like yvkudjqi) so if you see Falkon trying to run code for any one job ID more than once something is awry. Also, I remembered another time i've seen errors like that this mkdir/already exists problem was to do with broken file mappings, but it looks like you are inputting only numerical values and outputting a single file, so I think that won't be a problem ehre. > > /tmp/runam6 -out stdout.txt -err stderr.txt -i -d -if -of amdi.000000 -k > > -a 000000 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 > > 1206921034.45020: executing taskID urn:0-1-1-1206920989652 /bin/bash > > shared/wrapper.sh runam6-xvkudjqi -jobdir x -e /tmp/runam6 -out stdout.txt > > -err stderr.txt -i -d -if -of amdi.000000 -k -a 000000 0.200000 0.000391 > > 0.204419 0.200000 0.000391 0.204419 10 ... completed with exit code 0 in > > 40585.011719 ms! > > sendResults: urn:0-1-1-1206920989652#0 > Things look fine when we look at the output manually, but somehow Swift > doesn't think so. BTW, the application execution time is about 40 seconds, > and we are running things on a single CPU at the moment, so I don't think its > a concurrency issue. The same problem appears, even if we run both Swift, > Falkon, and the Falkon worker on the same node. Either way, we need to get > some results tonight for a talk tomorrow, and we don't have the time to fix > the real problem, whatever it may be. > So, here are the 2 questions I have. > 1) How do we disable the retry mechanism, to make sure that Swift won't retry > failed jobs? > 2) How do we configure Swift to continue sending all tasks it is able to (in > our case, it should be all tasks, as we only have 1 for loop, with no data > dependencies between iterations), although all tasks will eventually fail? > > The motivation for these questions is so we can do a large run via Swift and > let the Falkon exit codes guide us to whether tasks failed or were successful. > Our hopes are that things are actually executing fine at the application > (we'll do some sanity checks to make sure), and that somehow Swift is > reporting errors due to some reason we don't understand. If the application > indeed runs successfully, we could produce some graphs of the run from the > Falkon logs, and get those results for the talk tomorrow. We'll then have to > figure out exactly why the error is happening, and how to fix that, but that > seems out of the scope of our work for the next 12 hours. > > One last thing. The Swift and Falkon installs we have from SVN (updated > today) passed sanity checks... we could run sleep jobs just fine. But this > app unfortunately doesn't. > Thanks, > Ioan > > From zhaozhang at uchicago.edu Sun Mar 30 21:06:24 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Sun, 30 Mar 2008 21:06:24 -0500 Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: References: <47F02A00.6090203@cs.uchicago.edu> Message-ID: <47F04720.1010207@uchicago.edu> Thanks, Ben Ben Clifford wrote: > On Sun, 30 Mar 2008, Ioan Raicu wrote: > > >>> runam6 failed >>> > > >>> Directory: amps2-20080330-1849-hnpls37c/jobs/y/runam6-yvkudjqi >>> stderr.txt: mkdir: cannot create directory `am.000000': File exists >>> > > I think when I've seen that error before, its not been swift-level retries > that have been hapepning - when Swift retries a job, it gets a different > identifier ('yvkudjqi' in the above). If a job gets partly executed and > then retried by the underlying execution mechanism below swift (eg. any > part of cog downwards) then the above will happen. > > Does falkon ever try to retry a job that its been given if it thinks > something went wrong? If so, that might cause a problem here - what needs > to hapepn is that the failure gets reported all the way back to swift for > swift to do a retry. > nope, falkon doesn't do any retry for now. > Another cause might be duplicate job IDs generated within swift (the > 'yvkudjqi' string again) but that would be very unusual (as in, I've never > seen that happen) > > >> 1) How do we disable the retry mechanism, to make sure that Swift won't retry >> failed jobs? >> > > What Quan said - set execution.retries=0 in swift.properties > > >> 2) How do we configure Swift to continue sending all tasks it is able to (in >> our case, it should be all tasks, as we only have 1 for loop, with no data >> dependencies between iterations), although all tasks will eventually fail? >> > > throttle.score.job.factor=off > > I think will do what you want. > ok, I will try this. best wishes zhangzhao -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sun Mar 30 21:06:48 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 02:06:48 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F02A00.6090203@cs.uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> Message-ID: On Sun, 30 Mar 2008, Ioan Raicu wrote: > 2) How do we configure Swift to continue sending all tasks it is able to (in > our case, it should be all tasks, as we only have 1 for loop, with no data > dependencies between iterations), although all tasks will eventually fail? I misunderstood this as a rate-limiting question. I think you want lazy.errors=true -- From zhaozhang at uchicago.edu Sun Mar 30 21:36:40 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Sun, 30 Mar 2008 21:36:40 -0500 Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: References: <47F02A00.6090203@cs.uchicago.edu> Message-ID: <47F04E38.60207@uchicago.edu> Hi, Ben Further on the run with error, here is the log. I am also attaching the swift script written by Mike. Thanks so much. best wishes zhangzhao Progress 2008-03-30 21:30:33.116711000-0500 LOG_START _____________________________________________________________________________ Wrapper _____________________________________________________________________________ DIR=jobs/3/runam6-32gakjqi EXEC=/tmp/runam6 STDIN= STDOUT=stdout.txt STDERR=stderr.txt DIRS= INF= OUTF=amdi.000000 KICKSTART= ARGS=000000 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 Progress 2008-03-30 21:30:33.134726000-0500 CREATE_JOBDIR Created job directory: jobs/3/runam6-32gakjqi Progress 2008-03-30 21:30:33.151468000-0500 CREATE_INPUTDIR Progress 2008-03-30 21:30:33.159241000-0500 LINK_INPUTS Progress 2008-03-30 21:30:33.167019000-0500 EXECUTE Progress 2008-03-30 21:31:10.505777000-0500 EXECUTE_DONE Job ran successfully The following output files were not created by the application: amdi.000000 Ben Clifford wrote: > On Sun, 30 Mar 2008, Ioan Raicu wrote: > > >> 2) How do we configure Swift to continue sending all tasks it is able to (in >> our case, it should be all tasks, as we only have 1 for loop, with no data >> dependencies between iterations), although all tasks will eventually fail? >> > > I misunderstood this as a rate-limiting question. I think you want > lazy.errors=true > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: amps2.swift URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: paramlist URL: From benc at hawaga.org.uk Sun Mar 30 21:48:55 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 02:48:55 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F04E38.60207@uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> Message-ID: > > Further on the run with error, here is the log. I am also attaching the swift > script written by Mike. Thanks so much. > > DIR=jobs/3/runam6-32gakjqi This job identifier doesn't match up with either of the two job identifiers in the earlier message. Useful logs would be: the swift .log files for a run, and the corresponding falkon logs for that run. also, a list of all the commandlines that falkon tried to execute - not sure if those are in the above falkon logs or not. but that would be useful. -- From benc at hawaga.org.uk Sun Mar 30 21:58:41 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 02:58:41 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F04E38.60207@uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> Message-ID: also, what is in: /tmp/runam6 -- From benc at hawaga.org.uk Sun Mar 30 22:01:40 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 03:01:40 +0000 (GMT) Subject: [Swift-user] RSL multi valued key In-Reply-To: <1206855141.31605.1.camel@blabla.mcs.anl.gov> References: <1206832781.31366.3.camel@blabla.mcs.anl.gov> <1206855141.31605.1.camel@blabla.mcs.anl.gov> Message-ID: I made this hack which allows me to specify condor requirements through GRAM2 as a swift profile key. Its not particularly nice but it should allow progress to be made on skenny's project: http://www.ci.uchicago.edu/~benc/tmp/condor-req-2 apply it like this: cd cog/modules/provider-gt2/src/org/globus/cog/abstraction/impl/execution patch -p0 < condor-req-2 and rebuild with 'ant redist' in the swift directory. Then use a profile key like this to specify requirements: Arch = "WINNT5" perhaps something like the following will match the two architectures in use on fletch: Arch == "X86_64" || Arch="INTEL" -- From benc at hawaga.org.uk Sun Mar 30 23:41:25 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 04:41:25 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F04E38.60207@uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> Message-ID: Those mkdir errors come from here in the runam6 application script. > > > stderr.txt: mkdir: cannot create directory `am.000000': File exists The application script is trying to make its own working directory on local filesystem, based on an identifier passed in as the first commandline parameter. (The identifier in the above is 000000). The identifier is read form column one of the input data file: id xmin xinc xmax ymin yinc ymax delay 000000 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 000000 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 in which you should see two jobs with the same identifier. This is probably an invalid parameter file because of that. Try fixing that and see what happens. This temporary directory handling is pretty ugly - it should be a couple lines change to wrapper.sh to get similar functionality using the existing swift temporary direcotry handling - change the path to /tmp and use cp instead of ln -s. That way you can take advantage of Swift's existing unique job IDs and error handling too. -- From zhaozhang at uchicago.edu Mon Mar 31 00:24:26 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 31 Mar 2008 00:24:26 -0500 Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> Message-ID: <47F0758A.6030809@uchicago.edu> Thanks Ben After I changed the definition of runam() function in the swift script, it seem everything is working now. OLD: (amout ofile ) runam (string id, string xmin, string xinc, string xmax, string ymin, string yinc, string ymax, string delay) NEW: runam (string id, string xmin, string xinc, string xmax, string ymin, string yinc, string ymax, string delay) since it is not asking for output file anymore. I could see that all tasks are successful. Another Issue: Do you know where to set the path that swift is trying to copy wrapper.sh from ? Thanks so much. zhao Ben Clifford wrote: > Those mkdir errors come from here in the runam6 application script. > > >>>> stderr.txt: mkdir: cannot create directory `am.000000': File exists >>>> > > The application script is trying to make its own working directory on > local filesystem, based on an identifier passed in as the first > commandline parameter. (The identifier in the above is 000000). > > The identifier is read form column one of the input data file: > > id xmin xinc xmax ymin yinc ymax delay > 000000 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 > 000000 0.200000 0.000391 0.204419 0.200000 0.000391 0.204419 10 > > in which you should see two jobs with the same identifier. This is > probably an invalid parameter file because of that. > > Try fixing that and see what happens. > > This temporary directory handling is pretty ugly - it should be a couple > lines change to wrapper.sh to get similar functionality using the existing > swift temporary direcotry handling - change the path to /tmp and use cp > instead of ln -s. That way you can take advantage of Swift's existing > unique job IDs and error handling too. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Mar 31 00:42:51 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 05:42:51 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F0758A.6030809@uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> <47F0758A.6030809@uchicago.edu> Message-ID: > After I changed the definition of runam() function in the swift script, it > seem everything is working now. I'm not sure that the below actually is working correctly from a swift perspective - it doesn't match up with my understanding of what the problem is/was. > Another Issue: Do you know where to set the path that swift is trying to copy > wrapper.sh from ? It copies it from libexec/ in your swift install directory. What are you trying to do? -- From zhaozhang at uchicago.edu Mon Mar 31 00:46:39 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 31 Mar 2008 00:46:39 -0500 Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> <47F0758A.6030809@uchicago.edu> Message-ID: <47F07ABF.20701@uchicago.edu> Hi, Ben Ben Clifford wrote: >> After I changed the definition of runam() function in the swift script, it >> seem everything is working now. >> > > I'm not sure that the below actually is working correctly from a swift > perspective - it doesn't match up with my understanding of what the > problem is/was. > > It is ok for now. >> Another Issue: Do you know where to set the path that swift is trying to copy >> wrapper.sh from ? >> > > It copies it from libexec/ in your swift install directory. What are you > trying to do? > > Because we are running swift and falkon on BGP, and the read/write from GPFS is not that fast, so we need to do a data caching. Precopy the wrapper.sh to Compute Nodes's shared memory. So I was asking where does swift copy wrapper.sh from. thanks zhao -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Mar 31 00:51:11 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 31 Mar 2008 00:51:11 -0500 Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <47F07ABF.20701@uchicago.edu> References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> <47F0758A.6030809@uchicago.edu> <47F07ABF.20701@uchicago.edu> Message-ID: <47F07BCF.8070003@cs.uchicago.edu> I think what Zhao is trying to say is that its working good enough for tonight, but in the longer term (i.e. next week), we'll have to get to the bottom of why Swift can't find the output file. It might be something to do with the fact that we are using local ram disk for lots of stuff, and our app scripts aren't putting the app output back to the right place on GPFS. We'll keep digging in the next few days, and pose more questions if we can't find a solution. For now, we are OK, thanks for the help! It looks like we should be able to get a run in with Swift on the BG/P tonight after all. Ioan Zhao Zhang wrote: > Hi, Ben > > Ben Clifford wrote: >>> After I changed the definition of runam() function in the swift script, it >>> seem everything is working now. >>> >> >> I'm not sure that the below actually is working correctly from a swift >> perspective - it doesn't match up with my understanding of what the >> problem is/was. >> >> > It is ok for now. >>> Another Issue: Do you know where to set the path that swift is trying to copy >>> wrapper.sh from ? >>> >> >> It copies it from libexec/ in your swift install directory. What are you >> trying to do? >> >> > Because we are running swift and falkon on BGP, and the read/write > from GPFS is not that fast, so we need to do a data caching. Precopy > the wrapper.sh to Compute Nodes's shared memory. So I was asking where > does swift copy wrapper.sh from. > > thanks > > zhao -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Mar 31 02:34:23 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 07:34:23 +0000 (GMT) Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> Message-ID: On Mon, 31 Mar 2008, Ben Clifford wrote: > This temporary directory handling is pretty ugly - it should be a couple > lines change to wrapper.sh to get similar functionality using the existing > swift temporary direcotry handling - change the path to /tmp and use cp > instead of ln -s. That way you can take advantage of Swift's existing > unique job IDs and error handling too. Attached are three patches that will apply against svn r1775: The first puts temporary directories in /tmp rather than on shared fs. http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp The second copies the application file to the worker in each job execution (though doesn't do any worker-node caching of such between jobs) http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable The third creates the worker node log on /tmp and copies it at the end. http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally The three modify all wrapper.sh and should be applied in the above order. With the first two patches, the timestamps in the usual info logs will provide information about how long the copies take, in the same way that they usually indicate times for other execution stages. -- From zhaozhang at uchicago.edu Mon Mar 31 02:37:12 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 31 Mar 2008 02:37:12 -0500 Subject: [Swift-user] Re: [falkon-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: References: <47F02A00.6090203@cs.uchicago.edu> <47F04E38.60207@uchicago.edu> Message-ID: <47F094A8.5020102@uchicago.edu> Thanks, Ben zhao Ben Clifford wrote: > On Mon, 31 Mar 2008, Ben Clifford wrote: > > >> This temporary directory handling is pretty ugly - it should be a couple >> lines change to wrapper.sh to get similar functionality using the existing >> swift temporary direcotry handling - change the path to /tmp and use cp >> instead of ln -s. That way you can take advantage of Swift's existing >> unique job IDs and error handling too. >> > > Attached are three patches that will apply against svn r1775: > > The first puts temporary directories in /tmp rather than on shared fs. > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-on-tmp > > The second copies the application file to the worker in each job execution > (though doesn't do any worker-node caching of such between jobs) > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-dirs-mv-executable > > The third creates the worker node log on /tmp and copies it at the end. > http://www.ci.uchicago.edu/~benc/tmp/wrapper-tmp-log-locally > > The three modify all wrapper.sh and should be applied in the above order. > > With the first two patches, the timestamps in the usual info logs will > provide information about how long the copies take, in the same way that > they usually indicate times for other execution stages. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Mar 31 03:52:40 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 31 Mar 2008 03:52:40 -0500 Subject: [Swift-user] 1) disable retry mechanism and 2) continue on failure? In-Reply-To: <4290b6c60803301723h414d3a73sce77da85304a8c31@mail.gmail.com> References: <47F02A00.6090203@cs.uchicago.edu> <4290b6c60803301723h414d3a73sce77da85304a8c31@mail.gmail.com> Message-ID: <1206953560.10515.0.camel@blabla.mcs.anl.gov> On Sun, 2008-03-30 at 19:23 -0500, Quan Tran Pham wrote: > Hi, > > > 1) How do we disable the retry mechanism, to make sure that Swift won't > > retry failed jobs? > I think you want to modify "execution.retries" in swift.properties. > However, please remember that once a job fails (cannot retry anymore), > he whole workflow stops. Not necessarily. You can enable lazy errors. > > I think you need to change the Swift code to acquire "continue on > failure". Or you might want to "cheat" Swift by adding some trigger > jobs in front of your workflow? Then Swift will send many jobs at once > after those trigger I believe. > > Regards > > Quan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From benc at hawaga.org.uk Mon Mar 31 04:08:00 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 09:08:00 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> Message-ID: On Sun, 30 Mar 2008, Quan Tran Pham wrote: > $ which swift > ~/bin/vdsk/bin/swift I just tried this and it makes log files for me. Please can you do this: mkdir ~/benc1 cd ~/benc1 cp ~benc/cog/modules/vdsk/examples/first.swift . swift first.swift and then paste all the output for that to me, and also make sure I have access to that directory. -- From quanpt at cs.uchicago.edu Mon Mar 31 09:42:27 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 31 Mar 2008 09:42:27 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> Message-ID: <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> The output is: 09:39 at tp-login2 ~/benc1 $ swift first.swift Swift svn swift-r1773 cog-r1953 RunID: 20080331-0939-yhsmzmo9 Progress: echo started echo completed Final status: Finished successfully:1 On Mon, Mar 31, 2008 at 4:08 AM, Ben Clifford wrote: > > On Sun, 30 Mar 2008, Quan Tran Pham wrote: > > > > $ which swift > > ~/bin/vdsk/bin/swift > > I just tried this and it makes log files for me. > > Please can you do this: > > mkdir ~/benc1 > cd ~/benc1 > cp ~benc/cog/modules/vdsk/examples/first.swift . > swift first.swift > > and then paste all the output for that to me, and also make sure I have > access to that directory. > > -- From benc at hawaga.org.uk Mon Mar 31 09:59:14 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 14:59:14 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <1206797277.21645.2.camel@blabla.mcs.anl.gov> <47EE5146.4010803@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> Message-ID: type: env and send me the output -- From quanpt at cs.uchicago.edu Mon Mar 31 10:33:34 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 31 Mar 2008 10:33:34 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: References: <47EE3489.2040607@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> Message-ID: <4290b6c60803310833v5a66c5a3pd3122beada07e0c7@mail.gmail.com> attached, or you can find it in /home/quanpt/env.txt On Mon, Mar 31, 2008 at 9:59 AM, Ben Clifford wrote: > type: env and send me the output > -- > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: env.txt URL: From benc at hawaga.org.uk Mon Mar 31 10:47:38 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 31 Mar 2008 15:47:38 +0000 (GMT) Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803310833v5a66c5a3pd3122beada07e0c7@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <47EE54A0.1080906@mcs.anl.gov> <4290b6c60803291757x28aa95a2naf2464825d6d637e@mail.gmail.com> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> <4290b6c60803310833v5a66c5a3pd3122beada07e0c7@mail.gmail.com> Message-ID: type: unset CLASSPATH and run that same swift command in ~/benc1 again -- From quanpt at cs.uchicago.edu Mon Mar 31 11:49:33 2008 From: quanpt at cs.uchicago.edu (Quan Tran Pham) Date: Mon, 31 Mar 2008 11:49:33 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: References: <47EE3489.2040607@mcs.anl.gov> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> <4290b6c60803310833v5a66c5a3pd3122beada07e0c7@mail.gmail.com> Message-ID: <4290b6c60803310949j6b4472a9o9665435c85fc222b@mail.gmail.com> Oh, it is ok now, the log is produced. What was wrong in the classpath? Was it the jar for Falkon that is incompatible with Swift? On Mon, Mar 31, 2008 at 10:47 AM, Ben Clifford wrote: > > type: > > unset CLASSPATH > > and run that same swift command in ~/benc1 again From hategan at mcs.anl.gov Mon Mar 31 12:10:41 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 31 Mar 2008 12:10:41 -0500 Subject: [Swift-user] debug log not being produced In-Reply-To: <4290b6c60803310949j6b4472a9o9665435c85fc222b@mail.gmail.com> References: <47EE3489.2040607@mcs.anl.gov> <47EEE5BD.50305@cs.uchicago.edu> <4290b6c60803292217p3b55f3a5h95e42d5574de55d@mail.gmail.com> <4290b6c60803300847va5c28c4qadf3774a036188e5@mail.gmail.com> <4290b6c60803310742n678489d7ja09e534974ef6b62@mail.gmail.com> <4290b6c60803310833v5a66c5a3pd3122beada07e0c7@mail.gmail.com> <4290b6c60803310949j6b4472a9o9665435c85fc222b@mail.gmail.com> Message-ID: <1206983441.19312.1.camel@blabla.mcs.anl.gov> On Mon, 2008-03-31 at 11:49 -0500, Quan Tran Pham wrote: > Oh, it is ok now, the log is produced. What was wrong in the > classpath? Was it the jar for Falkon that is incompatible with Swift? Something like that. Java was picking up the wrong log4j.properties file because it was in one of the jar files in CLASSPATH. > > On Mon, Mar 31, 2008 at 10:47 AM, Ben Clifford wrote: > > > > type: > > > > unset CLASSPATH > > > > and run that same swift command in ~/benc1 again > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >