From aespinosa at cs.uchicago.edu Wed Dec 1 14:17:22 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 1 Dec 2010 14:17:22 -0600 Subject: [Swift-user] reducing SetFieldValue logging levels Message-ID: Hi, I set the logging level of SetFieldValue to NONE but still receive its log entries: $ grep SetFieldValue postproc-20101201-1412-58dz3i1h.log | head -n 10 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000048 type int with no value at dataset=num_time_steps (not closed) to 3000 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000049 type string with no value at dataset=spectra_period1 (not closed) to all 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000050 type float with no value at dataset=filter_highhz (not closed) to 5.0 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000053 type string with no value at dataset=datadir (not closed) to gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000051 type float with no value at dataset=simulation_timeskip (not closed) to 0.1 2010-12-01 14:12:26,568-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000052 type int with no value at dataset=run_id (not closed) to 664 2010-12-01 14:12:26,568-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000057 type string with no value at dataset=swift#mapper#17045 (not closed) to gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles 2010-12-01 14:12:26,570-0600 DEBUG SetFieldValue Setting org.griphyn.vdl.mapping.RootDataNode identifier dataset:20101201-1412-y0ba3ap8:720000000056 type string with no value at dataset=swift#mapper#17044 (not closed) to org.griphyn.vdl.mapping.DataNode identifier dataset:20101201-1412-y0ba3ap8:720000000062 type string with no value at dataset=site path=.name (not closed) Do i have some conflicting logging config here? my log4j.properties: log4j.rootCategory=INFO, CONSOLE, FILE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout.ConversionPattern=%m%n log4j.appender.FILE=org.apache.log4j.RollingFileAppender log4j.appender.FILE.MaxFileSize=1GB log4j.appender.FILE.File=swift.log log4j.appender.FILE.layout=org.apache.log4j.PatternLayout log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n log4j.logger.swift=INFO log4j.logger.org.apache.axis.utils=ERROR log4j.logger.org.globus.swift.trace=INFO log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=INFO #log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG log4j.logger.org.griphyn.vdl.engine.Karajan=INFO log4j.logger.org.globus.cog.abstraction.coaster.rlog=DEBUG # log4j.logger.org.globus.swift.data.Director=DEBUG #log4j.logger.swift=DEBUG log4j.logger.org.griphyn.vdl.karajan.lib=NONE log4j.logger.org.griphyn.vdl.karajan.lib.SetFieldValue=NONE log4j.logger.org.griphyn.vdl.mapping.AbstractDataNode=NONE -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Wed Dec 1 20:38:21 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 1 Dec 2010 20:38:21 -0600 Subject: [Swift-user] GC overhead limit exceeded Message-ID: I have my heap set to 4GB does this mean i still need more memory? HEAPMAX=4096M -Allan Progress: Finished in previous run:4002 Uncaught exception: java.lang.OutOfMemoryError: GC overhead limit exceeded in swiftscript:strcat @ postproc.kml, line: 407 java.lang.OutOfMemoryError: GC overhead limit exceeded at edu.emory.mathcs.backport.java.util.concurrent.AbstractExecutorService.newTaskFor(AbstractExecutorService.java:59) at edu.emory.mathcs.backport.java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:78) at org.globus.cog.karajan.workflow.events.EventBus._post(EventBus.java:74) at org.globus.cog.karajan.workflow.events.EventBus.post(EventBus.java:97) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireControlEvent(FlowNode.java:187) at org.globus.cog.karajan.workflow.nodes.FlowNode.startElement(FlowNode.java:467) at org.globus.cog.karajan.workflow.nodes.Sequential.startElement(Sequential.java:84) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:57) at org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:44) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:195) at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:619) Event was NotificationEvent:EXECUTION_COMPLETED Exception is: java.lang.OutOfMemoryError: GC overhead limit exceeded Near Karajan line: swiftscript:strcat @ postproc.kml, line: 407 Execution failed: Uncaught exception: java.lang.OutOfMemoryError: GC overhead limit exceeded Exception in thread "Overloaded Host Monitor" java.lang.OutOfMemoryError: Java heap space at java.util.HashMap.newKeyIterator(HashMap.java:840) at java.util.HashMap$KeySet.iterator(HashMap.java:874) at java.util.HashSet.iterator(HashSet.java:153) at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:50) Progress: Finished in previous run:4002 -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Wed Dec 1 20:54:15 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 01 Dec 2010 18:54:15 -0800 Subject: [Swift-user] GC overhead limit exceeded In-Reply-To: References: Message-ID: <1291258455.32072.1.camel@blabla2.none> Something is very likely wrong if things can't fit in 4G. Usual suspects: -very large workflows. -lots of trace/tracef -Justin reported some problems with coaster I/O on the BGP, so it might be that (update to latest trunk and see if that solves it). Mihael On Wed, 2010-12-01 at 20:38 -0600, Allan Espinosa wrote: > I have my heap set to 4GB does this mean i still need more memory? > > HEAPMAX=4096M > > > -Allan > > Progress: Finished in previous run:4002 > Uncaught exception: java.lang.OutOfMemoryError: GC overhead limit > exceeded in swiftscript:strcat @ postproc.kml, line: 407 > java.lang.OutOfMemoryError: GC overhead limit exceeded > at edu.emory.mathcs.backport.java.util.concurrent.AbstractExecutorService.newTaskFor(AbstractExecutorService.java:59) > at edu.emory.mathcs.backport.java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:78) > at org.globus.cog.karajan.workflow.events.EventBus._post(EventBus.java:74) > at org.globus.cog.karajan.workflow.events.EventBus.post(EventBus.java:97) > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireControlEvent(FlowNode.java:187) > at org.globus.cog.karajan.workflow.nodes.FlowNode.startElement(FlowNode.java:467) > at org.globus.cog.karajan.workflow.nodes.Sequential.startElement(Sequential.java:84) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:57) > at org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:44) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:195) > at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) > at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) > at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) > at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) > at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) > at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) > at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) > at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) > at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) > at java.lang.Thread.run(Thread.java:619) > Event was NotificationEvent:EXECUTION_COMPLETED > Exception is: java.lang.OutOfMemoryError: GC overhead limit exceeded > Near Karajan line: swiftscript:strcat @ postproc.kml, line: 407 > Execution failed: > Uncaught exception: java.lang.OutOfMemoryError: GC overhead > limit exceeded > Exception in thread "Overloaded Host Monitor" > java.lang.OutOfMemoryError: Java heap space > at java.util.HashMap.newKeyIterator(HashMap.java:840) > at java.util.HashMap$KeySet.iterator(HashMap.java:874) > at java.util.HashSet.iterator(HashSet.java:153) > at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:50) > Progress: Finished in previous run:4002 > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From aespinosa at cs.uchicago.edu Wed Dec 1 21:05:59 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 1 Dec 2010 21:05:59 -0600 Subject: [Swift-user] GC overhead limit exceeded In-Reply-To: <1291258455.32072.1.camel@blabla2.none> References: <1291258455.32072.1.camel@blabla2.none> Message-ID: I do have a large workflow ~400k jobs and tons of mappers and primitive types that go along each one. I have no trace. my workflow is running on the latest trunk: swift-r3728 cog-r2948 Setting the heap to 8GB somehow solves the problem. Are there any tips to estimate heap usage? # of local variables, array sizes, etc.? 2010/12/1 Mihael Hategan : > Something is very likely wrong if things can't fit in 4G. > > Usual suspects: > -very large workflows. > -lots of trace/tracef > -Justin reported some problems with coaster I/O on the BGP, so it might > be that (update to latest trunk and see if that solves it). > > Mihael > > On Wed, 2010-12-01 at 20:38 -0600, Allan Espinosa wrote: >> I have my heap set to 4GB does this mean i still need more memory? >> >> HEAPMAX=4096M >> >> >> -Allan >> >> Progress: ?Finished in previous run:4002 >> Uncaught exception: java.lang.OutOfMemoryError: GC overhead limit >> exceeded in swiftscript:strcat @ postproc.kml, line: 407 >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> ? ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.AbstractExecutorService.newTaskFor(AbstractExecutorService.java:59) >> ? ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:78) >> ? ? ? ? at org.globus.cog.karajan.workflow.events.EventBus._post(EventBus.java:74) >> ? ? ? ? at org.globus.cog.karajan.workflow.events.EventBus.post(EventBus.java:97) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.fireControlEvent(FlowNode.java:187) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.startElement(FlowNode.java:467) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.Sequential.startElement(Sequential.java:84) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:57) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.Sequential.childCompleted(Sequential.java:44) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.childCompleted(AbstractSequentialWithArguments.java:195) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.Sequential.notificationEvent(Sequential.java:32) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:340) >> ? ? ? ? at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.fireNotificationEvent(FlowNode.java:181) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:309) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:50) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:26) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:238) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:289) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.controlEvent(FlowNode.java:402) >> ? ? ? ? at org.globus.cog.karajan.workflow.nodes.FlowNode.event(FlowNode.java:343) >> ? ? ? ? at org.globus.cog.karajan.workflow.FlowElementWrapper.event(FlowElementWrapper.java:230) >> ? ? ? ? at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java:173) >> ? ? ? ? at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:44) >> ? ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) >> ? ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) >> ? ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) >> ? ? ? ? at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) >> ? ? ? ? at java.lang.Thread.run(Thread.java:619) >> Event was NotificationEvent:EXECUTION_COMPLETED >> Exception is: java.lang.OutOfMemoryError: GC overhead limit exceeded >> Near Karajan line: swiftscript:strcat @ postproc.kml, line: 407 >> Execution failed: >> ? ? ? ? Uncaught exception: java.lang.OutOfMemoryError: GC overhead >> limit exceeded >> Exception in thread "Overloaded Host Monitor" >> java.lang.OutOfMemoryError: Java heap space >> ? ? ? ? at java.util.HashMap.newKeyIterator(HashMap.java:840) >> ? ? ? ? at java.util.HashMap$KeySet.iterator(HashMap.java:874) >> ? ? ? ? at java.util.HashSet.iterator(HashSet.java:153) >> ? ? ? ? at org.globus.cog.karajan.scheduler.OverloadedHostMonitor.run(OverloadedHostMonitor.java:50) >> Progress: ?Finished in previous run:4002 From hategan at mcs.anl.gov Wed Dec 1 21:12:15 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 01 Dec 2010 19:12:15 -0800 Subject: [Swift-user] GC overhead limit exceeded In-Reply-To: References: <1291258455.32072.1.camel@blabla2.none> Message-ID: <1291259535.32372.1.camel@blabla2.none> On Wed, 2010-12-01 at 21:05 -0600, Allan Espinosa wrote: > I do have a large workflow ~400k jobs and tons of mappers and > primitive types that go along each one. > > I have no trace. my workflow is running on the latest trunk: > swift-r3728 cog-r2948 > > > Setting the heap to 8GB somehow solves the problem. Are there any > tips to estimate heap usage? # of local variables, array sizes, etc.? Send me your workflow and I can take a look. Though it's likely going to happen after next week. Mihael From aespinosa at cs.uchicago.edu Thu Dec 2 09:10:26 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 2 Dec 2010 09:10:26 -0600 Subject: [Swift-user] 3rd party transfers Message-ID: I have a bunch of 3rd party gridftp transfers. Swift reports around 10k jobs being in the vdl:stagein at a time. After a while i get a couple of these errors. Does it look like i'm stressing the gridftp servers? my throttle.transfers=8 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler Starting service on gsiftp://gpn-hus 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler File transfer with resource local->r 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler Exception in transfer org.globus.cog.abstraction.impl.file.FileResourceException at org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr esource.java:51) at org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr esource.java:34) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D eTransferHandler.java:352) at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin ngDelegatedFileTransferHandler.java:46) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi andler.java:489) at java.lang.Thread.run(Thread.java:619) Caused by: org.globus.ftp.exception.ServerException: Server refused performing the request. Custom m rror code 1) [Nested exception message: Custom message: Unexpected reply: 451 ocurred during retrie org.globus.ftp.exception.DataChannelException: setPassive() must match store() and setActive() - ret rror code 2) org.globus.ftp.exception.DataChannelException: setPassive() must match store() and setActive() - ret rror code 2) at org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) at org.globus.ftp.FTPClient.put(FTPClient.java:1294) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D eTransferHandler.java:352) at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin ngDelegatedFileTransferHandler.java:46) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi andler.java:489) at java.lang.Thread.run(Thread.java:619) ] [Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Unexp : 451 ocurred during retrieve() org.globus.ftp.exception.DataChannelException: setPassive() must match store() and setActive() - ret rror code 2) org.globus.ftp.exception.DataChannelException: setPassive() must match store() and setActive() - ret rror code 2) at org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) at org.globus.ftp.FTPClient.put(FTPClient.java:1294) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D eTransferHandler.java:352) at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin ngDelegatedFileTransferHandler.java:46) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi andler.java:489) at java.lang.Thread.run(Thread.java:619) ] at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) ... 1 more -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Mon Dec 6 14:57:13 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 6 Dec 2010 14:57:13 -0600 Subject: [Swift-user] workersPerNode on manual coasters Message-ID: if I set workersPerNode=16 does this mean there's 16 receivable jobs per worker.pl? I have 1 worker.pl connecting to my coaster service but only one job is being executed. -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Mon Dec 6 15:04:37 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 6 Dec 2010 15:04:37 -0600 (CST) Subject: [Swift-user] workersPerNode on manual coasters In-Reply-To: Message-ID: <16628357.178651.1291669477250.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > if I set workersPerNode=16 does this mean there's 16 receivable jobs > per worker.pl? My understanding is yes. > > I have 1 worker.pl connecting to my coaster service but only one job > is being executed. Can you check and/or send your sites file? Any chance you had a typo in that tag name? Also, I think you can verify from both the worker and service logs that the worker is "pulling" the right number of concurrent jobs. I think the worker may also echo the setting of workersPerNode in its log. Mihael may want to correct or clarify this. - Mike > -Allan > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Wed Dec 8 10:09:48 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 8 Dec 2010 10:09:48 -0600 Subject: [Swift-user] karajan/ cog webpage down? Message-ID: Can't access the page recently http://wiki.cogkit.org -Allan From aespinosa at cs.uchicago.edu Wed Dec 8 10:44:18 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 8 Dec 2010 10:44:18 -0600 Subject: [Swift-user] throttle transfers and vdl:stagein graphs In-Reply-To: <1289277290.18134.12.camel@blabla2.none> References: <1289277290.18134.12.camel@blabla2.none> Message-ID: I see this in doStagein: uParallelFor(file, files provider := provider(file) srchost := hostname(file) srcdir := vdl:dirname(file) destdir := dircat(dir, reldirname(file)) filename := basename(file) size := file:size("{srcdir}/{filename}", host=srchost, provider=provider) policy := cdm:query(query=file) log(LOG:DEBUG, "CDM: {file} : {policy}") doStageinFile(provider=provider, srchost=srchost, srcfile=filename, srcdir=srcdir, desthost=host, destdir=destdir, size=size, policy=policy) ) log(LOG:INFO, "END jobid={jobid} - Staging in finished") Does this mean that there is actually no throttling going on for dostageinfile() ? It does make sense since my 400k-job workflow is still stuck for 5 hours staging in 23k files. -Allan 2010/11/8 Mihael Hategan : > On Mon, 2010-11-08 at 20:50 -0600, Allan Espinosa wrote: >> Hi, >> >> In my workflow, I use the default throttle.transfers=4 . ?But my >> dostagein-total plot indicates that there are 72 stagein events going >> on for around 90 seconds. ?shouldn't there be a linear ramp up or a >> saw-tooth pattern at the plateau because of having throttled >> transfers? > > Lies. And statistics. > > The plot indicates that a number of instances of a certain portion of > vdl-int is executing. > > If you look at that portion of vdl-int (i.e. between setprogress("Stage > in") and setprogress("Submitting")) there are a few things happening, > including directory creation. > > Essentially you are dealing with the following pattern: > > parallelFor(... > ?a() > ?throttle(4, b()) > ?c() > ) > > The graph would show something like the parallelism in the invocation of > the body of parallelFor. And it is quite possible that all a() > invocations start well before any of the b() invocations start. The only > accurate way to see the effect of the throttle is to trace the b() > invocations, which you can probably do by looking at the status of file > transfer tasks (by enabling the relevant logging stuff). > > Mihael From aespinosa at cs.uchicago.edu Wed Dec 8 11:38:22 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 8 Dec 2010 11:38:22 -0600 Subject: [Swift-user] provider staging remote service+workers Message-ID: providerstagig=true stagingMethod is default (proxy) For proxy-based (the default) provider staging, is the coaster service the one that pulls the file? If so, then in my case where the coaster service is in communicado and workers somewhere else (OSG), then my workers won't be able to access the files at all? Caused by: Job failed with an exit code of 520 Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 Progress: Submitted:69 Active:2 Failed:1 Finished in previous run:4 Exception in seispeak: Arguments: [stat=LGU, extract_sgt=0, slon=-119.06587, slat=34.10819, outputBinary=1, mergeOutput=1, ntout=3000, rupmod file=scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations/21/0/21_0.txt.variation-s0002-h0004, sgt_xfi le=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/LGU_21_0_subfx.sgt, sgt_yfile=gpfs/pads/swift/aespino sa/science/cybershake/Results/LGU/21/0/LGU_21_0_subfy.sgt, seis_file=gpfs/pads/swift/aespinosa/science/cybershake/Resu lts/LGU/21/0/Seismogram_LGU_21_0_0016.grm, simulation_out_pointsX=2, simulation_out_pointsY=1, surfseis_rspectra_seism ogram_units=cmpersec, surfseis_rspectra_output_units=cmpersec2, surfseis_rspectra_output_type=aa, surfseis_rspectra_ap ply_byteswap=no, simulation_out_timesamples=3000, simulation_out_timeskip=0.1, surfseis_rspectra_period=all, surfseis _rspectra_apply_filter_highHZ=5, in=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/Seismogram_LGU_21_0_ 0016.grm, out=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/PeakVals_LGU_21_0_0016.bsa] Host: FNAL_FERMIGRID_fermigridosg1.fnal.gov Directory: postproc-proxy-stage00/jobs/b/seispeak-blf80q2kTODO: outs ---- Caused by: Job failed with an exit code of 520 Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 Exception in seispeak: Arguments: [stat=LGU, extract_sgt=0, slon=-119.06587, slat=34.10819, outputBinary=1, mergeOutput=1, ntout=3000, rupmodfile=scratch/projects/tg/tera3d/CyberShake2007/ruptures/RuptureVariations/21/0/21_0.txt.variation-s0001-h0001, sgt_xfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/LGU_21_0_subfx.sgt, sgt_yfile=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/LGU_21_0_subfy.sgt, seis_file=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/Seismogram_LGU_21_0_0007.grm, simulation_out_pointsX=2, simulation_out_pointsY=1, surfseis_rspectra_seismogram_units=cmpersec, surfseis_rspectra_output_units=cmpersec2, surfseis_rspectra_output_type=aa, surfseis_rspectra_apply_byteswap=no, simulation_out_timesamples=3000, simulation_out_timeskip=0.1, surfseis_rspectra_period=all, surfseis_rspectra_apply_filter_highHZ=5, in=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/Seismogram_LGU_21_0_0007.grm, out=gpfs/pads/swift/aespinosa/science/cybershake/Results/LGU/21/0/PeakVals_LGU_21_0_0007.bsa] Host: BNL-ATLAS_gridgk01.racf.bnl.gov Directory: postproc-proxy-stage00/jobs/1/seispeak-1mf80q2kTODO: outs ---- Caused by: J Thanks, -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Wed Dec 8 21:45:44 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Dec 2010 19:45:44 -0800 Subject: [Swift-user] throttle transfers and vdl:stagein graphs In-Reply-To: References: <1289277290.18134.12.camel@blabla2.none> Message-ID: <1291866344.8057.0.camel@blabla2.none> Throttling happens in the scheduler. On Wed, 2010-12-08 at 10:44 -0600, Allan Espinosa wrote: > I see this in doStagein: > > uParallelFor(file, files > provider := provider(file) > srchost := hostname(file) > srcdir := vdl:dirname(file) > destdir := dircat(dir, reldirname(file)) > filename := basename(file) > size := file:size("{srcdir}/{filename}", host=srchost, provider=provider) > > policy := cdm:query(query=file) > log(LOG:DEBUG, "CDM: {file} : {policy}") > > doStageinFile(provider=provider, srchost=srchost, srcfile=filename, > srcdir=srcdir, desthost=host, destdir=destdir, size=size, policy=policy) > ) > log(LOG:INFO, "END jobid={jobid} - Staging in finished") > > Does this mean that there is actually no throttling going on for > dostageinfile() ? It does make sense since my 400k-job workflow is > still stuck for 5 hours staging in 23k files. > > -Allan > > > 2010/11/8 Mihael Hategan : > > On Mon, 2010-11-08 at 20:50 -0600, Allan Espinosa wrote: > >> Hi, > >> > >> In my workflow, I use the default throttle.transfers=4 . But my > >> dostagein-total plot indicates that there are 72 stagein events going > >> on for around 90 seconds. shouldn't there be a linear ramp up or a > >> saw-tooth pattern at the plateau because of having throttled > >> transfers? > > > > Lies. And statistics. > > > > The plot indicates that a number of instances of a certain portion of > > vdl-int is executing. > > > > If you look at that portion of vdl-int (i.e. between setprogress("Stage > > in") and setprogress("Submitting")) there are a few things happening, > > including directory creation. > > > > Essentially you are dealing with the following pattern: > > > > parallelFor(... > > a() > > throttle(4, b()) > > c() > > ) > > > > The graph would show something like the parallelism in the invocation of > > the body of parallelFor. And it is quite possible that all a() > > invocations start well before any of the b() invocations start. The only > > accurate way to see the effect of the throttle is to trace the b() > > invocations, which you can probably do by looking at the status of file > > transfer tasks (by enabling the relevant logging stuff). > > > > Mihael From aespinosa at cs.uchicago.edu Thu Dec 9 13:12:59 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 9 Dec 2010 13:12:59 -0600 Subject: [Swift-user] throttle transfers and vdl:stagein graphs In-Reply-To: <1291866344.8057.0.camel@blabla2.none> References: <1289277290.18134.12.camel@blabla2.none> <1291866344.8057.0.camel@blabla2.none> Message-ID: Ahhh, I changed the throttling parameters and saw the behavior of vdl:stageinfile change. Thanks! -Allan 2010/12/8 Mihael Hategan : > Throttling happens in the scheduler. > > On Wed, 2010-12-08 at 10:44 -0600, Allan Espinosa wrote: >> I see this in doStagein: >> >> ? ? ? ? ? ? ? ? ? ? ? uParallelFor(file, files >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? provider := provider(file) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? srchost := hostname(file) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? srcdir := vdl:dirname(file) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? destdir := dircat(dir, reldirname(file)) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? filename := basename(file) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? size := file:size("{srcdir}/{filename}", host=srchost, provider=provider) >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? policy := cdm:query(query=file) >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? log(LOG:DEBUG, "CDM: {file} : {policy}") >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? doStageinFile(provider=provider, srchost=srchost, srcfile=filename, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? srcdir=srcdir, desthost=host, destdir=destdir, size=size, policy=policy) >> ? ? ? ? ? ? ? ? ? ? ? ) >> ? ? ? ? ? ? ? ? ? ? ? log(LOG:INFO, "END jobid={jobid} - Staging in finished") >> >> Does this mean that there is actually no throttling going on for >> dostageinfile() ? ? It does make sense since my 400k-job workflow is >> still stuck for 5 hours staging in 23k files. >> >> -Allan >> >> >> 2010/11/8 Mihael Hategan : >> > On Mon, 2010-11-08 at 20:50 -0600, Allan Espinosa wrote: >> >> Hi, >> >> >> >> In my workflow, I use the default throttle.transfers=4 . ?But my >> >> dostagein-total plot indicates that there are 72 stagein events going >> >> on for around 90 seconds. ?shouldn't there be a linear ramp up or a >> >> saw-tooth pattern at the plateau because of having throttled >> >> transfers? >> > >> > Lies. And statistics. >> > >> > The plot indicates that a number of instances of a certain portion of >> > vdl-int is executing. >> > >> > If you look at that portion of vdl-int (i.e. between setprogress("Stage >> > in") and setprogress("Submitting")) there are a few things happening, >> > including directory creation. >> > >> > Essentially you are dealing with the following pattern: >> > >> > parallelFor(... >> > ?a() >> > ?throttle(4, b()) >> > ?c() >> > ) >> > >> > The graph would show something like the parallelism in the invocation of >> > the body of parallelFor. And it is quite possible that all a() >> > invocations start well before any of the b() invocations start. The only >> > accurate way to see the effect of the throttle is to trace the b() >> > invocations, which you can probably do by looking at the status of file >> > transfer tasks (by enabling the relevant logging stuff). >> > >> > Mihael From iraicu at cs.iit.edu Thu Dec 9 13:13:55 2010 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 09 Dec 2010 13:13:55 -0600 Subject: [Swift-user] CFP: Workshop on Data Intensive Computing in the Clouds (DataCloud) 2011, deadline extended to January 3rd, 2011 Message-ID: <4D012A73.7060503@cs.iit.edu> --------------------------------------------------------------------------------- *** Call for Papers *** WORKSHOP ON DATA INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD 2011) In conjunction with IPDPS 2011, May 16, Anchorage, Alaska http://www.cse.buffalo.edu/faculty/tkosar/datacloud2011/index.php --------------------------------------------------------------------------------- The First International Workshop on Data Intensive Computing in the Clouds (DataCloud2011) will be held in conjunction with the 25th IEEE International Parallel and Distributed Computing Symposium (IPDPS 2011), in Anchorage, Alaska. Applications and experiments in all areas of science are becoming increasingly complex and more demanding in terms of their computational and data requirements. Some applications generate data volumes reaching hundreds of terabytes and even petabytes. As scientific applications become more data intensive, the management of data resources and dataflow between the storage and compute resources is becoming the main bottleneck. Analyzing, visualizing, and disseminating these large data sets has become a major challenge and data intensive computing is now considered as the "fourth paradigm" in scientific discovery after theoretical, experimental, and computational science. DataCloud2011 will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running data-intensive computing workloads on Cloud Computing infrastructures. The DataCloud2011 workshop will focus on the use of cloud-based technologies to meet the new data intensive scientific challenges that are not well served by the current supercomputers, grids or compute-intensive clouds. We believe the workshop will be an excellent place to help the community define the current state, determine future goals, and present architectures and services for future clouds supporting data intensive computing. TOPICS --------------------------------------------------------------------------------- - Data-intensive cloud computing applications, characteristics, challenges - Case studies of data intensive computing in the clouds - Performance evaluation of data clouds, data grids, and data centers - Energy-efficient data cloud design and management - Data placement, scheduling, and interoperability in the clouds - Accountability, QoS, and SLAs - Data privacy and protection in a public cloud environment - Distributed file systems for clouds - Data streaming and parallelization - New programming models for data-intensive cloud computing - Scalability issues in clouds - Social computing and massively social gaming - 3D Internet and implications - Future research challenges in data-intensive cloud computing IMPORTANT DATES --------------------------------------------------------------------------------- Paper submission: January 3rd, 2011 Acceptance notification: February 1st, 2011 Final papers due: February 15th, 2011 PAPER SUBMISSION --------------------------------------------------------------------------------- DataCloud2011 invites authors to submit original and unpublished technical papers. All submissions will be peer-reviewed and judged on correctness, originality, technical strength, significance, quality of presentation, and relevance to the workshop topics of interest. Submitted papers may not have appeared in or be under consideration for another workshop, conference or a journal, nor may they be under review or submitted to another forum during the DataCloud2011 review process. Submitted papers may not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style, document templates can be found at ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.pdf and ftp://pubftp.computer.org/Press/Outgoing/proceedings/instruct8.5x11.doc), including figures, tables, and references. The final 10 page papers (PDF format) must be submitted online at https://cmt.research.microsoft.com/DataCloud2011/ before the deadline of January 3rd, 2011 at 11:59PM PST. Authors of the selected DataCloud2011 papers will be invited to submit extended versions of their workshop papers to the Journal of Grid Computing (published by Springer), Special Issue on "Data Intensive Computing in the Clouds." WORKSHOP and PROGRAM CHAIRS --------------------------------------------------------------------------------- Tevfik Kosar, University at Buffalo Ioan Raicu, Illinois Institute of Technology STEERING COMMITTEE --------------------------------------------------------------------------------- Ian Foster, Univ of Chicago & Argonne National Lab Geoffrey Fox, Indiana University James Hamilton, Amazon Web Services Manish Parashar, Rutgers University & NSF Dan Reed, Microsoft Research Rich Wolski, University of California, Santa Barbara Liang-Jie Zhang, IBM Research PROGRAM COMMITTEE --------------------------------------------------------------------------------- David Abramson, Monash University, Australia Roger Barga, Microsoft Research John Bent, Los Alamos National Laboratory Umit Catalyurek, Ohio State University Abhishek Chandra, University of Minnesota Rong N. Chang, IBM Research Alok Choudhary, Northwestern University Brian Cooper, Google Ewa Deelman, University of Southern California Murat Demirbas, University at Buffalo Adriana Iamnitchi, University of South Florida Maria Indrawan, Monash University, Australia Alexandru Iosup, Delft University of Technology, Netherlands Peter Kacsuk, Hungarian Academy of Sciences, Hungary Dan Katz, University of Chicago Steven Ko, University at Buffalo Gregor von Laszewski, Rochester Institute of Technology Erwin Laure, CERN, Switzerland Ignacio Llorente, Universidad Complutense de Madrid, Spain Reagan Moore, University of North Carolina Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory Florian Schintke, Zuse Institute Berlin Ian Taylor, Cardiff University, UK Douglas Thain, University of Notre Dame Bernard Traversat, Oracle Yong Zhao, Univ of Electronic Science & Tech of China -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From iraicu at cs.iit.edu Thu Dec 9 15:53:54 2010 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 09 Dec 2010 15:53:54 -0600 Subject: [Swift-user] CFP: 2nd ACM Workshop on Scientific Cloud Computing (ScienceCloud) 2011, co-located with HPDC 2011 Message-ID: <4D014FF2.9000306@cs.iit.edu> --------------------------------------------------------------------------------- * ** Call for Papers *** 2nd Workshop on Scientific Cloud Computing (ScienceCloud) 2011 In conjunction with ACM HPDC 2011, June 8th, 2011, San Jose, California http://www.cs.iit.edu/~iraicu/ScienceCloud2011/ --------------------------------------------------------------------------------- The advent of computation can be compared, in terms of the breadth and depth of its impact on research and scholarship, to the invention of writing and the development of modern mathematics. Scientific Computing has already begun to change how science is done, enabling scientific breakthroughs through new kinds of experiments that would have been impossible only a decade ago. Today's science is generating datasets that are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. The support for data intensive computing is critical to advancing modern science as storage systems have experienced an increasing gap between their capacity and bandwidth by more than 10-fold over the last decade. There is an emerging need for advanced techniques to manipulate, visualize and interpret large datasets. Scientific computing involves a broad range of technologies, from high-performance computing (HPC) which is heavily focused on compute-intensive applications, high-throughput computing (HTC) which focuses on using many computing resources over long periods of time to accomplish its computational tasks, many-task computing (MTC) which aims to bridge the gap between HPC and HTC by focusing on using many resources over short periods of time, to data-intensive computing which is heavily focused on data distribution and harnessing data locality by scheduling of computations close to the data. The 2nd workshop on Scientific Cloud Computing (ScienceCloud) will provide the scientific community a dedicated forum for discussing new research, development, and deployment efforts in running these kinds of scientific computing workloads on Cloud Computing infrastructures. The ScienceCloud workshop will focus on the use of cloud-based technologies to meet new compute intensive and data intensive scientific challenges that are not well served by the current supercomputers, grids or commercial clouds. What architectural changes to the current cloud frameworks (hardware, operating systems, networking and/or programming models) are needed to support science? Dynamic information derived from remote instruments and coupled simulation and sensor ensembles are both important new science pathways and tremendous challenges for current HPC/HTC/MTC technologies. How can cloud technologies enable these new scientific approaches? How are scientists using clouds? Are there scientific HPC/HTC/MTC workloads that are suitable candidates to take advantage of emerging cloud computing resources with high efficiency? What benefits exist by adopting the cloud model, over clusters, grids, or supercomputers? What factors are limiting clouds use or would make them more usable/efficient? This workshop encourages interaction and cross-pollination between those developing applications, algorithms, software, hardware and networking, emphasizing scientific computing for such cloud platforms. We believe the workshop will be an excellent place to help the community define the current state, determine future goals, and define architectures and services for future science clouds. For more information about the workshop, please see http://www.cs.iit.edu/~iraicu/ScienceCloud2011/. To see last year's workshop program agenda, and accepted papers and presentations, please see http://dsl.cs.uchicago.edu/ScienceCloud2010/. TOPICS --------------------------------------------------------------------------------- # scientific computing applications * case studies on public, private and open source cloud computing * case studies comparing between cloud computing and cluster, grids, and/or supercomputers * performance evaluation # performance evaluation * real systems * cloud computing benchmarks * reliability of large systems # programming models and tools * map-reduce and its generalizations * many-task computing middleware and applications * integrating parallel programming frameworks with storage clouds * message passing interface (MPI) * service-oriented science applications # storage cloud architectures and implementations * distributed file systems * content distribution systems for large data * data caching frameworks and techniques * data management within and across data centers * data streaming applications * data-aware scheduling * data-intensive computing applications * eventual-consistency storage usage and management # compute resource management * dynamic resource provisioning * scheduling * techniques to manage many-core resources and/or GPUs # high-performance computing * high-performance I/O systems * interconnect and network interface architectures for HPC * multi-gigabit wide-area networking * scientific computing tradeoffs between clusters/grids/supercomputers and clouds * parallel file systems in dynamic environments # models, frameworks and systems for cloud security * implementation of access control and scalable isolation IMPORTANT DATES --------------------------------------------------------------------------------- Abstract submission: January 25th, 2011 Paper submission: February 1st, 2011 Acceptance notification: February 28th, 2011 Final papers due: March 24th, 2011 Workshop date: June 8th, 2011 PAPER SUBMISSION --------------------------------------------------------------------------------- Authors are invited to submit papers with unpublished, original work of not more than 10 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages (including all text, figures, and references), as per ACM 8.5 x 11 manuscript guidelines (http://www.acm.org/publications/instructions_for_proceedings_volumes); document templates can be found at http://www.acm.org/sigs/publications/proceedings-templates. A 250 word abstract (PDF format) must be submitted online at https://cmt.research.microsoft.com/ScienceCloud2011/ before the deadline of January 25th, 2011 at 11:59PM PST; the final 5/10 page papers in PDF format will be due on February 1st, 2011 at 11:59PM PST. Papers will be peer-reviewed, and accepted papers will be published in the workshop proceedings as part of the ACM digital library. Notifications of the paper decisions will be sent out by February 28th, 2011. Selected excellent work will be invited to submit extended versions of the workshop paper to a special issue journal. Submission implies the willingness of at least one of the authors to register and present the paper. For more information, please visit http://www.cs.iit.edu/~iraicu/ScienceCloud2011/. WORKSHOP GENERAL CHAIRS --------------------------------------------------------------------------------- * Ioan Raicu, Illinois Institute of Technology * Pete Beckman, University of Chicago & Argonne National Laboratory * Ian Foster, University of Chicago & Argonne National Laboratory PROGRAM CHAIR --------------------------------------------------------------------------------- Yogesh Simmhan, University of Southern California STEERING COMMITTEE --------------------------------------------------------------------------------- * Dennis Gannon, Microsoft Research, USA * Robert Grossman, University of Chicago, USA * Kate Keahey, Nimbus, University of Chicago, Argonne National Laboratory, USA * Ed Lazowska, University of Washington & Computing Community Consortium, USA * Ignacio Llorente, Open Nebula, Universidad Complutense de Madrid, Spain * David O'Hallaron, Carnegie Mellon University & Intel Labs, USA * Jack Dongarra, University of Tennessee, USA * Geoffrey Fox, Indiana University, USA PROGRAM COMMITTEE --------------------------------------------------------------------------------- * Remzi Arpaci-Dusseau, University of Wisconsin, Madison * Roger Barga, Microsoft Research * Jeff Broughton, Lawrence Berkeley National Lab. * Rajkumar Buyya, University of Melbourne, Australia * Roy Campbell, Univ. of Illinois at Urbana Champaign * Henri Casanova, University of Hawaii at Manoa * Jeff Chase, Duke University * Alok Choudhary, Northwestern University * Bill Howe, University of Washington * Alexandru Iosup, Delft University of Technology, Netherlands * Shantenu Jha, Louisiana State University * Tevfik Kosar, Louisiana State University * Shiyong Lu, Wayne State University * Joe Mambretti, Northwestern University * David Martin, Argonne National Laboratory * Paolo Missier, University of Manchester, UK * Ruben Montero, Univ. Complutense de Madrid, Spain * Reagan Moore, Univ. of North Carolina, Chappel Hill * Jose Moreira, IBM Research * Jim Myers, NCSA * Viktor Prasanna, University of Southern California * Lavanya Ramakrishnan, Lawrence Berkeley Nat. Lab. * Matei Ripeanu, University of British Columbia, Canada * Josh Simons, VMWare * Marc Snir, University of Illinois at Urbana Champaign * Ion Stoica, University of California Berkeley * Daniel Zinn, University of California at Davis -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From iraicu at cs.iit.edu Thu Dec 9 16:45:37 2010 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 09 Dec 2010 16:45:37 -0600 Subject: [Swift-user] CFP: The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2011 Message-ID: <4D015C11.2040909@cs.iit.edu> Call For Papers The 20th International ACM Symposium on High-Performance Parallel and Distributed Computing http://www.hpdc.org/2011/ San Jose, California, June 8-11, 2011 The ACM International Symposium on High-Performance Parallel and Distributed Computing is the premier conference for presenting the latest research on the design, implementation, evaluation, and use of parallel and distributed systems for high end computing. The 20th installment of HPDC will take place in San Jose, California, in the heart of Silicon Valley. This year, HPDC is affiliated with the ACM Federated Computing Research Conference, consisting of fifteen leading ACM conferences all in one week. HPDC will be held on June 9-11 (Thursday through Saturday) with affiliated workshops taking place on June 8th (Wednesday). Submissions are welcomed on all forms of high performance parallel and distributed computing, including but not limited to clusters, clouds, grids, utility computing, data-intensive computing, multicore and parallel computing. All papers will be reviewed by a distinguished program committee, with a strong preference for rigorous results obtained in operational parallel and distributed systems. All papers will be evaluated for correctness, originality, potential impact, quality of presentation, and interest and relevance to the conference. In addition to traditional technical papers, we also invite experience papers. Such papers should present operational details of a production high end system or application, and draw out conclusions gained from operating the system or application. The evaluation of experience papers will place a greater weight on the real-world impact of the system and the value of conclusions to future system designs. Topics of interest include, but are not limited to: ------------------------------------------------------------------------------- # Applications of parallel and distributed computing. # Systems, networks, and architectures for high end computing. # Parallel and multicore issues and opportunities. # Virtualization of machines, networks, and storage. # Programming languages and environments. # I/O, file systems, and data management. # Data intensive computing. # Resource management, scheduling, and load-balancing. # Performance modeling, simulation, and prediction. # Fault tolerance, reliability and availability. # Security, configuration, policy, and management issues. # Models and use cases for utility, grid, and cloud computing. Authors are invited to submit technical papers of at most 12 pages in PDF format, including all figures and references. Papers should be formatted in the ACM Proceedings Style and submitted via the conference web site. Accepted papers will appear in the conference proceedings, and will be incorporated into the ACM Digital Library. Papers must be self-contained and provide the technical substance required for the program committee to evaluate the paper's contribution. Papers should thoughtfully address all related work, particularly work presented at previous HPDC events. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or a journal. See the ACM Prior Publication Policy for more details. Workshops ------------------------------------------------------------------------------- Seven workshops affiliated with HPDC will be held on Wednesday, June 8th. For more information, see the Workshops page at http://www.hpdc.org/2011/workshops.php. # ScienceCloud: 2nd Workshop on Scientific Cloud Computing # MapReduce: The Second International Workshop on MapReduce and its Applications # VTDC: Virtual Technologies in Distributed Computing # ECMLS: The Second International Emerging Computational Methods for the Life Sciences Workshop # LSAP: Workshop on Large-Scale System and Application Performance # DIDC: The Fourth International Workshop on Data-Intensive Distributed Computing # 3DAPAS: Workshop on Dynamic Distributed Data-Intensive Applications, Programming Abstractions, and Systems Important Dates ------------------------------------------------------------------------------- Technical Papers Due: 17 January 2011 PAPER DEADLINE EXTENDED: 24 January 2011 at 12:01 PM (NOON) Eastern Time Author Notifications: 28 February 2011 Final Papers Due: 24 March 2011 Conference Dates: 8-11 June 2011 Organization ------------------------------------------------------------------------------- General Chair Barney Maccabe, Oak Ridge National Laboratory Program Chair Douglas Thain, University of Notre Dame Workshops Chair Mike Lewis, Binghamton University Local Arrangements Chair Nick Wright, Lawrence Berkeley National Laboratory Student Activities Chairs Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology Publicity Chairs Alexandru Iosup, Delft University John Lange, University of Pittsburgh Ioan Raicu, Illinois Institute of Technology Yong Zhao, Microsoft Program Committee Kento Aida, National Institute of Informatics Henri Bal, Vrije Universiteit Roger Barga, Microsoft Jim Basney, NCSA John Bent, Los Alamos National Laboratory Ron Brightwell, Sandia National Laboratories Shawn Brown, Pittsburgh Supercomputer Center Claris Castillo, IBM Andrew A. Chien, UC San Diego and SDSC Ewa Deelman, USC Information Sciences Institute Peter Dinda, Northwestern University Scott Emrich, University of Notre Dame Dick Epema, Delft University of Technology Gilles Fedak, INRIA Renato Figuierdo, University of Florida Ian Foster, University of Chicago and Argonne National Laboratory Gabriele Garzoglio, Fermi National Accelerator Laboratory Rong Ge, Marquette University Sebastien Goasguen, Clemson University Kartik Gopalan, Binghamton University Dean Hildebrand, IBM Almaden Adriana Iamnitchi, University of South Florida Alexandru Iosup, Delft University of Technology Keith Jackson, Lawrence Berkeley Shantenu Jha, Louisiana State University Daniel S. Katz, University of Chicago and Argonne National Laboratory Thilo Kielmann, Vrije Universiteit Charles Killian, Purdue University Tevfik Kosar, Louisiana State University John Lange, University of Pittsburgh Mike Lewis, Binghamton University Barney Maccabe, Oak Ridge National Laboratory Grzegorz Malewicz, Google Satoshi Matsuoka, Tokyo Institute of Technology Jarek Nabrzyski, University of Notre Dame Manish Parashar, Rutgers University Beth Plale, Indiana University Ioan Raicu, Illinois Institute of Technology Philip Rhodes, University of Mississippi Matei Ripeanu, University of British Columbia Philip Roth, Oak Ridge National Laboratory Karsten Schwan, Georgia Tech Martin Swany, University of Delaware Jon Weissman, University of Minnesota Dongyan Xu, Purdue University Ken Yocum, UC San Diego Yong Zhao, Microsoft Steering Committee Henri Bal, Vrije Universiteit Andrew A. Chien, UC San Diego and SDSC Peter Dinda, Northwestern University Ian Foster, Argonne National Laboratory and University of Chicago Dennis Gannon, Microsoft Salim Hariri, University of Arizona Dieter Kranzlmueller, Ludwig-Maximilians-Univ. Muenchen Satoshi Matsuoka, Tokyo Institute of Technology Manish Parashar, Rutgers University Karsten Schwan, Georgia Tech Jon Weissman, University of Minnesota (Chair) -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor ================================================================= Computer Science Department Illinois Institute of Technology 10 W. 31st Street Stuart Building, Room 237D Chicago, IL 60616 ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ ================================================================= ================================================================= From aespinosa at cs.uchicago.edu Thu Dec 9 21:01:08 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 9 Dec 2010 21:01:08 -0600 Subject: [Swift-user] Re: 3rd party transfers In-Reply-To: References: Message-ID: I tried to have the tests more synthesized using Mike's catsall workflow staging in ~3 MB data files to 5 OSG sites. Swift seem to handle the transfer well when the originating files are local. But when it starts to use remote file objects, I get all these 3rd party transfer exceptions. my throttle for file transfers is 8 and for file operations is 10. 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler File transfer with resource remote->tmp 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler Exception in transfer org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: Exception in getFile at org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP FileResource.java:62) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java :401) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil eTransferHandler.java:269) at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi ngDelegatedFileTransferHandler.java:59) at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran sferHandler.java:486) at java.lang.Thread.run(Thread.java:619) Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Failed to retrieve file information about /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in fo at org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP FileResource.java:51) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. java:550) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java :384) ... 4 more Caused by: org.globus.ftp.exception.ServerException: Server refused performing the request. Custom message : Server refused MLST command (error code 1) [Nested exception message: Custom message: Unexpected reply: 500-Command failed : globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: 500-System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.] [Nested exception is org.globus.ftp.exception.UnexpectedReplyCodeException: Custom message: Une xpected reply: 500-Command failed : globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: 500-System error in stat: No such file or directory 500-A system call failed: No such file or directory 500 End.] at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java :101) at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. java:546) ... 5 more I may have been stressing the source gridftp server too much (pads) that it cannot handle a throttle of 8 . But at this configuration, I get low transfer performance. When doing direct transfers, I was able to get better transfer rates until i start coking out gpfs at 10k stageins. My throttle for this configurations was 40 for both file transfers and file operations. 2010/12/2 Allan Espinosa : > I have ?a bunch of 3rd party gridftp transfers. ? Swift reports around > 10k jobs being in the vdl:stagein at a time. ?After a while i get a > couple of these errors. ?Does it look like i'm stressing the gridftp > servers? my throttle.transfers=8 > > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler > Starting service on gsiftp://gpn-hus > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler File > transfer with resource local->r > 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler > Exception in transfer > org.globus.cog.abstraction.impl.file.FileResourceException > ? ? ? ?at org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > esource.java:51) > ? ? ? ?at org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > esource.java:34) > ? ? ? ?at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > eTransferHandler.java:352) > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > ngDelegatedFileTransferHandler.java:46) > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > andler.java:489) > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > Caused by: org.globus.ftp.exception.ServerException: Server refused > performing the request. Custom m > rror code 1) [Nested exception message: ?Custom message: Unexpected > reply: 451 ocurred during retrie > org.globus.ftp.exception.DataChannelException: setPassive() must match > store() and setActive() - ret > rror code 2) > org.globus.ftp.exception.DataChannelException: setPassive() must match > store() and setActive() - ret > rror code 2) > ? ? ? ?at org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > ? ? ? ?at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > eTransferHandler.java:352) > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > ngDelegatedFileTransferHandler.java:46) > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > andler.java:489) > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > ] [Nested exception is > org.globus.ftp.exception.UnexpectedReplyCodeException: ?Custom > message: Unexp > : 451 ocurred during retrieve() > org.globus.ftp.exception.DataChannelException: setPassive() must match > store() and setActive() - ret > rror code 2) > org.globus.ftp.exception.DataChannelException: setPassive() must match > store() and setActive() - ret > rror code 2) > ? ? ? ?at org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > ? ? ? ?at org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > eTransferHandler.java:352) > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > ngDelegatedFileTransferHandler.java:46) > ? ? ? ?at org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > andler.java:489) > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > ] > ? ? ? ?at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > ? ? ? ?at org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > ? ? ? ?at org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > ? ? ? ?... 1 more > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 09:09:15 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 09:09:15 -0600 (CST) Subject: [Swift-user] Re: 3rd party transfers In-Reply-To: Message-ID: <342749191.6884.1291993755279.JavaMail.root@zimbra.anl.gov> Allan, did you verify that each remote site you are talking to in this test is functional at low transaction rates using your current sites configuration? I.e., are you certain that the error below is due to load and not a site-related error? - Mike ----- Original Message ----- > I tried to have the tests more synthesized using Mike's catsall > workflow staging in ~3 MB data files to 5 OSG sites. Swift seem to > handle the transfer well when the originating files are local. But > when it starts to use remote file objects, I get all these 3rd party > transfer exceptions. my throttle for file transfers is 8 and for file > operations is 10. > > 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler File > transfer with resource remote->tmp > 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > Exception in transfer > org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > Exception in getFile > at > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > FileResource.java:62) > at > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > :401) > at > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > eTransferHandler.java:269) > at > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > ngDelegatedFileTransferHandler.java:59) > at > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > sferHandler.java:486) > at java.lang.Thread.run(Thread.java:619) > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > Failed to retrieve file information > about > /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > fo > at > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > FileResource.java:51) > at > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > java:550) > at > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > :384) > ... 4 more > Caused by: org.globus.ftp.exception.ServerException: Server refused > performing the request. Custom message > : Server refused MLST command (error code 1) [Nested exception > message: Custom message: Unexpected reply: > 500-Command failed : > globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > 500-System error in stat: No such file or directory > 500-A system call failed: No such file or directory > 500 End.] [Nested exception is > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > message: Une > xpected reply: 500-Command failed : > globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > 500-System error in stat: No such file or directory > 500-A system call failed: No such file or directory > 500 End.] > at > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > :101) > at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > at > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > java:546) > ... 5 more > > > I may have been stressing the source gridftp server too much (pads) > that it cannot handle a throttle of 8 . But at this configuration, I > get low transfer performance. When doing direct transfers, I was able > to get better transfer rates until i start coking out gpfs at 10k > stageins. My throttle for this configurations was 40 for both file > transfers and file operations. > > > 2010/12/2 Allan Espinosa : > > I have a bunch of 3rd party gridftp transfers. Swift reports around > > 10k jobs being in the vdl:stagein at a time. After a while i get a > > couple of these errors. Does it look like i'm stressing the gridftp > > servers? my throttle.transfers=8 > > > > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler > > Starting service on gsiftp://gpn-hus > > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler File > > transfer with resource local->r > > 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler > > Exception in transfer > > org.globus.cog.abstraction.impl.file.FileResourceException > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > > esource.java:51) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > > esource.java:34) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > eTransferHandler.java:352) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > ngDelegatedFileTransferHandler.java:46) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > andler.java:489) > > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > > Caused by: org.globus.ftp.exception.ServerException: Server refused > > performing the request. Custom m > > rror code 1) [Nested exception message: Custom message: Unexpected > > reply: 451 ocurred during retrie > > org.globus.ftp.exception.DataChannelException: setPassive() must > > match > > store() and setActive() - ret > > rror code 2) > > org.globus.ftp.exception.DataChannelException: setPassive() must > > match > > store() and setActive() - ret > > rror code 2) > > ? ? ? ?at > > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > eTransferHandler.java:352) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > ngDelegatedFileTransferHandler.java:46) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > andler.java:489) > > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > > ] [Nested exception is > > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > > message: Unexp > > : 451 ocurred during retrieve() > > org.globus.ftp.exception.DataChannelException: setPassive() must > > match > > store() and setActive() - ret > > rror code 2) > > org.globus.ftp.exception.DataChannelException: setPassive() must > > match > > store() and setActive() - ret > > rror code 2) > > ? ? ? ?at > > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > eTransferHandler.java:352) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > ngDelegatedFileTransferHandler.java:46) > > ? ? ? ?at > > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > andler.java:489) > > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > > ] > > ? ? ? ?at > > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > > ? ? ? ?at > > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > > ? ? ? ?at > > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > > ? ? ? ?... 1 more > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 10 10:13:05 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 10 Dec 2010 10:13:05 -0600 Subject: [Swift-user] Re: 3rd party transfers In-Reply-To: <342749191.6884.1291993755279.JavaMail.root@zimbra.anl.gov> References: <342749191.6884.1291993755279.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Mike. Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, 8000, 30000 files. The throttles are the same for each run. Problems started to occur at around 800 files . For staging in local files, problems started to occur at 30000 files where vdl:dostagein hits gpfs too much. -Allan 2010/12/10 Michael Wilde : > Allan, did you verify that each remote site you are talking to in this test is functional at low transaction rates using your current sites configuration? > > I.e., are you certain that the error below is due to load and not a site-related error? > > - Mike > > > ----- Original Message ----- >> I tried to have the tests more synthesized using Mike's catsall >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem to >> handle the transfer well when the originating files are local. But >> when it starts to use remote file objects, I get all these 3rd party >> transfer exceptions. my throttle for file transfers is 8 and for file >> operations is 10. >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler File >> transfer with resource remote->tmp >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler >> Exception in transfer >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: >> Exception in getFile >> at >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP >> FileResource.java:62) >> at >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java >> :401) >> at >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil >> eTransferHandler.java:269) >> at >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi >> ngDelegatedFileTransferHandler.java:59) >> at >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran >> sferHandler.java:486) >> at java.lang.Thread.run(Thread.java:619) >> Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: >> Failed to retrieve file information >> about >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in >> fo >> at >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP >> FileResource.java:51) >> at >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. >> java:550) >> at >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java >> :384) >> ... 4 more >> Caused by: org.globus.ftp.exception.ServerException: Server refused >> performing the request. Custom message >> : Server refused MLST command (error code 1) [Nested exception >> message: Custom message: Unexpected reply: >> 500-Command failed : >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: >> 500-System error in stat: No such file or directory >> 500-A system call failed: No such file or directory >> 500 End.] [Nested exception is >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom >> message: Une >> xpected reply: 500-Command failed : >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: >> 500-System error in stat: No such file or directory >> 500-A system call failed: No such file or directory >> 500 End.] >> at >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java >> :101) >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) >> at >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. >> java:546) >> ... 5 more >> >> >> I may have been stressing the source gridftp server too much (pads) >> that it cannot handle a throttle of 8 . But at this configuration, I >> get low transfer performance. When doing direct transfers, I was able >> to get better transfer rates until i start coking out gpfs at 10k >> stageins. My throttle for this configurations was 40 for both file >> transfers and file operations. >> >> >> 2010/12/2 Allan Espinosa : >> > I have a bunch of 3rd party gridftp transfers. Swift reports around >> > 10k jobs being in the vdl:stagein at a time. After a while i get a >> > couple of these errors. Does it look like i'm stressing the gridftp >> > servers? my throttle.transfers=8 >> > >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler >> > Starting service on gsiftp://gpn-hus >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler File >> > transfer with resource local->r >> > 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler >> > Exception in transfer >> > org.globus.cog.abstraction.impl.file.FileResourceException >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr >> > esource.java:51) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr >> > esource.java:34) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> > eTransferHandler.java:352) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> > ngDelegatedFileTransferHandler.java:46) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> > andler.java:489) >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> > Caused by: org.globus.ftp.exception.ServerException: Server refused >> > performing the request. Custom m >> > rror code 1) [Nested exception message: Custom message: Unexpected >> > reply: 451 ocurred during retrie >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> > match >> > store() and setActive() - ret >> > rror code 2) >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> > match >> > store() and setActive() - ret >> > rror code 2) >> > ? ? ? ?at >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> > eTransferHandler.java:352) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> > ngDelegatedFileTransferHandler.java:46) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> > andler.java:489) >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> > ] [Nested exception is >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom >> > message: Unexp >> > : 451 ocurred during retrieve() >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> > match >> > store() and setActive() - ret >> > rror code 2) >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> > match >> > store() and setActive() - ret >> > rror code 2) >> > ? ? ? ?at >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> > eTransferHandler.java:352) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> > ngDelegatedFileTransferHandler.java:46) >> > ? ? ? ?at >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> > andler.java:489) >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> > ] >> > ? ? ? ?at >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio >> > ? ? ? ?at >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio >> > ? ? ? ?at >> > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) >> > ? ? ? ?... 1 more >> > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 10:17:58 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 10:17:58 -0600 (CST) Subject: [Swift-user] Re: 3rd party transfers In-Reply-To: Message-ID: <1295051624.8206.1291997878173.JavaMail.root@zimbra.anl.gov> Did you try provider staging, which might be easier to throttle given that the staging endpoints are more under Swift's control? - MIke ----- Original Message ----- > Hi Mike. > > Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, 8000, > 30000 files. The throttles are the same for each run. Problems > started to occur at around 800 files . > > For staging in local files, problems started to occur at 30000 files > where vdl:dostagein hits gpfs too much. > > -Allan > > > 2010/12/10 Michael Wilde : > > Allan, did you verify that each remote site you are talking to in > > this test is functional at low transaction rates using your current > > sites configuration? > > > > I.e., are you certain that the error below is due to load and not a > > site-related error? > > > > - Mike > > > > > > ----- Original Message ----- > >> I tried to have the tests more synthesized using Mike's catsall > >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem to > >> handle the transfer well when the originating files are local. But > >> when it starts to use remote file objects, I get all these 3rd > >> party > >> transfer exceptions. my throttle for file transfers is 8 and for > >> file > >> operations is 10. > >> > >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler > >> File > >> transfer with resource remote->tmp > >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > >> Exception in transfer > >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > >> Exception in getFile > >> at > >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> FileResource.java:62) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> :401) > >> at > >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > >> eTransferHandler.java:269) > >> at > >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > >> ngDelegatedFileTransferHandler.java:59) > >> at > >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > >> sferHandler.java:486) > >> at java.lang.Thread.run(Thread.java:619) > >> Caused by: > >> org.globus.cog.abstraction.impl.file.FileResourceException: > >> Failed to retrieve file information > >> about > >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > >> fo > >> at > >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> FileResource.java:51) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> java:550) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> :384) > >> ... 4 more > >> Caused by: org.globus.ftp.exception.ServerException: Server refused > >> performing the request. Custom message > >> : Server refused MLST command (error code 1) [Nested exception > >> message: Custom message: Unexpected reply: > >> 500-Command failed : > >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> 500-System error in stat: No such file or directory > >> 500-A system call failed: No such file or directory > >> 500 End.] [Nested exception is > >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> message: Une > >> xpected reply: 500-Command failed : > >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> 500-System error in stat: No such file or directory > >> 500-A system call failed: No such file or directory > >> 500 End.] > >> at > >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > >> :101) > >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> java:546) > >> ... 5 more > >> > >> > >> I may have been stressing the source gridftp server too much (pads) > >> that it cannot handle a throttle of 8 . But at this configuration, > >> I > >> get low transfer performance. When doing direct transfers, I was > >> able > >> to get better transfer rates until i start coking out gpfs at 10k > >> stageins. My throttle for this configurations was 40 for both file > >> transfers and file operations. > >> > >> > >> 2010/12/2 Allan Espinosa : > >> > I have a bunch of 3rd party gridftp transfers. Swift reports > >> > around > >> > 10k jobs being in the vdl:stagein at a time. After a while i get > >> > a > >> > couple of these errors. Does it look like i'm stressing the > >> > gridftp > >> > servers? my throttle.transfers=8 > >> > > >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler > >> > Starting service on gsiftp://gpn-hus > >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler > >> > File > >> > transfer with resource local->r > >> > 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler > >> > Exception in transfer > >> > org.globus.cog.abstraction.impl.file.FileResourceException > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> > esource.java:51) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> > esource.java:34) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> > eTransferHandler.java:352) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> > ngDelegatedFileTransferHandler.java:46) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> > andler.java:489) > >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> > Caused by: org.globus.ftp.exception.ServerException: Server > >> > refused > >> > performing the request. Custom m > >> > rror code 1) [Nested exception message: Custom message: > >> > Unexpected > >> > reply: 451 ocurred during retrie > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> > eTransferHandler.java:352) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> > ngDelegatedFileTransferHandler.java:46) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> > andler.java:489) > >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> > ] [Nested exception is > >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> > message: Unexp > >> > : 451 ocurred during retrieve() > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> > eTransferHandler.java:352) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> > ngDelegatedFileTransferHandler.java:46) > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> > andler.java:489) > >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> > ] > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> > ? ? ? ?at > >> > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > >> > ? ? ? ?... 1 more > >> > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 10 10:44:10 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 10 Dec 2010 10:44:10 -0600 Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) Message-ID: Hi Mike, I'm having problems getting provider staging to work. I seems to pass files as absolute references: _____________________________________________________________________________ command line _____________________________________________________________________________ -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 -of -k -cdmfile -status provider -a /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 _____________________________________________________________________________ stdout _____________________________________________________________________________ _____________________________________________________________________________ stderr _____________________________________________________________________________ /bin/cat: /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: No such file or directory But the remote site does not have /gpfs/pads . Should I be modifying my mappers to accomodate this? -Allan 2010/12/10 Michael Wilde : > Did you try provider staging, which might be easier to throttle given that the staging endpoints are more under Swift's control? > > - MIke > > ----- Original Message ----- >> Hi Mike. >> >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, 8000, >> 30000 files. The throttles are the same for each run. Problems >> started to occur at around 800 files . >> >> For staging in local files, problems started to occur at 30000 files >> where vdl:dostagein hits gpfs too much. >> >> -Allan >> >> >> 2010/12/10 Michael Wilde : >> > Allan, did you verify that each remote site you are talking to in >> > this test is functional at low transaction rates using your current >> > sites configuration? >> > >> > I.e., are you certain that the error below is due to load and not a >> > site-related error? >> > >> > - Mike >> > >> > >> > ----- Original Message ----- >> >> I tried to have the tests more synthesized using Mike's catsall >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem to >> >> handle the transfer well when the originating files are local. But >> >> when it starts to use remote file objects, I get all these 3rd >> >> party >> >> transfer exceptions. my throttle for file transfers is 8 and for >> >> file >> >> operations is 10. >> >> >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler >> >> File >> >> transfer with resource remote->tmp >> >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler >> >> Exception in transfer >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: >> >> Exception in getFile >> >> at >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP >> >> FileResource.java:62) >> >> at >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java >> >> :401) >> >> at >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil >> >> eTransferHandler.java:269) >> >> at >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi >> >> ngDelegatedFileTransferHandler.java:59) >> >> at >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran >> >> sferHandler.java:486) >> >> at java.lang.Thread.run(Thread.java:619) >> >> Caused by: >> >> org.globus.cog.abstraction.impl.file.FileResourceException: >> >> Failed to retrieve file information >> >> about >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in >> >> fo >> >> at >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP >> >> FileResource.java:51) >> >> at >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. >> >> java:550) >> >> at >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java >> >> :384) >> >> ... 4 more >> >> Caused by: org.globus.ftp.exception.ServerException: Server refused >> >> performing the request. Custom message >> >> : Server refused MLST command (error code 1) [Nested exception >> >> message: Custom message: Unexpected reply: >> >> 500-Command failed : >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: >> >> 500-System error in stat: No such file or directory >> >> 500-A system call failed: No such file or directory >> >> 500 End.] [Nested exception is >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom >> >> message: Une >> >> xpected reply: 500-Command failed : >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: >> >> 500-System error in stat: No such file or directory >> >> 500-A system call failed: No such file or directory >> >> 500 End.] >> >> at >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java >> >> :101) >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) >> >> at >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. >> >> java:546) >> >> ... 5 more >> >> >> >> >> >> I may have been stressing the source gridftp server too much (pads) >> >> that it cannot handle a throttle of 8 . But at this configuration, >> >> I >> >> get low transfer performance. When doing direct transfers, I was >> >> able >> >> to get better transfer rates until i start coking out gpfs at 10k >> >> stageins. My throttle for this configurations was 40 for both file >> >> transfers and file operations. >> >> >> >> >> >> 2010/12/2 Allan Espinosa : >> >> > I have a bunch of 3rd party gridftp transfers. Swift reports >> >> > around >> >> > 10k jobs being in the vdl:stagein at a time. After a while i get >> >> > a >> >> > couple of these errors. Does it look like i'm stressing the >> >> > gridftp >> >> > servers? my throttle.transfers=8 >> >> > >> >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler >> >> > Starting service on gsiftp://gpn-hus >> >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler >> >> > File >> >> > transfer with resource local->r >> >> > 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler >> >> > Exception in transfer >> >> > org.globus.cog.abstraction.impl.file.FileResourceException >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr >> >> > esource.java:51) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr >> >> > esource.java:34) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> >> > eTransferHandler.java:352) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> >> > ngDelegatedFileTransferHandler.java:46) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> >> > andler.java:489) >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >> > Caused by: org.globus.ftp.exception.ServerException: Server >> >> > refused >> >> > performing the request. Custom m >> >> > rror code 1) [Nested exception message: Custom message: >> >> > Unexpected >> >> > reply: 451 ocurred during retrie >> >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> >> > match >> >> > store() and setActive() - ret >> >> > rror code 2) >> >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> >> > match >> >> > store() and setActive() - ret >> >> > rror code 2) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> >> > eTransferHandler.java:352) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> >> > ngDelegatedFileTransferHandler.java:46) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> >> > andler.java:489) >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >> > ] [Nested exception is >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom >> >> > message: Unexp >> >> > : 451 ocurred during retrieve() >> >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> >> > match >> >> > store() and setActive() - ret >> >> > rror code 2) >> >> > org.globus.ftp.exception.DataChannelException: setPassive() must >> >> > match >> >> > store() and setActive() - ret >> >> > rror code 2) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> >> > eTransferHandler.java:352) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> >> > ngDelegatedFileTransferHandler.java:46) >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> >> > andler.java:489) >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >> > ] >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio >> >> > ? ? ? ?at >> >> > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) >> >> > ? ? ? ?... 1 more >> >> > >> >> -- >> Allan M. Espinosa >> PhD student, Computer Science >> University of Chicago > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 11:20:38 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 11:20:38 -0600 (CST) Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) In-Reply-To: Message-ID: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> Hi Allan, I vaguely recall similar issues with prior tests of provider staging. Based on Mihael's recommendation Ive been using the "proxy" mode. I dont have my head around all the modes at the moment (I did when I first looked at it). At any rate, in my tests using proxy mode, just on localhost, I did not run into any full-pathname problems: I used simple_mapper and unqualified partial pathnames. My test is on the CI net at: /home/wilde/swift/lab/tests/test.local.ps.sh and pasted below. We should build a similar test to validate proxy-mode provider staging on remote sites with coasters. Whoever gets to it first. See if using the pattern below gets you past this full-pathname problem. - Mike bri$ cat ./test.local.ps.sh #! /bin/bash cat >tc <sites.xml < 8 1 1 .15 10000 proxy $PWD END cat >cf <pstest.swift <; file outfile[] ; foreach f, i in infile { outfile[i] = cat(f); } EOF swift -config cf -tc.file tc -sites.file sites.xml pstest.swift bri$ bri$ mkdir outdir bri$ ls indir/ outdir/ test.local.ps.sh bri$ ls indir f.0000.in f.0001.in f.0002.in f.0003.in f.0004.in bri$ ls outdir bri$ ./test.local.ps.sh Swift svn swift-r3758 cog-r2951 (cog modified locally) RunID: 20101210-1108-qsdi3mz6 Progress: Progress: Active:4 Finished successfully:1 Final status: Finished successfully:5 bri$ ls outdir f.0000.out f.0001.out f.0002.out f.0003.out f.0004.out bri$ ----- Original Message ----- > Hi Mike, > > I'm having problems getting provider staging to work. I seems to pass > files as absolute references: > > _____________________________________________________________________________ > > command line > _____________________________________________________________________________ > > -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if > //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > -of -k -cdmfile -status provider -a > /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > > _____________________________________________________________________________ > > stdout > _____________________________________________________________________________ > > > _____________________________________________________________________________ > > stderr > _____________________________________________________________________________ > > /bin/cat: > /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: > No such file or directory > > But the remote site does not have /gpfs/pads . > > Should I be modifying my mappers to accomodate this? > > -Allan > > > 2010/12/10 Michael Wilde : > > Did you try provider staging, which might be easier to throttle > > given that the staging endpoints are more under Swift's control? > > > > - MIke > > > > ----- Original Message ----- > >> Hi Mike. > >> > >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, > >> 8000, > >> 30000 files. The throttles are the same for each run. Problems > >> started to occur at around 800 files . > >> > >> For staging in local files, problems started to occur at 30000 > >> files > >> where vdl:dostagein hits gpfs too much. > >> > >> -Allan > >> > >> > >> 2010/12/10 Michael Wilde : > >> > Allan, did you verify that each remote site you are talking to in > >> > this test is functional at low transaction rates using your > >> > current > >> > sites configuration? > >> > > >> > I.e., are you certain that the error below is due to load and not > >> > a > >> > site-related error? > >> > > >> > - Mike > >> > > >> > > >> > ----- Original Message ----- > >> >> I tried to have the tests more synthesized using Mike's catsall > >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem > >> >> to > >> >> handle the transfer well when the originating files are local. > >> >> But > >> >> when it starts to use remote file objects, I get all these 3rd > >> >> party > >> >> transfer exceptions. my throttle for file transfers is 8 and for > >> >> file > >> >> operations is 10. > >> >> > >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler > >> >> File > >> >> transfer with resource remote->tmp > >> >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > >> >> Exception in transfer > >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > >> >> Exception in getFile > >> >> at > >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> >> FileResource.java:62) > >> >> at > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> >> :401) > >> >> at > >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > >> >> eTransferHandler.java:269) > >> >> at > >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > >> >> ngDelegatedFileTransferHandler.java:59) > >> >> at > >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > >> >> sferHandler.java:486) > >> >> at java.lang.Thread.run(Thread.java:619) > >> >> Caused by: > >> >> org.globus.cog.abstraction.impl.file.FileResourceException: > >> >> Failed to retrieve file information > >> >> about > >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > >> >> fo > >> >> at > >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> >> FileResource.java:51) > >> >> at > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> >> java:550) > >> >> at > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> >> :384) > >> >> ... 4 more > >> >> Caused by: org.globus.ftp.exception.ServerException: Server > >> >> refused > >> >> performing the request. Custom message > >> >> : Server refused MLST command (error code 1) [Nested exception > >> >> message: Custom message: Unexpected reply: > >> >> 500-Command failed : > >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> >> 500-System error in stat: No such file or directory > >> >> 500-A system call failed: No such file or directory > >> >> 500 End.] [Nested exception is > >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> >> message: Une > >> >> xpected reply: 500-Command failed : > >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> >> 500-System error in stat: No such file or directory > >> >> 500-A system call failed: No such file or directory > >> >> 500 End.] > >> >> at > >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > >> >> :101) > >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > >> >> at > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> >> java:546) > >> >> ... 5 more > >> >> > >> >> > >> >> I may have been stressing the source gridftp server too much > >> >> (pads) > >> >> that it cannot handle a throttle of 8 . But at this > >> >> configuration, > >> >> I > >> >> get low transfer performance. When doing direct transfers, I was > >> >> able > >> >> to get better transfer rates until i start coking out gpfs at > >> >> 10k > >> >> stageins. My throttle for this configurations was 40 for both > >> >> file > >> >> transfers and file operations. > >> >> > >> >> > >> >> 2010/12/2 Allan Espinosa : > >> >> > I have a bunch of 3rd party gridftp transfers. Swift reports > >> >> > around > >> >> > 10k jobs being in the vdl:stagein at a time. After a while i > >> >> > get > >> >> > a > >> >> > couple of these errors. Does it look like i'm stressing the > >> >> > gridftp > >> >> > servers? my throttle.transfers=8 > >> >> > > >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > >> >> > DelegatedFileTransferHandler > >> >> > Starting service on gsiftp://gpn-hus > >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > >> >> > DelegatedFileTransferHandler > >> >> > File > >> >> > transfer with resource local->r > >> >> > 2010-12-02 02:22:06,247-0600 DEBUG > >> >> > DelegatedFileTransferHandler > >> >> > Exception in transfer > >> >> > org.globus.cog.abstraction.impl.file.FileResourceException > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> >> > esource.java:51) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> >> > esource.java:34) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> > eTransferHandler.java:352) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> > andler.java:489) > >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> >> > Caused by: org.globus.ftp.exception.ServerException: Server > >> >> > refused > >> >> > performing the request. Custom m > >> >> > rror code 1) [Nested exception message: Custom message: > >> >> > Unexpected > >> >> > reply: 451 ocurred during retrie > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> > must > >> >> > match > >> >> > store() and setActive() - ret > >> >> > rror code 2) > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> > must > >> >> > match > >> >> > store() and setActive() - ret > >> >> > rror code 2) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> > eTransferHandler.java:352) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> > andler.java:489) > >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> >> > ] [Nested exception is > >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> >> > message: Unexp > >> >> > : 451 ocurred during retrieve() > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> > must > >> >> > match > >> >> > store() and setActive() - ret > >> >> > rror code 2) > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> > must > >> >> > match > >> >> > store() and setActive() - ret > >> >> > rror code 2) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> > eTransferHandler.java:352) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> > andler.java:489) > >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> >> > ] > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> >> > ? ? ? ?at > >> >> > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > >> >> > ? ? ? ?... 1 more > >> >> > > >> > >> -- > >> Allan M. Espinosa > >> PhD student, Computer Science > >> University of Chicago > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Fri Dec 10 11:29:32 2010 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Fri, 10 Dec 2010 11:29:32 -0600 Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) In-Reply-To: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> References: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Mike, I temporarily converted my absolute path references to relative one with symlinks. But jobs started to fail at 800 files: 2010-12-10 11:26:31,116-0600 INFO vdl:execute Exception in cat: Arguments: [RuptureVariations/100/3/100_3.txt.variation-s0002-h0003] Host: Firefly_ff-grid.unl.edu Directory: catsall-20101210-1126-g172ithb/jobs/k/cat-kxrvat2kTODO: outs ---- Caused by: Task failed: null java.lang.IllegalStateException: Timer already cancelled. at java.util.Timer.sched(Timer.java:354) at java.util.Timer.schedule(Timer.java:170) at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:156) at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:150) at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) 2010-12-10 11:26:31,116-0600 DEBUG ConfigProperty Getting property pgraph with host null 2010/12/10 Michael Wilde : > Hi Allan, > > I vaguely recall similar issues with prior tests of provider staging. Based on Mihael's recommendation Ive been using the "proxy" mode. I dont have my head around all the modes at the moment (I did when I first looked at it). > > At any rate, in my tests using proxy mode, just on localhost, I did not run into any full-pathname problems: I used simple_mapper and unqualified partial pathnames. > > My test is on the CI net at: /home/wilde/swift/lab/tests/test.local.ps.sh > and pasted below. ?We should build a similar test to validate proxy-mode provider staging on remote sites with coasters. ?Whoever gets to it first. > > See if using the pattern below gets you past this full-pathname problem. > > - Mike > > bri$ cat ./test.local.ps.sh > #! /bin/bash > > cat >tc < > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > > END > > cat >sites.xml < > > ? > ? ? > ? ?8 > ? ?1 > ? ?1 > ? ?.15 > ? ?10000 > ? ?proxy > ? ?$PWD > ? > > > END > > cat >cf < > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=0 > lazy.errors=false > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > > END > > cat >pstest.swift < > type file; > > app (file o) cat (file i) > { > ?cat @i stdout=@o; > } > > file infile[] ?; > file outfile[] ; > > foreach f, i in infile { > ?outfile[i] = cat(f); > } > > EOF > > swift -config cf -tc.file tc -sites.file sites.xml pstest.swift > bri$ > > > bri$ mkdir outdir > bri$ ls > indir/ ?outdir/ ?test.local.ps.sh > bri$ ls indir > f.0000.in ?f.0001.in ?f.0002.in ?f.0003.in ?f.0004.in > bri$ ls outdir > bri$ ./test.local.ps.sh > Swift svn swift-r3758 cog-r2951 (cog modified locally) > > RunID: 20101210-1108-qsdi3mz6 > Progress: > Progress: ?Active:4 ?Finished successfully:1 > Final status: ?Finished successfully:5 > bri$ ls outdir > f.0000.out ?f.0001.out ?f.0002.out ?f.0003.out ?f.0004.out > bri$ > > > ----- Original Message ----- >> Hi Mike, >> >> I'm having problems getting provider staging to work. I seems to pass >> files as absolute references: >> >> _____________________________________________________________________________ >> >> command line >> _____________________________________________________________________________ >> >> -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if >> //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 >> -of -k -cdmfile -status provider -a >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 >> >> _____________________________________________________________________________ >> >> stdout >> _____________________________________________________________________________ >> >> >> _____________________________________________________________________________ >> >> stderr >> _____________________________________________________________________________ >> >> /bin/cat: >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: >> No such file or directory >> >> But the remote site does not have /gpfs/pads . >> >> Should I be modifying my mappers to accomodate this? >> >> -Allan >> >> >> 2010/12/10 Michael Wilde : >> > Did you try provider staging, which might be easier to throttle >> > given that the staging endpoints are more under Swift's control? >> > >> > - MIke >> > >> > ----- Original Message ----- >> >> Hi Mike. >> >> >> >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, >> >> 8000, >> >> 30000 files. The throttles are the same for each run. Problems >> >> started to occur at around 800 files . >> >> >> >> For staging in local files, problems started to occur at 30000 >> >> files >> >> where vdl:dostagein hits gpfs too much. >> >> >> >> -Allan >> >> >> >> >> >> 2010/12/10 Michael Wilde : >> >> > Allan, did you verify that each remote site you are talking to in >> >> > this test is functional at low transaction rates using your >> >> > current >> >> > sites configuration? >> >> > >> >> > I.e., are you certain that the error below is due to load and not >> >> > a >> >> > site-related error? >> >> > >> >> > - Mike >> >> > >> >> > >> >> > ----- Original Message ----- >> >> >> I tried to have the tests more synthesized using Mike's catsall >> >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem >> >> >> to >> >> >> handle the transfer well when the originating files are local. >> >> >> But >> >> >> when it starts to use remote file objects, I get all these 3rd >> >> >> party >> >> >> transfer exceptions. my throttle for file transfers is 8 and for >> >> >> file >> >> >> operations is 10. >> >> >> >> >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler >> >> >> File >> >> >> transfer with resource remote->tmp >> >> >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler >> >> >> Exception in transfer >> >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: >> >> >> Exception in getFile >> >> >> at >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP >> >> >> FileResource.java:62) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java >> >> >> :401) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil >> >> >> eTransferHandler.java:269) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi >> >> >> ngDelegatedFileTransferHandler.java:59) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran >> >> >> sferHandler.java:486) >> >> >> at java.lang.Thread.run(Thread.java:619) >> >> >> Caused by: >> >> >> org.globus.cog.abstraction.impl.file.FileResourceException: >> >> >> Failed to retrieve file information >> >> >> about >> >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in >> >> >> fo >> >> >> at >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP >> >> >> FileResource.java:51) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. >> >> >> java:550) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java >> >> >> :384) >> >> >> ... 4 more >> >> >> Caused by: org.globus.ftp.exception.ServerException: Server >> >> >> refused >> >> >> performing the request. Custom message >> >> >> : Server refused MLST command (error code 1) [Nested exception >> >> >> message: Custom message: Unexpected reply: >> >> >> 500-Command failed : >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: >> >> >> 500-System error in stat: No such file or directory >> >> >> 500-A system call failed: No such file or directory >> >> >> 500 End.] [Nested exception is >> >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom >> >> >> message: Une >> >> >> xpected reply: 500-Command failed : >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: >> >> >> 500-System error in stat: No such file or directory >> >> >> 500-A system call failed: No such file or directory >> >> >> 500 End.] >> >> >> at >> >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java >> >> >> :101) >> >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) >> >> >> at >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. >> >> >> java:546) >> >> >> ... 5 more >> >> >> >> >> >> >> >> >> I may have been stressing the source gridftp server too much >> >> >> (pads) >> >> >> that it cannot handle a throttle of 8 . But at this >> >> >> configuration, >> >> >> I >> >> >> get low transfer performance. When doing direct transfers, I was >> >> >> able >> >> >> to get better transfer rates until i start coking out gpfs at >> >> >> 10k >> >> >> stageins. My throttle for this configurations was 40 for both >> >> >> file >> >> >> transfers and file operations. >> >> >> >> >> >> >> >> >> 2010/12/2 Allan Espinosa : >> >> >> > I have a bunch of 3rd party gridftp transfers. Swift reports >> >> >> > around >> >> >> > 10k jobs being in the vdl:stagein at a time. After a while i >> >> >> > get >> >> >> > a >> >> >> > couple of these errors. Does it look like i'm stressing the >> >> >> > gridftp >> >> >> > servers? my throttle.transfers=8 >> >> >> > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG >> >> >> > DelegatedFileTransferHandler >> >> >> > Starting service on gsiftp://gpn-hus >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG >> >> >> > DelegatedFileTransferHandler >> >> >> > File >> >> >> > transfer with resource local->r >> >> >> > 2010-12-02 02:22:06,247-0600 DEBUG >> >> >> > DelegatedFileTransferHandler >> >> >> > Exception in transfer >> >> >> > org.globus.cog.abstraction.impl.file.FileResourceException >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr >> >> >> > esource.java:51) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr >> >> >> > esource.java:34) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> >> >> > eTransferHandler.java:352) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> >> >> > ngDelegatedFileTransferHandler.java:46) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> >> >> > andler.java:489) >> >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >> >> > Caused by: org.globus.ftp.exception.ServerException: Server >> >> >> > refused >> >> >> > performing the request. Custom m >> >> >> > rror code 1) [Nested exception message: Custom message: >> >> >> > Unexpected >> >> >> > reply: 451 ocurred during retrie >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() >> >> >> > must >> >> >> > match >> >> >> > store() and setActive() - ret >> >> >> > rror code 2) >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() >> >> >> > must >> >> >> > match >> >> >> > store() and setActive() - ret >> >> >> > rror code 2) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) >> >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> >> >> > eTransferHandler.java:352) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> >> >> > ngDelegatedFileTransferHandler.java:46) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> >> >> > andler.java:489) >> >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >> >> > ] [Nested exception is >> >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom >> >> >> > message: Unexp >> >> >> > : 451 ocurred during retrieve() >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() >> >> >> > must >> >> >> > match >> >> >> > store() and setActive() - ret >> >> >> > rror code 2) >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() >> >> >> > must >> >> >> > match >> >> >> > store() and setActive() - ret >> >> >> > rror code 2) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) >> >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D >> >> >> > eTransferHandler.java:352) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin >> >> >> > ngDelegatedFileTransferHandler.java:46) >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi >> >> >> > andler.java:489) >> >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) >> >> >> > ] >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio >> >> >> > ? ? ? ?at >> >> >> > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) >> >> >> > ? ? ? ?... 1 more >> >> >> > >> >> >> >> -- >> >> Allan M. Espinosa >> >> PhD student, Computer Science >> >> University of Chicago >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> > >> > >> > >> >> >> >> -- >> Allan M. Espinosa >> PhD student, Computer Science >> University of Chicago > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Fri Dec 10 12:06:42 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 10 Dec 2010 12:06:42 -0600 (CST) Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) In-Reply-To: Message-ID: <2094268728.9174.1292004402804.JavaMail.root@zimbra.anl.gov> You should post logs and details of the failure to swift-devel for Mihael to diagnose. In the meantime, you should test between two more local machines - eg from bridled to communicado. Then to maybe more distant machines. You should check to make sure that you have not run /tmp out of space on the dest site. Perhaps you clobbered the dest node by driving its root fs out of space? I dont know how provider staging picks the dest directory: whether its hardwired to /tmp (which would be bad and need to get fixed) or if it honors the tag which would be great, in which case you should set that to $OSG_WN_TMP for OSG sites. At any rate, try this first under a more controlled environment where you can more closely observe both the client and server, and stress test that scenario first, much like I did on the localhost scenario. Then gradually move to OSG once you know how the provider staging mechanism behaves. - Mike ----- Original Message ----- > Hi Mike, > > I temporarily converted my absolute path references to relative one > with symlinks. But jobs started to fail at 800 files: > > 2010-12-10 11:26:31,116-0600 INFO vdl:execute Exception in cat: > Arguments: [RuptureVariations/100/3/100_3.txt.variation-s0002-h0003] > Host: Firefly_ff-grid.unl.edu > Directory: catsall-20101210-1126-g172ithb/jobs/k/cat-kxrvat2kTODO: > outs > ---- > > Caused by: Task failed: null > java.lang.IllegalStateException: Timer already cancelled. > at java.util.Timer.sched(Timer.java:354) > at java.util.Timer.schedule(Timer.java:170) > at > org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:156) > at > org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:150) > at > org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > > 2010-12-10 11:26:31,116-0600 DEBUG ConfigProperty Getting property > pgraph with host null > > > 2010/12/10 Michael Wilde : > > Hi Allan, > > > > I vaguely recall similar issues with prior tests of provider > > staging. Based on Mihael's recommendation Ive been using the "proxy" > > mode. I dont have my head around all the modes at the moment (I did > > when I first looked at it). > > > > At any rate, in my tests using proxy mode, just on localhost, I did > > not run into any full-pathname problems: I used simple_mapper and > > unqualified partial pathnames. > > > > My test is on the CI net at: > > /home/wilde/swift/lab/tests/test.local.ps.sh > > and pasted below. We should build a similar test to validate > > proxy-mode provider staging on remote sites with coasters. Whoever > > gets to it first. > > > > See if using the pattern below gets you past this full-pathname > > problem. > > > > - Mike > > > > bri$ cat ./test.local.ps.sh > > #! /bin/bash > > > > cat >tc < > > > localhost sh /bin/sh null null null > > localhost cat /bin/cat null null null > > > > END > > > > cat >sites.xml < > > > > > ? > > ? ? > ? ?jobmanager="local:local"/> > > ? ?8 > > ? ?1 > > ? ?1 > > ? ?.15 > > ? ?10000 > > ? ?proxy > > ? ?$PWD > > ? > > > > > > END > > > > cat >cf < > > > wrapperlog.always.transfer=true > > sitedir.keep=true > > execution.retries=0 > > lazy.errors=false > > status.mode=provider > > use.provider.staging=true > > provider.staging.pin.swiftfiles=false > > > > END > > > > cat >pstest.swift < > > > type file; > > > > app (file o) cat (file i) > > { > > ?cat @i stdout=@o; > > } > > > > file infile[] > suffix=".in">; > > file outfile[] > prefix="f.",suffix=".out">; > > > > foreach f, i in infile { > > ?outfile[i] = cat(f); > > } > > > > EOF > > > > swift -config cf -tc.file tc -sites.file sites.xml pstest.swift > > bri$ > > > > > > bri$ mkdir outdir > > bri$ ls > > indir/ outdir/ test.local.ps.sh > > bri$ ls indir > > f.0000.in f.0001.in f.0002.in f.0003.in f.0004.in > > bri$ ls outdir > > bri$ ./test.local.ps.sh > > Swift svn swift-r3758 cog-r2951 (cog modified locally) > > > > RunID: 20101210-1108-qsdi3mz6 > > Progress: > > Progress: Active:4 Finished successfully:1 > > Final status: Finished successfully:5 > > bri$ ls outdir > > f.0000.out f.0001.out f.0002.out f.0003.out f.0004.out > > bri$ > > > > > > ----- Original Message ----- > >> Hi Mike, > >> > >> I'm having problems getting provider staging to work. I seems to > >> pass > >> files as absolute references: > >> > >> _____________________________________________________________________________ > >> > >> command line > >> _____________________________________________________________________________ > >> > >> -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if > >> //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > >> -of -k -cdmfile -status provider -a > >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > >> > >> _____________________________________________________________________________ > >> > >> stdout > >> _____________________________________________________________________________ > >> > >> > >> _____________________________________________________________________________ > >> > >> stderr > >> _____________________________________________________________________________ > >> > >> /bin/cat: > >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: > >> No such file or directory > >> > >> But the remote site does not have /gpfs/pads . > >> > >> Should I be modifying my mappers to accomodate this? > >> > >> -Allan > >> > >> > >> 2010/12/10 Michael Wilde : > >> > Did you try provider staging, which might be easier to throttle > >> > given that the staging endpoints are more under Swift's control? > >> > > >> > - MIke > >> > > >> > ----- Original Message ----- > >> >> Hi Mike. > >> >> > >> >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, > >> >> 8000, > >> >> 30000 files. The throttles are the same for each run. Problems > >> >> started to occur at around 800 files . > >> >> > >> >> For staging in local files, problems started to occur at 30000 > >> >> files > >> >> where vdl:dostagein hits gpfs too much. > >> >> > >> >> -Allan > >> >> > >> >> > >> >> 2010/12/10 Michael Wilde : > >> >> > Allan, did you verify that each remote site you are talking to > >> >> > in > >> >> > this test is functional at low transaction rates using your > >> >> > current > >> >> > sites configuration? > >> >> > > >> >> > I.e., are you certain that the error below is due to load and > >> >> > not > >> >> > a > >> >> > site-related error? > >> >> > > >> >> > - Mike > >> >> > > >> >> > > >> >> > ----- Original Message ----- > >> >> >> I tried to have the tests more synthesized using Mike's > >> >> >> catsall > >> >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift > >> >> >> seem > >> >> >> to > >> >> >> handle the transfer well when the originating files are > >> >> >> local. > >> >> >> But > >> >> >> when it starts to use remote file objects, I get all these > >> >> >> 3rd > >> >> >> party > >> >> >> transfer exceptions. my throttle for file transfers is 8 and > >> >> >> for > >> >> >> file > >> >> >> operations is 10. > >> >> >> > >> >> >> 2010-12-09 18:58:16,700-0600 DEBUG > >> >> >> DelegatedFileTransferHandler > >> >> >> File > >> >> >> transfer with resource remote->tmp > >> >> >> 2010-12-09 18:58:16,734-0600 DEBUG > >> >> >> DelegatedFileTransferHandler > >> >> >> Exception in transfer > >> >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > >> >> >> Exception in getFile > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> >> >> FileResource.java:62) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> >> >> :401) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > >> >> >> eTransferHandler.java:269) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > >> >> >> ngDelegatedFileTransferHandler.java:59) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > >> >> >> sferHandler.java:486) > >> >> >> at java.lang.Thread.run(Thread.java:619) > >> >> >> Caused by: > >> >> >> org.globus.cog.abstraction.impl.file.FileResourceException: > >> >> >> Failed to retrieve file information > >> >> >> about > >> >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > >> >> >> fo > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> >> >> FileResource.java:51) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> >> >> java:550) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> >> >> :384) > >> >> >> ... 4 more > >> >> >> Caused by: org.globus.ftp.exception.ServerException: Server > >> >> >> refused > >> >> >> performing the request. Custom message > >> >> >> : Server refused MLST command (error code 1) [Nested > >> >> >> exception > >> >> >> message: Custom message: Unexpected reply: > >> >> >> 500-Command failed : > >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> >> >> 500-System error in stat: No such file or directory > >> >> >> 500-A system call failed: No such file or directory > >> >> >> 500 End.] [Nested exception is > >> >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> >> >> message: Une > >> >> >> xpected reply: 500-Command failed : > >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> >> >> 500-System error in stat: No such file or directory > >> >> >> 500-A system call failed: No such file or directory > >> >> >> 500 End.] > >> >> >> at > >> >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > >> >> >> :101) > >> >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> >> >> java:546) > >> >> >> ... 5 more > >> >> >> > >> >> >> > >> >> >> I may have been stressing the source gridftp server too much > >> >> >> (pads) > >> >> >> that it cannot handle a throttle of 8 . But at this > >> >> >> configuration, > >> >> >> I > >> >> >> get low transfer performance. When doing direct transfers, I > >> >> >> was > >> >> >> able > >> >> >> to get better transfer rates until i start coking out gpfs at > >> >> >> 10k > >> >> >> stageins. My throttle for this configurations was 40 for both > >> >> >> file > >> >> >> transfers and file operations. > >> >> >> > >> >> >> > >> >> >> 2010/12/2 Allan Espinosa : > >> >> >> > I have a bunch of 3rd party gridftp transfers. Swift > >> >> >> > reports > >> >> >> > around > >> >> >> > 10k jobs being in the vdl:stagein at a time. After a while > >> >> >> > i > >> >> >> > get > >> >> >> > a > >> >> >> > couple of these errors. Does it look like i'm stressing the > >> >> >> > gridftp > >> >> >> > servers? my throttle.transfers=8 > >> >> >> > > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > >> >> >> > DelegatedFileTransferHandler > >> >> >> > Starting service on gsiftp://gpn-hus > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > >> >> >> > DelegatedFileTransferHandler > >> >> >> > File > >> >> >> > transfer with resource local->r > >> >> >> > 2010-12-02 02:22:06,247-0600 DEBUG > >> >> >> > DelegatedFileTransferHandler > >> >> >> > Exception in transfer > >> >> >> > org.globus.cog.abstraction.impl.file.FileResourceException > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> >> >> > esource.java:51) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> >> >> > esource.java:34) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> >> > eTransferHandler.java:352) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> >> > andler.java:489) > >> >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> >> >> > Caused by: org.globus.ftp.exception.ServerException: Server > >> >> >> > refused > >> >> >> > performing the request. Custom m > >> >> >> > rror code 1) [Nested exception message: Custom message: > >> >> >> > Unexpected > >> >> >> > reply: 451 ocurred during retrie > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> >> > eTransferHandler.java:352) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> >> > andler.java:489) > >> >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> >> >> > ] [Nested exception is > >> >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: > >> >> >> > Custom > >> >> >> > message: Unexp > >> >> >> > : 451 ocurred during retrieve() > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> >> >> > ? ? ? ?at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> >> > eTransferHandler.java:352) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> >> > andler.java:489) > >> >> >> > ? ? ? ?at java.lang.Thread.run(Thread.java:619) > >> >> >> > ] > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> >> >> > ? ? ? ?at > >> >> >> > ? ? ? ?org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > >> >> >> > ? ? ? ?... 1 more > >> >> >> > > >> >> > >> >> -- > >> >> Allan M. Espinosa > >> >> PhD student, Computer Science > >> >> University of Chicago > >> > > >> > -- > >> > Michael Wilde > >> > Computation Institute, University of Chicago > >> > Mathematics and Computer Science Division > >> > Argonne National Laboratory > >> > > >> > > >> > > >> > >> > >> > >> -- > >> Allan M. Espinosa > >> PhD student, Computer Science > >> University of Chicago > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Dec 10 22:33:41 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 10 Dec 2010 20:33:41 -0800 Subject: [Swift-user] Re: 3rd party transfers In-Reply-To: References: <342749191.6884.1291993755279.JavaMail.root@zimbra.anl.gov> Message-ID: <1292042021.3760.5.camel@blabla2.none> I have seen GridFTP do that under high load. That is why throttles are in place for transfers. What I would do to confirm that the problem is with the server (which is my suspicion, since the complaint is about a server file) is to try to reproduce the problem with plain gridftp, but that may require writing some C code. Before that, however, I would make some traces of all the mlst commands issues and double checked if the parameters don't somehow get messed up. Mihael On Fri, 2010-12-10 at 10:13 -0600, Allan Espinosa wrote: > Hi Mike. > > Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, 8000, > 30000 files. The throttles are the same for each run. Problems > started to occur at around 800 files . > > For staging in local files, problems started to occur at 30000 files > where vdl:dostagein hits gpfs too much. > > -Allan > > > 2010/12/10 Michael Wilde : > > Allan, did you verify that each remote site you are talking to in this test is functional at low transaction rates using your current sites configuration? > > > > I.e., are you certain that the error below is due to load and not a site-related error? > > > > - Mike > > > > > > ----- Original Message ----- > >> I tried to have the tests more synthesized using Mike's catsall > >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem to > >> handle the transfer well when the originating files are local. But > >> when it starts to use remote file objects, I get all these 3rd party > >> transfer exceptions. my throttle for file transfers is 8 and for file > >> operations is 10. > >> > >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler File > >> transfer with resource remote->tmp > >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > >> Exception in transfer > >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > >> Exception in getFile > >> at > >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> FileResource.java:62) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> :401) > >> at > >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > >> eTransferHandler.java:269) > >> at > >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > >> ngDelegatedFileTransferHandler.java:59) > >> at > >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > >> sferHandler.java:486) > >> at java.lang.Thread.run(Thread.java:619) > >> Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > >> Failed to retrieve file information > >> about > >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > >> fo > >> at > >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> FileResource.java:51) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> java:550) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> :384) > >> ... 4 more > >> Caused by: org.globus.ftp.exception.ServerException: Server refused > >> performing the request. Custom message > >> : Server refused MLST command (error code 1) [Nested exception > >> message: Custom message: Unexpected reply: > >> 500-Command failed : > >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> 500-System error in stat: No such file or directory > >> 500-A system call failed: No such file or directory > >> 500 End.] [Nested exception is > >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> message: Une > >> xpected reply: 500-Command failed : > >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> 500-System error in stat: No such file or directory > >> 500-A system call failed: No such file or directory > >> 500 End.] > >> at > >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > >> :101) > >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > >> at > >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> java:546) > >> ... 5 more > >> > >> > >> I may have been stressing the source gridftp server too much (pads) > >> that it cannot handle a throttle of 8 . But at this configuration, I > >> get low transfer performance. When doing direct transfers, I was able > >> to get better transfer rates until i start coking out gpfs at 10k > >> stageins. My throttle for this configurations was 40 for both file > >> transfers and file operations. > >> > >> > >> 2010/12/2 Allan Espinosa : > >> > I have a bunch of 3rd party gridftp transfers. Swift reports around > >> > 10k jobs being in the vdl:stagein at a time. After a while i get a > >> > couple of these errors. Does it look like i'm stressing the gridftp > >> > servers? my throttle.transfers=8 > >> > > >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler > >> > Starting service on gsiftp://gpn-hus > >> > 2010-12-02 02:22:06,008-0600 DEBUG DelegatedFileTransferHandler File > >> > transfer with resource local->r > >> > 2010-12-02 02:22:06,247-0600 DEBUG DelegatedFileTransferHandler > >> > Exception in transfer > >> > org.globus.cog.abstraction.impl.file.FileResourceException > >> > at > >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> > esource.java:51) > >> > at > >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> > esource.java:34) > >> > at > >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> > eTransferHandler.java:352) > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> > ngDelegatedFileTransferHandler.java:46) > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> > andler.java:489) > >> > at java.lang.Thread.run(Thread.java:619) > >> > Caused by: org.globus.ftp.exception.ServerException: Server refused > >> > performing the request. Custom m > >> > rror code 1) [Nested exception message: Custom message: Unexpected > >> > reply: 451 ocurred during retrie > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > at > >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> > at > >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> > eTransferHandler.java:352) > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> > ngDelegatedFileTransferHandler.java:46) > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> > andler.java:489) > >> > at java.lang.Thread.run(Thread.java:619) > >> > ] [Nested exception is > >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> > message: Unexp > >> > : 451 ocurred during retrieve() > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > org.globus.ftp.exception.DataChannelException: setPassive() must > >> > match > >> > store() and setActive() - ret > >> > rror code 2) > >> > at > >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> > at > >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> > eTransferHandler.java:352) > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> > ngDelegatedFileTransferHandler.java:46) > >> > at > >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> > andler.java:489) > >> > at java.lang.Thread.run(Thread.java:619) > >> > ] > >> > at > >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> > at > >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> > at > >> > org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > >> > ... 1 more > >> > > From hategan at mcs.anl.gov Sun Dec 12 14:45:21 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 12 Dec 2010 12:45:21 -0800 Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) In-Reply-To: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> References: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> Message-ID: <1292186721.10265.1.camel@blabla2.none> Right. If you are using SFS, then paths should be absolute. Though I don't necessarily think that this is how things should be. Mihael On Fri, 2010-12-10 at 11:20 -0600, Michael Wilde wrote: > Hi Allan, > > I vaguely recall similar issues with prior tests of provider staging. Based on Mihael's recommendation Ive been using the "proxy" mode. I dont have my head around all the modes at the moment (I did when I first looked at it). > > At any rate, in my tests using proxy mode, just on localhost, I did not run into any full-pathname problems: I used simple_mapper and unqualified partial pathnames. > > My test is on the CI net at: /home/wilde/swift/lab/tests/test.local.ps.sh > and pasted below. We should build a similar test to validate proxy-mode provider staging on remote sites with coasters. Whoever gets to it first. > > See if using the pattern below gets you past this full-pathname problem. > > - Mike > > bri$ cat ./test.local.ps.sh > #! /bin/bash > > cat >tc < > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > > END > > cat >sites.xml < > > > > 8 > 1 > 1 > .15 > 10000 > proxy > $PWD > > > > END > > cat >cf < > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=0 > lazy.errors=false > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > > END > > cat >pstest.swift < > type file; > > app (file o) cat (file i) > { > cat @i stdout=@o; > } > > file infile[] ; > file outfile[] ; > > foreach f, i in infile { > outfile[i] = cat(f); > } > > EOF > > swift -config cf -tc.file tc -sites.file sites.xml pstest.swift > bri$ > > > bri$ mkdir outdir > bri$ ls > indir/ outdir/ test.local.ps.sh > bri$ ls indir > f.0000.in f.0001.in f.0002.in f.0003.in f.0004.in > bri$ ls outdir > bri$ ./test.local.ps.sh > Swift svn swift-r3758 cog-r2951 (cog modified locally) > > RunID: 20101210-1108-qsdi3mz6 > Progress: > Progress: Active:4 Finished successfully:1 > Final status: Finished successfully:5 > bri$ ls outdir > f.0000.out f.0001.out f.0002.out f.0003.out f.0004.out > bri$ > > > ----- Original Message ----- > > Hi Mike, > > > > I'm having problems getting provider staging to work. I seems to pass > > files as absolute references: > > > > _____________________________________________________________________________ > > > > command line > > _____________________________________________________________________________ > > > > -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if > > //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > > -of -k -cdmfile -status provider -a > > /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > > > > _____________________________________________________________________________ > > > > stdout > > _____________________________________________________________________________ > > > > > > _____________________________________________________________________________ > > > > stderr > > _____________________________________________________________________________ > > > > /bin/cat: > > /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: > > No such file or directory > > > > But the remote site does not have /gpfs/pads . > > > > Should I be modifying my mappers to accomodate this? > > > > -Allan > > > > > > 2010/12/10 Michael Wilde : > > > Did you try provider staging, which might be easier to throttle > > > given that the staging endpoints are more under Swift's control? > > > > > > - MIke > > > > > > ----- Original Message ----- > > >> Hi Mike. > > >> > > >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, > > >> 8000, > > >> 30000 files. The throttles are the same for each run. Problems > > >> started to occur at around 800 files . > > >> > > >> For staging in local files, problems started to occur at 30000 > > >> files > > >> where vdl:dostagein hits gpfs too much. > > >> > > >> -Allan > > >> > > >> > > >> 2010/12/10 Michael Wilde : > > >> > Allan, did you verify that each remote site you are talking to in > > >> > this test is functional at low transaction rates using your > > >> > current > > >> > sites configuration? > > >> > > > >> > I.e., are you certain that the error below is due to load and not > > >> > a > > >> > site-related error? > > >> > > > >> > - Mike > > >> > > > >> > > > >> > ----- Original Message ----- > > >> >> I tried to have the tests more synthesized using Mike's catsall > > >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem > > >> >> to > > >> >> handle the transfer well when the originating files are local. > > >> >> But > > >> >> when it starts to use remote file objects, I get all these 3rd > > >> >> party > > >> >> transfer exceptions. my throttle for file transfers is 8 and for > > >> >> file > > >> >> operations is 10. > > >> >> > > >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler > > >> >> File > > >> >> transfer with resource remote->tmp > > >> >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > > >> >> Exception in transfer > > >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > > >> >> Exception in getFile > > >> >> at > > >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > > >> >> FileResource.java:62) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > > >> >> :401) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > > >> >> eTransferHandler.java:269) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > > >> >> ngDelegatedFileTransferHandler.java:59) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > > >> >> sferHandler.java:486) > > >> >> at java.lang.Thread.run(Thread.java:619) > > >> >> Caused by: > > >> >> org.globus.cog.abstraction.impl.file.FileResourceException: > > >> >> Failed to retrieve file information > > >> >> about > > >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > > >> >> fo > > >> >> at > > >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > > >> >> FileResource.java:51) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > > >> >> java:550) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > > >> >> :384) > > >> >> ... 4 more > > >> >> Caused by: org.globus.ftp.exception.ServerException: Server > > >> >> refused > > >> >> performing the request. Custom message > > >> >> : Server refused MLST command (error code 1) [Nested exception > > >> >> message: Custom message: Unexpected reply: > > >> >> 500-Command failed : > > >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > > >> >> 500-System error in stat: No such file or directory > > >> >> 500-A system call failed: No such file or directory > > >> >> 500 End.] [Nested exception is > > >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > > >> >> message: Une > > >> >> xpected reply: 500-Command failed : > > >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > > >> >> 500-System error in stat: No such file or directory > > >> >> 500-A system call failed: No such file or directory > > >> >> 500 End.] > > >> >> at > > >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > > >> >> :101) > > >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > > >> >> at > > >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > > >> >> java:546) > > >> >> ... 5 more > > >> >> > > >> >> > > >> >> I may have been stressing the source gridftp server too much > > >> >> (pads) > > >> >> that it cannot handle a throttle of 8 . But at this > > >> >> configuration, > > >> >> I > > >> >> get low transfer performance. When doing direct transfers, I was > > >> >> able > > >> >> to get better transfer rates until i start coking out gpfs at > > >> >> 10k > > >> >> stageins. My throttle for this configurations was 40 for both > > >> >> file > > >> >> transfers and file operations. > > >> >> > > >> >> > > >> >> 2010/12/2 Allan Espinosa : > > >> >> > I have a bunch of 3rd party gridftp transfers. Swift reports > > >> >> > around > > >> >> > 10k jobs being in the vdl:stagein at a time. After a while i > > >> >> > get > > >> >> > a > > >> >> > couple of these errors. Does it look like i'm stressing the > > >> >> > gridftp > > >> >> > servers? my throttle.transfers=8 > > >> >> > > > >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > > >> >> > DelegatedFileTransferHandler > > >> >> > Starting service on gsiftp://gpn-hus > > >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > > >> >> > DelegatedFileTransferHandler > > >> >> > File > > >> >> > transfer with resource local->r > > >> >> > 2010-12-02 02:22:06,247-0600 DEBUG > > >> >> > DelegatedFileTransferHandler > > >> >> > Exception in transfer > > >> >> > org.globus.cog.abstraction.impl.file.FileResourceException > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > > >> >> > esource.java:51) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > > >> >> > esource.java:34) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > >> >> > eTransferHandler.java:352) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > >> >> > ngDelegatedFileTransferHandler.java:46) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > >> >> > andler.java:489) > > >> >> > at java.lang.Thread.run(Thread.java:619) > > >> >> > Caused by: org.globus.ftp.exception.ServerException: Server > > >> >> > refused > > >> >> > performing the request. Custom m > > >> >> > rror code 1) [Nested exception message: Custom message: > > >> >> > Unexpected > > >> >> > reply: 451 ocurred during retrie > > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> > must > > >> >> > match > > >> >> > store() and setActive() - ret > > >> >> > rror code 2) > > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> > must > > >> >> > match > > >> >> > store() and setActive() - ret > > >> >> > rror code 2) > > >> >> > at > > >> >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > > >> >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > >> >> > eTransferHandler.java:352) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > >> >> > ngDelegatedFileTransferHandler.java:46) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > >> >> > andler.java:489) > > >> >> > at java.lang.Thread.run(Thread.java:619) > > >> >> > ] [Nested exception is > > >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > > >> >> > message: Unexp > > >> >> > : 451 ocurred during retrieve() > > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> > must > > >> >> > match > > >> >> > store() and setActive() - ret > > >> >> > rror code 2) > > >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> > must > > >> >> > match > > >> >> > store() and setActive() - ret > > >> >> > rror code 2) > > >> >> > at > > >> >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > > >> >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > >> >> > eTransferHandler.java:352) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > >> >> > ngDelegatedFileTransferHandler.java:46) > > >> >> > at > > >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > >> >> > andler.java:489) > > >> >> > at java.lang.Thread.run(Thread.java:619) > > >> >> > ] > > >> >> > at > > >> >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > > >> >> > at > > >> >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > > >> >> > at > > >> >> > org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > > >> >> > ... 1 more > > >> >> > > > >> > > >> -- > > >> Allan M. Espinosa > > >> PhD student, Computer Science > > >> University of Chicago > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > -- > > Allan M. Espinosa > > PhD student, Computer Science > > University of Chicago > From hategan at mcs.anl.gov Sun Dec 12 14:50:23 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 12 Dec 2010 12:50:23 -0800 Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) In-Reply-To: References: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> Message-ID: <1292187023.10265.2.camel@blabla2.none> Ok. This is not related to coaster staging in particular, and I think I see what the problem is, so I'm hoping for a fix soon. Mihael On Fri, 2010-12-10 at 11:29 -0600, Allan Espinosa wrote: > Hi Mike, > > I temporarily converted my absolute path references to relative one > with symlinks. But jobs started to fail at 800 files: > > 2010-12-10 11:26:31,116-0600 INFO vdl:execute Exception in cat: > Arguments: [RuptureVariations/100/3/100_3.txt.variation-s0002-h0003] > Host: Firefly_ff-grid.unl.edu > Directory: catsall-20101210-1126-g172ithb/jobs/k/cat-kxrvat2kTODO: outs > ---- > > Caused by: Task failed: null > java.lang.IllegalStateException: Timer already cancelled. > at java.util.Timer.sched(Timer.java:354) > at java.util.Timer.schedule(Timer.java:170) > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:156) > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:150) > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > > 2010-12-10 11:26:31,116-0600 DEBUG ConfigProperty Getting property > pgraph with host null > > > 2010/12/10 Michael Wilde : > > Hi Allan, > > > > I vaguely recall similar issues with prior tests of provider staging. Based on Mihael's recommendation Ive been using the "proxy" mode. I dont have my head around all the modes at the moment (I did when I first looked at it). > > > > At any rate, in my tests using proxy mode, just on localhost, I did not run into any full-pathname problems: I used simple_mapper and unqualified partial pathnames. > > > > My test is on the CI net at: /home/wilde/swift/lab/tests/test.local.ps.sh > > and pasted below. We should build a similar test to validate proxy-mode provider staging on remote sites with coasters. Whoever gets to it first. > > > > See if using the pattern below gets you past this full-pathname problem. > > > > - Mike > > > > bri$ cat ./test.local.ps.sh > > #! /bin/bash > > > > cat >tc < > > > localhost sh /bin/sh null null null > > localhost cat /bin/cat null null null > > > > END > > > > cat >sites.xml < > > > > > > > > > 8 > > 1 > > 1 > > .15 > > 10000 > > proxy > > $PWD > > > > > > > > END > > > > cat >cf < > > > wrapperlog.always.transfer=true > > sitedir.keep=true > > execution.retries=0 > > lazy.errors=false > > status.mode=provider > > use.provider.staging=true > > provider.staging.pin.swiftfiles=false > > > > END > > > > cat >pstest.swift < > > > type file; > > > > app (file o) cat (file i) > > { > > cat @i stdout=@o; > > } > > > > file infile[] ; > > file outfile[] ; > > > > foreach f, i in infile { > > outfile[i] = cat(f); > > } > > > > EOF > > > > swift -config cf -tc.file tc -sites.file sites.xml pstest.swift > > bri$ > > > > > > bri$ mkdir outdir > > bri$ ls > > indir/ outdir/ test.local.ps.sh > > bri$ ls indir > > f.0000.in f.0001.in f.0002.in f.0003.in f.0004.in > > bri$ ls outdir > > bri$ ./test.local.ps.sh > > Swift svn swift-r3758 cog-r2951 (cog modified locally) > > > > RunID: 20101210-1108-qsdi3mz6 > > Progress: > > Progress: Active:4 Finished successfully:1 > > Final status: Finished successfully:5 > > bri$ ls outdir > > f.0000.out f.0001.out f.0002.out f.0003.out f.0004.out > > bri$ > > > > > > ----- Original Message ----- > >> Hi Mike, > >> > >> I'm having problems getting provider staging to work. I seems to pass > >> files as absolute references: > >> > >> _____________________________________________________________________________ > >> > >> command line > >> _____________________________________________________________________________ > >> > >> -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if > >> //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > >> -of -k -cdmfile -status provider -a > >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > >> > >> _____________________________________________________________________________ > >> > >> stdout > >> _____________________________________________________________________________ > >> > >> > >> _____________________________________________________________________________ > >> > >> stderr > >> _____________________________________________________________________________ > >> > >> /bin/cat: > >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: > >> No such file or directory > >> > >> But the remote site does not have /gpfs/pads . > >> > >> Should I be modifying my mappers to accomodate this? > >> > >> -Allan > >> > >> > >> 2010/12/10 Michael Wilde : > >> > Did you try provider staging, which might be easier to throttle > >> > given that the staging endpoints are more under Swift's control? > >> > > >> > - MIke > >> > > >> > ----- Original Message ----- > >> >> Hi Mike. > >> >> > >> >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, > >> >> 8000, > >> >> 30000 files. The throttles are the same for each run. Problems > >> >> started to occur at around 800 files . > >> >> > >> >> For staging in local files, problems started to occur at 30000 > >> >> files > >> >> where vdl:dostagein hits gpfs too much. > >> >> > >> >> -Allan > >> >> > >> >> > >> >> 2010/12/10 Michael Wilde : > >> >> > Allan, did you verify that each remote site you are talking to in > >> >> > this test is functional at low transaction rates using your > >> >> > current > >> >> > sites configuration? > >> >> > > >> >> > I.e., are you certain that the error below is due to load and not > >> >> > a > >> >> > site-related error? > >> >> > > >> >> > - Mike > >> >> > > >> >> > > >> >> > ----- Original Message ----- > >> >> >> I tried to have the tests more synthesized using Mike's catsall > >> >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem > >> >> >> to > >> >> >> handle the transfer well when the originating files are local. > >> >> >> But > >> >> >> when it starts to use remote file objects, I get all these 3rd > >> >> >> party > >> >> >> transfer exceptions. my throttle for file transfers is 8 and for > >> >> >> file > >> >> >> operations is 10. > >> >> >> > >> >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler > >> >> >> File > >> >> >> transfer with resource remote->tmp > >> >> >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > >> >> >> Exception in transfer > >> >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > >> >> >> Exception in getFile > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> >> >> FileResource.java:62) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> >> >> :401) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > >> >> >> eTransferHandler.java:269) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > >> >> >> ngDelegatedFileTransferHandler.java:59) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > >> >> >> sferHandler.java:486) > >> >> >> at java.lang.Thread.run(Thread.java:619) > >> >> >> Caused by: > >> >> >> org.globus.cog.abstraction.impl.file.FileResourceException: > >> >> >> Failed to retrieve file information > >> >> >> about > >> >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > >> >> >> fo > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > >> >> >> FileResource.java:51) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> >> >> java:550) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > >> >> >> :384) > >> >> >> ... 4 more > >> >> >> Caused by: org.globus.ftp.exception.ServerException: Server > >> >> >> refused > >> >> >> performing the request. Custom message > >> >> >> : Server refused MLST command (error code 1) [Nested exception > >> >> >> message: Custom message: Unexpected reply: > >> >> >> 500-Command failed : > >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> >> >> 500-System error in stat: No such file or directory > >> >> >> 500-A system call failed: No such file or directory > >> >> >> 500 End.] [Nested exception is > >> >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> >> >> message: Une > >> >> >> xpected reply: 500-Command failed : > >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > >> >> >> 500-System error in stat: No such file or directory > >> >> >> 500-A system call failed: No such file or directory > >> >> >> 500 End.] > >> >> >> at > >> >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > >> >> >> :101) > >> >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > >> >> >> at > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > >> >> >> java:546) > >> >> >> ... 5 more > >> >> >> > >> >> >> > >> >> >> I may have been stressing the source gridftp server too much > >> >> >> (pads) > >> >> >> that it cannot handle a throttle of 8 . But at this > >> >> >> configuration, > >> >> >> I > >> >> >> get low transfer performance. When doing direct transfers, I was > >> >> >> able > >> >> >> to get better transfer rates until i start coking out gpfs at > >> >> >> 10k > >> >> >> stageins. My throttle for this configurations was 40 for both > >> >> >> file > >> >> >> transfers and file operations. > >> >> >> > >> >> >> > >> >> >> 2010/12/2 Allan Espinosa : > >> >> >> > I have a bunch of 3rd party gridftp transfers. Swift reports > >> >> >> > around > >> >> >> > 10k jobs being in the vdl:stagein at a time. After a while i > >> >> >> > get > >> >> >> > a > >> >> >> > couple of these errors. Does it look like i'm stressing the > >> >> >> > gridftp > >> >> >> > servers? my throttle.transfers=8 > >> >> >> > > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > >> >> >> > DelegatedFileTransferHandler > >> >> >> > Starting service on gsiftp://gpn-hus > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > >> >> >> > DelegatedFileTransferHandler > >> >> >> > File > >> >> >> > transfer with resource local->r > >> >> >> > 2010-12-02 02:22:06,247-0600 DEBUG > >> >> >> > DelegatedFileTransferHandler > >> >> >> > Exception in transfer > >> >> >> > org.globus.cog.abstraction.impl.file.FileResourceException > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> >> >> > esource.java:51) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > >> >> >> > esource.java:34) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> >> > eTransferHandler.java:352) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> >> > andler.java:489) > >> >> >> > at java.lang.Thread.run(Thread.java:619) > >> >> >> > Caused by: org.globus.ftp.exception.ServerException: Server > >> >> >> > refused > >> >> >> > performing the request. Custom m > >> >> >> > rror code 1) [Nested exception message: Custom message: > >> >> >> > Unexpected > >> >> >> > reply: 451 ocurred during retrie > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > at > >> >> >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> >> >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> >> > eTransferHandler.java:352) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> >> > andler.java:489) > >> >> >> > at java.lang.Thread.run(Thread.java:619) > >> >> >> > ] [Nested exception is > >> >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > >> >> >> > message: Unexp > >> >> >> > : 451 ocurred during retrieve() > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > >> >> >> > must > >> >> >> > match > >> >> >> > store() and setActive() - ret > >> >> >> > rror code 2) > >> >> >> > at > >> >> >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > >> >> >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > >> >> >> > eTransferHandler.java:352) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > >> >> >> > ngDelegatedFileTransferHandler.java:46) > >> >> >> > at > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > >> >> >> > andler.java:489) > >> >> >> > at java.lang.Thread.run(Thread.java:619) > >> >> >> > ] > >> >> >> > at > >> >> >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> >> >> > at > >> >> >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > >> >> >> > at > >> >> >> > org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > >> >> >> > ... 1 more > >> >> >> > > >> >> > >> >> -- > >> >> Allan M. Espinosa > >> >> PhD student, Computer Science > >> >> University of Chicago > >> > > >> > -- > >> > Michael Wilde > >> > Computation Institute, University of Chicago > >> > Mathematics and Computer Science Division > >> > Argonne National Laboratory > >> > > >> > > >> > > >> > >> > >> > >> -- > >> Allan M. Espinosa > >> PhD student, Computer Science > >> University of Chicago > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > From hategan at mcs.anl.gov Sun Dec 12 15:05:46 2010 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 12 Dec 2010 13:05:46 -0800 Subject: provider staging to remote sites (was [Swift-user] Re: 3rd party transfers) In-Reply-To: <1292187023.10265.2.camel@blabla2.none> References: <435933201.8868.1292001638167.JavaMail.root@zimbra.anl.gov> <1292187023.10265.2.camel@blabla2.none> Message-ID: <1292187946.10265.3.camel@blabla2.none> Ok. I think this is fixed in cog trunk r2955. Mihael On Sun, 2010-12-12 at 12:50 -0800, Mihael Hategan wrote: > Ok. This is not related to coaster staging in particular, and I think I > see what the problem is, so I'm hoping for a fix soon. > > Mihael > > On Fri, 2010-12-10 at 11:29 -0600, Allan Espinosa wrote: > > Hi Mike, > > > > I temporarily converted my absolute path references to relative one > > with symlinks. But jobs started to fail at 800 files: > > > > 2010-12-10 11:26:31,116-0600 INFO vdl:execute Exception in cat: > > Arguments: [RuptureVariations/100/3/100_3.txt.variation-s0002-h0003] > > Host: Firefly_ff-grid.unl.edu > > Directory: catsall-20101210-1126-g172ithb/jobs/k/cat-kxrvat2kTODO: outs > > ---- > > > > Caused by: Task failed: null > > java.lang.IllegalStateException: Timer already cancelled. > > at java.util.Timer.sched(Timer.java:354) > > at java.util.Timer.schedule(Timer.java:170) > > at org.globus.cog.karajan.workflow.service.commands.Command.setupReplyTimeoutChecker(Command.java:156) > > at org.globus.cog.karajan.workflow.service.commands.Command.dataSent(Command.java:150) > > at org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:253) > > > > > > 2010-12-10 11:26:31,116-0600 DEBUG ConfigProperty Getting property > > pgraph with host null > > > > > > 2010/12/10 Michael Wilde : > > > Hi Allan, > > > > > > I vaguely recall similar issues with prior tests of provider staging. Based on Mihael's recommendation Ive been using the "proxy" mode. I dont have my head around all the modes at the moment (I did when I first looked at it). > > > > > > At any rate, in my tests using proxy mode, just on localhost, I did not run into any full-pathname problems: I used simple_mapper and unqualified partial pathnames. > > > > > > My test is on the CI net at: /home/wilde/swift/lab/tests/test.local.ps.sh > > > and pasted below. We should build a similar test to validate proxy-mode provider staging on remote sites with coasters. Whoever gets to it first. > > > > > > See if using the pattern below gets you past this full-pathname problem. > > > > > > - Mike > > > > > > bri$ cat ./test.local.ps.sh > > > #! /bin/bash > > > > > > cat >tc < > > > > > localhost sh /bin/sh null null null > > > localhost cat /bin/cat null null null > > > > > > END > > > > > > cat >sites.xml < > > > > > > > > > > > > > > 8 > > > 1 > > > 1 > > > .15 > > > 10000 > > > proxy > > > $PWD > > > > > > > > > > > > END > > > > > > cat >cf < > > > > > wrapperlog.always.transfer=true > > > sitedir.keep=true > > > execution.retries=0 > > > lazy.errors=false > > > status.mode=provider > > > use.provider.staging=true > > > provider.staging.pin.swiftfiles=false > > > > > > END > > > > > > cat >pstest.swift < > > > > > type file; > > > > > > app (file o) cat (file i) > > > { > > > cat @i stdout=@o; > > > } > > > > > > file infile[] ; > > > file outfile[] ; > > > > > > foreach f, i in infile { > > > outfile[i] = cat(f); > > > } > > > > > > EOF > > > > > > swift -config cf -tc.file tc -sites.file sites.xml pstest.swift > > > bri$ > > > > > > > > > bri$ mkdir outdir > > > bri$ ls > > > indir/ outdir/ test.local.ps.sh > > > bri$ ls indir > > > f.0000.in f.0001.in f.0002.in f.0003.in f.0004.in > > > bri$ ls outdir > > > bri$ ./test.local.ps.sh > > > Swift svn swift-r3758 cog-r2951 (cog modified locally) > > > > > > RunID: 20101210-1108-qsdi3mz6 > > > Progress: > > > Progress: Active:4 Finished successfully:1 > > > Final status: Finished successfully:5 > > > bri$ ls outdir > > > f.0000.out f.0001.out f.0002.out f.0003.out f.0004.out > > > bri$ > > > > > > > > > ----- Original Message ----- > > >> Hi Mike, > > >> > > >> I'm having problems getting provider staging to work. I seems to pass > > >> files as absolute references: > > >> > > >> _____________________________________________________________________________ > > >> > > >> command line > > >> _____________________________________________________________________________ > > >> > > >> -e /bin/cat -out stdout.txt -err stderr.txt -i -d -if > > >> //gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > > >> -of -k -cdmfile -status provider -a > > >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000 > > >> > > >> _____________________________________________________________________________ > > >> > > >> stdout > > >> _____________________________________________________________________________ > > >> > > >> > > >> _____________________________________________________________________________ > > >> > > >> stderr > > >> _____________________________________________________________________________ > > >> > > >> /bin/cat: > > >> /gpfs/pads/swift/aespinosa/science/cybershake/RuptureVariations/100/0/100_0.txt.variation-s0000-h0000: > > >> No such file or directory > > >> > > >> But the remote site does not have /gpfs/pads . > > >> > > >> Should I be modifying my mappers to accomodate this? > > >> > > >> -Allan > > >> > > >> > > >> 2010/12/10 Michael Wilde : > > >> > Did you try provider staging, which might be easier to throttle > > >> > given that the staging endpoints are more under Swift's control? > > >> > > > >> > - MIke > > >> > > > >> > ----- Original Message ----- > > >> >> Hi Mike. > > >> >> > > >> >> Yes. I had the workflow stagein 1, 10, 40 , 80, 400, 800, 2000, > > >> >> 8000, > > >> >> 30000 files. The throttles are the same for each run. Problems > > >> >> started to occur at around 800 files . > > >> >> > > >> >> For staging in local files, problems started to occur at 30000 > > >> >> files > > >> >> where vdl:dostagein hits gpfs too much. > > >> >> > > >> >> -Allan > > >> >> > > >> >> > > >> >> 2010/12/10 Michael Wilde : > > >> >> > Allan, did you verify that each remote site you are talking to in > > >> >> > this test is functional at low transaction rates using your > > >> >> > current > > >> >> > sites configuration? > > >> >> > > > >> >> > I.e., are you certain that the error below is due to load and not > > >> >> > a > > >> >> > site-related error? > > >> >> > > > >> >> > - Mike > > >> >> > > > >> >> > > > >> >> > ----- Original Message ----- > > >> >> >> I tried to have the tests more synthesized using Mike's catsall > > >> >> >> workflow staging in ~3 MB data files to 5 OSG sites. Swift seem > > >> >> >> to > > >> >> >> handle the transfer well when the originating files are local. > > >> >> >> But > > >> >> >> when it starts to use remote file objects, I get all these 3rd > > >> >> >> party > > >> >> >> transfer exceptions. my throttle for file transfers is 8 and for > > >> >> >> file > > >> >> >> operations is 10. > > >> >> >> > > >> >> >> 2010-12-09 18:58:16,700-0600 DEBUG DelegatedFileTransferHandler > > >> >> >> File > > >> >> >> transfer with resource remote->tmp > > >> >> >> 2010-12-09 18:58:16,734-0600 DEBUG DelegatedFileTransferHandler > > >> >> >> Exception in transfer > > >> >> >> org.globus.cog.abstraction.impl.file.IrrecoverableResourceException: > > >> >> >> Exception in getFile > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > > >> >> >> FileResource.java:62) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > > >> >> >> :401) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doSource(DelegatedFil > > >> >> >> eTransferHandler.java:269) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doSource(Cachi > > >> >> >> ngDelegatedFileTransferHandler.java:59) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFileTran > > >> >> >> sferHandler.java:486) > > >> >> >> at java.lang.Thread.run(Thread.java:619) > > >> >> >> Caused by: > > >> >> >> org.globus.cog.abstraction.impl.file.FileResourceException: > > >> >> >> Failed to retrieve file information > > >> >> >> about > > >> >> >> /projsmall/osg/data/engage/scec/swift_scratch/catsall-20101209-1839-pnazhid6/info/p/cat-p2em5s2k-in > > >> >> >> fo > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(AbstractFTP > > >> >> >> FileResource.java:51) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > > >> >> >> java:550) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getFile(FileResourceImpl.java > > >> >> >> :384) > > >> >> >> ... 4 more > > >> >> >> Caused by: org.globus.ftp.exception.ServerException: Server > > >> >> >> refused > > >> >> >> performing the request. Custom message > > >> >> >> : Server refused MLST command (error code 1) [Nested exception > > >> >> >> message: Custom message: Unexpected reply: > > >> >> >> 500-Command failed : > > >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > > >> >> >> 500-System error in stat: No such file or directory > > >> >> >> 500-A system call failed: No such file or directory > > >> >> >> 500 End.] [Nested exception is > > >> >> >> org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > > >> >> >> message: Une > > >> >> >> xpected reply: 500-Command failed : > > >> >> >> globus_gridftp_server_file.c:globus_l_gfs_file_stat:389: > > >> >> >> 500-System error in stat: No such file or directory > > >> >> >> 500-A system call failed: No such file or directory > > >> >> >> 500 End.] > > >> >> >> at > > >> >> >> org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerException.java > > >> >> >> :101) > > >> >> >> at org.globus.ftp.FTPClient.mlst(FTPClient.java:643) > > >> >> >> at > > >> >> >> org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.getGridFile(FileResourceImpl. > > >> >> >> java:546) > > >> >> >> ... 5 more > > >> >> >> > > >> >> >> > > >> >> >> I may have been stressing the source gridftp server too much > > >> >> >> (pads) > > >> >> >> that it cannot handle a throttle of 8 . But at this > > >> >> >> configuration, > > >> >> >> I > > >> >> >> get low transfer performance. When doing direct transfers, I was > > >> >> >> able > > >> >> >> to get better transfer rates until i start coking out gpfs at > > >> >> >> 10k > > >> >> >> stageins. My throttle for this configurations was 40 for both > > >> >> >> file > > >> >> >> transfers and file operations. > > >> >> >> > > >> >> >> > > >> >> >> 2010/12/2 Allan Espinosa : > > >> >> >> > I have a bunch of 3rd party gridftp transfers. Swift reports > > >> >> >> > around > > >> >> >> > 10k jobs being in the vdl:stagein at a time. After a while i > > >> >> >> > get > > >> >> >> > a > > >> >> >> > couple of these errors. Does it look like i'm stressing the > > >> >> >> > gridftp > > >> >> >> > servers? my throttle.transfers=8 > > >> >> >> > > > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > > >> >> >> > DelegatedFileTransferHandler > > >> >> >> > Starting service on gsiftp://gpn-hus > > >> >> >> > 2010-12-02 02:22:06,008-0600 DEBUG > > >> >> >> > DelegatedFileTransferHandler > > >> >> >> > File > > >> >> >> > transfer with resource local->r > > >> >> >> > 2010-12-02 02:22:06,247-0600 DEBUG > > >> >> >> > DelegatedFileTransferHandler > > >> >> >> > Exception in transfer > > >> >> >> > org.globus.cog.abstraction.impl.file.FileResourceException > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > > >> >> >> > esource.java:51) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.file.ftp.AbstractFTPFileResource.translateException(Abstr > > >> >> >> > esource.java:34) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > >> >> >> > eTransferHandler.java:352) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > >> >> >> > ngDelegatedFileTransferHandler.java:46) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > >> >> >> > andler.java:489) > > >> >> >> > at java.lang.Thread.run(Thread.java:619) > > >> >> >> > Caused by: org.globus.ftp.exception.ServerException: Server > > >> >> >> > refused > > >> >> >> > performing the request. Custom m > > >> >> >> > rror code 1) [Nested exception message: Custom message: > > >> >> >> > Unexpected > > >> >> >> > reply: 451 ocurred during retrie > > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> >> > must > > >> >> >> > match > > >> >> >> > store() and setActive() - ret > > >> >> >> > rror code 2) > > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> >> > must > > >> >> >> > match > > >> >> >> > store() and setActive() - ret > > >> >> >> > rror code 2) > > >> >> >> > at > > >> >> >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > > >> >> >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > >> >> >> > eTransferHandler.java:352) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > >> >> >> > ngDelegatedFileTransferHandler.java:46) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > >> >> >> > andler.java:489) > > >> >> >> > at java.lang.Thread.run(Thread.java:619) > > >> >> >> > ] [Nested exception is > > >> >> >> > org.globus.ftp.exception.UnexpectedReplyCodeException: Custom > > >> >> >> > message: Unexp > > >> >> >> > : 451 ocurred during retrieve() > > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> >> > must > > >> >> >> > match > > >> >> >> > store() and setActive() - ret > > >> >> >> > rror code 2) > > >> >> >> > org.globus.ftp.exception.DataChannelException: setPassive() > > >> >> >> > must > > >> >> >> > match > > >> >> >> > store() and setActive() - ret > > >> >> >> > rror code 2) > > >> >> >> > at > > >> >> >> > org.globus.ftp.extended.GridFTPServerFacade.retrieve(GridFTPServerFacade.java:469) > > >> >> >> > at org.globus.ftp.FTPClient.put(FTPClient.java:1294) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.file.gridftp.old.FileResourceImpl.putFile(FileResourceImp > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.doDestination(D > > >> >> >> > eTransferHandler.java:352) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.CachingDelegatedFileTransferHandler.doDestin > > >> >> >> > ngDelegatedFileTransferHandler.java:46) > > >> >> >> > at > > >> >> >> > org.globus.cog.abstraction.impl.fileTransfer.DelegatedFileTransferHandler.run(DelegatedFi > > >> >> >> > andler.java:489) > > >> >> >> > at java.lang.Thread.run(Thread.java:619) > > >> >> >> > ] > > >> >> >> > at > > >> >> >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > > >> >> >> > at > > >> >> >> > org.globus.ftp.exception.ServerException.embedUnexpectedReplyCodeException(ServerExceptio > > >> >> >> > at > > >> >> >> > org.globus.ftp.vanilla.TransferMonitor.run(TransferMonitor.java:195) > > >> >> >> > ... 1 more > > >> >> >> > > > >> >> > > >> >> -- > > >> >> Allan M. Espinosa > > >> >> PhD student, Computer Science > > >> >> University of Chicago > > >> > > > >> > -- > > >> > Michael Wilde > > >> > Computation Institute, University of Chicago > > >> > Mathematics and Computer Science Division > > >> > Argonne National Laboratory > > >> > > > >> > > > >> > > > >> > > >> > > >> > > >> -- > > >> Allan M. Espinosa > > >> PhD student, Computer Science > > >> University of Chicago > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Mon Dec 13 12:01:49 2010 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 13 Dec 2010 12:01:49 -0600 (CST) Subject: [Swift-user] Using multicore servers as Swift pools In-Reply-To: <1820609121.18056.1292262639864.JavaMail.root@zimbra.anl.gov> Message-ID: <100795199.18204.1292263309508.JavaMail.root@zimbra.anl.gov> Luiz, An example of the config files you will need in order to use the 10 8-core 64-but MCS compute servers as Swift pools is on the CI net under /home/wilde/swift/lab/{coasters.xml,auth.defaults.sample} The servers (10 x 64 bit and 3 x 32 bit) are listed at: http://wiki.mcs.anl.gov/IT/index.php/General_MCS_Questions#computeservers Since these machines are behind a firewall, you can use the ~/.ssh/config example below (adapted as needed) to make them accessible as if they permitted direct login. You login to login.mcs.anl.gov using your local ssh key, and then ssh ports are forwarded to each of the target machines that you want on the MCS network. The technique is explained at: http://articles.techrepublic.com.com/5100-10878_11-6155832.html This is one of the configurations we should test with new test scripts for Swift 0.91 Ive pasted the files below as well. - Mike === auth.defaults.sample === xlogin1.pads.ci.uchicago.edu.type=password xlogin1.pads.ci.uchicago.edu.username=wilde login.pads.ci.uchicago.edu.type=key login.pads.ci.uchicago.edu.username=wilde login.pads.ci.uchicago.edu.key=/home/wilde/.ssh/swift_rsa login.pads.ci.uchicago.edu.passphrase=mypassphrasegoeshere login.mcs.anl.gov.type=key login.mcs.anl.gov.username=wilde login.mcs.anl.gov.key=/home/wilde/.ssh/swift_rsa login.mcs.anl.gov.passphrase=mypassphrasegoeshere === ~/.ssh/config set up for forwarding one compute server, "crush" Host * ServerAliveInterval 15 #ControlMaster auto #ControlPath ~/.ssh/ssh-connections/%r@%h:%p # COMPUTEHOSTS='crush thwomp stomp crank steamroller grind churn trounce thrash vanquish' Host mcs login.mcs.anl.gov Hostname login.mcs.anl.gov ForwardAgent yes ForwardX11 no LocalForward 19001 140.221.8.62:22 Host crush thwomp stomp crank steamroller grind churn trounce thrash vanquish ForwardAgent yes ForwardX11 no Hostname localhost NoHostAuthenticationForLocalhost yes Host crush Port 19001 === coasters.xml === 8 3500 1 1 1 .07 10000 /home/wilde/swiftwork/crush 8 3500 1 1 1 .31 10000 /home/wilde/swiftwork/thwomp ...etc for the rest of the 10 compute servers... - Mike From wozniak at mcs.anl.gov Mon Dec 27 21:09:44 2010 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 27 Dec 2010 21:09:44 -0600 (Central Standard Time) Subject: [Swift-user] reducing SetFieldValue logging levels In-Reply-To: References: Message-ID: Hi Allan, You just need to change NONE to OFF. To track down log4j problems, you can run Swift like COG_OPTS=-Dlog4j.debug=true swift Justin On Wed, 1 Dec 2010, Allan Espinosa wrote: > Hi, > > I set the logging level of SetFieldValue to NONE but still receive its > log entries: > $ grep SetFieldValue postproc-20101201-1412-58dz3i1h.log | head -n 10 > 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000048 type int with no value at > dataset=num_time_steps (not closed) to 3000 > 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000049 type string with no value > at dataset=spectra_period1 (not closed) to all > 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000050 type float with no value > at dataset=filter_highhz (not closed) to 5.0 > 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000053 type string with no value > at dataset=datadir (not closed) to > gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/Results > 2010-12-01 14:12:26,567-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000051 type float with no value > at dataset=simulation_timeskip (not closed) to 0.1 > 2010-12-01 14:12:26,568-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000052 type int with no value at > dataset=run_id (not closed) to 664 > 2010-12-01 14:12:26,568-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000057 type string with no value > at dataset=swift#mapper#17045 (not closed) to > gsiftp://gridftp.pads.ci.uchicago.edu//gpfs/pads/swift/aespinosa/science/cybershake/SgtFiles > 2010-12-01 14:12:26,570-0600 DEBUG SetFieldValue Setting > org.griphyn.vdl.mapping.RootDataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000056 type string with no value > at dataset=swift#mapper#17044 (not closed) to > org.griphyn.vdl.mapping.DataNode identifier > dataset:20101201-1412-y0ba3ap8:720000000062 type string with no value > at dataset=site path=.name (not closed) > > Do i have some conflicting logging config here? my log4j.properties: > > log4j.rootCategory=INFO, CONSOLE, FILE > > log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender > log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout > log4j.appender.CONSOLE.Threshold=INFO > log4j.appender.CONSOLE.layout.ConversionPattern=%m%n > > log4j.appender.FILE=org.apache.log4j.RollingFileAppender > log4j.appender.FILE.MaxFileSize=1GB > log4j.appender.FILE.File=swift.log > log4j.appender.FILE.layout=org.apache.log4j.PatternLayout > log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd > HH:mm:ss,SSSZZZZZ} %-5p %c{1} %m%n > > log4j.logger.swift=INFO > > log4j.logger.org.apache.axis.utils=ERROR > > log4j.logger.org.globus.swift.trace=INFO > > log4j.logger.org.griphyn.vdl.karajan.Loader=DEBUG > log4j.logger.org.globus.cog.karajan.workflow.events.WorkerSweeper=WARN > log4j.logger.org.globus.cog.karajan.workflow.nodes.FlowNode=WARN > log4j.logger.org.globus.cog.karajan.scheduler.WeightedHostScoreScheduler=DEBUG > log4j.logger.org.griphyn.vdl.toolkit.VDLt2VDLx=DEBUG > log4j.logger.org.griphyn.vdl.karajan.VDL2ExecutionContext=DEBUG > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=INFO > #log4j.logger.org.griphyn.vdl.karajan.lib.GetFieldValue=DEBUG > log4j.logger.org.griphyn.vdl.engine.Karajan=INFO > log4j.logger.org.globus.cog.abstraction.coaster.rlog=DEBUG > > # log4j.logger.org.globus.swift.data.Director=DEBUG > > #log4j.logger.swift=DEBUG > log4j.logger.org.griphyn.vdl.karajan.lib=NONE > log4j.logger.org.griphyn.vdl.karajan.lib.SetFieldValue=NONE > log4j.logger.org.griphyn.vdl.mapping.AbstractDataNode=NONE > > > -- Justin M Wozniak