[Swift-user] Swift running errors
lixi at uchicago.edu
lixi at uchicago.edu
Tue Feb 19 15:32:34 CST 2008
Hi,
I have two problems.
1. Today, when I try to run swift workflow on muliple OSG
sites, I always encounter the following errors which cause
the running failed:
[lixi at login remote]$ swift -
tc.file /home/lixi/swift/test/tc.data -
sites.file /home/lixi/swift/test/OSGEDU_Sites.xml
workflowtest.swift
Swift v0.3-dev r1674 (modified locally)
RunID: 20080219-1447-1hztqje9
node started
Failed to transfer kickstart records from workflowtest-
20080219-1447-1hztqje9/kickstart/8/CIT_CMS_T2Exception in
getFile
task:transfer @ vdl-int.k, line: 322
sys:try @ vdl-int.k, line: 322
vdl:transferkickstartrec @ vdl-int.k, line: 409
sys:set @ vdl-int.k, line: 409
sys:sequential @ vdl-int.k, line: 409
sys:try @ vdl-int.k, line: 408
sys:else @ vdl-int.k, line: 407
sys:if @ vdl-int.k, line: 405
sys:set @ vdl-int.k, line: 404
sys:catch @ vdl-int.k, line: 396
sys:try @ vdl-int.k, line: 354
task:allocatehost @ vdl-int.k, line: 334
vdl:execute2 @ execute-default.k, line: 23
sys:restartonerror @ execute-default.k, line: 21
sys:sequential @ execute-default.k, line: 19
sys:try @ execute-default.k, line: 18
sys:if @ execute-default.k, line: 17
sys:then @ execute-default.k, line: 16
sys:if @ execute-default.k, line: 15
vdl:execute @ workflowtest.kml, line: 31
worknode @ workflowtest.kml, line: 79
sys:sequential @ workflowtest.kml, line: 78
sys:parallel @ workflowtest.kml, line: 77
vdl:mainp @ workflowtest.kml, line: 76
mainp @ vdl.k, line: 150
vdl:mains @ workflowtest.kml, line: 75
vdl:mains @ workflowtest.kml, line: 75
rlog:restartlog @ workflowtest.kml, line: 74
kernel:project @ workflowtest.kml, line: 2
workflowtest-20080219-1447-1hztqje9
Caused by:
org.globus.cog.abstraction.impl.file.FileResourceException:
Exception in getFile
Caused by: org.globus.ftp.exception.ServerException: Server
refused performing the request. Custom message: (error code
1) [Nested exception message: Custom message: Unexpected
reply: 500-Command failed. :
globus_gridftp_server_file.c:globus_l_gfs_file_send:2190:
500-globus_l_gfs_file_open failed.
500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694:
500-globus_xio_register_open failed.
500-globus_xio_file_driver.c:globus_l_xio_file_open:438:
500-Unable to open file /raid2/osg-data/lixi/workflowtest-
20080219-1447-1hztqje9/kickstart/8/node-8kgjdnoi-
kickstart.xml
500-globus_xio_file_driver.c:globus_l_xio_file_open:381:
500-System error in open: No such file or directory
500-globus_xio: A system call failed: No such file or
directory
500 End.] [Nested exception is
org.globus.ftp.exception.UnexpectedReplyCodeException:
Custom message: Unexpected reply: 500-Command failed. :
globus_gridftp_server_file.c:globus_l_gfs_file_send:2190:
500-globus_l_gfs_file_open failed.
500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694:
500-globus_xio_register_open failed.
500-globus_xio_file_driver.c:globus_l_xio_file_open:438:
500-Unable to open file /raid2/osg-data/lixi/workflowtest-
20080219-1447-1hztqje9/kickstart/8/node-8kgjdnoi-
kickstart.xml
500-globus_xio_file_driver.c:globus_l_xio_file_open:381:
500-System error in open: No such file or directory
500-globus_xio: A system call failed: No such file or
directory
500 End.]
2. When runing a workflow which involves 1000nodes, I
encounter the following errors very frequently, but not all
the time:
...
node completed
node completed
node completed
node completed
node completed
node failed
Execution failed:
Exception in node:
Arguments: [_concurrent/intermediatefile-b5b5dc39-df70-4137-
8149-c20f5d1af839-, out.0132.txt]
Host: localhost
Directory: workflowtest-20080219-1443-2qx4ctkc/jobs/6/node-
64kddnoi
stderr.txt:
stdout.txt:
----
Caused by:
java.io.IOException: Too many open files
Could you tell me why and teach me how to resolve such
problems?
Thanks,
Xi
More information about the Swift-user
mailing list