[Swift-user] Swift running errors

lixi at uchicago.edu lixi at uchicago.edu
Tue Feb 19 15:32:34 CST 2008


Hi,

I have two problems. 

1. Today, when I try to run swift workflow on muliple OSG 
sites, I always encounter the following errors which cause 
the running failed:
[lixi at login remote]$ swift -
tc.file /home/lixi/swift/test/tc.data -
sites.file /home/lixi/swift/test/OSGEDU_Sites.xml 
workflowtest.swift 
Swift v0.3-dev r1674 (modified locally)

RunID: 20080219-1447-1hztqje9
node started
Failed to transfer kickstart records from workflowtest-
20080219-1447-1hztqje9/kickstart/8/CIT_CMS_T2Exception in 
getFile
        task:transfer @ vdl-int.k, line: 322
        sys:try @ vdl-int.k, line: 322
        vdl:transferkickstartrec @ vdl-int.k, line: 409
        sys:set @ vdl-int.k, line: 409
        sys:sequential @ vdl-int.k, line: 409
        sys:try @ vdl-int.k, line: 408
        sys:else @ vdl-int.k, line: 407
        sys:if @ vdl-int.k, line: 405
        sys:set @ vdl-int.k, line: 404
        sys:catch @ vdl-int.k, line: 396
        sys:try @ vdl-int.k, line: 354
        task:allocatehost @ vdl-int.k, line: 334
        vdl:execute2 @ execute-default.k, line: 23
        sys:restartonerror @ execute-default.k, line: 21
        sys:sequential @ execute-default.k, line: 19
        sys:try @ execute-default.k, line: 18
        sys:if @ execute-default.k, line: 17
        sys:then @ execute-default.k, line: 16
        sys:if @ execute-default.k, line: 15
        vdl:execute @ workflowtest.kml, line: 31
        worknode @ workflowtest.kml, line: 79
        sys:sequential @ workflowtest.kml, line: 78
        sys:parallel @ workflowtest.kml, line: 77
        vdl:mainp @ workflowtest.kml, line: 76
        mainp @ vdl.k, line: 150
        vdl:mains @ workflowtest.kml, line: 75
        vdl:mains @ workflowtest.kml, line: 75
        rlog:restartlog @ workflowtest.kml, line: 74
        kernel:project @ workflowtest.kml, line: 2
        workflowtest-20080219-1447-1hztqje9
Caused by: 
org.globus.cog.abstraction.impl.file.FileResourceException: 
Exception in getFile
Caused by: org.globus.ftp.exception.ServerException: Server 
refused performing the request. Custom message:  (error code 
1) [Nested exception message:  Custom message: Unexpected 
reply: 500-Command failed. : 
globus_gridftp_server_file.c:globus_l_gfs_file_send:2190:
500-globus_l_gfs_file_open failed.
500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694:
500-globus_xio_register_open failed.
500-globus_xio_file_driver.c:globus_l_xio_file_open:438:
500-Unable to open file /raid2/osg-data/lixi/workflowtest-
20080219-1447-1hztqje9/kickstart/8/node-8kgjdnoi-
kickstart.xml
500-globus_xio_file_driver.c:globus_l_xio_file_open:381:
500-System error in open: No such file or directory
500-globus_xio: A system call failed: No such file or 
directory
500 End.] [Nested exception is 
org.globus.ftp.exception.UnexpectedReplyCodeException:  
Custom message: Unexpected reply: 500-Command failed. : 
globus_gridftp_server_file.c:globus_l_gfs_file_send:2190:
500-globus_l_gfs_file_open failed.
500-globus_gridftp_server_file.c:globus_l_gfs_file_open:1694:
500-globus_xio_register_open failed.
500-globus_xio_file_driver.c:globus_l_xio_file_open:438:
500-Unable to open file /raid2/osg-data/lixi/workflowtest-
20080219-1447-1hztqje9/kickstart/8/node-8kgjdnoi-
kickstart.xml
500-globus_xio_file_driver.c:globus_l_xio_file_open:381:
500-System error in open: No such file or directory
500-globus_xio: A system call failed: No such file or 
directory
500 End.]

2. When runing a workflow which involves 1000nodes, I 
encounter the following errors very frequently, but not all 
the time:
...
node completed
node completed
node completed
node completed
node completed
node failed
Execution failed:
        Exception in node:
Arguments: [_concurrent/intermediatefile-b5b5dc39-df70-4137-
8149-c20f5d1af839-, out.0132.txt]
Host: localhost
Directory: workflowtest-20080219-1443-2qx4ctkc/jobs/6/node-
64kddnoi
stderr.txt: 

stdout.txt: 

----

Caused by:
        java.io.IOException: Too many open files

Could you tell me why and teach me how to resolve such 
problems? 

Thanks,

Xi



More information about the Swift-user mailing list