[Swift-user] sort on large data

Jiada Tu jtu3 at hawk.iit.edu
Wed Oct 22 02:11:15 CDT 2014


Hi Yadu,

I have tested the new posted tutorial on s3fs and it works. I can run my
wordCount on s3fs now.

But there's a little problem.
I put all script and input file in /s3/wordCount-s3fs, and set the work
directory to /s3/wordCount-s3fs. Then I found that I can't set relative
path of files in my swift script, I have to set absolute path for all files.

*If I use absolute path to all input and output files, everything works
fine.

If I did:

file infile[] <filesys_mapper;pattern="input/split-*", location=".">;


It will have a exception:


----------------------------------------

Execution failed:

Exception in python:

    Arguments: [/s3/wordCount-s3fs/./wordCount.py, input/split-0006]

    Host: cloud-static

    Directory: wordCount-run001/jobs/g/python-g9k2d6zl

        exception @ swift-int-staging.k, line: 167

Caused by: Application /usr/bin/python failed with an exit code of 1


------- Application STDERR --------

wordcount error: file name "./input/split-0006" not exist.

Traceback (most recent call last):

  File "/s3/wordCount-s3fs/./wordCount.py", line 12, in <module>

    f=open(fileName, 'r')

IOError: [Errno 2] No such file or directory: './input/split-0006'

-----------------------------------

        exception @ swift-int-staging.k, line: 163

Caused by: Block task failed: Connection to worker lost

java.io.EOFException

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.readFromChannel(AbstractStreamCoasterChannel.java:253)

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:186)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:116)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:75)



        k:assign @ swift.k, line: 171

Caused by: Exception in python:

    Arguments: [/s3/wordCount-s3fs/./wordCount.py, input/split-0006]

    Host: cloud-static

    Directory: wordCount-run001/jobs/g/python-g9k2d6zl

        exception @ swift-int-staging.k, line: 167

Caused by: Application /usr/bin/python failed with an exit code of 1


------- Application STDERR --------

wordcount error: file name "./input/split-0006" not exist.

Traceback (most recent call last):

  File "/s3/wordCount-s3fs/./wordCount.py", line 12, in <module>

    f=open(fileName, 'r')

IOError: [Errno 2] No such file or directory: './input/split-0006'

-----------------------------------


        exception @ swift-int-staging.k, line: 163

Caused by: Block task failed: Connection to worker lost

java.io.EOFException

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.readFromChannel(AbstractStreamCoasterChannel.java:253)

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:186)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:116)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:75)


------------------------------------------------


Which basically says ./input/split-000 not exist. I'm sure
/s3/wordCount-s3fs/input/split-000 do exist. And I don't want to enter
absolute path to all files if possible.


Any idea about how to deal with it?


Thanks,

Jiada Tu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141022/4c79cd74/attachment.html>


More information about the Swift-user mailing list