From iraicu at cs.iit.edu  Sun Oct  5 08:38:58 2014
From: iraicu at cs.iit.edu (Ioan Raicu)
Date: Sun, 05 Oct 2014 08:38:58 -0500
Subject: [Swift-user] CFP: The 24th Int. ACM Symp. on High-Performance
 Parallel and Distributed Computing (HPDC) 2015
Message-ID: <543149F2.6080205@cs.iit.edu>

                         **** CALL FOR PAPERS ****

                  The 24th International ACM Symposium on
           High-Performance Parallel and Distributed Computing
                              (HPDC-2015)

                Portland, Oregon, USA - June 15-19, 2015

                         http://www.hpdc.org/2015


The ACM International Symposium on High-Performance Parallel and Distributed
Computing (HPDC) is the premier annual conference for presenting the latest
research on the design, implementation, evaluation, and the use of parallel
and distributed systems for high-end computing.

The 24th HPDC will take place in the city of roses, Portland, Oregon on June
15-19, 2015. (Workshops on June 15-16, and the main conference on June 17-19.)


**** IMPORTANT DATES ****
Abstracts (required) due:       January 12, 2015
Full Papers due:                January 19, 2015 (no extensions)
Author rebuttal period:         March 4-7, 2015
Author notifications:           March 16, 2015
Final Manuscripts:              April 1, 2015


**** SCOPE AND TOPICS ****
Submissions are welcomed on high-performance parallel and distributed computing
topics including but not limited to: clouds, clusters, grids, big data,
massively multicore, and global-scale computing systems. Submissions that focus
on the architectures, systems, and networks of cloud infrastructures are
particularly encouraged, as are experience reports of operational deployments
that can provide insights for future research on HPDC applications and systems.
All papers will be evaluated for their originality, technical depth and
correctness, potential impact, relevance to the conference, and quality of
presentation. Research papers must clearly demonstrate research contributions
and novelty, while experience reports must clearly describe lessons learned and
demonstrate impact.

In the context of high-performance parallel and distributed computing, the
topics of interest include, but are not limited to:
- Systems, networks, and architectures
- Massively multicore systems
- Resource virtualization
- Programming languages and environments
- File and storage systems, I/O, and data management
- Resource management and scheduling, including energy-aware techniques
- Performance modeling and analysis
- Fault tolerance, reliability, and availability
- Data-intensive computing
- Applications and services that depend upon high-end computing


**** PAPER SUBMISSION GUIDELINES ****
Authors are invited to submit technical papers of at most 12 pages in PDF
format, including figures and references. Papers should be formatted in the ACM
Proceedings Style and submitted via the conference web site. No changes to the
margins, spacing, or font sizes as specified by the style file are allowed.
Accepted papers will appear in the conference proceedings, and will be
incorporated into the ACM Digital Library. A limited number of papers will be
accepted as posters.

Papers must be self-contained and provide the technical substance required for
the program committee to evaluate their contributions. Submitted papers must be
original work that has not appeared in and is not under consideration for
another conference or a journal. See the ACM Prior Publication Policy for more
details. Papers can be submitted at https://ssl.linklings.net/conferences/hpdc/.

**** HPDC'15 GENERAL CO-CHAIRS ****
Thilo Kielmann, VU University Amsterdam, The Netherlands

**** HPDC'15 PROGRAM CO-CHAIRS ****
Dean Hildebrand, IBM Research Almaden, USA
Michela Taufer, University of Delaware, USA

**** HPDC'15 WORKSHOP CHAIRS ****
Abhishek Chandra, University of Minnesota, Twin Cities, USA
Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA

**** HPDC'15 POSTERS CHAIR ****
Ana-Maria Oprescu, VU University Amsterdam, The Netherlands

**** HPDC'15 PUBLICITY CHAIR ****
Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA
Torsten Hoefler, ETH Zurich, Switzerland
Naoya Maruyama, RIKEN Advanced Institute for Computational Science, Japan

**** HPDC'15 PUBLICATIONS CHAIR ****
Antonino Tumeo, Pacific Northwest National Laboratory, USA

**** HPDC'15 TRAVEL AWARD CHAIR ****
Ming Zhao, Florida International University, USA

**** HPDC'15 WEBMASTER CHAIR ****
Kaveh Razavi, VU University Amsterdam, The Netherlands

**** HPDC'15 PROGRAM COMMITTEE ****
David Abramson, The University of Queensland, Australia
Dong Ahn, Lawrence Livermore National Laboratory, USA
Gabriel Antoniu, INRIA, France
Henri Bal, VU University Amsterdam, The Netherlands
Pavan Balaji, Argonne National Laboratory, USA
Michela Becchi, University of Missouri, USA
John Bent, EMC, USA
Greg Bronevetsky, Lawrence Livermore National Laboratory, USA
Ali Butt, Virginia Tech, USA
Franck Cappello, Argonne National Lab, USA
Abhishek Chandra, University of Minnesota, USA
Andrew A. Chien, University of Chicago, USA
Paolo Costa, Microsoft Research Cambridge, UK
Kei Davis, Los Alamos National Laboratory, USA
Peter Dinda, Northwestern University, USA
Dick Epema, Delft University of Technology, The Netherlands
Gilles Fedak, INRIA, France
Wuchun Feng, Virginia Tech, USA
Renato Figueiredo, University of Florida, USA
Clemens Grelck, University of Amsterdam, The Netherlands
Adriana Iamnitchi, University of South Florida, USA
Larry Kaplan, Cray Inc., USA
Kate Keahey, Argonne National Laboratory, USA
Dries Kimpe, Argonne National Laboratory, USA
Alice Koniges, Lawrence Berkeley National Laboratory, USA
Zhiling Lan, Illinois Institute of Technology, USA
John (Jack) Lange, University of Pittsburgh, USA
Gary Liu, Oak Ridge National Laboratory, USA
Jay Lofstead, Sandia National Laboratories, USA
Arthur Barney Maccabe, Oak Ridge National Laboratory, USA
Carlos Maltzahn, University of California, Santa Cruz, USA
Naoya Maruyama, RIKEN Advanced Institute for Comp. Science, Japan
Satoshi Matsuoka, Tokyo Inst. Technology, Japan
Timothy Mattson, Intel, USA
Kathryn Mohror, Lawrence Livermore National Laboratory, USA
Bogdan Nicolae, IBM Research, Ireland
Sangmi Pallickara, Colorado State University, USA
Manish Parashar, Rutgers University, USA
Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory, USA
Raju Rangaswami, Florida International University, USA
Matei Ripeanu, University of British Columbia, Canada
Nagiza F. Samatova, North Carolina State University, USA
Prasenjit Sarkar, Independent Consultant, USA
Karsten Schwan, Georgia Institute of Technology, USA
Vasily Tarasov, IBM Research, USA
Kenjiro Taura, University of Tokyo, Japan
Douglas Thain, University of Notre Dame, USA
Ana Varbanescu, University of Amsterdam, The Netherlands
Richard Vuduc, Georgia Institute of Technology, USA
Jon Weissman, University of Minnesota, USA
Dongyan Xu, Purdue University, USA
Rui Zhang, IBM Research, USA

**** HPDC STEERING COMMITTEE ****
Franck Cappello, Argonne National Lab, USA and INRIA, France
Andrew A. Chien, University of Chicago, USA
Peter Dinda, Northwestern University, USA
Dick Epema, Delft University of Technology, The Netherlands
Renato Figueiredo, University of Florida, USA
Salim Hariri, University of Arizona, USA
Thilo Kielmann, VU University Amsterdam, The Netherlands
Arthur "Barney" Maccabe, Oak Ridge National Laboratory, USA
Manish Parashar, Rutgers University, USA
Matei Ripeanu, University of British Columbia, Canada
Karsten Schwan, Georgia Tech, USA
Doug Thain, University of Notre Dame, USA
Jon Weissman, University of Minnesota, USA (Chair)
Dongyan Xu, Purdue University, USA

-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Editor: IEEE TCC, Springer Cluster, Springer JoCCASA
Chair:  IEEE/ACM MTAGS, ACM ScienceCloud
=================================================================
Cel:      1-847-722-0876
Office:   1-312-567-5704
Email:    iraicu at cs.iit.edu
Web:      http://www.cs.iit.edu/~iraicu/
Web:      http://datasys.cs.iit.edu/
LinkedIn: http://www.linkedin.com/in/ioanraicu
Google:   http://scholar.google.com/citations?user=jE73HYAAAAAJ
=================================================================
=================================================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141005/2fb06e98/attachment.html>

From ketan at mcs.anl.gov  Fri Oct 10 15:55:08 2014
From: ketan at mcs.anl.gov (Ketan Maheshwari)
Date: Fri, 10 Oct 2014 15:55:08 -0500
Subject: [Swift-user] set cleanup off
Message-ID: <CAMUuvipVr-NkzmX9CnxEaOhH2Ubrh8Lxju5tt5OFBNjMNWmrcQ@mail.gmail.com>

I am running cobalt jobs where I have a user quota of 1024 node runs max in
an hour. In cases where this exceeds, the system will not allow any more
job submission.

In this scenario, the cleanup operations fail after the run has completed
with the following error:

Final status:Fri, 10 Oct 2014 20:49:58+0000  Finished successfully:100
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
not submit job (qsub reported an exit code of 1).
project: ExM<Fault 1001: "The limit of 1024 node hours per user in the
'default' queue has been reached\n">

at
org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
at
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
<?xml version="1.0" encoding="UTF-8"?>
at
org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61)
at
org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
Caused by:
org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could
not submit job (qsub reported an exit code of 1).
project: ExM<Fault 1001: "The limit of 1024 node hours per user in the
'default' queue has been reached\n">

at
org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113)
at
org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
... 3 more
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
not submit job (qsub reported an exit code of 1).
project: ExM<Fault 1001: "The limit of 1024 node hours per user in the
'default' queue has been reached\n">


Is there any ways to tell Swift/Coasters to not do cleanup? If so, is there
any harm in doing so?

Thanks,
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141010/5cf4d45c/attachment.html>

From hategan at mcs.anl.gov  Fri Oct 10 19:41:28 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Fri, 10 Oct 2014 17:41:28 -0700
Subject: [Swift-user] set cleanup off
In-Reply-To: <CAMUuvipVr-NkzmX9CnxEaOhH2Ubrh8Lxju5tt5OFBNjMNWmrcQ@mail.gmail.com>
References: <CAMUuvipVr-NkzmX9CnxEaOhH2Ubrh8Lxju5tt5OFBNjMNWmrcQ@mail.gmail.com>
Message-ID: <1412988088.23034.14.camel@echo>

Hi Ketan,

There is no way currently to disable the cleanup job, unless you run
with provider staging, in which case there is no cleanup job.

In some sense, the below limitation of the queue is a way to disable the
cleanup job, and, apart from the distasteful error messages, there
should be no detrimental side-effects.

The harm in not doing cleanup is that you leave unneeded files on disk.
You can, of course, clean up the work directory manually when you
need/want to.

Mihael

On Fri, 2014-10-10 at 15:55 -0500, Ketan Maheshwari wrote:
> I am running cobalt jobs where I have a user quota of 1024 node runs max in
> an hour. In cases where this exceeds, the system will not allow any more
> job submission.
> 
> In this scenario, the cleanup operations fail after the run has completed
> with the following error:
> 
> Final status:Fri, 10 Oct 2014 20:49:58+0000  Finished successfully:100
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> not submit job (qsub reported an exit code of 1).
> project: ExM<Fault 1001: "The limit of 1024 node hours per user in the
> 'default' queue has been reached\n">
> 
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63)
> at
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45)
> <?xml version="1.0" encoding="UTF-8"?>
> at
> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61)
> at
> org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40)
> Caused by:
> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could
> not submit job (qsub reported an exit code of 1).
> project: ExM<Fault 1001: "The limit of 1024 node hours per user in the
> 'default' queue has been reached\n">
> 
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113)
> at
> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53)
> ... 3 more
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> not submit job (qsub reported an exit code of 1).
> project: ExM<Fault 1001: "The limit of 1024 node hours per user in the
> 'default' queue has been reached\n">
> 
> 
> Is there any ways to tell Swift/Coasters to not do cleanup? If so, is there
> any harm in doing so?
> 
> Thanks,
> Ketan
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user


From ggowda at hawk.iit.edu  Mon Oct 13 11:02:46 2014
From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda)
Date: Mon, 13 Oct 2014 21:32:46 +0530
Subject: [Swift-user] NullPointerException while running Tutorial
Message-ID: <CAGYVvuULONFVAhNQHHqDZ2pJjvsmnhgKeg4MU5OQ_fEoTDS9Jw@mail.gmail.com>

Hello,

I am facing issues while running through this tutorial : http://
swiftlang.org/tutorials/cloud/swift-cloud-tutorial.tar.gz

I have setup the coaster conf to point to my workers and head node as
mentioned in the docs.

I see that the following error was because of a BUG (as mentioned here :
<https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1321>
https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1321)

Its mentioned what it has been fixed in 0.95 (which is not yet released). *I
am using 0.95-RC6*

Looking forward for any help in resolving this issue.

Following is the error:

ubuntu at ip-XXX-XXX-XXX-XXX:~/swift-cloud-tutorial/part04$ swift p4.swift
*Swift 0.95 RC6 swift-r7900 cog-r3908*
RunID: run002
Warning: The @ syntax for function invocation is deprecated
Progress: Sun, 12 Oct 2014 16:52:13+0000
*Exception in thread "Scheduler" java.lang.NullPointerException*
at
org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364)
at java.util.HashMap.get(HashMap.java:317)
at
org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400)
at
org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266)
Progress: Sun, 12 Oct 2014 16:52:14+0000 Selecting site:10
No events in 1s.
Finding dependency loops...

Waiting threads:
Thread: R-5x2-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-3x2, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-0-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-6-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-8-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-1-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-7-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-9-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-6, waiting on sims (declared on line 21)
swift:execute, p4, line 70
analyze, p4, line 211

Thread: R-5-2-3, waiting on simout (declared on line 24)
assignment, p4, line 28

Thread: R-5-4-3, waiting on simout (declared on line 24)
assignment, p4, line 28

----
No dependency loops found.

The following threads are independently hung:
Thread: R-6, waiting on sims (declared on line 21)
swift:execute, p4, line 70
analyze, p4, line 211

----

Irrecoverable error found. Exiting.

-- 
Regards,
Gagan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141013/7fd327b9/attachment.html>

From yadunand at uchicago.edu  Mon Oct 13 11:35:06 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Mon, 13 Oct 2014 11:35:06 -0500
Subject: [Swift-user] NullPointerException while running Tutorial
In-Reply-To: <CAGYVvuULONFVAhNQHHqDZ2pJjvsmnhgKeg4MU5OQ_fEoTDS9Jw@mail.gmail.com>
References: <CAGYVvuULONFVAhNQHHqDZ2pJjvsmnhgKeg4MU5OQ_fEoTDS9Jw@mail.gmail.com>
Message-ID: <543BFF3A.3060707@uchicago.edu>

Hi Gagan,

The tutorial that you are following is for an the 0.95* versions of 
swift. I would recommend that you use the
tutorial listed here instead : https://github.com/yadudoc/cloud-tutorials

The null pointer bug has been fixed in Swift-0.95-RC7, which you may 
download from here:
http://swift-lang.org/packages/swift-0.95-RC7.tar.gz

The tutorials for trunk, are far more tested and stable, so I would 
strongly recommend you use that
if you are from Ioan's class at IIT.

Thanks,
Yadu

On 10/13/2014 11:02 AM, Gagan Munisiddha Gowda wrote:
> Hello,
>
> I am facing issues while running through this tutorial : 
> http://swiftlang.org/tutorials/cloud/swift-cloud-tutorial.tar.gz 
> <http://swiftlang.org/tutorials/cloud/swift-cloud-tutorial.tar.gz>
>
> I have setup the coaster conf to point to my workers and head node as 
> mentioned in the docs.
>
> I see that the following error was because of a BUG (as mentioned here 
> : https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1321)
>
> Its mentioned what it has been fixed in 0.95 (which is not yet 
> released). *I am using 0.95-RC6*
> *
> *
> Looking forward for any help in resolving this issue.
>
> Following is the error:
>
> ubuntu at ip-XXX-XXX-XXX-XXX:~/swift-cloud-tutorial/part04$ swift p4.swift
> *Swift 0.95 RC6 swift-r7900 cog-r3908*
> RunID: run002
> Warning: The @ syntax for function invocation is deprecated
> Progress: Sun, 12 Oct 2014 16:52:13+0000
> *Exception in thread "Scheduler" java.lang.NullPointerException*
> at 
> org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364)
> at java.util.HashMap.get(HashMap.java:317)
> at 
> org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400)
> at 
> org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266)
> Progress: Sun, 12 Oct 2014 16:52:14+0000 Selecting site:10
> No events in 1s.
> Finding dependency loops...
>
> Waiting threads:
> Thread: R-5x2-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-3x2, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-0-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-6-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-8-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-1-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-7-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-9-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-6, waiting on sims (declared on line 21)
> swift:execute, p4, line 70
> analyze, p4, line 211
>
> Thread: R-5-2-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> Thread: R-5-4-3, waiting on simout (declared on line 24)
> assignment, p4, line 28
>
> ----
> No dependency loops found.
>
> The following threads are independently hung:
> Thread: R-6, waiting on sims (declared on line 21)
> swift:execute, p4, line 70
> analyze, p4, line 211
>
> ----
>
> Irrecoverable error found. Exiting.
>
>
> -- 
> Regards,
> Gagan
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141013/d8c2a5d6/attachment.html>

From yadudoc1729 at gmail.com  Fri Oct 17 18:16:01 2014
From: yadudoc1729 at gmail.com (Yadu Nand)
Date: Fri, 17 Oct 2014 18:16:01 -0500
Subject: [Swift-user] Error in running p4.swift in cloud-tutorials
In-Reply-To: <CAELr6GeYQvAz6O1-3X+ziJfnf+wB1+eoZNDEGZVLxDebrHx4JA@mail.gmail.com>
References: <CAELr6GeYQvAz6O1-3X+ziJfnf+wB1+eoZNDEGZVLxDebrHx4JA@mail.gmail.com>
Message-ID: <CANa904mF+3zaWg=EWTnQ=4b_-PUoESgNWXxg9doC5PBJYQPBCQ@mail.gmail.com>

Hi Raghav,

I'm CC-ing the swift-user list, and encourage you to join the list.

I just tried this from scratch, and I'm not able to reproduce the issue you
are seeing. Could you send me
a tar ball of the runNNN folder, and the cps*log from your
/home/ubuntu/s3fs-fuse folder.

Thanks,
Yadu

On Fri, Oct 17, 2014 at 5:30 PM, Raghav Kapoor <rkapoor7 at hawk.iit.edu>
wrote:

> Hello Yadunand,
>
>                                My name is Raghav,  I am a graduate student
> at IIT. I have an assignment on swift.
>
> I am using the directions provided on your github page.
>
> https://github.com/yadudoc/cloud-tutorials
>
> What I have observed is, all the sample tutorials were running a day
> before.
> But from yesterday, p4.swift is not running on the cloud.
>
> I am getting the following error which I am pasting below:
>
> root at ip-172-31-20-129:/home/ubuntu/swift-cloud-tutorial/part04# swift
> p4.swift
> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master
> 6130 (modified locally)
> RunID: run001
> Progress: Fri, 17 Oct 2014 22:10:33+0000
>
> Execution failed:
> Exception in sort:
>     Arguments: [-n, unsorted.txt]
>     Host: cloud-static
>     Directory: p4-run001/jobs/0/sort-0abp6zyl
> exception @ swift-int-staging.k, line: 167
> Caused by: null
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> not submit job
> Caused by: org.globus.cog.coaster.channels.ChannelException: Failed to
> create socket
> Caused by: java.net.ConnectException: Connection refused
> k:assign @ swift.k, line: 171
> Caused by: Exception in sort:
>     Arguments: [-n, unsorted.txt]
>     Host: cloud-static
>     Directory: p4-run001/jobs/0/sort-0abp6zyl
> exception @ swift-int-staging.k, line: 167
> Caused by: null
> Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> not submit job
> Caused by: org.globus.cog.coaster.channels.ChannelException: Failed to
> create socket
> Caused by: java.net.ConnectException: Connection refused
> root at ip-172-31-20-129:/home/ubuntu/swift-cloud-tutorial/part04#
>
>
> I think there is some problem with hostname and port numbers. it is not
> specified correctly.
>
> I see that you have updated the repository with some changes that might be
> the cause of this issue.
>
> I am referring to this commit specifically
>
>
> https://github.com/yadudoc/cloud-tutorials/commit/d75ce87eb94fd8460b9b425c72445265e6074974
>
> which was made a day ago.
>
> I am not sure what is the cause of this problem,
>
> Could you investigate and help me resolve the issue?
>
> Thanks a lot,
>
> Regards,
>
> Raghav
>


-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141017/e0ea3dac/attachment.html>

From sdivanji at hawk.iit.edu  Sat Oct 18 00:00:00 2014
From: sdivanji at hawk.iit.edu (Sughosh Divanji)
Date: Sat, 18 Oct 2014 00:00:00 -0500
Subject: [Swift-user] Fwd: Swift not running after restarting Amazon EC2
	instance
In-Reply-To: <CAJWHPxnB-bSqxA1vJiTXEr6Z8=m+C4XDj+4=anFeC31iSobt3w@mail.gmail.com>
References: <CAJWHPxnB-bSqxA1vJiTXEr6Z8=m+C4XDj+4=anFeC31iSobt3w@mail.gmail.com>
Message-ID: <CAJWHPxn7MJVPhYPU2U3PmF3hN43smb-YDbV9U+qdbefgO+xH9A@mail.gmail.com>

---------- Forwarded message ----------
From: Sughosh Divanji <sdivanji at hawk.iit.edu>
Date: Fri, Oct 17, 2014 at 11:50 PM
Subject: Swift not running after restarting Amazon EC2 instance
To: swift-user at ci.uchicago.edu
Cc: Raghav Kapoor <rkapoor7 at hawk.iit.edu>, Arjun Nanjundappa <
ananjun1 at hawk.iit.edu>


Hi all,

My name is Sughosh and I am a graduate student in CS from IIT Chicago. I am
using swift for a homework assignment and facing this issue after
restarting my amazon EC2 instances.

root at ip-172-31-19-231:/home/ubuntu/wordcount# time swift wordcount.swift
Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master
6130 (modified locally)
RunID: run004
Progress: Sat, 18 Oct 2014 04:36:15+0000
Progress: Sat, 18 Oct 2014 04:36:16+0000  Submitting:15  Failed but can
retry:1
Progress: Sat, 18 Oct 2014 04:36:46+0000  Submitting:15  Failed but can
retry:1
^C
real 0m59.366s
user 0m4.248s
sys 0m0.373s

The same code works fine without any issues before reboot. I have attached
the run logs. Please let me know what could be the issue.

Thanks,
Sughosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141018/ed7deacc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run004.zip
Type: application/zip
Size: 4930 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141018/ed7deacc/attachment.zip>

From yadunand at uchicago.edu  Sat Oct 18 00:53:01 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Sat, 18 Oct 2014 00:53:01 -0500
Subject: [Swift-user] Fwd: Swift not running after restarting Amazon EC2
 instance
In-Reply-To: <CAJWHPxn7MJVPhYPU2U3PmF3hN43smb-YDbV9U+qdbefgO+xH9A@mail.gmail.com>
References: <CAJWHPxnB-bSqxA1vJiTXEr6Z8=m+C4XDj+4=anFeC31iSobt3w@mail.gmail.com>
	<CAJWHPxn7MJVPhYPU2U3PmF3hN43smb-YDbV9U+qdbefgO+xH9A@mail.gmail.com>
Message-ID: <5442003D.6050707@uchicago.edu>

Hi Sughosh,

Could you describe the steps you took to restart the the amazon instances ?
Did you restart the headnode instance and all the worker instances ?

I do not see anything in the logs that jump out.

-Yadu

On 10/18/2014 12:00 AM, Sughosh Divanji wrote:
>
> ---------- Forwarded message ----------
> From: *Sughosh Divanji* <sdivanji at hawk.iit.edu 
> <mailto:sdivanji at hawk.iit.edu>>
> Date: Fri, Oct 17, 2014 at 11:50 PM
> Subject: Swift not running after restarting Amazon EC2 instance
> To: swift-user at ci.uchicago.edu <mailto:swift-user at ci.uchicago.edu>
> Cc: Raghav Kapoor <rkapoor7 at hawk.iit.edu 
> <mailto:rkapoor7 at hawk.iit.edu>>, Arjun Nanjundappa 
> <ananjun1 at hawk.iit.edu <mailto:ananjun1 at hawk.iit.edu>>
>
>
> Hi all,
>
> My name is Sughosh and I am a graduate student in CS from IIT Chicago. 
> I am using swift for a homework assignment and facing this issue after 
> restarting my amazon EC2 instances.
>
> root at ip-172-31-19-231:/home/ubuntu/wordcount# time swift wordcount.swift
> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac 
> heads/master 6130 (modified locally)
> RunID: run004
> Progress: Sat, 18 Oct 2014 04:36:15+0000
> Progress: Sat, 18 Oct 2014 04:36:16+0000  Submitting:15  Failed but 
> can retry:1
> Progress: Sat, 18 Oct 2014 04:36:46+0000  Submitting:15  Failed but 
> can retry:1
> ^C
> real0m59.366s
> user0m4.248s
> sys0m0.373s
>
> The same code works fine without any issues before reboot. I have 
> attached the run logs. Please let me know what could be the issue.
>
> Thanks,
> Sughosh
>
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141018/64ee4570/attachment.html>

From jtu3 at hawk.iit.edu  Sat Oct 18 16:18:44 2014
From: jtu3 at hawk.iit.edu (Jiada Tu)
Date: Sat, 18 Oct 2014 16:18:44 -0500
Subject: [Swift-user] sort on large data
Message-ID: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>

I am doing an assignment with swift to sort large data. The data contains
one record (string) each line. We need to sort the records base on ascii
code. The data is too large to fit in the memory.

The large data file is in head node, and I run the swift script directly on
head node.

Here's what I plan to do:

1) split the big file into 64MB files
2) let each worker task sort one 64MB files. Say, each task will call a
"sort.py" (written by me). sort.py will output a list of files,
say:"sorted-worker1-001; sorted-worker1-002; ......". The first file
contains the records started with 'a', the second started with 'b', etc.
3) now we will have all records started with 'a' in
(sorted-worker1-001;sorted-worker2-001;...); 'b' in
 (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all
the files contains records 'a' to a "reduce" worker task and let it merge
these files into one single file. Same to 'b', 'c', etc.
4) now we get 26 files (a-z) with each sorted inside.

Basically what I am doing is simulate Map-reduce. step 2 is map and step 3
is reduce

Here comes some problems:
1) for step 2, sort.py need to output a list of files. How can swift app
function handles list of outputs?

    app (file[] outfiles) sort (file[] infiles) {
          sort.py // how to put out files here?
    }

2) As I know (may be wrong), swift will stage all the output file back to
the local disk (here is the head node since I run the swift script directly
on headnode). So the output files in step 2 will be staged back to head
node first, then stage from head node to the worker nodes to do the step 3,
then stage the 26 files in step 4 back to head node. I don't want it
because the network will be a huge bottleneck. Is there any way to tell the
"reduce" worker to get data directly from "map" worker? Maybe a shared file
system will help, but is there any way that user can control the data
staging between workers without using the shared file system?

Since I am new to the swift, I may be totally wrong and misunderstanding
what swift do. If so, please correct me.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141018/a33593ef/attachment.html>

From yadunand at uchicago.edu  Sat Oct 18 18:13:04 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Sat, 18 Oct 2014 18:13:04 -0500
Subject: [Swift-user] sort on large data
In-Reply-To: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
References: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
Message-ID: <5442F400.9020805@uchicago.edu>

Hi Jiada Tu,

1) Here's an example for returning an array of files :

type file;
app (file outs[]) make_outputs (file script)
{
     bash @script;
}

file outputs[] <filesys_mapper; prefix="outputs">;
file script       <"make_outputs.sh">; # This script creates a few files 
with outputs as prefix
(outputs) = make_outputs(script);

2) The products of a successful task execution, must be visible to the 
headnode (where swift runs) either through a
- shared filesystem (NFS, S3 mounted over s3fs etc)  or
- must be brought back over the network.
But, we can reduce the overhead in moving the results to the headnode 
and then to the workers for the reduce stage.

I understand that this is part of your assignment, so I will try to 
answer without getting too specific, at the same time,
concepts from hadoop do not necessarily work directly in this context. 
So here are some things to consider to get
the best performance possible:

- Assuming that the texts contain 10K unique words, your sort program 
will generate a file containing atmost 10K lines
  (which would be definitely under an MB). Is there any advantage into 
splitting this into smaller files ?

- Since the final merge involves tiny files, you could very well do the 
reduce stage on the headnode and be quite efficient
   (you can define the reduce app only for site:local)

   sites : [local, cloud-static]
   site.local {
                 ....
                 app.reduce {
                         executable : ${env.PWD}/reduce.py
                 }
   }

   site.cloud-static {
                 ....
                 app.python {
                         executable : /usr/bin/python
                 }

  }

  This assumes that you are going to define your sorting app like this :

   app (file freqs) sort (file sorting_script, file input ) {
        python @sorting_script @input;
  }


- The real cost is in having the original text reach the workers, this 
can be made faster by :
     - A better headnode with better network/disk IO (I've measured 
140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s)
     - Use S3 with S3fs and have swift-workers pull data from S3 which 
is pretty scalable, and remove the IO load from the headnode.

- Identify the optimal size for data chunks for your specific problem. 
Each chunk of data in this case comes with the overhead of starting
   a new remote task, sending the data and bringing results back. Note 
that the result of a wordcount on a file whether it is 1Mb or 10Gb
   is still the atmost 1Mb (with earlier assumptions)

- Ensure that the data with the same datacenter, for cost as well as 
performance. By limiting the cluster to US-Oregon we already do this.

If you would like to attempt this using S3FS, let me know, I'll be happy 
to explain that in detail.

Thanks,
Yadu


On 10/18/2014 04:18 PM, Jiada Tu wrote:
> I am doing an assignment with swift to sort large data. The data 
> contains one record (string) each line. We need to sort the records 
> base on ascii code. The data is too large to fit in the memory.
>
> The large data file is in head node, and I run the swift script 
> directly on head node.
>
> Here's what I plan to do:
>
> 1) split the big file into 64MB files
> 2) let each worker task sort one 64MB files. Say, each task will call 
> a "sort.py" (written by me). sort.py will output a list of files, 
> say:"sorted-worker1-001; sorted-worker1-002; ......". The first file 
> contains the records started with 'a', the second started with 'b', etc.
> 3) now we will have all records started with 'a' in 
> (sorted-worker1-001;sorted-worker2-001;...); 'b' in 
>  (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send 
> all the files contains records 'a' to a "reduce" worker task and let 
> it merge these files into one single file. Same to 'b', 'c', etc.
> 4) now we get 26 files (a-z) with each sorted inside.
>
> Basically what I am doing is simulate Map-reduce. step 2 is map and 
> step 3 is reduce
>
> Here comes some problems:
> 1) for step 2, sort.py need to output a list of files. How can swift 
> app function handles list of outputs?
>     app (file[] outfiles) sort (file[] infiles) {
>           sort.py // how to put out files here?
>     }
>
> 2) As I know (may be wrong), swift will stage all the output file back 
> to the local disk (here is the head node since I run the swift script 
> directly on headnode). So the output files in step 2 will be staged 
> back to head node first, then stage from head node to the worker nodes 
> to do the step 3, then stage the 26 files in step 4 back to head node. 
> I don't want it because the network will be a huge bottleneck. Is 
> there any way to tell the "reduce" worker to get data directly from 
> "map" worker? Maybe a shared file system will help, but is there any 
> way that user can control the data staging between workers without 
> using the shared file system?
>
> Since I am new to the swift, I may be totally wrong and 
> misunderstanding what swift do. If so, please correct me.
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141018/846551a8/attachment.html>

From ggowda at hawk.iit.edu  Sun Oct 19 00:08:16 2014
From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda)
Date: Sun, 19 Oct 2014 10:38:16 +0530
Subject: [Swift-user] sort on large data
In-Reply-To: <5442F400.9020805@uchicago.edu>
References: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
	<5442F400.9020805@uchicago.edu>
Message-ID: <CAGYVvuUiBXbMOhmE4kawjbi5E0OFetUyq27QeNCb=ZFA6-y1wA@mail.gmail.com>

Hi Yadu,

I am in the same direction where I am trying to use a shared file system
(S3 bucket / S3FS).

I have setup : *WORKER_INIT_SCRIPT=/path/to/mounts3fs.sh in
cloud-tutorials/ec2/configs** (as mentioned in the tutorials)*

Though i am able to setup the passwd-s3fs file in the desired location
(using mounts3fs.sh script), i see that the S3 bucket is not getting
mounted.

I have verified the passwd-s3fs file and mount point and all seems to be
created as expected. But, one observation was the owner of these files were
'root' user as it was getting created through the setup.sh.

So, i added more commands to change the permissions and made 'ubuntu' as
the owner for all related files.

Even after all these changes, i see that the S3 bucket is still not mounted.

*PS: If i connect to the workers and run the s3fs command manually, it does
mount !*

sudo s3fs -o allow_other,gid=1000,use_cache=/home/ubuntu/cache <my-bucket>
<mount-point>;

(tried with and without sudo)

Thanks for your help.


On Sun, Oct 19, 2014 at 4:43 AM, Yadu Nand Babuji <yadunand at uchicago.edu>
wrote:

>  Hi Jiada Tu,
>
> 1) Here's an example for returning an array of files :
>
> type file;
> app (file outs[]) make_outputs (file script)
> {
>     bash @script;
> }
>
> file outputs[] <filesys_mapper; prefix="outputs">;
> file script       <"make_outputs.sh">; # This script creates a few files
> with outputs as prefix
> (outputs) = make_outputs(script);
>
> 2) The products of a successful task execution, must be visible to the
> headnode (where swift runs) either through a
> - shared filesystem (NFS, S3 mounted over s3fs etc)  or
> - must be brought back over the network.
> But, we can reduce the overhead in moving the results to the headnode and
> then to the workers for the reduce stage.
>
> I understand that this is part of your assignment, so I will try to answer
> without getting too specific, at the same time,
> concepts from hadoop do not necessarily work directly in this context. So
> here are some things to consider to get
> the best performance possible:
>
> - Assuming that the texts contain 10K unique words, your sort program will
> generate a file containing atmost 10K lines
>  (which would be definitely under an MB). Is there any advantage into
> splitting this into smaller files ?
>
> - Since the final merge involves tiny files, you could very well do the
> reduce stage on the headnode and be quite efficient
>   (you can define the reduce app only for site:local)
>
>   sites : [local, cloud-static]
>   site.local {
>                 ....
>                 app.reduce {
>                         executable : ${env.PWD}/reduce.py
>                 }
>   }
>
>   site.cloud-static {
>                 ....
>                 app.python {
>                         executable : /usr/bin/python
>                 }
>
>  }
>
>  This assumes that you are going to define your sorting app like this :
>
>   app (file freqs) sort (file sorting_script, file input ) {
>        python @sorting_script @input;
>  }
>
>
> - The real cost is in having the original text reach the workers, this can
> be made faster by :
>     - A better headnode with better network/disk IO (I've measured
> 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s)
>     - Use S3 with S3fs and have swift-workers pull data from S3 which is
> pretty scalable, and remove the IO load from the headnode.
>
> - Identify the optimal size for data chunks for your specific problem.
> Each chunk of data in this case comes with the overhead of starting
>   a new remote task, sending the data and bringing results back. Note that
> the result of a wordcount on a file whether it is 1Mb or 10Gb
>   is still the atmost 1Mb (with earlier assumptions)
>
> - Ensure that the data with the same datacenter, for cost as well as
> performance. By limiting the cluster to US-Oregon we already do this.
>
> If you would like to attempt this using S3FS, let me know, I'll be happy
> to explain that in detail.
>
> Thanks,
> Yadu
>
>
>
> On 10/18/2014 04:18 PM, Jiada Tu wrote:
>
> I am doing an assignment with swift to sort large data. The data contains
> one record (string) each line. We need to sort the records base on ascii
> code. The data is too large to fit in the memory.
>
>  The large data file is in head node, and I run the swift script directly
> on head node.
>
>  Here's what I plan to do:
>
>  1) split the big file into 64MB files
> 2) let each worker task sort one 64MB files. Say, each task will call a
> "sort.py" (written by me). sort.py will output a list of files,
> say:"sorted-worker1-001; sorted-worker1-002; ......". The first file
> contains the records started with 'a', the second started with 'b', etc.
> 3) now we will have all records started with 'a' in
> (sorted-worker1-001;sorted-worker2-001;...); 'b' in
>  (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all
> the files contains records 'a' to a "reduce" worker task and let it merge
> these files into one single file. Same to 'b', 'c', etc.
> 4) now we get 26 files (a-z) with each sorted inside.
>
>  Basically what I am doing is simulate Map-reduce. step 2 is map and step
> 3 is reduce
>
>  Here comes some problems:
> 1) for step 2, sort.py need to output a list of files. How can swift app
> function handles list of outputs?
>
>     app (file[] outfiles) sort (file[] infiles) {
>           sort.py // how to put out files here?
>     }
>
>  2) As I know (may be wrong), swift will stage all the output file back
> to the local disk (here is the head node since I run the swift script
> directly on headnode). So the output files in step 2 will be staged back to
> head node first, then stage from head node to the worker nodes to do the
> step 3, then stage the 26 files in step 4 back to head node. I don't want
> it because the network will be a huge bottleneck. Is there any way to tell
> the "reduce" worker to get data directly from "map" worker? Maybe a shared
> file system will help, but is there any way that user can control the data
> staging between workers without using the shared file system?
>
>  Since I am new to the swift, I may be totally wrong and misunderstanding
> what swift do. If so, please correct me.
>
>
>
>
> _______________________________________________
> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>


-- 
Regards,
Gagan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141019/7ea17387/attachment.html>

From jtu3 at hawk.iit.edu  Sun Oct 19 04:00:42 2014
From: jtu3 at hawk.iit.edu (Jiada Tu)
Date: Sun, 19 Oct 2014 04:00:42 -0500
Subject: [Swift-user] sort on large data
In-Reply-To: <544341D2.9020104@uchicago.edu>
References: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
	<5442F400.9020805@uchicago.edu>
	<CAC67Cjb-YEwvJDMvZZP19LzwMi-gYkH1X+ZVq6nOc+cF3mDZUw@mail.gmail.com>
	<544341D2.9020104@uchicago.edu>
Message-ID: <CAC67Cjb+4ABhuXPNSPcDAkGqmYO=QKvOkzhgi955yR9vSVvG4A@mail.gmail.com>

Hi Yadu,

Thanks for your answer! That's really helpful.


I forgot to forward my last question and your answer to swift-user group,
so if anybody have the same confusions or have interest on it, please check
below :

On Sat, Oct 18, 2014 at 11:45 PM, Yadu Nand Babuji <yadunand at uchicago.edu>
wrote:

>  Hi Jiada,
>
> Please find replies inline :
>
> On 10/18/2014 09:30 PM, Jiada Tu wrote:
>
> Thanks for your answer, Yadu. I have some more questions:
> 1) If I didn't misunderstand, the (outputs) will be staging back to the
> head node, right?
>
> Yes, in the current modes that you are using.
>
>  2)
> ---------------------------
>  type file;
> app (file outs[]) make_outputs (file script)
> {
>     bash @script;
> }
>
> file outputs[] <filesys_mapper; prefix="outputs">;
> file script       <"make_outputs.sh">; # This script creates a few files
> with outputs as prefix
> (outputs) = make_outputs(script);
> ---------------------------
>
>  If I have some later app function that takes the "outputs" files as
> input, will that app function wait until all possible
>
>   outputs generated?
>
> Yes! Swift is implicitly parallel, and the order of execution is based on
> the availability of dependent data items.
>
>   For example:
>
>  app (file outs[]) final_ouputs (file script, file input[])
>  {
>     bash @script @filenames(input)
> }
> foreach i in [0:100]
> {
>     file outputs[] <filesys_mapper; prefix="outputs-"+ at tostring(i)+"-">;
>     #I know ""outputs-"+ at tostring(i)+"-"" may not work, please think it
> as pesudo-code
>     (outputs) = make_outputs(script);
> }
>
>  file script2 <"final_output"> #take multiple input and merge them into a
> file
> file inputs[] <filesys_mapper; prefix="outputs", suffix="000">
> file finoutput<filesys_mapper; prefix="finalouputs-">
> (finoutput) = final_ouputs(script2, inputs)
>
>  final_outputs needs to take some output files from "every" single loops
> as its input (first loop may generate "outputs-0-000", second loop may
> generate "outputs-1-000", etc).
>
>  So, will final_outputs() task be "block" until all make_outputs() task
> finish processing?
>
> Yes, final_outputs will block till the array that it depends on is closed.
>
>   3) Actually, wordCount is our first program, and sort is our second
> program which gives "extra credits". You gives a great answer to another
> question which I also confused in.
>
>   I hope I did not give too much away :)
>
>  The output of sort program will be 10GB, so it will not fit in memory.
> That why I want to split the intermediate files and send them to several
> merge task. Each merge task will generate, say, a 100MB file. So the result
> of my sort program will have 10GB/100MB=100 files, with file1 have the
> smallest words and file100 have the largest words.
>
>  From your answer, I believe this can be deal with by using s3fs? so:
> 4) Yes, I want some help to use s3fs.
>
> Since this is something that would be of general interest. I will update
> the github readme page with directions for how to run swift
> over a s3fs acting as a shared-filesystem.
>
>  But, are there any other general ways to deal with big-file-sorting in
> swift? Can you give me a hit about what they would be? Like, how you
> generate deal with this sorting problem?
>
> The simplest strategy I can think of is to split each chunk into say 100
> buckets, and have the corresponding buckets from every chunk merge-sorted.
>
>  Thanks,
> Jiada Tu
>
> On Sat, Oct 18, 2014 at 6:13 PM, Yadu Nand Babuji <yadunand at uchicago.edu>
> wrote:
>
>>  Hi Jiada Tu,
>>
>> 1) Here's an example for returning an array of files :
>>
>> type file;
>> app (file outs[]) make_outputs (file script)
>> {
>>     bash @script;
>> }
>>
>> file outputs[] <filesys_mapper; prefix="outputs">;
>> file script       <"make_outputs.sh">; # This script creates a few files
>> with outputs as prefix
>> (outputs) = make_outputs(script);
>>
>> 2) The products of a successful task execution, must be visible to the
>> headnode (where swift runs) either through a
>> - shared filesystem (NFS, S3 mounted over s3fs etc)  or
>> - must be brought back over the network.
>> But, we can reduce the overhead in moving the results to the headnode and
>> then to the workers for the reduce stage.
>>
>> I understand that this is part of your assignment, so I will try to
>> answer without getting too specific, at the same time,
>> concepts from hadoop do not necessarily work directly in this context. So
>> here are some things to consider to get
>> the best performance possible:
>>
>> - Assuming that the texts contain 10K unique words, your sort program
>> will generate a file containing atmost 10K lines
>>  (which would be definitely under an MB). Is there any advantage into
>> splitting this into smaller files ?
>>
>> - Since the final merge involves tiny files, you could very well do the
>> reduce stage on the headnode and be quite efficient
>>   (you can define the reduce app only for site:local)
>>
>>   sites : [local, cloud-static]
>>   site.local {
>>                 ....
>>                 app.reduce {
>>                         executable : ${env.PWD}/reduce.py
>>                 }
>>   }
>>
>>   site.cloud-static {
>>                 ....
>>                 app.python {
>>                         executable : /usr/bin/python
>>                 }
>>
>>  }
>>
>>  This assumes that you are going to define your sorting app like this :
>>
>>   app (file freqs) sort (file sorting_script, file input ) {
>>        python @sorting_script @input;
>>  }
>>
>>
>> - The real cost is in having the original text reach the workers, this
>> can be made faster by :
>>     - A better headnode with better network/disk IO (I've measured
>> 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s)
>>     - Use S3 with S3fs and have swift-workers pull data from S3 which is
>> pretty scalable, and remove the IO load from the headnode.
>>
>> - Identify the optimal size for data chunks for your specific problem.
>> Each chunk of data in this case comes with the overhead of starting
>>   a new remote task, sending the data and bringing results back. Note
>> that the result of a wordcount on a file whether it is 1Mb or 10Gb
>>   is still the atmost 1Mb (with earlier assumptions)
>>
>> - Ensure that the data with the same datacenter, for cost as well as
>> performance. By limiting the cluster to US-Oregon we already do this.
>>
>> If you would like to attempt this using S3FS, let me know, I'll be happy
>> to explain that in detail.
>>
>> Thanks,
>> Yadu
>>
>>
>>
>> On 10/18/2014 04:18 PM, Jiada Tu wrote:
>>
>>  I am doing an assignment with swift to sort large data. The data
>> contains one record (string) each line. We need to sort the records base on
>> ascii code. The data is too large to fit in the memory.
>>
>>  The large data file is in head node, and I run the swift script
>> directly on head node.
>>
>>  Here's what I plan to do:
>>
>>  1) split the big file into 64MB files
>> 2) let each worker task sort one 64MB files. Say, each task will call a
>> "sort.py" (written by me). sort.py will output a list of files,
>> say:"sorted-worker1-001; sorted-worker1-002; ......". The first file
>> contains the records started with 'a', the second started with 'b', etc.
>> 3) now we will have all records started with 'a' in
>> (sorted-worker1-001;sorted-worker2-001;...); 'b' in
>>  (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all
>> the files contains records 'a' to a "reduce" worker task and let it merge
>> these files into one single file. Same to 'b', 'c', etc.
>> 4) now we get 26 files (a-z) with each sorted inside.
>>
>>  Basically what I am doing is simulate Map-reduce. step 2 is map and
>> step 3 is reduce
>>
>>  Here comes some problems:
>> 1) for step 2, sort.py need to output a list of files. How can swift app
>> function handles list of outputs?
>>
>>     app (file[] outfiles) sort (file[] infiles) {
>>           sort.py // how to put out files here?
>>     }
>>
>>  2) As I know (may be wrong), swift will stage all the output file back
>> to the local disk (here is the head node since I run the swift script
>> directly on headnode). So the output files in step 2 will be staged back to
>> head node first, then stage from head node to the worker nodes to do the
>> step 3, then stage the 26 files in step 4 back to head node. I don't want
>> it because the network will be a huge bottleneck. Is there any way to tell
>> the "reduce" worker to get data directly from "map" worker? Maybe a shared
>> file system will help, but is there any way that user can control the data
>> staging between workers without using the shared file system?
>>
>>  Since I am new to the swift, I may be totally wrong and
>> misunderstanding what swift do. If so, please correct me.
>>
>>
>>
>>
>>  _______________________________________________
>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141019/2152d3c4/attachment.html>

From yadunand at uchicago.edu  Mon Oct 20 20:03:11 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Mon, 20 Oct 2014 20:03:11 -0500
Subject: [Swift-user] sort on large data
In-Reply-To: <CAGYVvuUiBXbMOhmE4kawjbi5E0OFetUyq27QeNCb=ZFA6-y1wA@mail.gmail.com>
References: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
	<5442F400.9020805@uchicago.edu>
	<CAGYVvuUiBXbMOhmE4kawjbi5E0OFetUyq27QeNCb=ZFA6-y1wA@mail.gmail.com>
Message-ID: <5445B0CF.2000502@uchicago.edu>

Hi,

@Jiada, Dongfang,

I've updated the README on the 
https://github.com/yadudoc/cloud-tutorials page with documentation on 
how to use
s3fs as a shared filesystem. I've added configs and links to external 
documentation. Please try it, and let me know
if any of it is unclear or buggy.

I would also appreciate help from anyone in testing this.

@Gagan,
That was most likely a bug in my scripts, where the user script is 
executed ahead of the installation of s3fs on the worker nodes.
Please try again, and if you see the same behavior, please let me know.

Thanks,
Yadu

On 10/19/2014 12:08 AM, Gagan Munisiddha Gowda wrote:
> Hi Yadu,
>
> I am in the same direction where I am trying to use a shared file 
> system (S3 bucket / S3FS).
>
> I have setup : /WORKER_INIT_SCRIPT=/path/to/mounts3fs.sh in 
> cloud-tutorials/ec2/configs//(as mentioned in the tutorials)/
>
> Though i am able to setup the passwd-s3fs file in the desired location 
> (using mounts3fs.sh script), i see that the S3 bucket is not getting 
> mounted.
>
> I have verified the passwd-s3fs file and mount point and all seems to 
> be created as expected. But, one observation was the owner of these 
> files were 'root' user as it was getting created through the setup.sh.
>
> So, i added more commands to change the permissions and made 'ubuntu' 
> as the owner for all related files.
>
> Even after all these changes, i see that the S3 bucket is still not 
> mounted.
>
> *PS: If i connect to the workers and run the s3fs command manually, it 
> does mount !*
>
> sudo s3fs -o allow_other,gid=1000,use_cache=/home/ubuntu/cache 
> <my-bucket> <mount-point>;
>
> (tried with and without sudo)
>
> Thanks for your help.
>
>
> On Sun, Oct 19, 2014 at 4:43 AM, Yadu Nand Babuji 
> <yadunand at uchicago.edu <mailto:yadunand at uchicago.edu>> wrote:
>
>     Hi Jiada Tu,
>
>     1) Here's an example for returning an array of files :
>
>     type file;
>     app (file outs[]) make_outputs (file script)
>     {
>         bash @script;
>     }
>
>     file outputs[] <filesys_mapper; prefix="outputs">;
>     file script       <"make_outputs.sh">; # This script creates a few
>     files with outputs as prefix
>     (outputs) = make_outputs(script);
>
>     2) The products of a successful task execution, must be visible to
>     the headnode (where swift runs) either through a
>     - shared filesystem (NFS, S3 mounted over s3fs etc)  or
>     - must be brought back over the network.
>     But, we can reduce the overhead in moving the results to the
>     headnode and then to the workers for the reduce stage.
>
>     I understand that this is part of your assignment, so I will try
>     to answer without getting too specific, at the same time,
>     concepts from hadoop do not necessarily work directly in this
>     context. So here are some things to consider to get
>     the best performance possible:
>
>     - Assuming that the texts contain 10K unique words, your sort
>     program will generate a file containing atmost 10K lines
>      (which would be definitely under an MB). Is there any advantage
>     into splitting this into smaller files ?
>
>     - Since the final merge involves tiny files, you could very well
>     do the reduce stage on the headnode and be quite efficient
>       (you can define the reduce app only for site:local)
>
>       sites : [local, cloud-static]
>       site.local {
>                     ....
>                     app.reduce {
>                             executable : ${env.PWD}/reduce.py
>                     }
>       }
>
>       site.cloud-static {
>                     ....
>                     app.python {
>                             executable : /usr/bin/python
>                     }
>
>      }
>
>      This assumes that you are going to define your sorting app like
>     this :
>
>       app (file freqs) sort (file sorting_script, file input ) {
>            python @sorting_script @input;
>      }
>
>
>     - The real cost is in having the original text reach the workers,
>     this can be made faster by :
>         - A better headnode with better network/disk IO (I've measured
>     140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s)
>         - Use S3 with S3fs and have swift-workers pull data from S3
>     which is pretty scalable, and remove the IO load from the headnode.
>
>     - Identify the optimal size for data chunks for your specific
>     problem. Each chunk of data in this case comes with the overhead
>     of starting
>       a new remote task, sending the data and bringing results back.
>     Note that the result of a wordcount on a file whether it is 1Mb or
>     10Gb
>       is still the atmost 1Mb (with earlier assumptions)
>
>     - Ensure that the data with the same datacenter, for cost as well
>     as performance. By limiting the cluster to US-Oregon we already do
>     this.
>
>     If you would like to attempt this using S3FS, let me know, I'll be
>     happy to explain that in detail.
>
>     Thanks,
>     Yadu
>
>
>
>     On 10/18/2014 04:18 PM, Jiada Tu wrote:
>>     I am doing an assignment with swift to sort large data. The data
>>     contains one record (string) each line. We need to sort the
>>     records base on ascii code. The data is too large to fit in the
>>     memory.
>>
>>     The large data file is in head node, and I run the swift script
>>     directly on head node.
>>
>>     Here's what I plan to do:
>>
>>     1) split the big file into 64MB files
>>     2) let each worker task sort one 64MB files. Say, each task will
>>     call a "sort.py" (written by me). sort.py will output a list of
>>     files, say:"sorted-worker1-001; sorted-worker1-002; ......". The
>>     first file contains the records started with 'a', the second
>>     started with 'b', etc.
>>     3) now we will have all records started with 'a' in
>>     (sorted-worker1-001;sorted-worker2-001;...); 'b' in
>>      (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I
>>     send all the files contains records 'a' to a "reduce" worker task
>>     and let it merge these files into one single file. Same to 'b',
>>     'c', etc.
>>     4) now we get 26 files (a-z) with each sorted inside.
>>
>>     Basically what I am doing is simulate Map-reduce. step 2 is map
>>     and step 3 is reduce
>>
>>     Here comes some problems:
>>     1) for step 2, sort.py need to output a list of files. How can
>>     swift app function handles list of outputs?
>>         app (file[] outfiles) sort (file[] infiles) {
>>               sort.py // how to put out files here?
>>         }
>>
>>     2) As I know (may be wrong), swift will stage all the output file
>>     back to the local disk (here is the head node since I run the
>>     swift script directly on headnode). So the output files in step 2
>>     will be staged back to head node first, then stage from head node
>>     to the worker nodes to do the step 3, then stage the 26 files in
>>     step 4 back to head node. I don't want it because the network
>>     will be a huge bottleneck. Is there any way to tell the "reduce"
>>     worker to get data directly from "map" worker? Maybe a shared
>>     file system will help, but is there any way that user can control
>>     the data staging between workers without using the shared file
>>     system?
>>
>>     Since I am new to the swift, I may be totally wrong and
>>     misunderstanding what swift do. If so, please correct me.
>>
>>
>>
>>
>>     _______________________________________________
>>     Swift-user mailing list
>>     Swift-user at ci.uchicago.edu  <mailto:Swift-user at ci.uchicago.edu>
>>     https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>     _______________________________________________
>     Swift-user mailing list
>     Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>     https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
> -- 
> Regards,
> Gagan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141020/37ef5b4a/attachment.html>

From vpabani at hawk.iit.edu  Mon Oct 20 22:20:16 2014
From: vpabani at hawk.iit.edu (Vivek Pabani)
Date: Mon, 20 Oct 2014 22:20:16 -0500
Subject: [Swift-user] swift-conf error | Issue with specification on
	local/cloud-static
Message-ID: <CAEfC6cBAN8LhoEZYMp2WncZpgrzjSyuS=cpeAPJ040nkGpQiww@mail.gmail.com>

Hello,

For a word count program, I need to run one bash script locally - on
headnode, and one python script on the all the worker nodes. Could you
please tell me how do I specify which part of swift program should be run
on headnode, and which part on workers node?

As of now, I have made following changes in the swift.conf file :

------------------------------------------------------------------
sites: [local,cloud-static]               //This seems to have problem,
because I get error that python_remote cannot be found on Host : local.

site.local {
    filesystem {
    <same lines as p5.swift program's swift.config>
        app.bash_local {
                executable: "/bin/bash"
         }
}

site.cloud-static {
    execution {
        <same lines as p5.swift program's swift.config>
         app.python_remote {
              executable: "/usr/bin/python"
       }
}

------------------------------------------------------------
--------------------------
My swift program uses only two executable : bash_local, and python_remote.
When I run this, I get error :
Caused by: Cannot find executable python_remote on site system path.

Please let me know what should be the correct config changes.
Thanks,

Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141020/630e6066/attachment.html>

From yadudoc1729 at gmail.com  Mon Oct 20 23:02:57 2014
From: yadudoc1729 at gmail.com (Yadu Nand)
Date: Mon, 20 Oct 2014 23:02:57 -0500
Subject: [Swift-user] swift-conf error | Issue with specification on
	local/cloud-static
In-Reply-To: <CAEfC6cBAN8LhoEZYMp2WncZpgrzjSyuS=cpeAPJ040nkGpQiww@mail.gmail.com>
References: <CAEfC6cBAN8LhoEZYMp2WncZpgrzjSyuS=cpeAPJ040nkGpQiww@mail.gmail.com>
Message-ID: <CANa904=Qbno5LKon+O-wkdbNq1tBNyuzJmuD4isYeN7ayu6+Mw@mail.gmail.com>

Hi Vivek,

Could you send us a tarball of the runNNN folder from a failed run please ?

-Yadu

On Mon, Oct 20, 2014 at 10:20 PM, Vivek Pabani <vpabani at hawk.iit.edu> wrote:

> Hello,
>
> For a word count program, I need to run one bash script locally - on
> headnode, and one python script on the all the worker nodes. Could you
> please tell me how do I specify which part of swift program should be run
> on headnode, and which part on workers node?
>
> As of now, I have made following changes in the swift.conf file :
>
> ------------------------------------------------------------------
> sites: [local,cloud-static]               //This seems to have problem,
> because I get error that python_remote cannot be found on Host : local.
>
> site.local {
>     filesystem {
>     <same lines as p5.swift program's swift.config>
>         app.bash_local {
>                 executable: "/bin/bash"
>          }
> }
>
> site.cloud-static {
>     execution {
>         <same lines as p5.swift program's swift.config>
>          app.python_remote {
>               executable: "/usr/bin/python"
>        }
> }
>
> ------------------------------------------------------------
> --------------------------
> My swift program uses only two executable : bash_local, and python_remote.
> When I run this, I get error :
> Caused by: Cannot find executable python_remote on site system path.
>
> Please let me know what should be the correct config changes.
> Thanks,
>
> Vivek
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>


-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141020/1a72e298/attachment.html>

From ggowda at hawk.iit.edu  Mon Oct 20 23:28:33 2014
From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda)
Date: Tue, 21 Oct 2014 09:58:33 +0530
Subject: [Swift-user] sort on large data
In-Reply-To: <5445B0CF.2000502@uchicago.edu>
References: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
	<5442F400.9020805@uchicago.edu>
	<CAGYVvuUiBXbMOhmE4kawjbi5E0OFetUyq27QeNCb=ZFA6-y1wA@mail.gmail.com>
	<5445B0CF.2000502@uchicago.edu>
Message-ID: <CAGYVvuVumoAfFCaDwd0y0_ecuB6cnuen31G33y8VgStfavK0dA@mail.gmail.com>

Great Yadu !

Thanks for your help !

Regards,
Gagan
On 21/10/2014 6:33 am, "Yadu Nand Babuji" <yadunand at uchicago.edu> wrote:

>  Hi,
>
> @Jiada, Dongfang,
>
> I've updated the README on the https://github.com/yadudoc/cloud-tutorials
> page with documentation on how to use
> s3fs as a shared filesystem. I've added configs and links to external
> documentation. Please try it, and let me know
> if any of it is unclear or buggy.
>
> I would also appreciate help from anyone in testing this.
>
> @Gagan,
> That was most likely a bug in my scripts, where the user script is
> executed ahead of the installation of s3fs on the worker nodes.
> Please try again, and if you see the same behavior, please let me know.
>
> Thanks,
> Yadu
>
> On 10/19/2014 12:08 AM, Gagan Munisiddha Gowda wrote:
>
>   Hi Yadu,
>
>  I am in the same direction where I am trying to use a shared file system
> (S3 bucket / S3FS).
>
>  I have setup : *WORKER_INIT_SCRIPT=/path/to/mounts3fs.sh in
> cloud-tutorials/ec2/configs** (as mentioned in the tutorials)*
>
>  Though i am able to setup the passwd-s3fs file in the desired location
> (using mounts3fs.sh script), i see that the S3 bucket is not getting
> mounted.
>
>  I have verified the passwd-s3fs file and mount point and all seems to be
> created as expected. But, one observation was the owner of these files were
> 'root' user as it was getting created through the setup.sh.
>
>  So, i added more commands to change the permissions and made 'ubuntu' as
> the owner for all related files.
>
>  Even after all these changes, i see that the S3 bucket is still not
> mounted.
>
>  *PS: If i connect to the workers and run the s3fs command manually, it
> does mount !*
>
> sudo s3fs -o allow_other,gid=1000,use_cache=/home/ubuntu/cache <my-bucket>
> <mount-point>;
>
>  (tried with and without sudo)
>
>  Thanks for your help.
>
>
> On Sun, Oct 19, 2014 at 4:43 AM, Yadu Nand Babuji <yadunand at uchicago.edu>
> wrote:
>
>>  Hi Jiada Tu,
>>
>> 1) Here's an example for returning an array of files :
>>
>> type file;
>> app (file outs[]) make_outputs (file script)
>> {
>>     bash @script;
>> }
>>
>> file outputs[] <filesys_mapper; prefix="outputs">;
>> file script       <"make_outputs.sh">; # This script creates a few files
>> with outputs as prefix
>> (outputs) = make_outputs(script);
>>
>> 2) The products of a successful task execution, must be visible to the
>> headnode (where swift runs) either through a
>> - shared filesystem (NFS, S3 mounted over s3fs etc)  or
>> - must be brought back over the network.
>> But, we can reduce the overhead in moving the results to the headnode and
>> then to the workers for the reduce stage.
>>
>> I understand that this is part of your assignment, so I will try to
>> answer without getting too specific, at the same time,
>> concepts from hadoop do not necessarily work directly in this context. So
>> here are some things to consider to get
>> the best performance possible:
>>
>> - Assuming that the texts contain 10K unique words, your sort program
>> will generate a file containing atmost 10K lines
>>  (which would be definitely under an MB). Is there any advantage into
>> splitting this into smaller files ?
>>
>> - Since the final merge involves tiny files, you could very well do the
>> reduce stage on the headnode and be quite efficient
>>   (you can define the reduce app only for site:local)
>>
>>   sites : [local, cloud-static]
>>   site.local {
>>                 ....
>>                 app.reduce {
>>                         executable : ${env.PWD}/reduce.py
>>                 }
>>   }
>>
>>   site.cloud-static {
>>                 ....
>>                 app.python {
>>                         executable : /usr/bin/python
>>                 }
>>
>>  }
>>
>>  This assumes that you are going to define your sorting app like this :
>>
>>   app (file freqs) sort (file sorting_script, file input ) {
>>        python @sorting_script @input;
>>  }
>>
>>
>> - The real cost is in having the original text reach the workers, this
>> can be made faster by :
>>     - A better headnode with better network/disk IO (I've measured
>> 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s)
>>     - Use S3 with S3fs and have swift-workers pull data from S3 which is
>> pretty scalable, and remove the IO load from the headnode.
>>
>> - Identify the optimal size for data chunks for your specific problem.
>> Each chunk of data in this case comes with the overhead of starting
>>   a new remote task, sending the data and bringing results back. Note
>> that the result of a wordcount on a file whether it is 1Mb or 10Gb
>>   is still the atmost 1Mb (with earlier assumptions)
>>
>> - Ensure that the data with the same datacenter, for cost as well as
>> performance. By limiting the cluster to US-Oregon we already do this.
>>
>> If you would like to attempt this using S3FS, let me know, I'll be happy
>> to explain that in detail.
>>
>> Thanks,
>> Yadu
>>
>>
>>
>> On 10/18/2014 04:18 PM, Jiada Tu wrote:
>>
>>  I am doing an assignment with swift to sort large data. The data
>> contains one record (string) each line. We need to sort the records base on
>> ascii code. The data is too large to fit in the memory.
>>
>>  The large data file is in head node, and I run the swift script
>> directly on head node.
>>
>>  Here's what I plan to do:
>>
>>  1) split the big file into 64MB files
>> 2) let each worker task sort one 64MB files. Say, each task will call a
>> "sort.py" (written by me). sort.py will output a list of files,
>> say:"sorted-worker1-001; sorted-worker1-002; ......". The first file
>> contains the records started with 'a', the second started with 'b', etc.
>> 3) now we will have all records started with 'a' in
>> (sorted-worker1-001;sorted-worker2-001;...); 'b' in
>>  (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all
>> the files contains records 'a' to a "reduce" worker task and let it merge
>> these files into one single file. Same to 'b', 'c', etc.
>> 4) now we get 26 files (a-z) with each sorted inside.
>>
>>  Basically what I am doing is simulate Map-reduce. step 2 is map and
>> step 3 is reduce
>>
>>  Here comes some problems:
>> 1) for step 2, sort.py need to output a list of files. How can swift app
>> function handles list of outputs?
>>
>>     app (file[] outfiles) sort (file[] infiles) {
>>           sort.py // how to put out files here?
>>     }
>>
>>  2) As I know (may be wrong), swift will stage all the output file back
>> to the local disk (here is the head node since I run the swift script
>> directly on headnode). So the output files in step 2 will be staged back to
>> head node first, then stage from head node to the worker nodes to do the
>> step 3, then stage the 26 files in step 4 back to head node. I don't want
>> it because the network will be a huge bottleneck. Is there any way to tell
>> the "reduce" worker to get data directly from "map" worker? Maybe a shared
>> file system will help, but is there any way that user can control the data
>> staging between workers without using the shared file system?
>>
>>  Since I am new to the swift, I may be totally wrong and
>> misunderstanding what swift do. If so, please correct me.
>>
>>
>>
>>
>>  _______________________________________________
>> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>
>
>
> --
> Regards,
> Gagan
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141021/2cf63c9f/attachment.html>

From jtu3 at hawk.iit.edu  Wed Oct 22 02:11:15 2014
From: jtu3 at hawk.iit.edu (Jiada Tu)
Date: Wed, 22 Oct 2014 02:11:15 -0500
Subject: [Swift-user] sort on large data
In-Reply-To: <5445B0CF.2000502@uchicago.edu>
References: <CAC67CjYyJBnpn54CEYRT8StyoD9fLfRcPhDxaiqYNJkROGf4mA@mail.gmail.com>
	<5442F400.9020805@uchicago.edu>
	<CAGYVvuUiBXbMOhmE4kawjbi5E0OFetUyq27QeNCb=ZFA6-y1wA@mail.gmail.com>
	<5445B0CF.2000502@uchicago.edu>
Message-ID: <CAC67Cjbhb6Vgy2OmiXvsJbqebkdD+=cmzNBTAS9W0TRcsRDM_w@mail.gmail.com>

Hi Yadu,

I have tested the new posted tutorial on s3fs and it works. I can run my
wordCount on s3fs now.

But there's a little problem.
I put all script and input file in /s3/wordCount-s3fs, and set the work
directory to /s3/wordCount-s3fs. Then I found that I can't set relative
path of files in my swift script, I have to set absolute path for all files.

*If I use absolute path to all input and output files, everything works
fine.

If I did:

file infile[] <filesys_mapper;pattern="input/split-*", location=".">;


It will have a exception:


----------------------------------------

Execution failed:

Exception in python:

    Arguments: [/s3/wordCount-s3fs/./wordCount.py, input/split-0006]

    Host: cloud-static

    Directory: wordCount-run001/jobs/g/python-g9k2d6zl

        exception @ swift-int-staging.k, line: 167

Caused by: Application /usr/bin/python failed with an exit code of 1


------- Application STDERR --------

wordcount error: file name "./input/split-0006" not exist.

Traceback (most recent call last):

  File "/s3/wordCount-s3fs/./wordCount.py", line 12, in <module>

    f=open(fileName, 'r')

IOError: [Errno 2] No such file or directory: './input/split-0006'

-----------------------------------

        exception @ swift-int-staging.k, line: 163

Caused by: Block task failed: Connection to worker lost

java.io.EOFException

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.readFromChannel(AbstractStreamCoasterChannel.java:253)

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:186)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:116)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:75)


        k:assign @ swift.k, line: 171

Caused by: Exception in python:

    Arguments: [/s3/wordCount-s3fs/./wordCount.py, input/split-0006]

    Host: cloud-static

    Directory: wordCount-run001/jobs/g/python-g9k2d6zl

        exception @ swift-int-staging.k, line: 167

Caused by: Application /usr/bin/python failed with an exit code of 1


------- Application STDERR --------

wordcount error: file name "./input/split-0006" not exist.

Traceback (most recent call last):

  File "/s3/wordCount-s3fs/./wordCount.py", line 12, in <module>

    f=open(fileName, 'r')

IOError: [Errno 2] No such file or directory: './input/split-0006'

-----------------------------------


        exception @ swift-int-staging.k, line: 163

Caused by: Block task failed: Connection to worker lost

java.io.EOFException

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.readFromChannel(AbstractStreamCoasterChannel.java:253)

        at
org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:186)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:116)

        at
org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:75)


------------------------------------------------


Which basically says ./input/split-000 not exist. I'm sure
/s3/wordCount-s3fs/input/split-000 do exist. And I don't want to enter
absolute path to all files if possible.


Any idea about how to deal with it?


Thanks,

Jiada Tu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141022/4c79cd74/attachment.html>

From aanthon2 at hawk.iit.edu  Thu Oct 23 20:59:24 2014
From: aanthon2 at hawk.iit.edu (Ajay Anthony)
Date: Thu, 23 Oct 2014 21:59:24 -0400
Subject: [Swift-user] CS553: Stuck with Swift "bash application not
 available" issue . Please suggest
Message-ID: <CAH92yaD5Ermr=SAZskZDkAb39iXO5D=3EHoFKLO6VqEcY5=acw@mail.gmail.com>

Hi Swift Users,

Please help me with below issue. I am stuck on this since 2 days.

I am trying to run split.sh script through word.swift file, which will
split a file in 16 chunks.

split.sh:
#! /bin/bash

split -a 2 -d -n 16  small-dataset
/home/ajay/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat/input/small-dataset


word.swift:
type file;

app (file out) split (file script)
{
    bash @script stdout=@filename(out);
}

file script_file <"*split.sh*">;
file output;
output = split(script_file);


I am facing the below issue:

RunID: 20141023-2145-ur5u9gpa
Progress:  time: Thu, 23 Oct 2014 21:45:55 -0400
Execution failed:
    The application "bash" is not available for any site/pool in your
tc.data catalog
    split, word.swift, line 10


The script version I am using is :
ajay at ubuntu:~/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat$
swift -version
Swift 0.94.1 swift-r7114 cog-r3803


My swift.conf file has below configurations:

sites: local

site.local {
    filesystem {
        type: "local"
        URL: "localhost"
    }
    execution {
        type: "local"
        URL: "localhost"
    }
    workDirectory: /tmp/${env.USER}
    maxParallelTasks: 32
    initialParallelTasks: 31
    app.ALL {executable: "*"}
}

Thanks in Advance.

Regards,
Ajay Anthony,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141023/2f18865e/attachment.html>

From hategan at mcs.anl.gov  Thu Oct 23 21:46:25 2014
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 23 Oct 2014 19:46:25 -0700
Subject: [Swift-user] CS553: Stuck with Swift "bash application not
 available" issue . Please suggest
In-Reply-To: <CAH92yaD5Ermr=SAZskZDkAb39iXO5D=3EHoFKLO6VqEcY5=acw@mail.gmail.com>
References: <CAH92yaD5Ermr=SAZskZDkAb39iXO5D=3EHoFKLO6VqEcY5=acw@mail.gmail.com>
Message-ID: <1414118785.31177.2.camel@echo>

Hi,

You have some version mismatch there.

> swift -version
> Swift 0.94.1 swift-r7114 cog-r3803

However, the swift configuration format that you are using (swift.conf),
is only available in trunk, and that is two versions above 0.94.

So the question is how you got that swift version, and why that
particular version.

Mihael

On Thu, 2014-10-23 at 21:59 -0400, Ajay Anthony wrote:
> Hi Swift Users,
> 
> Please help me with below issue. I am stuck on this since 2 days.
> 
> I am trying to run split.sh script through word.swift file, which will
> split a file in 16 chunks.
> 
> split.sh:
> #! /bin/bash
> 
> split -a 2 -d -n 16  small-dataset
> /home/ajay/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat/input/small-dataset
> 
> 
> 
> word.swift:
> type file;
> 
> app (file out) split (file script)
> {
>     bash @script stdout=@filename(out);
> }
> 
> file script_file <"*split.sh*">;
> file output;
> output = split(script_file);
> 
> 
> I am facing the below issue:
> 
> RunID: 20141023-2145-ur5u9gpa
> Progress:  time: Thu, 23 Oct 2014 21:45:55 -0400
> Execution failed:
>     The application "bash" is not available for any site/pool in your
> tc.data catalog
>     split, word.swift, line 10
> 
> 
> The script version I am using is :
> ajay at ubuntu:~/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat$
> swift -version
> Swift 0.94.1 swift-r7114 cog-r3803
> 
> 
> My swift.conf file has below configurations:
> 
> sites: local
> 
> site.local {
>     filesystem {
>         type: "local"
>         URL: "localhost"
>     }
>     execution {
>         type: "local"
>         URL: "localhost"
>     }
>     workDirectory: /tmp/${env.USER}
>     maxParallelTasks: 32
>     initialParallelTasks: 31
>     app.ALL {executable: "*"}
> }
> 
> Thanks in Advance.
> 
> Regards,
> Ajay Anthony,
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user


From yadudoc1729 at gmail.com  Thu Oct 23 21:50:46 2014
From: yadudoc1729 at gmail.com (Yadu Nand)
Date: Thu, 23 Oct 2014 21:50:46 -0500
Subject: [Swift-user] CS553: Stuck with Swift "bash application not
 available" issue . Please suggest
In-Reply-To: <1414118785.31177.2.camel@echo>
References: <CAH92yaD5Ermr=SAZskZDkAb39iXO5D=3EHoFKLO6VqEcY5=acw@mail.gmail.com>
	<1414118785.31177.2.camel@echo>
Message-ID: <CANa904nkSdYUF17D7dgQ=jMitCxbvMLaWg6ygjQrBy4qq4vcow@mail.gmail.com>

Hi Ajay,

As Mihael pointed out, you are using the wrong swift version for the
tutorials you are running.

Please try the following steps on the headnode instance:

cd /home/ubuntu/cloud-tutorials/swift-cloud-tutorial/
source setup.sh
swift -version

The swift version should output something like the following  :
Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master

If that is what you see, please retry the tutorials.

-Yadu

On Thu, Oct 23, 2014 at 9:46 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

> Hi,
>
> You have some version mismatch there.
>
> > swift -version
> > Swift 0.94.1 swift-r7114 cog-r3803
>
> However, the swift configuration format that you are using (swift.conf),
> is only available in trunk, and that is two versions above 0.94.
>
> So the question is how you got that swift version, and why that
> particular version.
>
> Mihael
>
> On Thu, 2014-10-23 at 21:59 -0400, Ajay Anthony wrote:
> > Hi Swift Users,
> >
> > Please help me with below issue. I am stuck on this since 2 days.
> >
> > I am trying to run split.sh script through word.swift file, which will
> > split a file in 16 chunks.
> >
> > split.sh:
> > #! /bin/bash
> >
> > split -a 2 -d -n 16  small-dataset
> >
> /home/ajay/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat/input/small-dataset
> >
> >
> >
> > word.swift:
> > type file;
> >
> > app (file out) split (file script)
> > {
> >     bash @script stdout=@filename(out);
> > }
> >
> > file script_file <"*split.sh*">;
> > file output;
> > output = split(script_file);
> >
> >
> > I am facing the below issue:
> >
> > RunID: 20141023-2145-ur5u9gpa
> > Progress:  time: Thu, 23 Oct 2014 21:45:55 -0400
> > Execution failed:
> >     The application "bash" is not available for any site/pool in your
> > tc.data catalog
> >     split, word.swift, line 10
> >
> >
> > The script version I am using is :
> > ajay at ubuntu
> :~/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat$
> > swift -version
> > Swift 0.94.1 swift-r7114 cog-r3803
> >
> >
> > My swift.conf file has below configurations:
> >
> > sites: local
> >
> > site.local {
> >     filesystem {
> >         type: "local"
> >         URL: "localhost"
> >     }
> >     execution {
> >         type: "local"
> >         URL: "localhost"
> >     }
> >     workDirectory: /tmp/${env.USER}
> >     maxParallelTasks: 32
> >     initialParallelTasks: 31
> >     app.ALL {executable: "*"}
> > }
> >
> > Thanks in Advance.
> >
> > Regards,
> > Ajay Anthony,
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>


-- 
Yadu Nand B
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141023/2baf395e/attachment.html>

From iraicu at cs.iit.edu  Fri Oct 24 18:55:32 2014
From: iraicu at cs.iit.edu (Ioan Raicu)
Date: Fri, 24 Oct 2014 18:55:32 -0500
Subject: [Swift-user] CFP: IEEE Cluster 2015 -- Chicago IL September 8-11
	2015
Message-ID: <544AE6F4.7080506@cs.iit.edu>

IEEE International Conference on Cluster Computing
September 8-11, 2015
Chicago, IL, USA
https://press3.mcs.anl.gov/ieeecluster2015/

----------------------------------------------
...Follow us on Facebook athttps://www.facebook.com/ieee.cluster
...Follow us on Twitter athttps://twitter.com/IEEECluster
...Follow us on Linkedin at
https://www.linkedin.com/groups/IEEE-International-Conference-on-Cluster-7428925
...Follow us on RenRen athttp://page.renren.com/601871401

----------------------------------------------
CALL FOR PAPERS

Following the successes of the series of Cluster conferences, for 2015
we solicit high-quality original papers presenting work that advances
the state-of-the-art in clusters and closely related fields. All
papers will be rigorously peer-reviewed for their originality,
technical depth and correctness, potential impact, relevance to the
conference, and quality of presentation. Research papers must clearly
demonstrate research contributions and novelty, while papers reporting
experience must clearly describe lessons learned and impact, along
with the utility of the approach compared to the ones in the past.

PAPER TRACKS
  *  Applications, Algorithms, and Libraries
  *  Architecture, Networks/Communication, and Management
  *  Programming and Systems Software
  *  Data, Storage, and Visualization

PROCEEDINGS:
Proceedings of the conference and workshops will be available online
when the conference starts and will be submitted to IEEE Xplore and
for EI indexing.

SPECIAL JOURNAL ISSUE:
The best papers of Cluster 2015 will be included in a Special Issue on
advances in topics related to cluster computing of the Elsevier
International Journal of Parallel Computing (PARCO), edited by Pavan
Balaji, Satoshi Matsuoka, and Michela Taufer. This special issue is
dedicated for the papers accepted in the Cluster 2015 conference. The
submission to this special issue is by invitation only.

IMPORTANT DATES
September 27, 2014 .... Submissions open for Workshops
January 1, 2015 ........... Submissions open for Papers, Posters, and Tutorials
February 27, 2015 ....... Papers Submission Deadline
April 23, 2015 ............... Papers Acceptance Notification
May 1, 2015 ................. Posters Submission Deadline
May 1, 2015 ................. Submissions open for Student Mentoring Program
June 1, 2015 ................ Student Mentoring Program Notification (Round 1)
June 15, 2015 .............. Posters Acceptance Notification
June 15, 2015 .............. Student Mentoring Program Notification (Round 2)
June 29, 2015 .............. Student Mentoring Program Notification (Round 3)
July 13, 2015 ............... Student Mentoring Program Notification (Round 4)
July 13, 2015 ............... Student Mentoring Program NSF Grant Notification
August 1, 2015 ............ Camera-ready Copy Deadline for Papers,
Posters, and Workshops

Workshop/Tutorial proposals are selected and notifications are sent on
a first-come basis.

SUBMISSION GUIDELINES
Authors are invited to submit papers electronically in PDF format.
Submitted manuscripts should be structured as technical papers and may
not exceed 10 letter-size (8.5 x 11) pages including figures, tables
and references using the IEEE format for conference proceedings.
Submissions not conforming to these guidelines may be returned without
review. Authors should make sure that their file will print on a
printer that uses letter-size (8.5 x 11) paper. The official language
of the conference is English. All manuscripts will be reviewed and
will be judged on correctness, originality, technical strength,
significance, quality of presentation, and interest and relevance to
the conference attendees.

Paper submissions are limited to 10 pages in 2-column IEEE format
including all figures and references. Submitted manuscripts exceeding
this limit will be returned without review. For the final camera-ready
version, authors with accepted papers may purchase additional pages at
the following rates: 200 USD for each of two additional pages. See
formatting templates for details:
  *  LaTex Package
http://datasys.cs.iit.edu/events/CCGrid2014/IEEECS_confs_LaTeX.zip
  *  Word Template
http://datasys.cs.iit.edu/events/CCGrid2014/instruct8.5x11x2.doc

Submitted papers must represent original unpublished research that is
not currently under review for any other conference or journal. Papers
not following these guidelines will be rejected without review and
further action may be taken, including (but not limited to)
notifications sent to the heads of the institutions of the authors and
sponsors of the conference. Submissions received after the due date,
exceeding the page limit, or not appropriately structured may not be
considered. Authors may contact the conference chairs for more
information. The proceedings will be published through the IEEE
Computer Society Conference Publishing Services.

ORGANIZATION::
-    General Co-chairs: Pavan Balaji (Argonne National Laboratory,
USA), Michela Taufer (University of Delaware, USA)
-    Program Chair:  Satoshi Matsuoka (Tokyo Institute of Technology, USA)
-    Posters Chair:  Seetharami Seelam (IBM, USA)
-    Proceedings Chair:  Antonino Tumeo (Pacific Northwest National
Laboratory, USA)
-    Workshops and Tutorials Chair:  Bronis de Supinski (Lawrence
Livermore National Laboratory, USA)
-    Panels Chair:  Alice Koniges (Lawrence Berkeley National Laboratory, USA)
-    Mentoring Program Chair:  Luc Boug? (?cole Normale Sup?rieure de
Rennes, France)
-    Track Chairs:
       * Applications, Algorithms, and Libraries - Richard Vuduc
(Georgia Tech, USA)
       * Architecture, Networks/Communication, and Management - Todd
Gamblin (Lawrence Livermore National Laboratory, USA)
       * Programming and System Software - Naoya Maruyama (Riken AICS, Japan)
       * Data, Storage, and Visualization  - Gabriel Antoniu (INRIA, France)

MORE INFORMATION:
For more information, contact Pavan Balaji (balaji at anl.gov) or Michela
Taufer (taufer at udel.edu).

-- 
=================================================================
Ioan Raicu, Ph.D.
Assistant Professor, Illinois Institute of Technology (IIT)
Guest Research Faculty, Argonne National Laboratory (ANL)
=================================================================
Data-Intensive Distributed Systems Laboratory, CS/IIT
Distributed Systems Laboratory, MCS/ANL
=================================================================
Editor: IEEE TCC, Springer Cluster, Springer JoCCASA
Chair:  IEEE/ACM MTAGS, ACM ScienceCloud
=================================================================
Cel:      1-847-722-0876
Office:   1-312-567-5704
Email:    iraicu at cs.iit.edu
Web:      http://www.cs.iit.edu/~iraicu/
Web:      http://datasys.cs.iit.edu/
LinkedIn: http://www.linkedin.com/in/ioanraicu
Google:   http://scholar.google.com/citations?user=jE73HYAAAAAJ
=================================================================
=================================================================


From ggowda at hawk.iit.edu  Sat Oct 25 06:53:44 2014
From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda)
Date: Sat, 25 Oct 2014 17:23:44 +0530
Subject: [Swift-user] Walltime Exceeded
Message-ID: <CAGYVvuU_g=CGuCDRUU6eViyUzG=Dtv=naAoDULJE=T8HLaaKqw@mail.gmail.com>

Hi,

I am running a swift program on 10gb data (split into 100mb each).

I am using a Node.js script for processing.

I did increase the Walltime by adding below to persistent-coaster and
local-coaster files in /usr/local/bin/swift-trunk/etc/sites.

 <profile namespace="globus" key="maxWallTime">00:05:00</profile>

But, of no use. The entire program runs for around 14 minutes when it
throws this exception.


Here's my error :

2014-10-25 10:29:04,113+0000 INFO  Execute TASK_STATUS_CHANGE
taskid=urn:R-3-1414232059111 status=5 Walltime exceeded
2014-10-25 10:29:04,121+0000 INFO  LateBindingScheduler jobs queued: 0
2014-10-25 10:29:04,122+0000 DEBUG swift APPLICATION_EXCEPTION
jobid=node-vvboibzl - Application exception: Walltime exceeded

2014-10-25 10:29:04,125+0000 INFO  swift END_FAILURE thread=R-3 tr=node
2014-10-25 10:29:04,127+0000 INFO  Loader Swift finished with errors

Please find complete log file attached.

-- 
Regards,
Gagan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/ad5d58dc/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: swift.log
Type: text/x-log
Size: 987717 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/ad5d58dc/attachment.bin>

From yadunand at uchicago.edu  Sat Oct 25 12:45:13 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Sat, 25 Oct 2014 12:45:13 -0500
Subject: [Swift-user] Walltime Exceeded
In-Reply-To: <CAGYVvuU_g=CGuCDRUU6eViyUzG=Dtv=naAoDULJE=T8HLaaKqw@mail.gmail.com>
References: <CAGYVvuU_g=CGuCDRUU6eViyUzG=Dtv=naAoDULJE=T8HLaaKqw@mail.gmail.com>
Message-ID: <544BE1A9.9000203@uchicago.edu>

Hi Gagan,

You are using swift-trunk on the cloud nodes, so you should be using the 
(swift.conf) config files for trunk.
The configs are specified using swift.conf files, and you should be able 
to get a few examples from the cloud-tutorials/swift-cloud-tutorial folders.
If you have the swift.conf file in your current directory you need not 
specify the conf file on the swift commandline, say "swift p5.swift" 
while in the
cloud-tutorials/swift-cloud-tutorials/part05 directory.

Here's a sample config with the walltime for ALL apps set to 15 minutes :

sites: local

site.local {
     filesystem {
         type: "local"
         URL: "localhost"
     }
     execution {
         type: "local"
         URL: "localhost"
     }
     workDirectory: /tmp/${env.USER}/swiftwork
     maxParallelTasks: 32
     initialParallelTasks: 31
     app.ALL {
         executable: "*"
         maxWallTime: "00:15:00"
     }
}

The swift-trunk userguide has more detailed documentation on the various 
configuration options that the swift.conf file takes:
http://swift-lang.org/guides/trunk/userguide/userguide.html

Thanks,
Yadu


On 10/25/2014 06:53 AM, Gagan Munisiddha Gowda wrote:
> Hi,
>
> I am running a swift program on 10gb data (split into 100mb each).
>
> I am using a Node.js script for processing.
>
> I did increase the Walltime by adding below to persistent-coaster and 
> local-coaster files in /usr/local/bin/swift-trunk/etc/sites.
>
>  <profile namespace="globus" key="maxWallTime">00:05:00</profile>
>
> But, of no use. The entire program runs for around 14 minutes when it 
> throws this exception.
>
>
> Here's my error :
>
> 2014-10-25 10:29:04,113+0000 INFO  Execute TASK_STATUS_CHANGE 
> taskid=urn:R-3-1414232059111 status=5 Walltime exceeded
> 2014-10-25 10:29:04,121+0000 INFO  LateBindingScheduler jobs queued: 0
> 2014-10-25 10:29:04,122+0000 DEBUG swift APPLICATION_EXCEPTION 
> jobid=node-vvboibzl - Application exception: Walltime exceeded
>
> 2014-10-25 10:29:04,125+0000 INFO  swift END_FAILURE thread=R-3 tr=node
> 2014-10-25 10:29:04,127+0000 INFO  Loader Swift finished with errors
>
> Please find complete log file attached.
>
> -- 
> Regards,
> Gagan
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/7881a7eb/attachment.html>

From wilde at anl.gov  Sat Oct 25 14:24:34 2014
From: wilde at anl.gov (Michael Wilde)
Date: Sat, 25 Oct 2014 14:24:34 -0500
Subject: [Swift-user] Walltime Exceeded
In-Reply-To: <544BE1A9.9000203@uchicago.edu>
References: <CAGYVvuU_g=CGuCDRUU6eViyUzG=Dtv=naAoDULJE=T8HLaaKqw@mail.gmail.com>
	<544BE1A9.9000203@uchicago.edu>
Message-ID: <544BF8F2.2000209@anl.gov>

In addition to Yadu's good pointers, I would add: if your app( ) call 
was terminated by Swift for running for 15 minutes, and you were trying 
to set that time limit down to 5 minutes, then you also need to 
investigate why your app( ) call was running for so long. Is it perhaps 
in an infinite loop, or running much less efficiently than you expected?

Based on how you are managing output files, you should look for a way to 
see if your app( ) run is executing productively and as expected.

- Mike

On 10/25/14 12:45 PM, Yadu Nand Babuji wrote:
> Hi Gagan,
>
> You are using swift-trunk on the cloud nodes, so you should be using 
> the (swift.conf) config files for trunk.
> The configs are specified using swift.conf files, and you should be 
> able to get a few examples from the 
> cloud-tutorials/swift-cloud-tutorial folders.
> If you have the swift.conf file in your current directory you need not 
> specify the conf file on the swift commandline, say "swift p5.swift" 
> while in the
> cloud-tutorials/swift-cloud-tutorials/part05 directory.
>
> Here's a sample config with the walltime for ALL apps set to 15 minutes :
>
> sites: local
>
> site.local {
>     filesystem {
>         type: "local"
>         URL: "localhost"
>     }
>     execution {
>         type: "local"
>         URL: "localhost"
>     }
>     workDirectory: /tmp/${env.USER}/swiftwork
>     maxParallelTasks: 32
>     initialParallelTasks: 31
>     app.ALL {
>         executable: "*"
>         maxWallTime: "00:15:00"
>     }
> }
>
> The swift-trunk userguide has more detailed documentation on the 
> various configuration options that the swift.conf file takes:
> http://swift-lang.org/guides/trunk/userguide/userguide.html
>
> Thanks,
> Yadu
>
>
>
> On 10/25/2014 06:53 AM, Gagan Munisiddha Gowda wrote:
>> Hi,
>>
>> I am running a swift program on 10gb data (split into 100mb each).
>>
>> I am using a Node.js script for processing.
>>
>> I did increase the Walltime by adding below to persistent-coaster and 
>> local-coaster files in /usr/local/bin/swift-trunk/etc/sites.
>>
>>  <profile namespace="globus" key="maxWallTime">00:05:00</profile>
>>
>> But, of no use. The entire program runs for around 14 minutes when it 
>> throws this exception.
>>
>>
>> Here's my error :
>>
>> 2014-10-25 10:29:04,113+0000 INFO  Execute TASK_STATUS_CHANGE 
>> taskid=urn:R-3-1414232059111 status=5 Walltime exceeded
>> 2014-10-25 10:29:04,121+0000 INFO  LateBindingScheduler jobs queued: 0
>> 2014-10-25 10:29:04,122+0000 DEBUG swift APPLICATION_EXCEPTION 
>> jobid=node-vvboibzl - Application exception: Walltime exceeded
>>
>> 2014-10-25 10:29:04,125+0000 INFO  swift END_FAILURE thread=R-3 tr=node
>> 2014-10-25 10:29:04,127+0000 INFO  Loader Swift finished with errors
>>
>> Please find complete log file attached.
>>
>> -- 
>> Regards,
>> Gagan
>>
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/9fe9cd02/attachment.html>

From aanthon2 at hawk.iit.edu  Sat Oct 25 20:35:03 2014
From: aanthon2 at hawk.iit.edu (Ajay Anthony)
Date: Sat, 25 Oct 2014 21:35:03 -0400
Subject: [Swift-user] Swift: ConnectException: Connection refused Issue
Message-ID: <CAH92yaAGxCMQuyT2eRATqPmFTt-Nq57TQ=5LYzQMyk4ztsdTSA@mail.gmail.com>

Hi Swift users,

Kindly help me in below issue. Stuck on this issue almost whole day.

I am trying to run  swift word.swift on  headnode in aws and have 1 worker
running.

2014-10-26 01:17:52,685+0000 INFO  AbstractCoasterChannel
TCPChannel[client, http://127.0.0.1:50010] setting name to
http://127.0.0.1:50010
2014-10-26 01:17:52,709+0000 INFO  Execute TASK_STATUS_CHANGE
taskid=urn:R-0-1414286270539 status=1
2014-10-26 01:17:52,712+0000 INFO  LateBindingScheduler jobs queued: 0
2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION
jobid=bash-lupsjczl - Application exception: null


*Caused by:
org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
not submit jobCaused by: org.globus.cog.coaster.channels.ChannelException:
Failed to create socketCaused by: java.net.ConnectException: Connection
refused*


Below are my configuratioins:
swift.conf:

sites: cloud-static

site.cloud-static {
    execution {
        type:"coaster-persistent"
        URL: "http://127.0.0.1:50010"
        jobManager: "local:local"
        options {
            maxJobs: 10
            tasksPerNode: 2
        }
    }

    initialParallelTasks: 20
    maxParallelTasks: 20
    filesystem.type: local
    workDirectory: /tmp/swiftwork
    staging: local
    app.ALL {executable: "*"}
}

Below is the Swift version I am using:
ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$
swift
-version
Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master
6130 (modified locally)

I am able to run with "sites: local" successfully , but failing with
"sites: cloud-static".

Please suggest on what can be the issue.

Thanks and Regards,
AJay ANthony.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/26bb0ca2/attachment.html>

From ggowda at hawk.iit.edu  Sat Oct 25 20:42:50 2014
From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda)
Date: Sun, 26 Oct 2014 07:12:50 +0530
Subject: [Swift-user] Swift: ConnectException: Connection refused Issue
In-Reply-To: <CAH92yaAGxCMQuyT2eRATqPmFTt-Nq57TQ=5LYzQMyk4ztsdTSA@mail.gmail.com>
References: <CAH92yaAGxCMQuyT2eRATqPmFTt-Nq57TQ=5LYzQMyk4ztsdTSA@mail.gmail.com>
Message-ID: <CAGYVvuX5ihmTtRmpMyRg+WncwOS43BdOV4S=fH_5+S3TqQ7hdg@mail.gmail.com>

Hi Ajay,

Could you try restarting your entire cluster. (terminate and start )

This happens occasionally for some unknown reason.

Regards,
Gagan
On 26/10/2014 7:05 am, "Ajay Anthony" <aanthon2 at hawk.iit.edu> wrote:

> Hi Swift users,
>
> Kindly help me in below issue. Stuck on this issue almost whole day.
>
> I am trying to run  swift word.swift on  headnode in aws and have 1
> worker running.
>
> 2014-10-26 01:17:52,685+0000 INFO  AbstractCoasterChannel
> TCPChannel[client, http://127.0.0.1:50010] setting name to
> http://127.0.0.1:50010
> 2014-10-26 01:17:52,709+0000 INFO  Execute TASK_STATUS_CHANGE
> taskid=urn:R-0-1414286270539 status=1
> 2014-10-26 01:17:52,712+0000 INFO  LateBindingScheduler jobs queued: 0
> 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION
> jobid=bash-lupsjczl - Application exception: null
>
>
> *Caused by:
> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
> not submit jobCaused by: org.globus.cog.coaster.channels.ChannelException:
> Failed to create socketCaused by: java.net.ConnectException: Connection
> refused*
>
>
> Below are my configuratioins:
> swift.conf:
>
> sites: cloud-static
>
> site.cloud-static {
>     execution {
>         type:"coaster-persistent"
>         URL: "http://127.0.0.1:50010"
>         jobManager: "local:local"
>         options {
>             maxJobs: 10
>             tasksPerNode: 2
>         }
>     }
>
>     initialParallelTasks: 20
>     maxParallelTasks: 20
>     filesystem.type: local
>     workDirectory: /tmp/swiftwork
>     staging: local
>     app.ALL {executable: "*"}
> }
>
> Below is the Swift version I am using:
> ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$
> swift -version
> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master
> 6130 (modified locally)
>
> I am able to run with "sites: local" successfully , but failing with
> "sites: cloud-static".
>
> Please suggest on what can be the issue.
>
> Thanks and Regards,
> AJay ANthony.
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141026/986f769a/attachment.html>

From aanthon2 at hawk.iit.edu  Sat Oct 25 20:47:52 2014
From: aanthon2 at hawk.iit.edu (Ajay Anthony)
Date: Sat, 25 Oct 2014 21:47:52 -0400
Subject: [Swift-user] Swift: ConnectException: Connection refused Issue
In-Reply-To: <CAGYVvuX5ihmTtRmpMyRg+WncwOS43BdOV4S=fH_5+S3TqQ7hdg@mail.gmail.com>
References: <CAH92yaAGxCMQuyT2eRATqPmFTt-Nq57TQ=5LYzQMyk4ztsdTSA@mail.gmail.com>
	<CAGYVvuX5ihmTtRmpMyRg+WncwOS43BdOV4S=fH_5+S3TqQ7hdg@mail.gmail.com>
Message-ID: <CAH92yaBv7o_uadJFZWN-+rSqxDeCxG9zD-ikxLCPWPoDVtyUPg@mail.gmail.com>

Hi Gagan,

Restarted for the 8th time. Still getting the same issue. Do we need to
make any more other configuration changes besides ones mentioned above,

Thanks and regards,
Ajay Anthony.

On Sat, Oct 25, 2014 at 9:42 PM, Gagan Munisiddha Gowda <ggowda at hawk.iit.edu
> wrote:

> Hi Ajay,
>
> Could you try restarting your entire cluster. (terminate and start )
>
> This happens occasionally for some unknown reason.
>
> Regards,
> Gagan
> On 26/10/2014 7:05 am, "Ajay Anthony" <aanthon2 at hawk.iit.edu> wrote:
>
>> Hi Swift users,
>>
>> Kindly help me in below issue. Stuck on this issue almost whole day.
>>
>> I am trying to run  swift word.swift on  headnode in aws and have 1
>> worker running.
>>
>> 2014-10-26 01:17:52,685+0000 INFO  AbstractCoasterChannel
>> TCPChannel[client, http://127.0.0.1:50010] setting name to
>> http://127.0.0.1:50010
>> 2014-10-26 01:17:52,709+0000 INFO  Execute TASK_STATUS_CHANGE
>> taskid=urn:R-0-1414286270539 status=1
>> 2014-10-26 01:17:52,712+0000 INFO  LateBindingScheduler jobs queued: 0
>> 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION
>> jobid=bash-lupsjczl - Application exception: null
>>
>>
>> *Caused by:
>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>> not submit jobCaused by: org.globus.cog.coaster.channels.ChannelException:
>> Failed to create socketCaused by: java.net.ConnectException: Connection
>> refused*
>>
>>
>> Below are my configuratioins:
>> swift.conf:
>>
>> sites: cloud-static
>>
>> site.cloud-static {
>>     execution {
>>         type:"coaster-persistent"
>>         URL: "http://127.0.0.1:50010"
>>         jobManager: "local:local"
>>         options {
>>             maxJobs: 10
>>             tasksPerNode: 2
>>         }
>>     }
>>
>>     initialParallelTasks: 20
>>     maxParallelTasks: 20
>>     filesystem.type: local
>>     workDirectory: /tmp/swiftwork
>>     staging: local
>>     app.ALL {executable: "*"}
>> }
>>
>> Below is the Swift version I am using:
>> ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$
>> swift -version
>> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac
>> heads/master 6130 (modified locally)
>>
>> I am able to run with "sites: local" successfully , but failing with
>> "sites: cloud-static".
>>
>> Please suggest on what can be the issue.
>>
>> Thanks and Regards,
>> AJay ANthony.
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/74fccb57/attachment.html>

From yadunand at uchicago.edu  Sat Oct 25 20:59:23 2014
From: yadunand at uchicago.edu (Yadu Nand Babuji)
Date: Sat, 25 Oct 2014 20:59:23 -0500
Subject: [Swift-user] Swift: ConnectException: Connection refused Issue
In-Reply-To: <CAH92yaBv7o_uadJFZWN-+rSqxDeCxG9zD-ikxLCPWPoDVtyUPg@mail.gmail.com>
References: <CAH92yaAGxCMQuyT2eRATqPmFTt-Nq57TQ=5LYzQMyk4ztsdTSA@mail.gmail.com>	<CAGYVvuX5ihmTtRmpMyRg+WncwOS43BdOV4S=fH_5+S3TqQ7hdg@mail.gmail.com>
	<CAH92yaBv7o_uadJFZWN-+rSqxDeCxG9zD-ikxLCPWPoDVtyUPg@mail.gmail.com>
Message-ID: <544C557B.6030402@uchicago.edu>

Hi Ajay,

Could you send me the following:

1. The runNNN directory for a failing run.
2. The ~/s3fs-fuse/cps*log file

If you could add my public key to the ~/.ssh/authorized_keys file, I can 
take a look as well.
You can get my public key from here : 
http://users.rcc.uchicago.edu/~yadunand/yadunand_id_rsa.pub

Thanks,
Yadu

On 10/25/2014 08:47 PM, Ajay Anthony wrote:
> Hi Gagan,
>
> Restarted for the 8th time. Still getting the same issue. Do we need 
> to make any more other configuration changes besides ones mentioned above,
>
> Thanks and regards,
> Ajay Anthony.
>
> On Sat, Oct 25, 2014 at 9:42 PM, Gagan Munisiddha Gowda 
> <ggowda at hawk.iit.edu <mailto:ggowda at hawk.iit.edu>> wrote:
>
>     Hi Ajay,
>
>     Could you try restarting your entire cluster. (terminate and start )
>
>     This happens occasionally for some unknown reason.
>
>     Regards,
>     Gagan
>
>     On 26/10/2014 7:05 am, "Ajay Anthony" <aanthon2 at hawk.iit.edu
>     <mailto:aanthon2 at hawk.iit.edu>> wrote:
>
>         Hi Swift users,
>
>         Kindly help me in below issue. Stuck on this issue almost
>         whole day.
>
>         I am trying to run swift word.swift on  headnode in aws and
>         have 1 worker running.
>
>         2014-10-26 01:17:52,685+0000 INFO AbstractCoasterChannel
>         TCPChannel[client, http://127.0.0.1:50010] setting name to
>         http://127.0.0.1:50010
>         2014-10-26 01:17:52,709+0000 INFO  Execute TASK_STATUS_CHANGE
>         taskid=urn:R-0-1414286270539 status=1
>         2014-10-26 01:17:52,712+0000 INFO LateBindingScheduler jobs
>         queued: 0
>         2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION
>         jobid=bash-lupsjczl - Application exception: null
>         *Caused by:
>         org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
>         Could not submit job
>         Caused by: org.globus.cog.coaster.channels.ChannelException:
>         Failed to create socket
>         Caused by: java.net.ConnectException: Connection refused*
>
>
>         Below are my configuratioins:
>         swift.conf:
>
>         sites: cloud-static
>
>         site.cloud-static {
>             execution {
>                 type:"coaster-persistent"
>                 URL: "http://127.0.0.1:50010"
>                 jobManager: "local:local"
>                 options {
>                     maxJobs: 10
>                     tasksPerNode: 2
>                 }
>             }
>
>             initialParallelTasks: 20
>             maxParallelTasks: 20
>             filesystem.type: local
>             workDirectory: /tmp/swiftwork
>             staging: local
>             app.ALL {executable: "*"}
>         }
>
>         Below is the Swift version I am using:
>         ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$
>         swift -version
>         Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac
>         heads/master 6130 (modified locally)
>
>         I am able to run with "sites: local" successfully , but
>         failing with "sites: cloud-static".
>
>         Please suggest on what can be the issue.
>
>         Thanks and Regards,
>         AJay ANthony.
>
>         _______________________________________________
>         Swift-user mailing list
>         Swift-user at ci.uchicago.edu <mailto:Swift-user at ci.uchicago.edu>
>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141025/717e20be/attachment.html>

From aanthon2 at hawk.iit.edu  Sat Oct 25 23:04:12 2014
From: aanthon2 at hawk.iit.edu (Ajay Anthony)
Date: Sun, 26 Oct 2014 00:04:12 -0400
Subject: [Swift-user] Swift: ConnectException: Connection refused Issue
In-Reply-To: <544C557B.6030402@uchicago.edu>
References: <CAH92yaAGxCMQuyT2eRATqPmFTt-Nq57TQ=5LYzQMyk4ztsdTSA@mail.gmail.com>
	<CAGYVvuX5ihmTtRmpMyRg+WncwOS43BdOV4S=fH_5+S3TqQ7hdg@mail.gmail.com>
	<CAH92yaBv7o_uadJFZWN-+rSqxDeCxG9zD-ikxLCPWPoDVtyUPg@mail.gmail.com>
	<544C557B.6030402@uchicago.edu>
Message-ID: <CAH92yaDassuT+pUZi6enZ5KTS95S2xV-Ux1DuJe-6KqwJ7t_Qw@mail.gmail.com>

Hi Yadu/Gagan,

Thanks for fast response. I was able to resolve the problem...

Issue was with the way the restart was done.
3 ways to restart I am using:
1. Console
2. ec2-run-instances
3. launchPad

Some restarts were done with mixture of above 3. I performed a pure
LaunchPad restart , and it passed successfully. Thanks once again.


Regards,
Ajay Anthony.

On Sat, Oct 25, 2014 at 9:59 PM, Yadu Nand Babuji <yadunand at uchicago.edu>
wrote:

>  Hi Ajay,
>
> Could you send me the following:
>
> 1. The runNNN directory for a failing run.
> 2. The ~/s3fs-fuse/cps*log file
>
> If you could add my public key to the ~/.ssh/authorized_keys file, I can
> take a look as well.
> You can get my public key from here :
> http://users.rcc.uchicago.edu/~yadunand/yadunand_id_rsa.pub
>
> Thanks,
> Yadu
>
>
> On 10/25/2014 08:47 PM, Ajay Anthony wrote:
>
>  Hi Gagan,
>
>  Restarted for the 8th time. Still getting the same issue. Do we need to
> make any more other configuration changes besides ones mentioned above,
>
>  Thanks and regards,
> Ajay Anthony.
>
> On Sat, Oct 25, 2014 at 9:42 PM, Gagan Munisiddha Gowda <
> ggowda at hawk.iit.edu> wrote:
>
>> Hi Ajay,
>>
>> Could you try restarting your entire cluster. (terminate and start )
>>
>> This happens occasionally for some unknown reason.
>>
>> Regards,
>> Gagan
>>  On 26/10/2014 7:05 am, "Ajay Anthony" <aanthon2 at hawk.iit.edu> wrote:
>>
>>>   Hi Swift users,
>>>
>>>  Kindly help me in below issue. Stuck on this issue almost whole day.
>>>
>>>  I am trying to run  swift word.swift on  headnode in aws and have 1
>>> worker running.
>>>
>>> 2014-10-26 01:17:52,685+0000 INFO  AbstractCoasterChannel
>>> TCPChannel[client, http://127.0.0.1:50010] setting name to
>>> http://127.0.0.1:50010
>>> 2014-10-26 01:17:52,709+0000 INFO  Execute TASK_STATUS_CHANGE
>>> taskid=urn:R-0-1414286270539 status=1
>>> 2014-10-26 01:17:52,712+0000 INFO  LateBindingScheduler jobs queued: 0
>>> 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION
>>> jobid=bash-lupsjczl - Application exception: null
>>>
>>>
>>> *Caused by:
>>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could
>>> not submit job Caused by: org.globus.cog.coaster.channels.ChannelException:
>>> Failed to create socket Caused by: java.net.ConnectException: Connection
>>> refused*
>>>
>>>
>>>  Below are my configuratioins:
>>>  swift.conf:
>>>
>>> sites: cloud-static
>>>
>>> site.cloud-static {
>>>     execution {
>>>         type:"coaster-persistent"
>>>         URL: "http://127.0.0.1:50010"
>>>         jobManager: "local:local"
>>>         options {
>>>             maxJobs: 10
>>>             tasksPerNode: 2
>>>         }
>>>     }
>>>
>>>     initialParallelTasks: 20
>>>     maxParallelTasks: 20
>>>     filesystem.type: local
>>>     workDirectory: /tmp/swiftwork
>>>     staging: local
>>>     app.ALL {executable: "*"}
>>> }
>>>
>>>  Below is the Swift version I am using:
>>> ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$
>>> swift -version
>>> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac
>>> heads/master 6130 (modified locally)
>>>
>>>  I am able to run with "sites: local" successfully , but failing with
>>> "sites: cloud-static".
>>>
>>>  Please suggest on what can be the issue.
>>>
>>>  Thanks and Regards,
>>> AJay ANthony.
>>>
>>>  _______________________________________________
>>> Swift-user mailing list
>>> Swift-user at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>>>
>>
>
>
> _______________________________________________
> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141026/b5f1a9b1/attachment.html>

From ketan at mcs.anl.gov  Wed Oct 29 15:41:43 2014
From: ketan at mcs.anl.gov (Ketan Maheshwari)
Date: Wed, 29 Oct 2014 15:41:43 -0500
Subject: [Swift-user] bring all files produced by app to output dir
Message-ID: <CAMUuvirEYNo7QtGMh8DLHvPQMtSf2PJXRSyAW6-Xxa6uNMSs_g@mail.gmail.com>

Hi,

Without using a wrapper, is there a way to tell Swift to bring *all* files
produced by an app to the application output directory?

This is required for a materials app which produces about 600 output files.
About 520 of them follows some pattern but rest 80 are just odd files that
need to be brought into the output dir.

A shell wrapper will not work as this is on non-forking BG machines.

Thanks,
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141029/9ebad406/attachment.html>

From wilde at anl.gov  Wed Oct 29 17:32:45 2014
From: wilde at anl.gov (Michael Wilde)
Date: Wed, 29 Oct 2014 17:32:45 -0500
Subject: [Swift-user] bring all files produced by app to output dir
In-Reply-To: <CAMUuvirEYNo7QtGMh8DLHvPQMtSf2PJXRSyAW6-Xxa6uNMSs_g@mail.gmail.com>
References: <CAMUuvirEYNo7QtGMh8DLHvPQMtSf2PJXRSyAW6-Xxa6uNMSs_g@mail.gmail.com>
Message-ID: <54516B0D.1040004@anl.gov>

Ketan, the fix to enhance request 1225 enabled an app to return multiple 
arrays of files, each of which match some specific pattern:

https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1225

Im not sure if this made it to the User Guide yet.

Does that address your need?

- Mike


On 10/29/14 3:41 PM, Ketan Maheshwari wrote:
> Hi,
>
> Without using a wrapper, is there a way to tell Swift to bring *all* 
> files produced by an app to the application output directory?
>
> This is required for a materials app which produces about 600 output 
> files. About 520 of them follows some pattern but rest 80 are just odd 
> files that need to be brought into the output dir.
>
> A shell wrapper will not work as this is on non-forking BG machines.
>
> Thanks,
> Ketan
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-- 
Michael Wilde
Mathematics and Computer Science          Computation Institute
Argonne National Laboratory               The University of Chicago

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20141029/de6c28d4/attachment.html>