From iraicu at cs.iit.edu Sun Oct 5 08:38:58 2014 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sun, 05 Oct 2014 08:38:58 -0500 Subject: [Swift-user] CFP: The 24th Int. ACM Symp. on High-Performance Parallel and Distributed Computing (HPDC) 2015 Message-ID: <543149F2.6080205@cs.iit.edu> **** CALL FOR PAPERS **** The 24th International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC-2015) Portland, Oregon, USA - June 15-19, 2015 http://www.hpdc.org/2015 The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) is the premier annual conference for presenting the latest research on the design, implementation, evaluation, and the use of parallel and distributed systems for high-end computing. The 24th HPDC will take place in the city of roses, Portland, Oregon on June 15-19, 2015. (Workshops on June 15-16, and the main conference on June 17-19.) **** IMPORTANT DATES **** Abstracts (required) due: January 12, 2015 Full Papers due: January 19, 2015 (no extensions) Author rebuttal period: March 4-7, 2015 Author notifications: March 16, 2015 Final Manuscripts: April 1, 2015 **** SCOPE AND TOPICS **** Submissions are welcomed on high-performance parallel and distributed computing topics including but not limited to: clouds, clusters, grids, big data, massively multicore, and global-scale computing systems. Submissions that focus on the architectures, systems, and networks of cloud infrastructures are particularly encouraged, as are experience reports of operational deployments that can provide insights for future research on HPDC applications and systems. All papers will be evaluated for their originality, technical depth and correctness, potential impact, relevance to the conference, and quality of presentation. Research papers must clearly demonstrate research contributions and novelty, while experience reports must clearly describe lessons learned and demonstrate impact. In the context of high-performance parallel and distributed computing, the topics of interest include, but are not limited to: - Systems, networks, and architectures - Massively multicore systems - Resource virtualization - Programming languages and environments - File and storage systems, I/O, and data management - Resource management and scheduling, including energy-aware techniques - Performance modeling and analysis - Fault tolerance, reliability, and availability - Data-intensive computing - Applications and services that depend upon high-end computing **** PAPER SUBMISSION GUIDELINES **** Authors are invited to submit technical papers of at most 12 pages in PDF format, including figures and references. Papers should be formatted in the ACM Proceedings Style and submitted via the conference web site. No changes to the margins, spacing, or font sizes as specified by the style file are allowed. Accepted papers will appear in the conference proceedings, and will be incorporated into the ACM Digital Library. A limited number of papers will be accepted as posters. Papers must be self-contained and provide the technical substance required for the program committee to evaluate their contributions. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or a journal. See the ACM Prior Publication Policy for more details. Papers can be submitted at https://ssl.linklings.net/conferences/hpdc/. **** HPDC'15 GENERAL CO-CHAIRS **** Thilo Kielmann, VU University Amsterdam, The Netherlands **** HPDC'15 PROGRAM CO-CHAIRS **** Dean Hildebrand, IBM Research Almaden, USA Michela Taufer, University of Delaware, USA **** HPDC'15 WORKSHOP CHAIRS **** Abhishek Chandra, University of Minnesota, Twin Cities, USA Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA **** HPDC'15 POSTERS CHAIR **** Ana-Maria Oprescu, VU University Amsterdam, The Netherlands **** HPDC'15 PUBLICITY CHAIR **** Ioan Raicu, Illinois Institute of Technology and Argonne National Laboratory, USA Torsten Hoefler, ETH Zurich, Switzerland Naoya Maruyama, RIKEN Advanced Institute for Computational Science, Japan **** HPDC'15 PUBLICATIONS CHAIR **** Antonino Tumeo, Pacific Northwest National Laboratory, USA **** HPDC'15 TRAVEL AWARD CHAIR **** Ming Zhao, Florida International University, USA **** HPDC'15 WEBMASTER CHAIR **** Kaveh Razavi, VU University Amsterdam, The Netherlands **** HPDC'15 PROGRAM COMMITTEE **** David Abramson, The University of Queensland, Australia Dong Ahn, Lawrence Livermore National Laboratory, USA Gabriel Antoniu, INRIA, France Henri Bal, VU University Amsterdam, The Netherlands Pavan Balaji, Argonne National Laboratory, USA Michela Becchi, University of Missouri, USA John Bent, EMC, USA Greg Bronevetsky, Lawrence Livermore National Laboratory, USA Ali Butt, Virginia Tech, USA Franck Cappello, Argonne National Lab, USA Abhishek Chandra, University of Minnesota, USA Andrew A. Chien, University of Chicago, USA Paolo Costa, Microsoft Research Cambridge, UK Kei Davis, Los Alamos National Laboratory, USA Peter Dinda, Northwestern University, USA Dick Epema, Delft University of Technology, The Netherlands Gilles Fedak, INRIA, France Wuchun Feng, Virginia Tech, USA Renato Figueiredo, University of Florida, USA Clemens Grelck, University of Amsterdam, The Netherlands Adriana Iamnitchi, University of South Florida, USA Larry Kaplan, Cray Inc., USA Kate Keahey, Argonne National Laboratory, USA Dries Kimpe, Argonne National Laboratory, USA Alice Koniges, Lawrence Berkeley National Laboratory, USA Zhiling Lan, Illinois Institute of Technology, USA John (Jack) Lange, University of Pittsburgh, USA Gary Liu, Oak Ridge National Laboratory, USA Jay Lofstead, Sandia National Laboratories, USA Arthur Barney Maccabe, Oak Ridge National Laboratory, USA Carlos Maltzahn, University of California, Santa Cruz, USA Naoya Maruyama, RIKEN Advanced Institute for Comp. Science, Japan Satoshi Matsuoka, Tokyo Inst. Technology, Japan Timothy Mattson, Intel, USA Kathryn Mohror, Lawrence Livermore National Laboratory, USA Bogdan Nicolae, IBM Research, Ireland Sangmi Pallickara, Colorado State University, USA Manish Parashar, Rutgers University, USA Lavanya Ramakrishnan, Lawrence Berkeley National Laboratory, USA Raju Rangaswami, Florida International University, USA Matei Ripeanu, University of British Columbia, Canada Nagiza F. Samatova, North Carolina State University, USA Prasenjit Sarkar, Independent Consultant, USA Karsten Schwan, Georgia Institute of Technology, USA Vasily Tarasov, IBM Research, USA Kenjiro Taura, University of Tokyo, Japan Douglas Thain, University of Notre Dame, USA Ana Varbanescu, University of Amsterdam, The Netherlands Richard Vuduc, Georgia Institute of Technology, USA Jon Weissman, University of Minnesota, USA Dongyan Xu, Purdue University, USA Rui Zhang, IBM Research, USA **** HPDC STEERING COMMITTEE **** Franck Cappello, Argonne National Lab, USA and INRIA, France Andrew A. Chien, University of Chicago, USA Peter Dinda, Northwestern University, USA Dick Epema, Delft University of Technology, The Netherlands Renato Figueiredo, University of Florida, USA Salim Hariri, University of Arizona, USA Thilo Kielmann, VU University Amsterdam, The Netherlands Arthur "Barney" Maccabe, Oak Ridge National Laboratory, USA Manish Parashar, Rutgers University, USA Matei Ripeanu, University of British Columbia, Canada Karsten Schwan, Georgia Tech, USA Doug Thain, University of Notre Dame, USA Jon Weissman, University of Minnesota, USA (Chair) Dongyan Xu, Purdue University, USA -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer Cluster, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Fri Oct 10 15:55:08 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Fri, 10 Oct 2014 15:55:08 -0500 Subject: [Swift-user] set cleanup off Message-ID: I am running cobalt jobs where I have a user quota of 1024 node runs max in an hour. In cases where this exceeds, the system will not allow any more job submission. In this scenario, the cleanup operations fail after the run has completed with the following error: Final status:Fri, 10 Oct 2014 20:49:58+0000 Finished successfully:100 org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job (qsub reported an exit code of 1). project: ExM at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) at org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40) Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of 1). project: ExM at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) ... 3 more org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job (qsub reported an exit code of 1). project: ExM Is there any ways to tell Swift/Coasters to not do cleanup? If so, is there any harm in doing so? Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Oct 10 19:41:28 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 10 Oct 2014 17:41:28 -0700 Subject: [Swift-user] set cleanup off In-Reply-To: References: Message-ID: <1412988088.23034.14.camel@echo> Hi Ketan, There is no way currently to disable the cleanup job, unless you run with provider staging, in which case there is no cleanup job. In some sense, the below limitation of the queue is a way to disable the cleanup job, and, apart from the distasteful error messages, there should be no detrimental side-effects. The harm in not doing cleanup is that you leave unneeded files on disk. You can, of course, clean up the work directory manually when you need/want to. Mihael On Fri, 2014-10-10 at 15:55 -0500, Ketan Maheshwari wrote: > I am running cobalt jobs where I have a user quota of 1024 node runs max in > an hour. In cases where this exceeds, the system will not allow any more > job submission. > > In this scenario, the cleanup operations fail after the run has completed > with the following error: > > Final status:Fri, 10 Oct 2014 20:49:58+0000 Finished successfully:100 > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job (qsub reported an exit code of 1). > project: ExM 'default' queue has been reached\n"> > > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) > > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) > at > org.globus.cog.abstraction.coaster.service.job.manager.LocalQueueProcessor.run(LocalQueueProcessor.java:40) > Caused by: > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could > not submit job (qsub reported an exit code of 1). > project: ExM 'default' queue has been reached\n"> > > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) > ... 3 more > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job (qsub reported an exit code of 1). > project: ExM 'default' queue has been reached\n"> > > > Is there any ways to tell Swift/Coasters to not do cleanup? If so, is there > any harm in doing so? > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From ggowda at hawk.iit.edu Mon Oct 13 11:02:46 2014 From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda) Date: Mon, 13 Oct 2014 21:32:46 +0530 Subject: [Swift-user] NullPointerException while running Tutorial Message-ID: Hello, I am facing issues while running through this tutorial : http:// swiftlang.org/tutorials/cloud/swift-cloud-tutorial.tar.gz I have setup the coaster conf to point to my workers and head node as mentioned in the docs. I see that the following error was because of a BUG (as mentioned here : https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1321) Its mentioned what it has been fixed in 0.95 (which is not yet released). *I am using 0.95-RC6* Looking forward for any help in resolving this issue. Following is the error: ubuntu at ip-XXX-XXX-XXX-XXX:~/swift-cloud-tutorial/part04$ swift p4.swift *Swift 0.95 RC6 swift-r7900 cog-r3908* RunID: run002 Warning: The @ syntax for function invocation is deprecated Progress: Sun, 12 Oct 2014 16:52:13+0000 *Exception in thread "Scheduler" java.lang.NullPointerException* at org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364) at java.util.HashMap.get(HashMap.java:317) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400) at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266) Progress: Sun, 12 Oct 2014 16:52:14+0000 Selecting site:10 No events in 1s. Finding dependency loops... Waiting threads: Thread: R-5x2-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-3x2, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-0-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-6-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-8-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-1-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-7-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-9-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-6, waiting on sims (declared on line 21) swift:execute, p4, line 70 analyze, p4, line 211 Thread: R-5-2-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-4-3, waiting on simout (declared on line 24) assignment, p4, line 28 ---- No dependency loops found. The following threads are independently hung: Thread: R-6, waiting on sims (declared on line 21) swift:execute, p4, line 70 analyze, p4, line 211 ---- Irrecoverable error found. Exiting. -- Regards, Gagan -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Mon Oct 13 11:35:06 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Mon, 13 Oct 2014 11:35:06 -0500 Subject: [Swift-user] NullPointerException while running Tutorial In-Reply-To: References: Message-ID: <543BFF3A.3060707@uchicago.edu> Hi Gagan, The tutorial that you are following is for an the 0.95* versions of swift. I would recommend that you use the tutorial listed here instead : https://github.com/yadudoc/cloud-tutorials The null pointer bug has been fixed in Swift-0.95-RC7, which you may download from here: http://swift-lang.org/packages/swift-0.95-RC7.tar.gz The tutorials for trunk, are far more tested and stable, so I would strongly recommend you use that if you are from Ioan's class at IIT. Thanks, Yadu On 10/13/2014 11:02 AM, Gagan Munisiddha Gowda wrote: > Hello, > > I am facing issues while running through this tutorial : > http://swiftlang.org/tutorials/cloud/swift-cloud-tutorial.tar.gz > > > I have setup the coaster conf to point to my workers and head node as > mentioned in the docs. > > I see that the following error was because of a BUG (as mentioned here > : https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1321) > > Its mentioned what it has been fixed in 0.95 (which is not yet > released). *I am using 0.95-RC6* > * > * > Looking forward for any help in resolving this issue. > > Following is the error: > > ubuntu at ip-XXX-XXX-XXX-XXX:~/swift-cloud-tutorial/part04$ swift p4.swift > *Swift 0.95 RC6 swift-r7900 cog-r3908* > RunID: run002 > Warning: The @ syntax for function invocation is deprecated > Progress: Sun, 12 Oct 2014 16:52:13+0000 > *Exception in thread "Scheduler" java.lang.NullPointerException* > at > org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364) > at java.util.HashMap.get(HashMap.java:317) > at > org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400) > at > org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266) > Progress: Sun, 12 Oct 2014 16:52:14+0000 Selecting site:10 > No events in 1s. > Finding dependency loops... > > Waiting threads: > Thread: R-5x2-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-3x2, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-0-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-6-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-8-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-1-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-7-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-9-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-6, waiting on sims (declared on line 21) > swift:execute, p4, line 70 > analyze, p4, line 211 > > Thread: R-5-2-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > Thread: R-5-4-3, waiting on simout (declared on line 24) > assignment, p4, line 28 > > ---- > No dependency loops found. > > The following threads are independently hung: > Thread: R-6, waiting on sims (declared on line 21) > swift:execute, p4, line 70 > analyze, p4, line 211 > > ---- > > Irrecoverable error found. Exiting. > > > -- > Regards, > Gagan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Fri Oct 17 18:16:01 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Fri, 17 Oct 2014 18:16:01 -0500 Subject: [Swift-user] Error in running p4.swift in cloud-tutorials In-Reply-To: References: Message-ID: Hi Raghav, I'm CC-ing the swift-user list, and encourage you to join the list. I just tried this from scratch, and I'm not able to reproduce the issue you are seeing. Could you send me a tar ball of the runNNN folder, and the cps*log from your /home/ubuntu/s3fs-fuse folder. Thanks, Yadu On Fri, Oct 17, 2014 at 5:30 PM, Raghav Kapoor wrote: > Hello Yadunand, > > My name is Raghav, I am a graduate student > at IIT. I have an assignment on swift. > > I am using the directions provided on your github page. > > https://github.com/yadudoc/cloud-tutorials > > What I have observed is, all the sample tutorials were running a day > before. > But from yesterday, p4.swift is not running on the cloud. > > I am getting the following error which I am pasting below: > > root at ip-172-31-20-129:/home/ubuntu/swift-cloud-tutorial/part04# swift > p4.swift > Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master > 6130 (modified locally) > RunID: run001 > Progress: Fri, 17 Oct 2014 22:10:33+0000 > > Execution failed: > Exception in sort: > Arguments: [-n, unsorted.txt] > Host: cloud-static > Directory: p4-run001/jobs/0/sort-0abp6zyl > exception @ swift-int-staging.k, line: 167 > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job > Caused by: org.globus.cog.coaster.channels.ChannelException: Failed to > create socket > Caused by: java.net.ConnectException: Connection refused > k:assign @ swift.k, line: 171 > Caused by: Exception in sort: > Arguments: [-n, unsorted.txt] > Host: cloud-static > Directory: p4-run001/jobs/0/sort-0abp6zyl > exception @ swift-int-staging.k, line: 167 > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job > Caused by: org.globus.cog.coaster.channels.ChannelException: Failed to > create socket > Caused by: java.net.ConnectException: Connection refused > root at ip-172-31-20-129:/home/ubuntu/swift-cloud-tutorial/part04# > > > I think there is some problem with hostname and port numbers. it is not > specified correctly. > > I see that you have updated the repository with some changes that might be > the cause of this issue. > > I am referring to this commit specifically > > > https://github.com/yadudoc/cloud-tutorials/commit/d75ce87eb94fd8460b9b425c72445265e6074974 > > which was made a day ago. > > I am not sure what is the cause of this problem, > > Could you investigate and help me resolve the issue? > > Thanks a lot, > > Regards, > > Raghav > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From sdivanji at hawk.iit.edu Sat Oct 18 00:00:00 2014 From: sdivanji at hawk.iit.edu (Sughosh Divanji) Date: Sat, 18 Oct 2014 00:00:00 -0500 Subject: [Swift-user] Fwd: Swift not running after restarting Amazon EC2 instance In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Sughosh Divanji Date: Fri, Oct 17, 2014 at 11:50 PM Subject: Swift not running after restarting Amazon EC2 instance To: swift-user at ci.uchicago.edu Cc: Raghav Kapoor , Arjun Nanjundappa < ananjun1 at hawk.iit.edu> Hi all, My name is Sughosh and I am a graduate student in CS from IIT Chicago. I am using swift for a homework assignment and facing this issue after restarting my amazon EC2 instances. root at ip-172-31-19-231:/home/ubuntu/wordcount# time swift wordcount.swift Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master 6130 (modified locally) RunID: run004 Progress: Sat, 18 Oct 2014 04:36:15+0000 Progress: Sat, 18 Oct 2014 04:36:16+0000 Submitting:15 Failed but can retry:1 Progress: Sat, 18 Oct 2014 04:36:46+0000 Submitting:15 Failed but can retry:1 ^C real 0m59.366s user 0m4.248s sys 0m0.373s The same code works fine without any issues before reboot. I have attached the run logs. Please let me know what could be the issue. Thanks, Sughosh -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: run004.zip Type: application/zip Size: 4930 bytes Desc: not available URL: From yadunand at uchicago.edu Sat Oct 18 00:53:01 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Sat, 18 Oct 2014 00:53:01 -0500 Subject: [Swift-user] Fwd: Swift not running after restarting Amazon EC2 instance In-Reply-To: References: Message-ID: <5442003D.6050707@uchicago.edu> Hi Sughosh, Could you describe the steps you took to restart the the amazon instances ? Did you restart the headnode instance and all the worker instances ? I do not see anything in the logs that jump out. -Yadu On 10/18/2014 12:00 AM, Sughosh Divanji wrote: > > ---------- Forwarded message ---------- > From: *Sughosh Divanji* > > Date: Fri, Oct 17, 2014 at 11:50 PM > Subject: Swift not running after restarting Amazon EC2 instance > To: swift-user at ci.uchicago.edu > Cc: Raghav Kapoor >, Arjun Nanjundappa > > > > > Hi all, > > My name is Sughosh and I am a graduate student in CS from IIT Chicago. > I am using swift for a homework assignment and facing this issue after > restarting my amazon EC2 instances. > > root at ip-172-31-19-231:/home/ubuntu/wordcount# time swift wordcount.swift > Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac > heads/master 6130 (modified locally) > RunID: run004 > Progress: Sat, 18 Oct 2014 04:36:15+0000 > Progress: Sat, 18 Oct 2014 04:36:16+0000 Submitting:15 Failed but > can retry:1 > Progress: Sat, 18 Oct 2014 04:36:46+0000 Submitting:15 Failed but > can retry:1 > ^C > real0m59.366s > user0m4.248s > sys0m0.373s > > The same code works fine without any issues before reboot. I have > attached the run logs. Please let me know what could be the issue. > > Thanks, > Sughosh > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtu3 at hawk.iit.edu Sat Oct 18 16:18:44 2014 From: jtu3 at hawk.iit.edu (Jiada Tu) Date: Sat, 18 Oct 2014 16:18:44 -0500 Subject: [Swift-user] sort on large data Message-ID: I am doing an assignment with swift to sort large data. The data contains one record (string) each line. We need to sort the records base on ascii code. The data is too large to fit in the memory. The large data file is in head node, and I run the swift script directly on head node. Here's what I plan to do: 1) split the big file into 64MB files 2) let each worker task sort one 64MB files. Say, each task will call a "sort.py" (written by me). sort.py will output a list of files, say:"sorted-worker1-001; sorted-worker1-002; ......". The first file contains the records started with 'a', the second started with 'b', etc. 3) now we will have all records started with 'a' in (sorted-worker1-001;sorted-worker2-001;...); 'b' in (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all the files contains records 'a' to a "reduce" worker task and let it merge these files into one single file. Same to 'b', 'c', etc. 4) now we get 26 files (a-z) with each sorted inside. Basically what I am doing is simulate Map-reduce. step 2 is map and step 3 is reduce Here comes some problems: 1) for step 2, sort.py need to output a list of files. How can swift app function handles list of outputs? app (file[] outfiles) sort (file[] infiles) { sort.py // how to put out files here? } 2) As I know (may be wrong), swift will stage all the output file back to the local disk (here is the head node since I run the swift script directly on headnode). So the output files in step 2 will be staged back to head node first, then stage from head node to the worker nodes to do the step 3, then stage the 26 files in step 4 back to head node. I don't want it because the network will be a huge bottleneck. Is there any way to tell the "reduce" worker to get data directly from "map" worker? Maybe a shared file system will help, but is there any way that user can control the data staging between workers without using the shared file system? Since I am new to the swift, I may be totally wrong and misunderstanding what swift do. If so, please correct me. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Sat Oct 18 18:13:04 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Sat, 18 Oct 2014 18:13:04 -0500 Subject: [Swift-user] sort on large data In-Reply-To: References: Message-ID: <5442F400.9020805@uchicago.edu> Hi Jiada Tu, 1) Here's an example for returning an array of files : type file; app (file outs[]) make_outputs (file script) { bash @script; } file outputs[] ; file script <"make_outputs.sh">; # This script creates a few files with outputs as prefix (outputs) = make_outputs(script); 2) The products of a successful task execution, must be visible to the headnode (where swift runs) either through a - shared filesystem (NFS, S3 mounted over s3fs etc) or - must be brought back over the network. But, we can reduce the overhead in moving the results to the headnode and then to the workers for the reduce stage. I understand that this is part of your assignment, so I will try to answer without getting too specific, at the same time, concepts from hadoop do not necessarily work directly in this context. So here are some things to consider to get the best performance possible: - Assuming that the texts contain 10K unique words, your sort program will generate a file containing atmost 10K lines (which would be definitely under an MB). Is there any advantage into splitting this into smaller files ? - Since the final merge involves tiny files, you could very well do the reduce stage on the headnode and be quite efficient (you can define the reduce app only for site:local) sites : [local, cloud-static] site.local { .... app.reduce { executable : ${env.PWD}/reduce.py } } site.cloud-static { .... app.python { executable : /usr/bin/python } } This assumes that you are going to define your sorting app like this : app (file freqs) sort (file sorting_script, file input ) { python @sorting_script @input; } - The real cost is in having the original text reach the workers, this can be made faster by : - A better headnode with better network/disk IO (I've measured 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s) - Use S3 with S3fs and have swift-workers pull data from S3 which is pretty scalable, and remove the IO load from the headnode. - Identify the optimal size for data chunks for your specific problem. Each chunk of data in this case comes with the overhead of starting a new remote task, sending the data and bringing results back. Note that the result of a wordcount on a file whether it is 1Mb or 10Gb is still the atmost 1Mb (with earlier assumptions) - Ensure that the data with the same datacenter, for cost as well as performance. By limiting the cluster to US-Oregon we already do this. If you would like to attempt this using S3FS, let me know, I'll be happy to explain that in detail. Thanks, Yadu On 10/18/2014 04:18 PM, Jiada Tu wrote: > I am doing an assignment with swift to sort large data. The data > contains one record (string) each line. We need to sort the records > base on ascii code. The data is too large to fit in the memory. > > The large data file is in head node, and I run the swift script > directly on head node. > > Here's what I plan to do: > > 1) split the big file into 64MB files > 2) let each worker task sort one 64MB files. Say, each task will call > a "sort.py" (written by me). sort.py will output a list of files, > say:"sorted-worker1-001; sorted-worker1-002; ......". The first file > contains the records started with 'a', the second started with 'b', etc. > 3) now we will have all records started with 'a' in > (sorted-worker1-001;sorted-worker2-001;...); 'b' in > (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send > all the files contains records 'a' to a "reduce" worker task and let > it merge these files into one single file. Same to 'b', 'c', etc. > 4) now we get 26 files (a-z) with each sorted inside. > > Basically what I am doing is simulate Map-reduce. step 2 is map and > step 3 is reduce > > Here comes some problems: > 1) for step 2, sort.py need to output a list of files. How can swift > app function handles list of outputs? > app (file[] outfiles) sort (file[] infiles) { > sort.py // how to put out files here? > } > > 2) As I know (may be wrong), swift will stage all the output file back > to the local disk (here is the head node since I run the swift script > directly on headnode). So the output files in step 2 will be staged > back to head node first, then stage from head node to the worker nodes > to do the step 3, then stage the 26 files in step 4 back to head node. > I don't want it because the network will be a huge bottleneck. Is > there any way to tell the "reduce" worker to get data directly from > "map" worker? Maybe a shared file system will help, but is there any > way that user can control the data staging between workers without > using the shared file system? > > Since I am new to the swift, I may be totally wrong and > misunderstanding what swift do. If so, please correct me. > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ggowda at hawk.iit.edu Sun Oct 19 00:08:16 2014 From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda) Date: Sun, 19 Oct 2014 10:38:16 +0530 Subject: [Swift-user] sort on large data In-Reply-To: <5442F400.9020805@uchicago.edu> References: <5442F400.9020805@uchicago.edu> Message-ID: Hi Yadu, I am in the same direction where I am trying to use a shared file system (S3 bucket / S3FS). I have setup : *WORKER_INIT_SCRIPT=/path/to/mounts3fs.sh in cloud-tutorials/ec2/configs** (as mentioned in the tutorials)* Though i am able to setup the passwd-s3fs file in the desired location (using mounts3fs.sh script), i see that the S3 bucket is not getting mounted. I have verified the passwd-s3fs file and mount point and all seems to be created as expected. But, one observation was the owner of these files were 'root' user as it was getting created through the setup.sh. So, i added more commands to change the permissions and made 'ubuntu' as the owner for all related files. Even after all these changes, i see that the S3 bucket is still not mounted. *PS: If i connect to the workers and run the s3fs command manually, it does mount !* sudo s3fs -o allow_other,gid=1000,use_cache=/home/ubuntu/cache ; (tried with and without sudo) Thanks for your help. On Sun, Oct 19, 2014 at 4:43 AM, Yadu Nand Babuji wrote: > Hi Jiada Tu, > > 1) Here's an example for returning an array of files : > > type file; > app (file outs[]) make_outputs (file script) > { > bash @script; > } > > file outputs[] ; > file script <"make_outputs.sh">; # This script creates a few files > with outputs as prefix > (outputs) = make_outputs(script); > > 2) The products of a successful task execution, must be visible to the > headnode (where swift runs) either through a > - shared filesystem (NFS, S3 mounted over s3fs etc) or > - must be brought back over the network. > But, we can reduce the overhead in moving the results to the headnode and > then to the workers for the reduce stage. > > I understand that this is part of your assignment, so I will try to answer > without getting too specific, at the same time, > concepts from hadoop do not necessarily work directly in this context. So > here are some things to consider to get > the best performance possible: > > - Assuming that the texts contain 10K unique words, your sort program will > generate a file containing atmost 10K lines > (which would be definitely under an MB). Is there any advantage into > splitting this into smaller files ? > > - Since the final merge involves tiny files, you could very well do the > reduce stage on the headnode and be quite efficient > (you can define the reduce app only for site:local) > > sites : [local, cloud-static] > site.local { > .... > app.reduce { > executable : ${env.PWD}/reduce.py > } > } > > site.cloud-static { > .... > app.python { > executable : /usr/bin/python > } > > } > > This assumes that you are going to define your sorting app like this : > > app (file freqs) sort (file sorting_script, file input ) { > python @sorting_script @input; > } > > > - The real cost is in having the original text reach the workers, this can > be made faster by : > - A better headnode with better network/disk IO (I've measured > 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s) > - Use S3 with S3fs and have swift-workers pull data from S3 which is > pretty scalable, and remove the IO load from the headnode. > > - Identify the optimal size for data chunks for your specific problem. > Each chunk of data in this case comes with the overhead of starting > a new remote task, sending the data and bringing results back. Note that > the result of a wordcount on a file whether it is 1Mb or 10Gb > is still the atmost 1Mb (with earlier assumptions) > > - Ensure that the data with the same datacenter, for cost as well as > performance. By limiting the cluster to US-Oregon we already do this. > > If you would like to attempt this using S3FS, let me know, I'll be happy > to explain that in detail. > > Thanks, > Yadu > > > > On 10/18/2014 04:18 PM, Jiada Tu wrote: > > I am doing an assignment with swift to sort large data. The data contains > one record (string) each line. We need to sort the records base on ascii > code. The data is too large to fit in the memory. > > The large data file is in head node, and I run the swift script directly > on head node. > > Here's what I plan to do: > > 1) split the big file into 64MB files > 2) let each worker task sort one 64MB files. Say, each task will call a > "sort.py" (written by me). sort.py will output a list of files, > say:"sorted-worker1-001; sorted-worker1-002; ......". The first file > contains the records started with 'a', the second started with 'b', etc. > 3) now we will have all records started with 'a' in > (sorted-worker1-001;sorted-worker2-001;...); 'b' in > (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all > the files contains records 'a' to a "reduce" worker task and let it merge > these files into one single file. Same to 'b', 'c', etc. > 4) now we get 26 files (a-z) with each sorted inside. > > Basically what I am doing is simulate Map-reduce. step 2 is map and step > 3 is reduce > > Here comes some problems: > 1) for step 2, sort.py need to output a list of files. How can swift app > function handles list of outputs? > > app (file[] outfiles) sort (file[] infiles) { > sort.py // how to put out files here? > } > > 2) As I know (may be wrong), swift will stage all the output file back > to the local disk (here is the head node since I run the swift script > directly on headnode). So the output files in step 2 will be staged back to > head node first, then stage from head node to the worker nodes to do the > step 3, then stage the 26 files in step 4 back to head node. I don't want > it because the network will be a huge bottleneck. Is there any way to tell > the "reduce" worker to get data directly from "map" worker? Maybe a shared > file system will help, but is there any way that user can control the data > staging between workers without using the shared file system? > > Since I am new to the swift, I may be totally wrong and misunderstanding > what swift do. If so, please correct me. > > > > > _______________________________________________ > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Regards, Gagan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtu3 at hawk.iit.edu Sun Oct 19 04:00:42 2014 From: jtu3 at hawk.iit.edu (Jiada Tu) Date: Sun, 19 Oct 2014 04:00:42 -0500 Subject: [Swift-user] sort on large data In-Reply-To: <544341D2.9020104@uchicago.edu> References: <5442F400.9020805@uchicago.edu> <544341D2.9020104@uchicago.edu> Message-ID: Hi Yadu, Thanks for your answer! That's really helpful. I forgot to forward my last question and your answer to swift-user group, so if anybody have the same confusions or have interest on it, please check below : On Sat, Oct 18, 2014 at 11:45 PM, Yadu Nand Babuji wrote: > Hi Jiada, > > Please find replies inline : > > On 10/18/2014 09:30 PM, Jiada Tu wrote: > > Thanks for your answer, Yadu. I have some more questions: > 1) If I didn't misunderstand, the (outputs) will be staging back to the > head node, right? > > Yes, in the current modes that you are using. > > 2) > --------------------------- > type file; > app (file outs[]) make_outputs (file script) > { > bash @script; > } > > file outputs[] ; > file script <"make_outputs.sh">; # This script creates a few files > with outputs as prefix > (outputs) = make_outputs(script); > --------------------------- > > If I have some later app function that takes the "outputs" files as > input, will that app function wait until all possible > > outputs generated? > > Yes! Swift is implicitly parallel, and the order of execution is based on > the availability of dependent data items. > > For example: > > app (file outs[]) final_ouputs (file script, file input[]) > { > bash @script @filenames(input) > } > foreach i in [0:100] > { > file outputs[] ; > #I know ""outputs-"+ at tostring(i)+"-"" may not work, please think it > as pesudo-code > (outputs) = make_outputs(script); > } > > file script2 <"final_output"> #take multiple input and merge them into a > file > file inputs[] > file finoutput > (finoutput) = final_ouputs(script2, inputs) > > final_outputs needs to take some output files from "every" single loops > as its input (first loop may generate "outputs-0-000", second loop may > generate "outputs-1-000", etc). > > So, will final_outputs() task be "block" until all make_outputs() task > finish processing? > > Yes, final_outputs will block till the array that it depends on is closed. > > 3) Actually, wordCount is our first program, and sort is our second > program which gives "extra credits". You gives a great answer to another > question which I also confused in. > > I hope I did not give too much away :) > > The output of sort program will be 10GB, so it will not fit in memory. > That why I want to split the intermediate files and send them to several > merge task. Each merge task will generate, say, a 100MB file. So the result > of my sort program will have 10GB/100MB=100 files, with file1 have the > smallest words and file100 have the largest words. > > From your answer, I believe this can be deal with by using s3fs? so: > 4) Yes, I want some help to use s3fs. > > Since this is something that would be of general interest. I will update > the github readme page with directions for how to run swift > over a s3fs acting as a shared-filesystem. > > But, are there any other general ways to deal with big-file-sorting in > swift? Can you give me a hit about what they would be? Like, how you > generate deal with this sorting problem? > > The simplest strategy I can think of is to split each chunk into say 100 > buckets, and have the corresponding buckets from every chunk merge-sorted. > > Thanks, > Jiada Tu > > On Sat, Oct 18, 2014 at 6:13 PM, Yadu Nand Babuji > wrote: > >> Hi Jiada Tu, >> >> 1) Here's an example for returning an array of files : >> >> type file; >> app (file outs[]) make_outputs (file script) >> { >> bash @script; >> } >> >> file outputs[] ; >> file script <"make_outputs.sh">; # This script creates a few files >> with outputs as prefix >> (outputs) = make_outputs(script); >> >> 2) The products of a successful task execution, must be visible to the >> headnode (where swift runs) either through a >> - shared filesystem (NFS, S3 mounted over s3fs etc) or >> - must be brought back over the network. >> But, we can reduce the overhead in moving the results to the headnode and >> then to the workers for the reduce stage. >> >> I understand that this is part of your assignment, so I will try to >> answer without getting too specific, at the same time, >> concepts from hadoop do not necessarily work directly in this context. So >> here are some things to consider to get >> the best performance possible: >> >> - Assuming that the texts contain 10K unique words, your sort program >> will generate a file containing atmost 10K lines >> (which would be definitely under an MB). Is there any advantage into >> splitting this into smaller files ? >> >> - Since the final merge involves tiny files, you could very well do the >> reduce stage on the headnode and be quite efficient >> (you can define the reduce app only for site:local) >> >> sites : [local, cloud-static] >> site.local { >> .... >> app.reduce { >> executable : ${env.PWD}/reduce.py >> } >> } >> >> site.cloud-static { >> .... >> app.python { >> executable : /usr/bin/python >> } >> >> } >> >> This assumes that you are going to define your sorting app like this : >> >> app (file freqs) sort (file sorting_script, file input ) { >> python @sorting_script @input; >> } >> >> >> - The real cost is in having the original text reach the workers, this >> can be made faster by : >> - A better headnode with better network/disk IO (I've measured >> 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s) >> - Use S3 with S3fs and have swift-workers pull data from S3 which is >> pretty scalable, and remove the IO load from the headnode. >> >> - Identify the optimal size for data chunks for your specific problem. >> Each chunk of data in this case comes with the overhead of starting >> a new remote task, sending the data and bringing results back. Note >> that the result of a wordcount on a file whether it is 1Mb or 10Gb >> is still the atmost 1Mb (with earlier assumptions) >> >> - Ensure that the data with the same datacenter, for cost as well as >> performance. By limiting the cluster to US-Oregon we already do this. >> >> If you would like to attempt this using S3FS, let me know, I'll be happy >> to explain that in detail. >> >> Thanks, >> Yadu >> >> >> >> On 10/18/2014 04:18 PM, Jiada Tu wrote: >> >> I am doing an assignment with swift to sort large data. The data >> contains one record (string) each line. We need to sort the records base on >> ascii code. The data is too large to fit in the memory. >> >> The large data file is in head node, and I run the swift script >> directly on head node. >> >> Here's what I plan to do: >> >> 1) split the big file into 64MB files >> 2) let each worker task sort one 64MB files. Say, each task will call a >> "sort.py" (written by me). sort.py will output a list of files, >> say:"sorted-worker1-001; sorted-worker1-002; ......". The first file >> contains the records started with 'a', the second started with 'b', etc. >> 3) now we will have all records started with 'a' in >> (sorted-worker1-001;sorted-worker2-001;...); 'b' in >> (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all >> the files contains records 'a' to a "reduce" worker task and let it merge >> these files into one single file. Same to 'b', 'c', etc. >> 4) now we get 26 files (a-z) with each sorted inside. >> >> Basically what I am doing is simulate Map-reduce. step 2 is map and >> step 3 is reduce >> >> Here comes some problems: >> 1) for step 2, sort.py need to output a list of files. How can swift app >> function handles list of outputs? >> >> app (file[] outfiles) sort (file[] infiles) { >> sort.py // how to put out files here? >> } >> >> 2) As I know (may be wrong), swift will stage all the output file back >> to the local disk (here is the head node since I run the swift script >> directly on headnode). So the output files in step 2 will be staged back to >> head node first, then stage from head node to the worker nodes to do the >> step 3, then stage the 26 files in step 4 back to head node. I don't want >> it because the network will be a huge bottleneck. Is there any way to tell >> the "reduce" worker to get data directly from "map" worker? Maybe a shared >> file system will help, but is there any way that user can control the data >> staging between workers without using the shared file system? >> >> Since I am new to the swift, I may be totally wrong and >> misunderstanding what swift do. If so, please correct me. >> >> >> >> >> _______________________________________________ >> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Mon Oct 20 20:03:11 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Mon, 20 Oct 2014 20:03:11 -0500 Subject: [Swift-user] sort on large data In-Reply-To: References: <5442F400.9020805@uchicago.edu> Message-ID: <5445B0CF.2000502@uchicago.edu> Hi, @Jiada, Dongfang, I've updated the README on the https://github.com/yadudoc/cloud-tutorials page with documentation on how to use s3fs as a shared filesystem. I've added configs and links to external documentation. Please try it, and let me know if any of it is unclear or buggy. I would also appreciate help from anyone in testing this. @Gagan, That was most likely a bug in my scripts, where the user script is executed ahead of the installation of s3fs on the worker nodes. Please try again, and if you see the same behavior, please let me know. Thanks, Yadu On 10/19/2014 12:08 AM, Gagan Munisiddha Gowda wrote: > Hi Yadu, > > I am in the same direction where I am trying to use a shared file > system (S3 bucket / S3FS). > > I have setup : /WORKER_INIT_SCRIPT=/path/to/mounts3fs.sh in > cloud-tutorials/ec2/configs//(as mentioned in the tutorials)/ > > Though i am able to setup the passwd-s3fs file in the desired location > (using mounts3fs.sh script), i see that the S3 bucket is not getting > mounted. > > I have verified the passwd-s3fs file and mount point and all seems to > be created as expected. But, one observation was the owner of these > files were 'root' user as it was getting created through the setup.sh. > > So, i added more commands to change the permissions and made 'ubuntu' > as the owner for all related files. > > Even after all these changes, i see that the S3 bucket is still not > mounted. > > *PS: If i connect to the workers and run the s3fs command manually, it > does mount !* > > sudo s3fs -o allow_other,gid=1000,use_cache=/home/ubuntu/cache > ; > > (tried with and without sudo) > > Thanks for your help. > > > On Sun, Oct 19, 2014 at 4:43 AM, Yadu Nand Babuji > > wrote: > > Hi Jiada Tu, > > 1) Here's an example for returning an array of files : > > type file; > app (file outs[]) make_outputs (file script) > { > bash @script; > } > > file outputs[] ; > file script <"make_outputs.sh">; # This script creates a few > files with outputs as prefix > (outputs) = make_outputs(script); > > 2) The products of a successful task execution, must be visible to > the headnode (where swift runs) either through a > - shared filesystem (NFS, S3 mounted over s3fs etc) or > - must be brought back over the network. > But, we can reduce the overhead in moving the results to the > headnode and then to the workers for the reduce stage. > > I understand that this is part of your assignment, so I will try > to answer without getting too specific, at the same time, > concepts from hadoop do not necessarily work directly in this > context. So here are some things to consider to get > the best performance possible: > > - Assuming that the texts contain 10K unique words, your sort > program will generate a file containing atmost 10K lines > (which would be definitely under an MB). Is there any advantage > into splitting this into smaller files ? > > - Since the final merge involves tiny files, you could very well > do the reduce stage on the headnode and be quite efficient > (you can define the reduce app only for site:local) > > sites : [local, cloud-static] > site.local { > .... > app.reduce { > executable : ${env.PWD}/reduce.py > } > } > > site.cloud-static { > .... > app.python { > executable : /usr/bin/python > } > > } > > This assumes that you are going to define your sorting app like > this : > > app (file freqs) sort (file sorting_script, file input ) { > python @sorting_script @input; > } > > > - The real cost is in having the original text reach the workers, > this can be made faster by : > - A better headnode with better network/disk IO (I've measured > 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s) > - Use S3 with S3fs and have swift-workers pull data from S3 > which is pretty scalable, and remove the IO load from the headnode. > > - Identify the optimal size for data chunks for your specific > problem. Each chunk of data in this case comes with the overhead > of starting > a new remote task, sending the data and bringing results back. > Note that the result of a wordcount on a file whether it is 1Mb or > 10Gb > is still the atmost 1Mb (with earlier assumptions) > > - Ensure that the data with the same datacenter, for cost as well > as performance. By limiting the cluster to US-Oregon we already do > this. > > If you would like to attempt this using S3FS, let me know, I'll be > happy to explain that in detail. > > Thanks, > Yadu > > > > On 10/18/2014 04:18 PM, Jiada Tu wrote: >> I am doing an assignment with swift to sort large data. The data >> contains one record (string) each line. We need to sort the >> records base on ascii code. The data is too large to fit in the >> memory. >> >> The large data file is in head node, and I run the swift script >> directly on head node. >> >> Here's what I plan to do: >> >> 1) split the big file into 64MB files >> 2) let each worker task sort one 64MB files. Say, each task will >> call a "sort.py" (written by me). sort.py will output a list of >> files, say:"sorted-worker1-001; sorted-worker1-002; ......". The >> first file contains the records started with 'a', the second >> started with 'b', etc. >> 3) now we will have all records started with 'a' in >> (sorted-worker1-001;sorted-worker2-001;...); 'b' in >> (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I >> send all the files contains records 'a' to a "reduce" worker task >> and let it merge these files into one single file. Same to 'b', >> 'c', etc. >> 4) now we get 26 files (a-z) with each sorted inside. >> >> Basically what I am doing is simulate Map-reduce. step 2 is map >> and step 3 is reduce >> >> Here comes some problems: >> 1) for step 2, sort.py need to output a list of files. How can >> swift app function handles list of outputs? >> app (file[] outfiles) sort (file[] infiles) { >> sort.py // how to put out files here? >> } >> >> 2) As I know (may be wrong), swift will stage all the output file >> back to the local disk (here is the head node since I run the >> swift script directly on headnode). So the output files in step 2 >> will be staged back to head node first, then stage from head node >> to the worker nodes to do the step 3, then stage the 26 files in >> step 4 back to head node. I don't want it because the network >> will be a huge bottleneck. Is there any way to tell the "reduce" >> worker to get data directly from "map" worker? Maybe a shared >> file system will help, but is there any way that user can control >> the data staging between workers without using the shared file >> system? >> >> Since I am new to the swift, I may be totally wrong and >> misunderstanding what swift do. If so, please correct me. >> >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Regards, > Gagan -------------- next part -------------- An HTML attachment was scrubbed... URL: From vpabani at hawk.iit.edu Mon Oct 20 22:20:16 2014 From: vpabani at hawk.iit.edu (Vivek Pabani) Date: Mon, 20 Oct 2014 22:20:16 -0500 Subject: [Swift-user] swift-conf error | Issue with specification on local/cloud-static Message-ID: Hello, For a word count program, I need to run one bash script locally - on headnode, and one python script on the all the worker nodes. Could you please tell me how do I specify which part of swift program should be run on headnode, and which part on workers node? As of now, I have made following changes in the swift.conf file : ------------------------------------------------------------------ sites: [local,cloud-static] //This seems to have problem, because I get error that python_remote cannot be found on Host : local. site.local { filesystem { app.bash_local { executable: "/bin/bash" } } site.cloud-static { execution { app.python_remote { executable: "/usr/bin/python" } } ------------------------------------------------------------ -------------------------- My swift program uses only two executable : bash_local, and python_remote. When I run this, I get error : Caused by: Cannot find executable python_remote on site system path. Please let me know what should be the correct config changes. Thanks, Vivek -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Mon Oct 20 23:02:57 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Mon, 20 Oct 2014 23:02:57 -0500 Subject: [Swift-user] swift-conf error | Issue with specification on local/cloud-static In-Reply-To: References: Message-ID: Hi Vivek, Could you send us a tarball of the runNNN folder from a failed run please ? -Yadu On Mon, Oct 20, 2014 at 10:20 PM, Vivek Pabani wrote: > Hello, > > For a word count program, I need to run one bash script locally - on > headnode, and one python script on the all the worker nodes. Could you > please tell me how do I specify which part of swift program should be run > on headnode, and which part on workers node? > > As of now, I have made following changes in the swift.conf file : > > ------------------------------------------------------------------ > sites: [local,cloud-static] //This seems to have problem, > because I get error that python_remote cannot be found on Host : local. > > site.local { > filesystem { > > app.bash_local { > executable: "/bin/bash" > } > } > > site.cloud-static { > execution { > > app.python_remote { > executable: "/usr/bin/python" > } > } > > ------------------------------------------------------------ > -------------------------- > My swift program uses only two executable : bash_local, and python_remote. > When I run this, I get error : > Caused by: Cannot find executable python_remote on site system path. > > Please let me know what should be the correct config changes. > Thanks, > > Vivek > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From ggowda at hawk.iit.edu Mon Oct 20 23:28:33 2014 From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda) Date: Tue, 21 Oct 2014 09:58:33 +0530 Subject: [Swift-user] sort on large data In-Reply-To: <5445B0CF.2000502@uchicago.edu> References: <5442F400.9020805@uchicago.edu> <5445B0CF.2000502@uchicago.edu> Message-ID: Great Yadu ! Thanks for your help ! Regards, Gagan On 21/10/2014 6:33 am, "Yadu Nand Babuji" wrote: > Hi, > > @Jiada, Dongfang, > > I've updated the README on the https://github.com/yadudoc/cloud-tutorials > page with documentation on how to use > s3fs as a shared filesystem. I've added configs and links to external > documentation. Please try it, and let me know > if any of it is unclear or buggy. > > I would also appreciate help from anyone in testing this. > > @Gagan, > That was most likely a bug in my scripts, where the user script is > executed ahead of the installation of s3fs on the worker nodes. > Please try again, and if you see the same behavior, please let me know. > > Thanks, > Yadu > > On 10/19/2014 12:08 AM, Gagan Munisiddha Gowda wrote: > > Hi Yadu, > > I am in the same direction where I am trying to use a shared file system > (S3 bucket / S3FS). > > I have setup : *WORKER_INIT_SCRIPT=/path/to/mounts3fs.sh in > cloud-tutorials/ec2/configs** (as mentioned in the tutorials)* > > Though i am able to setup the passwd-s3fs file in the desired location > (using mounts3fs.sh script), i see that the S3 bucket is not getting > mounted. > > I have verified the passwd-s3fs file and mount point and all seems to be > created as expected. But, one observation was the owner of these files were > 'root' user as it was getting created through the setup.sh. > > So, i added more commands to change the permissions and made 'ubuntu' as > the owner for all related files. > > Even after all these changes, i see that the S3 bucket is still not > mounted. > > *PS: If i connect to the workers and run the s3fs command manually, it > does mount !* > > sudo s3fs -o allow_other,gid=1000,use_cache=/home/ubuntu/cache > ; > > (tried with and without sudo) > > Thanks for your help. > > > On Sun, Oct 19, 2014 at 4:43 AM, Yadu Nand Babuji > wrote: > >> Hi Jiada Tu, >> >> 1) Here's an example for returning an array of files : >> >> type file; >> app (file outs[]) make_outputs (file script) >> { >> bash @script; >> } >> >> file outputs[] ; >> file script <"make_outputs.sh">; # This script creates a few files >> with outputs as prefix >> (outputs) = make_outputs(script); >> >> 2) The products of a successful task execution, must be visible to the >> headnode (where swift runs) either through a >> - shared filesystem (NFS, S3 mounted over s3fs etc) or >> - must be brought back over the network. >> But, we can reduce the overhead in moving the results to the headnode and >> then to the workers for the reduce stage. >> >> I understand that this is part of your assignment, so I will try to >> answer without getting too specific, at the same time, >> concepts from hadoop do not necessarily work directly in this context. So >> here are some things to consider to get >> the best performance possible: >> >> - Assuming that the texts contain 10K unique words, your sort program >> will generate a file containing atmost 10K lines >> (which would be definitely under an MB). Is there any advantage into >> splitting this into smaller files ? >> >> - Since the final merge involves tiny files, you could very well do the >> reduce stage on the headnode and be quite efficient >> (you can define the reduce app only for site:local) >> >> sites : [local, cloud-static] >> site.local { >> .... >> app.reduce { >> executable : ${env.PWD}/reduce.py >> } >> } >> >> site.cloud-static { >> .... >> app.python { >> executable : /usr/bin/python >> } >> >> } >> >> This assumes that you are going to define your sorting app like this : >> >> app (file freqs) sort (file sorting_script, file input ) { >> python @sorting_script @input; >> } >> >> >> - The real cost is in having the original text reach the workers, this >> can be made faster by : >> - A better headnode with better network/disk IO (I've measured >> 140Mbit/s between m1.medium nodes, c3.8xlarge comes with 975Mbits/s) >> - Use S3 with S3fs and have swift-workers pull data from S3 which is >> pretty scalable, and remove the IO load from the headnode. >> >> - Identify the optimal size for data chunks for your specific problem. >> Each chunk of data in this case comes with the overhead of starting >> a new remote task, sending the data and bringing results back. Note >> that the result of a wordcount on a file whether it is 1Mb or 10Gb >> is still the atmost 1Mb (with earlier assumptions) >> >> - Ensure that the data with the same datacenter, for cost as well as >> performance. By limiting the cluster to US-Oregon we already do this. >> >> If you would like to attempt this using S3FS, let me know, I'll be happy >> to explain that in detail. >> >> Thanks, >> Yadu >> >> >> >> On 10/18/2014 04:18 PM, Jiada Tu wrote: >> >> I am doing an assignment with swift to sort large data. The data >> contains one record (string) each line. We need to sort the records base on >> ascii code. The data is too large to fit in the memory. >> >> The large data file is in head node, and I run the swift script >> directly on head node. >> >> Here's what I plan to do: >> >> 1) split the big file into 64MB files >> 2) let each worker task sort one 64MB files. Say, each task will call a >> "sort.py" (written by me). sort.py will output a list of files, >> say:"sorted-worker1-001; sorted-worker1-002; ......". The first file >> contains the records started with 'a', the second started with 'b', etc. >> 3) now we will have all records started with 'a' in >> (sorted-worker1-001;sorted-worker2-001;...); 'b' in >> (sorted-worker1-002;sorted-worker2-002; ......); ...... Then I send all >> the files contains records 'a' to a "reduce" worker task and let it merge >> these files into one single file. Same to 'b', 'c', etc. >> 4) now we get 26 files (a-z) with each sorted inside. >> >> Basically what I am doing is simulate Map-reduce. step 2 is map and >> step 3 is reduce >> >> Here comes some problems: >> 1) for step 2, sort.py need to output a list of files. How can swift app >> function handles list of outputs? >> >> app (file[] outfiles) sort (file[] infiles) { >> sort.py // how to put out files here? >> } >> >> 2) As I know (may be wrong), swift will stage all the output file back >> to the local disk (here is the head node since I run the swift script >> directly on headnode). So the output files in step 2 will be staged back to >> head node first, then stage from head node to the worker nodes to do the >> step 3, then stage the 26 files in step 4 back to head node. I don't want >> it because the network will be a huge bottleneck. Is there any way to tell >> the "reduce" worker to get data directly from "map" worker? Maybe a shared >> file system will help, but is there any way that user can control the data >> staging between workers without using the shared file system? >> >> Since I am new to the swift, I may be totally wrong and >> misunderstanding what swift do. If so, please correct me. >> >> >> >> >> _______________________________________________ >> Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > > -- > Regards, > Gagan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtu3 at hawk.iit.edu Wed Oct 22 02:11:15 2014 From: jtu3 at hawk.iit.edu (Jiada Tu) Date: Wed, 22 Oct 2014 02:11:15 -0500 Subject: [Swift-user] sort on large data In-Reply-To: <5445B0CF.2000502@uchicago.edu> References: <5442F400.9020805@uchicago.edu> <5445B0CF.2000502@uchicago.edu> Message-ID: Hi Yadu, I have tested the new posted tutorial on s3fs and it works. I can run my wordCount on s3fs now. But there's a little problem. I put all script and input file in /s3/wordCount-s3fs, and set the work directory to /s3/wordCount-s3fs. Then I found that I can't set relative path of files in my swift script, I have to set absolute path for all files. *If I use absolute path to all input and output files, everything works fine. If I did: file infile[] ; It will have a exception: ---------------------------------------- Execution failed: Exception in python: Arguments: [/s3/wordCount-s3fs/./wordCount.py, input/split-0006] Host: cloud-static Directory: wordCount-run001/jobs/g/python-g9k2d6zl exception @ swift-int-staging.k, line: 167 Caused by: Application /usr/bin/python failed with an exit code of 1 ------- Application STDERR -------- wordcount error: file name "./input/split-0006" not exist. Traceback (most recent call last): File "/s3/wordCount-s3fs/./wordCount.py", line 12, in f=open(fileName, 'r') IOError: [Errno 2] No such file or directory: './input/split-0006' ----------------------------------- exception @ swift-int-staging.k, line: 163 Caused by: Block task failed: Connection to worker lost java.io.EOFException at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.readFromChannel(AbstractStreamCoasterChannel.java:253) at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:186) at org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:116) at org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:75) k:assign @ swift.k, line: 171 Caused by: Exception in python: Arguments: [/s3/wordCount-s3fs/./wordCount.py, input/split-0006] Host: cloud-static Directory: wordCount-run001/jobs/g/python-g9k2d6zl exception @ swift-int-staging.k, line: 167 Caused by: Application /usr/bin/python failed with an exit code of 1 ------- Application STDERR -------- wordcount error: file name "./input/split-0006" not exist. Traceback (most recent call last): File "/s3/wordCount-s3fs/./wordCount.py", line 12, in f=open(fileName, 'r') IOError: [Errno 2] No such file or directory: './input/split-0006' ----------------------------------- exception @ swift-int-staging.k, line: 163 Caused by: Block task failed: Connection to worker lost java.io.EOFException at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.readFromChannel(AbstractStreamCoasterChannel.java:253) at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:186) at org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:116) at org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:75) ------------------------------------------------ Which basically says ./input/split-000 not exist. I'm sure /s3/wordCount-s3fs/input/split-000 do exist. And I don't want to enter absolute path to all files if possible. Any idea about how to deal with it? Thanks, Jiada Tu -------------- next part -------------- An HTML attachment was scrubbed... URL: From aanthon2 at hawk.iit.edu Thu Oct 23 20:59:24 2014 From: aanthon2 at hawk.iit.edu (Ajay Anthony) Date: Thu, 23 Oct 2014 21:59:24 -0400 Subject: [Swift-user] CS553: Stuck with Swift "bash application not available" issue . Please suggest Message-ID: Hi Swift Users, Please help me with below issue. I am stuck on this since 2 days. I am trying to run split.sh script through word.swift file, which will split a file in 16 chunks. split.sh: #! /bin/bash split -a 2 -d -n 16 small-dataset /home/ajay/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat/input/small-dataset word.swift: type file; app (file out) split (file script) { bash @script stdout=@filename(out); } file script_file <"*split.sh*">; file output; output = split(script_file); I am facing the below issue: RunID: 20141023-2145-ur5u9gpa Progress: time: Thu, 23 Oct 2014 21:45:55 -0400 Execution failed: The application "bash" is not available for any site/pool in your tc.data catalog split, word.swift, line 10 The script version I am using is : ajay at ubuntu:~/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat$ swift -version Swift 0.94.1 swift-r7114 cog-r3803 My swift.conf file has below configurations: sites: local site.local { filesystem { type: "local" URL: "localhost" } execution { type: "local" URL: "localhost" } workDirectory: /tmp/${env.USER} maxParallelTasks: 32 initialParallelTasks: 31 app.ALL {executable: "*"} } Thanks in Advance. Regards, Ajay Anthony, -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Oct 23 21:46:25 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 23 Oct 2014 19:46:25 -0700 Subject: [Swift-user] CS553: Stuck with Swift "bash application not available" issue . Please suggest In-Reply-To: References: Message-ID: <1414118785.31177.2.camel@echo> Hi, You have some version mismatch there. > swift -version > Swift 0.94.1 swift-r7114 cog-r3803 However, the swift configuration format that you are using (swift.conf), is only available in trunk, and that is two versions above 0.94. So the question is how you got that swift version, and why that particular version. Mihael On Thu, 2014-10-23 at 21:59 -0400, Ajay Anthony wrote: > Hi Swift Users, > > Please help me with below issue. I am stuck on this since 2 days. > > I am trying to run split.sh script through word.swift file, which will > split a file in 16 chunks. > > split.sh: > #! /bin/bash > > split -a 2 -d -n 16 small-dataset > /home/ajay/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat/input/small-dataset > > > > word.swift: > type file; > > app (file out) split (file script) > { > bash @script stdout=@filename(out); > } > > file script_file <"*split.sh*">; > file output; > output = split(script_file); > > > I am facing the below issue: > > RunID: 20141023-2145-ur5u9gpa > Progress: time: Thu, 23 Oct 2014 21:45:55 -0400 > Execution failed: > The application "bash" is not available for any site/pool in your > tc.data catalog > split, word.swift, line 10 > > > The script version I am using is : > ajay at ubuntu:~/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat$ > swift -version > Swift 0.94.1 swift-r7114 cog-r3803 > > > My swift.conf file has below configurations: > > sites: local > > site.local { > filesystem { > type: "local" > URL: "localhost" > } > execution { > type: "local" > URL: "localhost" > } > workDirectory: /tmp/${env.USER} > maxParallelTasks: 32 > initialParallelTasks: 31 > app.ALL {executable: "*"} > } > > Thanks in Advance. > > Regards, > Ajay Anthony, > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From yadudoc1729 at gmail.com Thu Oct 23 21:50:46 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 23 Oct 2014 21:50:46 -0500 Subject: [Swift-user] CS553: Stuck with Swift "bash application not available" issue . Please suggest In-Reply-To: <1414118785.31177.2.camel@echo> References: <1414118785.31177.2.camel@echo> Message-ID: Hi Ajay, As Mihael pointed out, you are using the wrong swift version for the tutorials you are running. Please try the following steps on the headnode instance: cd /home/ubuntu/cloud-tutorials/swift-cloud-tutorial/ source setup.sh swift -version The swift version should output something like the following : Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master If that is what you see, please retry the tutorials. -Yadu On Thu, Oct 23, 2014 at 9:46 PM, Mihael Hategan wrote: > Hi, > > You have some version mismatch there. > > > swift -version > > Swift 0.94.1 swift-r7114 cog-r3803 > > However, the swift configuration format that you are using (swift.conf), > is only available in trunk, and that is two versions above 0.94. > > So the question is how you got that swift version, and why that > particular version. > > Mihael > > On Thu, 2014-10-23 at 21:59 -0400, Ajay Anthony wrote: > > Hi Swift Users, > > > > Please help me with below issue. I am stuck on this since 2 days. > > > > I am trying to run split.sh script through word.swift file, which will > > split a file in 16 chunks. > > > > split.sh: > > #! /bin/bash > > > > split -a 2 -d -n 16 small-dataset > > > /home/ajay/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat/input/small-dataset > > > > > > > > word.swift: > > type file; > > > > app (file out) split (file script) > > { > > bash @script stdout=@filename(out); > > } > > > > file script_file <"*split.sh*">; > > file output; > > output = split(script_file); > > > > > > I am facing the below issue: > > > > RunID: 20141023-2145-ur5u9gpa > > Progress: time: Thu, 23 Oct 2014 21:45:55 -0400 > > Execution failed: > > The application "bash" is not available for any site/pool in your > > tc.data catalog > > split, word.swift, line 10 > > > > > > The script version I am using is : > > ajay at ubuntu > :~/CS553/Ass_2/Swift/cloud-tutorials/swift-cloud-tutorial/cloud-cat$ > > swift -version > > Swift 0.94.1 swift-r7114 cog-r3803 > > > > > > My swift.conf file has below configurations: > > > > sites: local > > > > site.local { > > filesystem { > > type: "local" > > URL: "localhost" > > } > > execution { > > type: "local" > > URL: "localhost" > > } > > workDirectory: /tmp/${env.USER} > > maxParallelTasks: 32 > > initialParallelTasks: 31 > > app.ALL {executable: "*"} > > } > > > > Thanks in Advance. > > > > Regards, > > Ajay Anthony, > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Fri Oct 24 18:55:32 2014 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Fri, 24 Oct 2014 18:55:32 -0500 Subject: [Swift-user] CFP: IEEE Cluster 2015 -- Chicago IL September 8-11 2015 Message-ID: <544AE6F4.7080506@cs.iit.edu> IEEE International Conference on Cluster Computing September 8-11, 2015 Chicago, IL, USA https://press3.mcs.anl.gov/ieeecluster2015/ ---------------------------------------------- ...Follow us on Facebook athttps://www.facebook.com/ieee.cluster ...Follow us on Twitter athttps://twitter.com/IEEECluster ...Follow us on Linkedin at https://www.linkedin.com/groups/IEEE-International-Conference-on-Cluster-7428925 ...Follow us on RenRen athttp://page.renren.com/601871401 ---------------------------------------------- CALL FOR PAPERS Following the successes of the series of Cluster conferences, for 2015 we solicit high-quality original papers presenting work that advances the state-of-the-art in clusters and closely related fields. All papers will be rigorously peer-reviewed for their originality, technical depth and correctness, potential impact, relevance to the conference, and quality of presentation. Research papers must clearly demonstrate research contributions and novelty, while papers reporting experience must clearly describe lessons learned and impact, along with the utility of the approach compared to the ones in the past. PAPER TRACKS * Applications, Algorithms, and Libraries * Architecture, Networks/Communication, and Management * Programming and Systems Software * Data, Storage, and Visualization PROCEEDINGS: Proceedings of the conference and workshops will be available online when the conference starts and will be submitted to IEEE Xplore and for EI indexing. SPECIAL JOURNAL ISSUE: The best papers of Cluster 2015 will be included in a Special Issue on advances in topics related to cluster computing of the Elsevier International Journal of Parallel Computing (PARCO), edited by Pavan Balaji, Satoshi Matsuoka, and Michela Taufer. This special issue is dedicated for the papers accepted in the Cluster 2015 conference. The submission to this special issue is by invitation only. IMPORTANT DATES September 27, 2014 .... Submissions open for Workshops January 1, 2015 ........... Submissions open for Papers, Posters, and Tutorials February 27, 2015 ....... Papers Submission Deadline April 23, 2015 ............... Papers Acceptance Notification May 1, 2015 ................. Posters Submission Deadline May 1, 2015 ................. Submissions open for Student Mentoring Program June 1, 2015 ................ Student Mentoring Program Notification (Round 1) June 15, 2015 .............. Posters Acceptance Notification June 15, 2015 .............. Student Mentoring Program Notification (Round 2) June 29, 2015 .............. Student Mentoring Program Notification (Round 3) July 13, 2015 ............... Student Mentoring Program Notification (Round 4) July 13, 2015 ............... Student Mentoring Program NSF Grant Notification August 1, 2015 ............ Camera-ready Copy Deadline for Papers, Posters, and Workshops Workshop/Tutorial proposals are selected and notifications are sent on a first-come basis. SUBMISSION GUIDELINES Authors are invited to submit papers electronically in PDF format. Submitted manuscripts should be structured as technical papers and may not exceed 10 letter-size (8.5 x 11) pages including figures, tables and references using the IEEE format for conference proceedings. Submissions not conforming to these guidelines may be returned without review. Authors should make sure that their file will print on a printer that uses letter-size (8.5 x 11) paper. The official language of the conference is English. All manuscripts will be reviewed and will be judged on correctness, originality, technical strength, significance, quality of presentation, and interest and relevance to the conference attendees. Paper submissions are limited to 10 pages in 2-column IEEE format including all figures and references. Submitted manuscripts exceeding this limit will be returned without review. For the final camera-ready version, authors with accepted papers may purchase additional pages at the following rates: 200 USD for each of two additional pages. See formatting templates for details: * LaTex Package http://datasys.cs.iit.edu/events/CCGrid2014/IEEECS_confs_LaTeX.zip * Word Template http://datasys.cs.iit.edu/events/CCGrid2014/instruct8.5x11x2.doc Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors and sponsors of the conference. Submissions received after the due date, exceeding the page limit, or not appropriately structured may not be considered. Authors may contact the conference chairs for more information. The proceedings will be published through the IEEE Computer Society Conference Publishing Services. ORGANIZATION:: - General Co-chairs: Pavan Balaji (Argonne National Laboratory, USA), Michela Taufer (University of Delaware, USA) - Program Chair: Satoshi Matsuoka (Tokyo Institute of Technology, USA) - Posters Chair: Seetharami Seelam (IBM, USA) - Proceedings Chair: Antonino Tumeo (Pacific Northwest National Laboratory, USA) - Workshops and Tutorials Chair: Bronis de Supinski (Lawrence Livermore National Laboratory, USA) - Panels Chair: Alice Koniges (Lawrence Berkeley National Laboratory, USA) - Mentoring Program Chair: Luc Boug? (?cole Normale Sup?rieure de Rennes, France) - Track Chairs: * Applications, Algorithms, and Libraries - Richard Vuduc (Georgia Tech, USA) * Architecture, Networks/Communication, and Management - Todd Gamblin (Lawrence Livermore National Laboratory, USA) * Programming and System Software - Naoya Maruyama (Riken AICS, Japan) * Data, Storage, and Visualization - Gabriel Antoniu (INRIA, France) MORE INFORMATION: For more information, contact Pavan Balaji (balaji at anl.gov) or Michela Taufer (taufer at udel.edu). -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Editor: IEEE TCC, Springer Cluster, Springer JoCCASA Chair: IEEE/ACM MTAGS, ACM ScienceCloud ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ LinkedIn: http://www.linkedin.com/in/ioanraicu Google: http://scholar.google.com/citations?user=jE73HYAAAAAJ ================================================================= ================================================================= From ggowda at hawk.iit.edu Sat Oct 25 06:53:44 2014 From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda) Date: Sat, 25 Oct 2014 17:23:44 +0530 Subject: [Swift-user] Walltime Exceeded Message-ID: Hi, I am running a swift program on 10gb data (split into 100mb each). I am using a Node.js script for processing. I did increase the Walltime by adding below to persistent-coaster and local-coaster files in /usr/local/bin/swift-trunk/etc/sites. 00:05:00 But, of no use. The entire program runs for around 14 minutes when it throws this exception. Here's my error : 2014-10-25 10:29:04,113+0000 INFO Execute TASK_STATUS_CHANGE taskid=urn:R-3-1414232059111 status=5 Walltime exceeded 2014-10-25 10:29:04,121+0000 INFO LateBindingScheduler jobs queued: 0 2014-10-25 10:29:04,122+0000 DEBUG swift APPLICATION_EXCEPTION jobid=node-vvboibzl - Application exception: Walltime exceeded 2014-10-25 10:29:04,125+0000 INFO swift END_FAILURE thread=R-3 tr=node 2014-10-25 10:29:04,127+0000 INFO Loader Swift finished with errors Please find complete log file attached. -- Regards, Gagan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: swift.log Type: text/x-log Size: 987717 bytes Desc: not available URL: From yadunand at uchicago.edu Sat Oct 25 12:45:13 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Sat, 25 Oct 2014 12:45:13 -0500 Subject: [Swift-user] Walltime Exceeded In-Reply-To: References: Message-ID: <544BE1A9.9000203@uchicago.edu> Hi Gagan, You are using swift-trunk on the cloud nodes, so you should be using the (swift.conf) config files for trunk. The configs are specified using swift.conf files, and you should be able to get a few examples from the cloud-tutorials/swift-cloud-tutorial folders. If you have the swift.conf file in your current directory you need not specify the conf file on the swift commandline, say "swift p5.swift" while in the cloud-tutorials/swift-cloud-tutorials/part05 directory. Here's a sample config with the walltime for ALL apps set to 15 minutes : sites: local site.local { filesystem { type: "local" URL: "localhost" } execution { type: "local" URL: "localhost" } workDirectory: /tmp/${env.USER}/swiftwork maxParallelTasks: 32 initialParallelTasks: 31 app.ALL { executable: "*" maxWallTime: "00:15:00" } } The swift-trunk userguide has more detailed documentation on the various configuration options that the swift.conf file takes: http://swift-lang.org/guides/trunk/userguide/userguide.html Thanks, Yadu On 10/25/2014 06:53 AM, Gagan Munisiddha Gowda wrote: > Hi, > > I am running a swift program on 10gb data (split into 100mb each). > > I am using a Node.js script for processing. > > I did increase the Walltime by adding below to persistent-coaster and > local-coaster files in /usr/local/bin/swift-trunk/etc/sites. > > 00:05:00 > > But, of no use. The entire program runs for around 14 minutes when it > throws this exception. > > > Here's my error : > > 2014-10-25 10:29:04,113+0000 INFO Execute TASK_STATUS_CHANGE > taskid=urn:R-3-1414232059111 status=5 Walltime exceeded > 2014-10-25 10:29:04,121+0000 INFO LateBindingScheduler jobs queued: 0 > 2014-10-25 10:29:04,122+0000 DEBUG swift APPLICATION_EXCEPTION > jobid=node-vvboibzl - Application exception: Walltime exceeded > > 2014-10-25 10:29:04,125+0000 INFO swift END_FAILURE thread=R-3 tr=node > 2014-10-25 10:29:04,127+0000 INFO Loader Swift finished with errors > > Please find complete log file attached. > > -- > Regards, > Gagan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Sat Oct 25 14:24:34 2014 From: wilde at anl.gov (Michael Wilde) Date: Sat, 25 Oct 2014 14:24:34 -0500 Subject: [Swift-user] Walltime Exceeded In-Reply-To: <544BE1A9.9000203@uchicago.edu> References: <544BE1A9.9000203@uchicago.edu> Message-ID: <544BF8F2.2000209@anl.gov> In addition to Yadu's good pointers, I would add: if your app( ) call was terminated by Swift for running for 15 minutes, and you were trying to set that time limit down to 5 minutes, then you also need to investigate why your app( ) call was running for so long. Is it perhaps in an infinite loop, or running much less efficiently than you expected? Based on how you are managing output files, you should look for a way to see if your app( ) run is executing productively and as expected. - Mike On 10/25/14 12:45 PM, Yadu Nand Babuji wrote: > Hi Gagan, > > You are using swift-trunk on the cloud nodes, so you should be using > the (swift.conf) config files for trunk. > The configs are specified using swift.conf files, and you should be > able to get a few examples from the > cloud-tutorials/swift-cloud-tutorial folders. > If you have the swift.conf file in your current directory you need not > specify the conf file on the swift commandline, say "swift p5.swift" > while in the > cloud-tutorials/swift-cloud-tutorials/part05 directory. > > Here's a sample config with the walltime for ALL apps set to 15 minutes : > > sites: local > > site.local { > filesystem { > type: "local" > URL: "localhost" > } > execution { > type: "local" > URL: "localhost" > } > workDirectory: /tmp/${env.USER}/swiftwork > maxParallelTasks: 32 > initialParallelTasks: 31 > app.ALL { > executable: "*" > maxWallTime: "00:15:00" > } > } > > The swift-trunk userguide has more detailed documentation on the > various configuration options that the swift.conf file takes: > http://swift-lang.org/guides/trunk/userguide/userguide.html > > Thanks, > Yadu > > > > On 10/25/2014 06:53 AM, Gagan Munisiddha Gowda wrote: >> Hi, >> >> I am running a swift program on 10gb data (split into 100mb each). >> >> I am using a Node.js script for processing. >> >> I did increase the Walltime by adding below to persistent-coaster and >> local-coaster files in /usr/local/bin/swift-trunk/etc/sites. >> >> 00:05:00 >> >> But, of no use. The entire program runs for around 14 minutes when it >> throws this exception. >> >> >> Here's my error : >> >> 2014-10-25 10:29:04,113+0000 INFO Execute TASK_STATUS_CHANGE >> taskid=urn:R-3-1414232059111 status=5 Walltime exceeded >> 2014-10-25 10:29:04,121+0000 INFO LateBindingScheduler jobs queued: 0 >> 2014-10-25 10:29:04,122+0000 DEBUG swift APPLICATION_EXCEPTION >> jobid=node-vvboibzl - Application exception: Walltime exceeded >> >> 2014-10-25 10:29:04,125+0000 INFO swift END_FAILURE thread=R-3 tr=node >> 2014-10-25 10:29:04,127+0000 INFO Loader Swift finished with errors >> >> Please find complete log file attached. >> >> -- >> Regards, >> Gagan >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From aanthon2 at hawk.iit.edu Sat Oct 25 20:35:03 2014 From: aanthon2 at hawk.iit.edu (Ajay Anthony) Date: Sat, 25 Oct 2014 21:35:03 -0400 Subject: [Swift-user] Swift: ConnectException: Connection refused Issue Message-ID: Hi Swift users, Kindly help me in below issue. Stuck on this issue almost whole day. I am trying to run swift word.swift on headnode in aws and have 1 worker running. 2014-10-26 01:17:52,685+0000 INFO AbstractCoasterChannel TCPChannel[client, http://127.0.0.1:50010] setting name to http://127.0.0.1:50010 2014-10-26 01:17:52,709+0000 INFO Execute TASK_STATUS_CHANGE taskid=urn:R-0-1414286270539 status=1 2014-10-26 01:17:52,712+0000 INFO LateBindingScheduler jobs queued: 0 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION jobid=bash-lupsjczl - Application exception: null *Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit jobCaused by: org.globus.cog.coaster.channels.ChannelException: Failed to create socketCaused by: java.net.ConnectException: Connection refused* Below are my configuratioins: swift.conf: sites: cloud-static site.cloud-static { execution { type:"coaster-persistent" URL: "http://127.0.0.1:50010" jobManager: "local:local" options { maxJobs: 10 tasksPerNode: 2 } } initialParallelTasks: 20 maxParallelTasks: 20 filesystem.type: local workDirectory: /tmp/swiftwork staging: local app.ALL {executable: "*"} } Below is the Swift version I am using: ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$ swift -version Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master 6130 (modified locally) I am able to run with "sites: local" successfully , but failing with "sites: cloud-static". Please suggest on what can be the issue. Thanks and Regards, AJay ANthony. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ggowda at hawk.iit.edu Sat Oct 25 20:42:50 2014 From: ggowda at hawk.iit.edu (Gagan Munisiddha Gowda) Date: Sun, 26 Oct 2014 07:12:50 +0530 Subject: [Swift-user] Swift: ConnectException: Connection refused Issue In-Reply-To: References: Message-ID: Hi Ajay, Could you try restarting your entire cluster. (terminate and start ) This happens occasionally for some unknown reason. Regards, Gagan On 26/10/2014 7:05 am, "Ajay Anthony" wrote: > Hi Swift users, > > Kindly help me in below issue. Stuck on this issue almost whole day. > > I am trying to run swift word.swift on headnode in aws and have 1 > worker running. > > 2014-10-26 01:17:52,685+0000 INFO AbstractCoasterChannel > TCPChannel[client, http://127.0.0.1:50010] setting name to > http://127.0.0.1:50010 > 2014-10-26 01:17:52,709+0000 INFO Execute TASK_STATUS_CHANGE > taskid=urn:R-0-1414286270539 status=1 > 2014-10-26 01:17:52,712+0000 INFO LateBindingScheduler jobs queued: 0 > 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION > jobid=bash-lupsjczl - Application exception: null > > > *Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit jobCaused by: org.globus.cog.coaster.channels.ChannelException: > Failed to create socketCaused by: java.net.ConnectException: Connection > refused* > > > Below are my configuratioins: > swift.conf: > > sites: cloud-static > > site.cloud-static { > execution { > type:"coaster-persistent" > URL: "http://127.0.0.1:50010" > jobManager: "local:local" > options { > maxJobs: 10 > tasksPerNode: 2 > } > } > > initialParallelTasks: 20 > maxParallelTasks: 20 > filesystem.type: local > workDirectory: /tmp/swiftwork > staging: local > app.ALL {executable: "*"} > } > > Below is the Swift version I am using: > ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$ > swift -version > Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac heads/master > 6130 (modified locally) > > I am able to run with "sites: local" successfully , but failing with > "sites: cloud-static". > > Please suggest on what can be the issue. > > Thanks and Regards, > AJay ANthony. > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aanthon2 at hawk.iit.edu Sat Oct 25 20:47:52 2014 From: aanthon2 at hawk.iit.edu (Ajay Anthony) Date: Sat, 25 Oct 2014 21:47:52 -0400 Subject: [Swift-user] Swift: ConnectException: Connection refused Issue In-Reply-To: References: Message-ID: Hi Gagan, Restarted for the 8th time. Still getting the same issue. Do we need to make any more other configuration changes besides ones mentioned above, Thanks and regards, Ajay Anthony. On Sat, Oct 25, 2014 at 9:42 PM, Gagan Munisiddha Gowda wrote: > Hi Ajay, > > Could you try restarting your entire cluster. (terminate and start ) > > This happens occasionally for some unknown reason. > > Regards, > Gagan > On 26/10/2014 7:05 am, "Ajay Anthony" wrote: > >> Hi Swift users, >> >> Kindly help me in below issue. Stuck on this issue almost whole day. >> >> I am trying to run swift word.swift on headnode in aws and have 1 >> worker running. >> >> 2014-10-26 01:17:52,685+0000 INFO AbstractCoasterChannel >> TCPChannel[client, http://127.0.0.1:50010] setting name to >> http://127.0.0.1:50010 >> 2014-10-26 01:17:52,709+0000 INFO Execute TASK_STATUS_CHANGE >> taskid=urn:R-0-1414286270539 status=1 >> 2014-10-26 01:17:52,712+0000 INFO LateBindingScheduler jobs queued: 0 >> 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION >> jobid=bash-lupsjczl - Application exception: null >> >> >> *Caused by: >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could >> not submit jobCaused by: org.globus.cog.coaster.channels.ChannelException: >> Failed to create socketCaused by: java.net.ConnectException: Connection >> refused* >> >> >> Below are my configuratioins: >> swift.conf: >> >> sites: cloud-static >> >> site.cloud-static { >> execution { >> type:"coaster-persistent" >> URL: "http://127.0.0.1:50010" >> jobManager: "local:local" >> options { >> maxJobs: 10 >> tasksPerNode: 2 >> } >> } >> >> initialParallelTasks: 20 >> maxParallelTasks: 20 >> filesystem.type: local >> workDirectory: /tmp/swiftwork >> staging: local >> app.ALL {executable: "*"} >> } >> >> Below is the Swift version I am using: >> ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$ >> swift -version >> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac >> heads/master 6130 (modified locally) >> >> I am able to run with "sites: local" successfully , but failing with >> "sites: cloud-static". >> >> Please suggest on what can be the issue. >> >> Thanks and Regards, >> AJay ANthony. >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadunand at uchicago.edu Sat Oct 25 20:59:23 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Sat, 25 Oct 2014 20:59:23 -0500 Subject: [Swift-user] Swift: ConnectException: Connection refused Issue In-Reply-To: References: Message-ID: <544C557B.6030402@uchicago.edu> Hi Ajay, Could you send me the following: 1. The runNNN directory for a failing run. 2. The ~/s3fs-fuse/cps*log file If you could add my public key to the ~/.ssh/authorized_keys file, I can take a look as well. You can get my public key from here : http://users.rcc.uchicago.edu/~yadunand/yadunand_id_rsa.pub Thanks, Yadu On 10/25/2014 08:47 PM, Ajay Anthony wrote: > Hi Gagan, > > Restarted for the 8th time. Still getting the same issue. Do we need > to make any more other configuration changes besides ones mentioned above, > > Thanks and regards, > Ajay Anthony. > > On Sat, Oct 25, 2014 at 9:42 PM, Gagan Munisiddha Gowda > > wrote: > > Hi Ajay, > > Could you try restarting your entire cluster. (terminate and start ) > > This happens occasionally for some unknown reason. > > Regards, > Gagan > > On 26/10/2014 7:05 am, "Ajay Anthony" > wrote: > > Hi Swift users, > > Kindly help me in below issue. Stuck on this issue almost > whole day. > > I am trying to run swift word.swift on headnode in aws and > have 1 worker running. > > 2014-10-26 01:17:52,685+0000 INFO AbstractCoasterChannel > TCPChannel[client, http://127.0.0.1:50010] setting name to > http://127.0.0.1:50010 > 2014-10-26 01:17:52,709+0000 INFO Execute TASK_STATUS_CHANGE > taskid=urn:R-0-1414286270539 status=1 > 2014-10-26 01:17:52,712+0000 INFO LateBindingScheduler jobs > queued: 0 > 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION > jobid=bash-lupsjczl - Application exception: null > *Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could not submit job > Caused by: org.globus.cog.coaster.channels.ChannelException: > Failed to create socket > Caused by: java.net.ConnectException: Connection refused* > > > Below are my configuratioins: > swift.conf: > > sites: cloud-static > > site.cloud-static { > execution { > type:"coaster-persistent" > URL: "http://127.0.0.1:50010" > jobManager: "local:local" > options { > maxJobs: 10 > tasksPerNode: 2 > } > } > > initialParallelTasks: 20 > maxParallelTasks: 20 > filesystem.type: local > workDirectory: /tmp/swiftwork > staging: local > app.ALL {executable: "*"} > } > > Below is the Swift version I am using: > ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$ > swift -version > Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac > heads/master 6130 (modified locally) > > I am able to run with "sites: local" successfully , but > failing with "sites: cloud-static". > > Please suggest on what can be the issue. > > Thanks and Regards, > AJay ANthony. > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From aanthon2 at hawk.iit.edu Sat Oct 25 23:04:12 2014 From: aanthon2 at hawk.iit.edu (Ajay Anthony) Date: Sun, 26 Oct 2014 00:04:12 -0400 Subject: [Swift-user] Swift: ConnectException: Connection refused Issue In-Reply-To: <544C557B.6030402@uchicago.edu> References: <544C557B.6030402@uchicago.edu> Message-ID: Hi Yadu/Gagan, Thanks for fast response. I was able to resolve the problem... Issue was with the way the restart was done. 3 ways to restart I am using: 1. Console 2. ec2-run-instances 3. launchPad Some restarts were done with mixture of above 3. I performed a pure LaunchPad restart , and it passed successfully. Thanks once again. Regards, Ajay Anthony. On Sat, Oct 25, 2014 at 9:59 PM, Yadu Nand Babuji wrote: > Hi Ajay, > > Could you send me the following: > > 1. The runNNN directory for a failing run. > 2. The ~/s3fs-fuse/cps*log file > > If you could add my public key to the ~/.ssh/authorized_keys file, I can > take a look as well. > You can get my public key from here : > http://users.rcc.uchicago.edu/~yadunand/yadunand_id_rsa.pub > > Thanks, > Yadu > > > On 10/25/2014 08:47 PM, Ajay Anthony wrote: > > Hi Gagan, > > Restarted for the 8th time. Still getting the same issue. Do we need to > make any more other configuration changes besides ones mentioned above, > > Thanks and regards, > Ajay Anthony. > > On Sat, Oct 25, 2014 at 9:42 PM, Gagan Munisiddha Gowda < > ggowda at hawk.iit.edu> wrote: > >> Hi Ajay, >> >> Could you try restarting your entire cluster. (terminate and start ) >> >> This happens occasionally for some unknown reason. >> >> Regards, >> Gagan >> On 26/10/2014 7:05 am, "Ajay Anthony" wrote: >> >>> Hi Swift users, >>> >>> Kindly help me in below issue. Stuck on this issue almost whole day. >>> >>> I am trying to run swift word.swift on headnode in aws and have 1 >>> worker running. >>> >>> 2014-10-26 01:17:52,685+0000 INFO AbstractCoasterChannel >>> TCPChannel[client, http://127.0.0.1:50010] setting name to >>> http://127.0.0.1:50010 >>> 2014-10-26 01:17:52,709+0000 INFO Execute TASK_STATUS_CHANGE >>> taskid=urn:R-0-1414286270539 status=1 >>> 2014-10-26 01:17:52,712+0000 INFO LateBindingScheduler jobs queued: 0 >>> 2014-10-26 01:17:52,712+0000 DEBUG swift APPLICATION_EXCEPTION >>> jobid=bash-lupsjczl - Application exception: null >>> >>> >>> *Caused by: >>> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could >>> not submit job Caused by: org.globus.cog.coaster.channels.ChannelException: >>> Failed to create socket Caused by: java.net.ConnectException: Connection >>> refused* >>> >>> >>> Below are my configuratioins: >>> swift.conf: >>> >>> sites: cloud-static >>> >>> site.cloud-static { >>> execution { >>> type:"coaster-persistent" >>> URL: "http://127.0.0.1:50010" >>> jobManager: "local:local" >>> options { >>> maxJobs: 10 >>> tasksPerNode: 2 >>> } >>> } >>> >>> initialParallelTasks: 20 >>> maxParallelTasks: 20 >>> filesystem.type: local >>> workDirectory: /tmp/swiftwork >>> staging: local >>> app.ALL {executable: "*"} >>> } >>> >>> Below is the Swift version I am using: >>> ubuntu at ip-172-31-17-186:~/cloud-tutorials/swift-cloud-tutorial/word_count$ >>> swift -version >>> Swift trunk git-rev: 2d334140f2c288e5aeb3d354de0ecda35b4b3aac >>> heads/master 6130 (modified locally) >>> >>> I am able to run with "sites: local" successfully , but failing with >>> "sites: cloud-static". >>> >>> Please suggest on what can be the issue. >>> >>> Thanks and Regards, >>> AJay ANthony. >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> > > > _______________________________________________ > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Wed Oct 29 15:41:43 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Wed, 29 Oct 2014 15:41:43 -0500 Subject: [Swift-user] bring all files produced by app to output dir Message-ID: Hi, Without using a wrapper, is there a way to tell Swift to bring *all* files produced by an app to the application output directory? This is required for a materials app which produces about 600 output files. About 520 of them follows some pattern but rest 80 are just odd files that need to be brought into the output dir. A shell wrapper will not work as this is on non-forking BG machines. Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Wed Oct 29 17:32:45 2014 From: wilde at anl.gov (Michael Wilde) Date: Wed, 29 Oct 2014 17:32:45 -0500 Subject: [Swift-user] bring all files produced by app to output dir In-Reply-To: References: Message-ID: <54516B0D.1040004@anl.gov> Ketan, the fix to enhance request 1225 enabled an app to return multiple arrays of files, each of which match some specific pattern: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1225 Im not sure if this made it to the User Guide yet. Does that address your need? - Mike On 10/29/14 3:41 PM, Ketan Maheshwari wrote: > Hi, > > Without using a wrapper, is there a way to tell Swift to bring *all* > files produced by an app to the application output directory? > > This is required for a materials app which produces about 600 output > files. About 520 of them follows some pattern but rest 80 are just odd > files that need to be brought into the output dir. > > A shell wrapper will not work as this is on non-forking BG machines. > > Thanks, > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: