From skenny at uchicago.edu Thu Oct 6 17:07:53 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 6 Oct 2011 15:07:53 -0700 Subject: [Swift-user] gram on ranger Message-ID: hey all, i'm trying to submit to gram on ranger using the latest swift (built from trunk). it failes like so: Cannot submit job Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot submit job Caused by: org.globus.gram.GramException: Parameter not supported Cannot submit job the gram log was saying first that 'jobsPerNode' is not supported so i changed it to workersPerNode and then it was saying 'maxnodes' is not supported. here's my sites file: 10000 1 00:15:00 86400 1 256 16way 1 64 normal TG-DBS080004N /work/00043/tg457040 thoughts? ideas? -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Fri Oct 7 10:16:10 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Fri, 7 Oct 2011 10:16:10 -0500 (Central Daylight Time) Subject: [Swift-user] gram on ranger In-Reply-To: References: Message-ID: Can I take a look at the log? On Thu, 6 Oct 2011, Sarah Kenny wrote: > hey all, i'm trying to submit to gram on ranger using the latest swift > (built from trunk). it failes like so: > > Cannot submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot > submit job > Caused by: org.globus.gram.GramException: Parameter not supported > Cannot submit job > > the gram log was saying first that 'jobsPerNode' is not supported so i > changed it to workersPerNode and then it was saying 'maxnodes' is not > supported. here's my sites file: > > > > 10000 > 1 > 00:15:00 > 86400 > 1 > 256 > 16way > 1 > 64 > normal > TG-DBS080004N > > > > /work/00043/tg457040 > > > > thoughts? ideas? -- Justin M Wozniak From ketancmaheshwari at gmail.com Fri Oct 7 10:19:00 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 7 Oct 2011 10:19:00 -0500 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: References: Message-ID: Also, could you post the generated submit script. I tested this and seems the following line is not honored: 16way My script is showing "1way" irrespective of what pe I put. Regards, Ketan On Thu, Oct 6, 2011 at 5:07 PM, Sarah Kenny wrote: > hey all, i'm trying to submit to gram on ranger using the latest swift > (built from trunk). it failes like so: > > Cannot submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Cannot > submit job > Caused by: org.globus.gram.GramException: Parameter not supported > Cannot submit job > > the gram log was saying first that 'jobsPerNode' is not supported so i > changed it to workersPerNode and then it was saying 'maxnodes' is not > supported. here's my sites file: > > > > 10000 > 1 > 00:15:00 > 86400 > 1 > 256 > 16way > 1 > 64 > normal > TG-DBS080004N > > > > /work/00043/tg457040 > > > > thoughts? ideas? > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Oct 7 10:25:37 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 7 Oct 2011 10:25:37 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: Message-ID: <446676502.136713.1318001137284.JavaMail.root@zimbra-mb2.anl.gov> I ran into the same issue with the 'pe' value not being passed correctly when I was doing the provider testing. I created this bug for it: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=549. I started looking at the code trying to understand why this happens.. I'll try to write a fix for this soon. David ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Sarah Kenny" > Cc: "Swift Devel" , "Swift User" > Sent: Friday, October 7, 2011 10:19:00 AM > Subject: Re: [Swift-devel] gram on ranger > Also, could you post the generated submit script. I tested this and > seems the following line is not honored: > > > 16way > > > My script is showing "1way" irrespective of what pe I put. > > Regards, > Ketan > > > On Thu, Oct 6, 2011 at 5:07 PM, Sarah Kenny < skenny at uchicago.edu > > wrote: > > > hey all, i'm trying to submit to gram on ranger using the latest swift > (built from trunk). it failes like so: > > Cannot submit job > Caused by: > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Cannot submit job > Caused by: org.globus.gram.GramException: Parameter not supported > Cannot submit job > > the gram log was saying first that 'jobsPerNode' is not supported so i > changed it to workersPerNode and then it was saying 'maxnodes' is not > supported. here's my sites file: > > > > 10000 > 1 > 00:15:00 > 86400 > 1 > 256 > 16way > 1 > 64 > normal > TG-DBS080004N > > > > /work/00043/tg457040 > > > > thoughts? ideas? > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Ketan > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From skenny at uchicago.edu Fri Oct 7 15:13:57 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Fri, 7 Oct 2011 13:13:57 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: References: Message-ID: /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log on ci On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak wrote: > > Can I take a look at the log? > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > hey all, i'm trying to submit to gram on ranger using the latest swift >> (built from trunk). it failes like so: >> >> Cannot submit job >> Caused by: >> org.globus.cog.abstraction.**impl.common.task.**TaskSubmissionException: >> Cannot >> submit job >> Caused by: org.globus.gram.GramException: Parameter not supported >> Cannot submit job >> >> the gram log was saying first that 'jobsPerNode' is not supported so i >> changed it to workersPerNode and then it was saying 'maxnodes' is not >> supported. here's my sites file: >> >> >> >> 10000 >> 1 >> 00:15:00 >> 86400 >> 1 >> 256 >> 16way >> 1 >> 64 >> normal >> TG-DBS080004N >> >> >> >> /work/00043/**tg457040 >> >> >> >> thoughts? ideas? >> > > -- > Justin M Wozniak > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tianyu491433909 at 163.com Mon Oct 10 08:07:06 2011 From: tianyu491433909 at 163.com (tianyu491433909) Date: Mon, 10 Oct 2011 21:07:06 +0800 (CST) Subject: [Swift-user] can't run SwiftMontage's m101 example Message-ID: <71802692.1bca5.132edf19b4a.Coremail.tianyu491433909@163.com> the SwiftMontage example can't run the the m101 Montage can't run in the SwiftMontage example?the Swift_Montage_Types.swift?Swift_Montage_Apps.swift and Swift_Montage_Batch.swift does not exist?and SwiftMontage_Types.swift?SwiftMontage_Apps.swift and SwiftMontage_Batch.swift Missing some function but i want run the demo,can you send me this files? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tianyu491433909 at 163.com Mon Oct 10 08:09:49 2011 From: tianyu491433909 at 163.com (tianyu491433909) Date: Mon, 10 Oct 2011 21:09:49 +0800 (CST) Subject: [Swift-user] can't run SwiftMontage's m101 example Message-ID: <27a70f06.1bd17.132edf4182b.Coremail.tianyu491433909@163.com> the m101 Montage demo can't run in the SwiftMontage example?the Swift_Montage_Types.swift?Swift_Montage_Apps.swift and Swift_Montage_Batch.swift does not exist?and SwiftMontage_Types.swift?SwiftMontage_Apps.swift and SwiftMontage_Batch.swift Missing some function but i want run the demo,can you send me this files? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Mon Oct 10 08:37:12 2011 From: jonmon at mcs.anl.gov (=?utf-8?B?Sm9uYXRoYW4gTW9uZXR0ZQ==?=) Date: Mon, 10 Oct 2011 08:37:12 -0500 Subject: [Swift-user] =?utf-8?q?can=27t_run_SwiftMontage=27s_m101_example?= Message-ID: <20111010133658.4A1C2121FD@zimbra.anl.gov> I'll take a look at the example code. I have done updates and haven't checked the example in awhile. Let me see what needs to be fixed. ----- Reply message ----- From: "tianyu491433909" Date: Mon, Oct 10, 2011 8:09 am Subject: [Swift-user] can't run SwiftMontage's m101 example To: -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Mon Oct 10 14:28:55 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Mon, 10 Oct 2011 14:28:55 -0500 (CDT) Subject: [Swift-user] gram on ranger In-Reply-To: Message-ID: <1762711564.139799.1318274935596.JavaMail.root@zimbra-mb2.anl.gov> Sarah, Can you give this another try with the latest 0.93? I made some changes to the coaster and sge providers and was able to get it working with a simple catns script. Here is the configuration file I was using: 3600 00:00:03 1 16 16 development 0.9 TG-DBS080004N 16way /share/home/01503/davidkel/swiftwork Thanks, David ----- Original Message ----- > From: "Sarah Kenny" > To: "Justin M Wozniak" > Cc: "Swift Devel" , "Swift User" > Sent: Friday, October 7, 2011 3:13:57 PM > Subject: Re: [Swift-user] gram on ranger > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > on ci > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > wrote: > > > > Can I take a look at the log? > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > hey all, i'm trying to submit to gram on ranger using the latest swift > (built from trunk). it failes like so: > > Cannot submit job > Caused by: > org.globus.cog.abstraction. impl.common.task. TaskSubmissionException: > Cannot > submit job > Caused by: org.globus.gram.GramException: Parameter not supported > Cannot submit job > > the gram log was saying first that 'jobsPerNode' is not supported so i > changed it to workersPerNode and then it was saying 'maxnodes' is not > supported. here's my sites file: > > > > 10000 > 1 > 00:15:00 > 86400 > 1 > 256 > 16way > 1 > 64 > normal > TG-DBS080004N > > > > /work/00043/ tg457040 > > > > thoughts? ideas? > > -- > Justin M Wozniak > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From skenny at uchicago.edu Mon Oct 10 15:59:34 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 10 Oct 2011 13:59:34 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: <1762711564.139799.1318274935596.JavaMail.root@zimbra-mb2.anl.gov> References: <1762711564.139799.1318274935596.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: where's the latest .93, can you send me the checkout link so i can be sure we're on the same page? On Mon, Oct 10, 2011 at 12:28 PM, David Kelly wrote: > Sarah, > > Can you give this another try with the latest 0.93? I made some changes to > the coaster and sge providers and was able to get it working with a simple > catns script. Here is the configuration file I was using: > > > > > > 3600 > 00:00:03 > 1 > 16 > 16 > development > 0.9 > TG-DBS080004N > 16way > /share/home/01503/davidkel/swiftwork > > > > Thanks, > David > > ----- Original Message ----- > > From: "Sarah Kenny" > > To: "Justin M Wozniak" > > Cc: "Swift Devel" , "Swift User" < > swift-user at ci.uchicago.edu> > > Sent: Friday, October 7, 2011 3:13:57 PM > > Subject: Re: [Swift-user] gram on ranger > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > on ci > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > > wrote: > > > > > > > > Can I take a look at the log? > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > hey all, i'm trying to submit to gram on ranger using the latest swift > > (built from trunk). it failes like so: > > > > Cannot submit job > > Caused by: > > org.globus.cog.abstraction. impl.common.task. TaskSubmissionException: > > Cannot > > submit job > > Caused by: org.globus.gram.GramException: Parameter not supported > > Cannot submit job > > > > the gram log was saying first that 'jobsPerNode' is not supported so i > > changed it to workersPerNode and then it was saying 'maxnodes' is not > > supported. here's my sites file: > > > > > > > > 10000 > > 1 > > 00:15:00 > > 86400 > > 1 > > 256 > > 16way > > 1 > > 64 > > normal > > TG-DBS080004N > > > > > > > > /work/00043/ tg457040 > > > > > > > > thoughts? ideas? > > > > -- > > Justin M Wozniak > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Mon Oct 10 16:05:31 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Mon, 10 Oct 2011 16:05:31 -0500 (CDT) Subject: [Swift-user] gram on ranger In-Reply-To: Message-ID: <101294858.140119.1318280731646.JavaMail.root@zimbra-mb2.anl.gov> Sure, it's at cog: https://cogkit.svn.sourceforge.net/svnroot/cogkit/branches/4.1.9/src/cog swift: https://svn.ci.uchicago.edu/svn/vdl2/branches/release-0.93 David ----- Original Message ----- > From: "Sarah Kenny" > To: "David Kelly" > Cc: "Swift Devel" , "Swift User" , "Justin M Wozniak" > > Sent: Monday, October 10, 2011 3:59:34 PM > Subject: Re: [Swift-user] gram on ranger > where's the latest .93, can you send me the checkout link so i can be > sure we're on the same page? > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > Sarah, > > Can you give this another try with the latest 0.93? I made some > changes to the coaster and sge providers and was able to get it > working with a simple catns script. Here is the configuration file I > was using: > > > > > > > 3600 > 00:00:03 > 1 > 16 > 16 > development > 0.9 > > TG-DBS080004N > > 16way > /share/home/01503/davidkel/swiftwork > > > > Thanks, > > David > > ----- Original Message ----- > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift User" < > > swift-user at ci.uchicago.edu > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > Subject: Re: [Swift-user] gram on ranger > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > on ci > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < > > wozniak at mcs.anl.gov > > > wrote: > > > > > > > > Can I take a look at the log? > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > hey all, i'm trying to submit to gram on ranger using the latest > > swift > > (built from trunk). it failes like so: > > > > Cannot submit job > > Caused by: > > org.globus.cog.abstraction. impl.common.task. > > TaskSubmissionException: > > Cannot > > submit job > > Caused by: org.globus.gram.GramException: Parameter not supported > > Cannot submit job > > > > the gram log was saying first that 'jobsPerNode' is not supported so > > i > > changed it to workersPerNode and then it was saying 'maxnodes' is > > not > > supported. here's my sites file: > > > > > > > > 10000 > > 1 > > 00:15:00 > > 86400 > > 1 > > 256 > > 16way > > 1 > > 64 > > normal > > TG-DBS080004N > > > > > > > > > > /work/00043/ tg457040 > > > > > > > > > thoughts? ideas? > > > > -- > > Justin M Wozniak > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 From skenny at uchicago.edu Mon Oct 10 17:43:04 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 10 Oct 2011 15:43:04 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: <1762711564.139799.1318274935596.JavaMail.root@zimbra-mb2.anl.gov> References: <1762711564.139799.1318274935596.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: ok, thanks, got in the queue now...also, realized my last run may have been using the old swift. apparently i had SWIFT_HOME set in my env and that overrides the newer swift i had set in my PATH. ~sk On Mon, Oct 10, 2011 at 12:28 PM, David Kelly wrote: > Sarah, > > Can you give this another try with the latest 0.93? I made some changes to > the coaster and sge providers and was able to get it working with a simple > catns script. Here is the configuration file I was using: > > > > > > 3600 > 00:00:03 > 1 > 16 > 16 > development > 0.9 > TG-DBS080004N > 16way > /share/home/01503/davidkel/swiftwork > > > > Thanks, > David > > ----- Original Message ----- > > From: "Sarah Kenny" > > To: "Justin M Wozniak" > > Cc: "Swift Devel" , "Swift User" < > swift-user at ci.uchicago.edu> > > Sent: Friday, October 7, 2011 3:13:57 PM > > Subject: Re: [Swift-user] gram on ranger > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > on ci > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < wozniak at mcs.anl.gov > > > wrote: > > > > > > > > Can I take a look at the log? > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > hey all, i'm trying to submit to gram on ranger using the latest swift > > (built from trunk). it failes like so: > > > > Cannot submit job > > Caused by: > > org.globus.cog.abstraction. impl.common.task. TaskSubmissionException: > > Cannot > > submit job > > Caused by: org.globus.gram.GramException: Parameter not supported > > Cannot submit job > > > > the gram log was saying first that 'jobsPerNode' is not supported so i > > changed it to workersPerNode and then it was saying 'maxnodes' is not > > supported. here's my sites file: > > > > > > > > 10000 > > 1 > > 00:15:00 > > 86400 > > 1 > > 256 > > 16way > > 1 > > 64 > > normal > > TG-DBS080004N > > > > > > > > /work/00043/ tg457040 > > > > > > > > thoughts? ideas? > > > > -- > > Justin M Wozniak > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Mon Oct 10 20:59:22 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 10 Oct 2011 20:59:22 -0500 Subject: [Swift-user] CFP: First Int. Workshop on Workflow Models, Systems, Services and Applications in the Cloud (CloudFlow) 2012 Message-ID: <4E93A2FA.4040900@cs.iit.edu> First International Workshop on Workflow Models, Systems, Services and Applications in the Cloud (CloudFlow) 2012 To be held in conjunction with the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS) 2012 Shanghai, China May 21-25, 2012. http://www.cloud-uestc.cn/cloudflow/home.html Overview Cloud computing is gaining tremendous momentum in both academia and industry, more and more people are migrating their data and applications into the Cloud. We have observed wide adoption of the MapReduce computing model and the open source Hadoop system for large scale distributed data processing, and a variety of ad hoc mashup techniques that weave together Web applications. However, these are just first steps towards managing complex task and data dependencies in the Cloud, as there are more challenging issues such as large parameter space exploration, data partitioning and distribution, scheduling and optimization, smart reruns, and provenance tracking associated with workflow execution. Cloud needs structured and mature workflow technologies to handle such issues, and vice versa, as Cloud offers unprecedented scalability to workflow systems, and could potentially change the way we perceive and conduct research and experiments. The scale and complexity of the science and data analytics problems that can be handled can be greatly increased on the Cloud, and the on-demand nature of resource allocation on the Cloud will also help improve resource utilization and user experience. As Cloud computing provides a paradigm-shifting utility-oriented computing model in terms of the unprecedented size of datacenter-level resource pool and the on-demand resource provisioning mechanism, there are lots of challenges in bringing Cloud and workflows together. We need high level languages and computing models for large scale workflow specification; we need to adapt existing workflow architectures into the Cloud, and integrate workflow systems with Cloud infrastructure and resources; we also need to leverage Cloud data storage technologies to efficiently distribute data over a large number of nodes and explore data locality during computation etc. We organize the CloudFlow workshop as a venue for the workflow and Cloud communities to define models and paradigms, present their state-of-the-art work, share their thoughts and experiences, and explore new directions in realizing workflows in the Cloud. Topics: We welcome the submission of original work related to the topics listed below, which include (in the context of Cloud): - Models and Languages for Large Scale Workflow Specification - Workflow Architecture and Framework - Large Scale Workflow Systems - Service Workflow - Workflow Composition and Orchestration - Workflow Migration into the Cloud - Workflow Scheduling and Optimization - Cloud Middleware in Support of Workflow - Virtualized Environment - Workflow Applications and Case Studies - Performance and Scalability Analysis - Peta-Scale Data Processing - Event Processing and Messaging - Real-Time Analytics - Provenance Paper Submission Authors are invited to submit papers with unpublished, original work. The papers should not exceed 10 single-spaced double-column pages using 10-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. Paper submission should be done via the online CMT system, Microsoft??s Academic Conference Management Service (https://cmt.research.microsoft.com/CF2012) by midnight January 9th, 2012 Pacific Time. The final format should be in PDF. Proceedings of the workshop will be published by the IEEE Digital Library and distributed at the conference. Selected excellent work may be eligible for additional post-conference publication as journal articles or book chapters. Submission implies the willingness of at least one of the authors to register and present the paper. Important Dates Paper submission: January 9th, 2012 Acceptance notification: February 8th, 2012 Final paper due: Feb 21st, 2012 Organization Workshop Chairs: Dr. Yong Zhao University of Electronic Science and Technology of China, China yongzh04 at gmail.com Dr. Cui Lin California State University, Fresno, USA clin at csufresno.edu Dr. Shiyong Lu Wayne State University, USA shiyong at wayne.edu Program Chairs: Dr. Wenhong Tian University of Electronic Science and Technology of China, China Dr. Ruini Xue Tsinghua University, China Steering Committee - Dan Kartz, University of Chicago, U.S.A. - Mike Wilde, University of Chicago, U.S.A. - Ewa Deelman, University of South California, U.S.A. - Tevfik Kosar, University at Buffalo, U.S.A. - Ilkay Altintas, San Diego Supercomputer Center, U.S.A. - Ioan Raicu, Illinois Institute of Technology, U.S.A. - Yogesh Simmhan, University of Southern California, U.S.A. - Ian Taylor, Cardiff University, U.K. - Weimin Zheng, Tsinghua University, China - Hai Jin, Huazhong University of Science and Engineering, China - Wanchun Dou, Nanjing University, China Program Committee - Shawn Bowers, Gonzaga University, U.S.A. - Douglas Thain, University of Notre Dame, U.S.A. - Ian Gorton, Pacific Northwest National Laboratory, U.S.A. - Artem Chebotko, University of Texas at Pan American, U.S.A. - Paolo Missier, Newcastle University, U.K. - Paul Groth, University of Amsterdam, the Netherlands - Zhiming Zhao, University of Amsterdam, the Netherlands - Marta Mattoso, Federal University of Rio de Janeiro, Brazil - Wei Tan, IBM T. J. Watson Research Center, U.S.A. - Jianwu Wang, San Diego Super Computer Center, U.S.A. - Ping Yang, Binghamton University, U.S.A. - Jian Guo, Harvard University, U.S.A. - Liqiang Wang, University of Wyoming, U.S.A. - Wenhong Tian, University of Electronic Science and Technology of China, China - Ruini Xue, Tsinghua University, China - Jian Cao, Shanghai Jiaotong University, China - Weisong Shi, Tongji University, China - Jianxun Liu, Hunan University of Science and Technology, China - Song Zhang, Chinese Academy of Sciences, China -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= From iraicu at cs.iit.edu Mon Oct 10 21:11:06 2011 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Mon, 10 Oct 2011 21:11:06 -0500 Subject: [Swift-user] CFP: 12th IEEE/ACM International Symposium on Cluster, Grid and Cloud Computing (CCGrid 2012) Message-ID: <4E93A5BA.3060901@cs.iit.edu> 12th IEEE/ACM International Symposium on Cluster, Grid and Cloud Computing (CCGrid 2012) Ottawa, Canada May 13-16, 2012 http://www.cloudbus.org/ccgrid2012 CALL FOR PAPERS Rapid advances in processing, communication and systems/middleware technologies are leading to new paradigms and platforms for computing, ranging from computing Clusters to widely distributed Grid and emerging Clouds. CCGrid is a series of very successful conferences, sponsored by the IEEE Computer Society Technical Committee on Scalable Computing (TCSC) and ACM, with the overarching goal of bringing together international researchers, developers, and users and to provide an international forum to present leading research activities and results on a broad range of topics related to these platforms and paradigms and their applications. The conference features keynotes, technical presentations, posters and research demos, workshops, tutorials, as well as the SCALE challenges featuring live demonstrations. In 2012, CCGrid will come to Canada for the first time and will be held in Ottawa, the capital city. CCGrid 2012 will have a focus on important and immediate issues that are significantly influencing all aspects of cluster, cloud and grid computing. Topics of interest include, but are not limited to: * Applications and Experiences: Applications to real and complex problems in science, engineering, business and society; User studies; Experiences with large-scale deployments systems or applications. * Architecture: System architectures, Design and deployment. * Autonomic Computing and Cyberinfrastructure: Self managed behavior, models and technologies; Autonomic paradigms and approaches (control-based, bio-inspired, emergent, etc.); Bio-inspired approaches to management; SLA definition and enforcement. * Performance Modeling and Evaluation: Performance models; Monitoring and evaluation tools, Analysis of system/application performance; Benchmarks and testbeds. * Programming Models, Systems, and Fault-Tolerant Computing: Programming models for cluster, clouds and grid computing; fault tolerant infrastructure and algorithms; systems software to enable efficient computing. * Multicore and Accelerator-based Computing: Software and application techniques to utilize multicore architectures and accelerators/heterogeneous computing systems. * Scheduling and Resource Management: Techniques to schedule jobs and resources on clusters, clouds and grid computing platforms. * Cloud Computing: Cloud architectures; Software tools and techniques for clouds. PAPER SUBMISSION Authors are invited to submit papers electronically. Submitted manuscripts should be structured as technical papers and may not exceed 8 letter size (8.5 x 11) pages including figures, tables and references using the IEEE format for conference proceedings (print area of 6-1/2 inches (16.51 cm) wide by 8-7/8 inches (22.51 cm) high, two-column format with columns 3-1/16 inches (7.85 cm) wide with a 3/8 inch (0.81 cm) space between them, single-spaced 10-point Times fully justified text). Submissions not conforming to these guidelines may be returned without review. Authors should submit the manuscript in PDF format and make sure that the file will print on a printer that uses letter size (8.5 x 11) paper. The official language of the meeting is English. All manuscripts will be reviewed and will be judged on correctness, originality, technical strength, significance, quality of presentation, and interest and relevance to the conference attendees. Submitted papers must represent original unpublished research that is not currently under review for any other conference or journal. Papers not following these guidelines will be rejected without review and further action may be taken, including (but not limited to) notifications sent to the heads of the institutions of the authors and sponsors of the conference. Submissions received after the due date, exceeding the page limit, or not appropriately structured may not be considered. Authors may contact the conference chairs for more information. The proceedings will be published through the IEEE Computer Society Press, USA and will be made available online through the IEEE Digital Library. Submission Link: https://www.easychair.org/account/signin.cgi?conf=ccgrid2012 JOURNAL SPECIAL ISSUE Highly rated Top 6 papers from the CCGrid 2012 conference will be invited to extend for publication in a special issue of the "Future Generation Computer Systems (FGCS)" Journal published by Elsevier Press. CHAIRS General Chair * Shikharesh Majumdar, Carleton University, Canada Honorary Chair * Geoffrey Fox, Indiana University, USA Program Committee Co-Chairs * Rajkumar Buyya, University of Melbourne, Australia * Pavan Balaji, Argonne National Laboratory, USA Program Committee Vice-chairs * Daniel S. Katz (Applications and Experiences) * Dhabaleswar K. Panda (Architecture) * Manish Parashar (Middleware, Autonomic Computing, and Cyberinfrastructure) * Ahmad Afsahi (Performance Modeling and Analysis) * Xian-He Sun (Performance Measurement and Evaluation) * William Gropp (Programming Models, Systems, and Fault-Tolerant computing) * David Bader (Multicore and Accelerator-based Computing) * Thomas Fahringer (Scheduling and Resource Management) * Ignacio Martin Llorente and Madhusudhan Govindaraju (Cloud Computing) Cyber Co-Chairs * Anton Beloglazov, The University of Melbourne, Australia * Suraj Pandey, CSIRO, Australia * Trevor Gelowsky, Carleton University, Canada Workshops Co-Chairs * Marin Litiou, York University, Canada * Mukaddim Pathan, Telstra Corporation Limited, Australia Publicity Chairs * Helen Karatza, Aristotle University of Thessaloniki, Greece * Ioan Raicu, Illinois Institute of Technology& Argonne National Labs, USA * Bruno Schulze, National Laboratory for Scientific Computing, Brazil * G Subrahmanya VRK Rao: Cognizant technology Solutions, India Tutorials Co-Chairs * Sushil K. Prasad, Georgia State University, USA * Rob Simmonds, Westgrid, Canada Doctoral Symposium Co-Chairs * Carlos Varela, Rensselaer Polytechnic Institute, USA * Yogesh Simmhan, University of Southern California Poster and Research Demo Co-Chairs * Suraj Pandey, CSIRO, Australia SCALE Challenge Coordinator * Shantenu Jha, Rutgers and Loisiana State University Steering Committee * Henri Bal, Vrije University, The Netherlands * Pavan Balaji, Argonne National Laboratory, USA * Rajkumar Buyya, University of Melbourne, Australia (Chair) * Franck Capello, University of Paris-Sud, France * Jack Dongarra, University of Tennessee& ORNL, USA * Dick Epema, Technical University of Delft, The Netherlands * Thomas Fahringer, University of Innsbruck, Austria * Ian Foster, University of Chicago, USA * Wolfgang Gentzsch, DEISA, Germany * Hai Jin, Huazhong University of Science& Technology, China * Craig Lee, The Aerospace Corporation, USA (Co-Chair) * Laurent Lefevre, INRIA, France * Geng Lin, Dell Inc., USA * Manish Parashar, Rutgers: The State University of New Jersey, USA * Shikharesh Majumdar, Carleton University, Canada * Satoshi Matsuoaka, Tokyo Institute of Technology, Japan * Omer Rana, Cardiff University, UK * Paul Roe, Queensland University of Technology, Australia * Bruno Schulze, LNCC, Brazil * Nalini Venkatasubramanian, University of California, USA * Carlos Varela, Rensselaer Polytechnic Institute, USA IMPORTANT DATES Papers Due: 25 November 2011 Notification of Acceptance: 30 January 2012 Camera Ready Papers Due: 27 February 2012 Sponsors: IEEE Computer Society (TCSE)& ACM SIGARCH (approval pending) -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= From skenny at uchicago.edu Tue Oct 11 13:32:37 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 11 Oct 2011 11:32:37 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: References: <1762711564.139799.1318274935596.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: so, this workflow completes all the jobs but then just hangs indefinitely at the end...maybe a stray cleanup job? log is here: /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log just tweaked the sites file a bit from what david sent me: 28800 00:15:00 1 64 256 normal 1 TG-DBS080004N 16way 10000 /work/00043/tg457040/sidgrid_out/skenny On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny wrote: > ok, thanks, got in the queue now...also, realized my last run may have been > using the old swift. apparently i had SWIFT_HOME set in my env and that > overrides the newer swift i had set in my PATH. > > ~sk > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly wrote: > >> Sarah, >> >> Can you give this another try with the latest 0.93? I made some changes to >> the coaster and sge providers and was able to get it working with a simple >> catns script. Here is the configuration file I was using: >> >> >> >> >> >> 3600 >> 00:00:03 >> 1 >> 16 >> 16 >> development >> 0.9 >> TG-DBS080004N >> 16way >> /share/home/01503/davidkel/swiftwork >> >> >> >> Thanks, >> David >> >> ----- Original Message ----- >> > From: "Sarah Kenny" >> > To: "Justin M Wozniak" >> > Cc: "Swift Devel" , "Swift User" < >> swift-user at ci.uchicago.edu> >> > Sent: Friday, October 7, 2011 3:13:57 PM >> > Subject: Re: [Swift-user] gram on ranger >> > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log >> > >> > on ci >> > >> > >> > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < wozniak at mcs.anl.gov >> > > wrote: >> > >> > >> > >> > Can I take a look at the log? >> > >> > >> > >> > >> > On Thu, 6 Oct 2011, Sarah Kenny wrote: >> > >> > >> > >> > hey all, i'm trying to submit to gram on ranger using the latest swift >> > (built from trunk). it failes like so: >> > >> > Cannot submit job >> > Caused by: >> > org.globus.cog.abstraction. impl.common.task. TaskSubmissionException: >> > Cannot >> > submit job >> > Caused by: org.globus.gram.GramException: Parameter not supported >> > Cannot submit job >> > >> > the gram log was saying first that 'jobsPerNode' is not supported so i >> > changed it to workersPerNode and then it was saying 'maxnodes' is not >> > supported. here's my sites file: >> > >> > >> > >> > 10000 >> > 1 >> > 00:15:00 >> > 86400 >> > 1 >> > 256 >> > 16way >> > 1 >> > 64 >> > normal >> > TG-DBS080004N >> > >> > >> > >> > /work/00043/ tg457040 >> > >> > >> > >> > thoughts? ideas? >> > >> > -- >> > Justin M Wozniak >> > >> > >> > >> > -- >> > Sarah Kenny >> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III >> > University of California Irvine, Dept. of Neurology ~ 773-818-8300 >> > >> > >> > _______________________________________________ >> > Swift-user mailing list >> > Swift-user at ci.uchicago.edu >> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Tue Oct 11 13:49:02 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Tue, 11 Oct 2011 13:49:02 -0500 (CDT) Subject: [Swift-user] gram on ranger In-Reply-To: Message-ID: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> That could be it.. maybe a cleanup script is not getting the right parameters and failing. Do you happen to have a copy of the coaster log? Maybe there will be some clues in there. ----- Original Message ----- > From: "Sarah Kenny" > To: "David Kelly" > Cc: "Swift Devel" , "Swift User" , "Justin M Wozniak" > > Sent: Tuesday, October 11, 2011 1:32:37 PM > Subject: Re: [Swift-user] gram on ranger > so, this workflow completes all the jobs but then just hangs > indefinitely at the end...maybe a stray cleanup job? > > log is here: > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > just tweaked the sites file a bit from what david sent me: > > > > > > 28800 > 00:15:00 > 1 > 64 > 256 > normal > 1 > TG-DBS080004N > 16way > 10000 > /work/00043/tg457040/sidgrid_out/skenny > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < skenny at uchicago.edu > > wrote: > > > ok, thanks, got in the queue now...also, realized my last run may have > been using the old swift. apparently i had SWIFT_HOME set in my env > and that overrides the newer swift i had set in my PATH. > > ~sk > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > > > > Sarah, > > Can you give this another try with the latest 0.93? I made some > changes to the coaster and sge providers and was able to get it > working with a simple catns script. Here is the configuration file I > was using: > > > > > > > 3600 > 00:00:03 > 1 > 16 > 16 > development > 0.9 > > TG-DBS080004N > > 16way > /share/home/01503/davidkel/swiftwork > > > > Thanks, > > David > > ----- Original Message ----- > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift User" < > > swift-user at ci.uchicago.edu > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > Subject: Re: [Swift-user] gram on ranger > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > on ci > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < > > wozniak at mcs.anl.gov > > > wrote: > > > > > > > > Can I take a look at the log? > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > hey all, i'm trying to submit to gram on ranger using the latest > > swift > > (built from trunk). it failes like so: > > > > Cannot submit job > > Caused by: > > org.globus.cog.abstraction. impl.common.task. > > TaskSubmissionException: > > Cannot > > submit job > > Caused by: org.globus.gram.GramException: Parameter not supported > > Cannot submit job > > > > the gram log was saying first that 'jobsPerNode' is not supported so > > i > > changed it to workersPerNode and then it was saying 'maxnodes' is > > not > > supported. here's my sites file: > > > > > > > > 10000 > > 1 > > 00:15:00 > > 86400 > > 1 > > 256 > > 16way > > 1 > > 64 > > normal > > TG-DBS080004N > > > > > > > > > > /work/00043/ tg457040 > > > > > > > > > thoughts? ideas? > > > > -- > > Justin M Wozniak > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 From skenny at uchicago.edu Tue Oct 11 14:05:34 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 11 Oct 2011 12:05:34 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> References: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: On Tue, Oct 11, 2011 at 11:49 AM, David Kelly wrote: > > That could be it.. maybe a cleanup script is not getting the right > parameters and failing. Do you happen to have a copy of the coaster log? just put it in /home/skenny/swift_logs > Maybe there will be some clues in there. > > ----- Original Message ----- > > From: "Sarah Kenny" > > To: "David Kelly" > > Cc: "Swift Devel" , "Swift User" < > swift-user at ci.uchicago.edu>, "Justin M Wozniak" > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > Subject: Re: [Swift-user] gram on ranger > > so, this workflow completes all the jobs but then just hangs > > indefinitely at the end...maybe a stray cleanup job? > > > > log is here: > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > just tweaked the sites file a bit from what david sent me: > > > > > > > > > > > > 28800 > > 00:15:00 > > 1 > > 64 > > 256 > > normal > > 1 > > TG-DBS080004N > > 16way > > 10000 > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < skenny at uchicago.edu > > > wrote: > > > > > > ok, thanks, got in the queue now...also, realized my last run may have > > been using the old swift. apparently i had SWIFT_HOME set in my env > > and that overrides the newer swift i had set in my PATH. > > > > ~sk > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < davidk at ci.uchicago.edu > > > wrote: > > > > > > > > > > > > Sarah, > > > > Can you give this another try with the latest 0.93? I made some > > changes to the coaster and sge providers and was able to get it > > working with a simple catns script. Here is the configuration file I > > was using: > > > > > > > > > > > > > > 3600 > > 00:00:03 > > 1 > > 16 > > 16 > > development > > 0.9 > > > > TG-DBS080004N > > > > 16way > > /share/home/01503/davidkel/swiftwork > > > > > > > > Thanks, > > > > David > > > > ----- Original Message ----- > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift User" < > > > swift-user at ci.uchicago.edu > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > Subject: Re: [Swift-user] gram on ranger > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > on ci > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < > > > wozniak at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger using the latest > > > swift > > > (built from trunk). it failes like so: > > > > > > Cannot submit job > > > Caused by: > > > org.globus.cog.abstraction. impl.common.task. > > > TaskSubmissionException: > > > Cannot > > > submit job > > > Caused by: org.globus.gram.GramException: Parameter not supported > > > Cannot submit job > > > > > > the gram log was saying first that 'jobsPerNode' is not supported so > > > i > > > changed it to workersPerNode and then it was saying 'maxnodes' is > > > not > > > supported. here's my sites file: > > > > > > > > > > > > 10000 > > > 1 > > > 00:15:00 > > > 86400 > > > 1 > > > 256 > > > 16way > > > 1 > > > 64 > > > normal > > > TG-DBS080004N > > > > > > > > > > > > > > > > /work/00043/ tg457040 > > > > > > > > > > > > > > thoughts? ideas? > > > > > > -- > > > Justin M Wozniak > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Oct 11 15:28:48 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 11 Oct 2011 15:28:48 -0500 Subject: [Swift-user] gram on ranger In-Reply-To: References: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: We have another example of swift hanging at the end of a ParVis script. I think I reported that on the list. Mihael needs a jstack dump of this along with the swift log. On 10/11/11, Sarah Kenny wrote: > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly wrote: > >> >> That could be it.. maybe a cleanup script is not getting the right >> parameters and failing. Do you happen to have a copy of the coaster log? > > > just put it in /home/skenny/swift_logs > > > >> Maybe there will be some clues in there. >> >> ----- Original Message ----- >> > From: "Sarah Kenny" >> > To: "David Kelly" >> > Cc: "Swift Devel" , "Swift User" < >> swift-user at ci.uchicago.edu>, "Justin M Wozniak" >> > >> > Sent: Tuesday, October 11, 2011 1:32:37 PM >> > Subject: Re: [Swift-user] gram on ranger >> > so, this workflow completes all the jobs but then just hangs >> > indefinitely at the end...maybe a stray cleanup job? >> > >> > log is here: >> > >> > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log >> > >> > just tweaked the sites file a bit from what david sent me: >> > >> > >> > >> > >> > >> > 28800 >> > 00:15:00 >> > 1 >> > 64 >> > 256 >> > normal >> > 1 >> > TG-DBS080004N >> > 16way >> > 10000 >> > /work/00043/tg457040/sidgrid_out/skenny >> > >> > >> > >> > >> > >> > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < skenny at uchicago.edu > >> > wrote: >> > >> > >> > ok, thanks, got in the queue now...also, realized my last run may have >> > been using the old swift. apparently i had SWIFT_HOME set in my env >> > and that overrides the newer swift i had set in my PATH. >> > >> > ~sk >> > >> > >> > >> > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < davidk at ci.uchicago.edu >> > > wrote: >> > >> > >> > >> > >> > >> > Sarah, >> > >> > Can you give this another try with the latest 0.93? I made some >> > changes to the coaster and sge providers and was able to get it >> > working with a simple catns script. Here is the configuration file I >> > was using: >> > >> > >> > >> > >> > >> > >> > 3600 >> > 00:00:03 >> > 1 >> > 16 >> > 16 >> > development >> > 0.9 >> > >> > TG-DBS080004N >> > >> > 16way >> > /share/home/01503/davidkel/swiftwork >> > >> > >> > >> > Thanks, >> > >> > David >> > >> > ----- Original Message ----- >> > >> > > From: "Sarah Kenny" < skenny at uchicago.edu > >> > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > >> > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift User" < >> > > swift-user at ci.uchicago.edu > >> > >> > >> > >> > > Sent: Friday, October 7, 2011 3:13:57 PM >> > > Subject: Re: [Swift-user] gram on ranger >> > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log >> > > >> > > on ci >> > > >> > > >> > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < >> > > wozniak at mcs.anl.gov >> > > > wrote: >> > > >> > > >> > > >> > > Can I take a look at the log? >> > > >> > > >> > > >> > > >> > > On Thu, 6 Oct 2011, Sarah Kenny wrote: >> > > >> > > >> > > >> > > hey all, i'm trying to submit to gram on ranger using the latest >> > > swift >> > > (built from trunk). it failes like so: >> > > >> > > Cannot submit job >> > > Caused by: >> > > org.globus.cog.abstraction. impl.common.task. >> > > TaskSubmissionException: >> > > Cannot >> > > submit job >> > > Caused by: org.globus.gram.GramException: Parameter not supported >> > > Cannot submit job >> > > >> > > the gram log was saying first that 'jobsPerNode' is not supported so >> > > i >> > > changed it to workersPerNode and then it was saying 'maxnodes' is >> > > not >> > > supported. here's my sites file: >> > > >> > > >> > > >> > > 10000 >> > > 1 >> > > 00:15:00 >> > > 86400 >> > > 1 >> > > 256 >> > > 16way >> > > 1 >> > > 64 >> > > normal >> > > TG-DBS080004N >> > > >> > >> > > >> > >> > > >> > > /work/00043/ tg457040 >> > >> > > >> > > >> > > >> > > thoughts? ideas? >> > > >> > > -- >> > > Justin M Wozniak >> > > >> > > >> > > >> > > -- >> > > Sarah Kenny >> > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III >> > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 >> > > >> > > >> > > _______________________________________________ >> > > Swift-user mailing list >> > > Swift-user at ci.uchicago.edu >> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >> > >> > >> > >> > >> > >> > -- >> > Sarah Kenny >> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III >> > University of California Irvine, Dept. of Neurology ~ 773-818-8300 >> > >> > >> > >> > >> > -- >> > Sarah Kenny >> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III >> > University of California Irvine, Dept. of Neurology ~ 773-818-8300 >> > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > -- Sent from my mobile device From hategan at mcs.anl.gov Tue Oct 11 18:23:26 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 11 Oct 2011 16:23:26 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: References: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <1318375406.2770.0.camel@blabla> Is this with a persistent coaster service? On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > wrote: > > That could be it.. maybe a cleanup script is not getting the > right parameters and failing. Do you happen to have a copy of > the coaster log? > > just put it in /home/skenny/swift_logs > > > Maybe there will be some clues in there. > > ----- Original Message ----- > > From: "Sarah Kenny" > > > To: "David Kelly" > > Cc: "Swift Devel" , "Swift > User" , "Justin M Wozniak" > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > Subject: Re: [Swift-user] gram on ranger > > > so, this workflow completes all the jobs but then just hangs > > indefinitely at the end...maybe a stray cleanup job? > > > > log is here: > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > just tweaked the sites file a bit from what david sent me: > > > > > > > > > > > > > 28800 > > key="maxWallTime">00:15:00 > > 1 > > key="nodeGranularity">64 > > 256 > > normal > > 1 > > key="project">TG-DBS080004N > > 16way > > key="initialScore">10000 > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > skenny at uchicago.edu > > > wrote: > > > > > > ok, thanks, got in the queue now...also, realized my last > run may have > > been using the old swift. apparently i had SWIFT_HOME set in > my env > > and that overrides the newer swift i had set in my PATH. > > > > ~sk > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > davidk at ci.uchicago.edu > > > wrote: > > > > > > > > > > > > Sarah, > > > > Can you give this another try with the latest 0.93? I made > some > > changes to the coaster and sge providers and was able to get > it > > working with a simple catns script. Here is the > configuration file I > > was using: > > > > > > > > > > > > > > > 3600 > > key="maxWallTime">00:00:03 > > 1 > > key="nodeGranularity">16 > > 16 > > key="queue">development > > 0.9 > > > > key="project">TG-DBS080004N > > > > 16way > > > /share/home/01503/davidkel/swiftwork > > > > > > > > Thanks, > > > > David > > > > ----- Original Message ----- > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift > User" < > > > swift-user at ci.uchicago.edu > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > Subject: Re: [Swift-user] gram on ranger > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > on ci > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < > > > wozniak at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger using the > latest > > > swift > > > (built from trunk). it failes like so: > > > > > > Cannot submit job > > > Caused by: > > > org.globus.cog.abstraction. impl.common.task. > > > TaskSubmissionException: > > > Cannot > > > submit job > > > Caused by: org.globus.gram.GramException: Parameter not > supported > > > Cannot submit job > > > > > > the gram log was saying first that 'jobsPerNode' is not > supported so > > > i > > > changed it to workersPerNode and then it was saying > 'maxnodes' is > > > not > > > supported. here's my sites file: > > > > > > > > > > > > 10000 profile> > > > 1 > > > 00:15:00 profile> > > > 86400 > > > 1 > > > 256 > > > 16way > > > 1 profile> > > > 64 profile> > > > normal > > > TG-DBS080004N profile> > > > > > > > > url=" > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > /work/00043/ tg457040 > > > > > > > > > > > > > > thoughts? ideas? > > > > > > -- > > > Justin M Wozniak > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci > III > > > University of California Irvine, Dept. of Neurology ~ > 773-818-8300 > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ > 773-818-8300 > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ > 773-818-8300 > > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From skenny at uchicago.edu Tue Oct 11 19:13:21 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 11 Oct 2011 17:13:21 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: <1318375406.2770.0.camel@blabla> References: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> <1318375406.2770.0.camel@blabla> Message-ID: On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan wrote: > Is this with a persistent coaster service? > admittedly i have not used persistent coaster service...should i? i feel like it's documented *somewhere* (?) for now i've tried setting 'sitedir.keep=true' in the config so maybe it won't try to run the cleanup job...we'll see (waiting in q) > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > wrote: > > > > That could be it.. maybe a cleanup script is not getting the > > right parameters and failing. Do you happen to have a copy of > > the coaster log? > > > > just put it in /home/skenny/swift_logs > > > > > > Maybe there will be some clues in there. > > > > ----- Original Message ----- > > > From: "Sarah Kenny" > > > > > To: "David Kelly" > > > Cc: "Swift Devel" , "Swift > > User" , "Justin M Wozniak" > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > Subject: Re: [Swift-user] gram on ranger > > > > > so, this workflow completes all the jobs but then just hangs > > > indefinitely at the end...maybe a stray cleanup job? > > > > > > log is here: > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > just tweaked the sites file a bit from what david sent me: > > > > > > > > > > > > > > > > > > > > 28800 > > > > key="maxWallTime">00:15:00 > > > 1 > > > > key="nodeGranularity">64 > > > 256 > > > normal > > > 1 > > > > key="project">TG-DBS080004N > > > 16way > > > > key="initialScore">10000 > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > skenny at uchicago.edu > > > > wrote: > > > > > > > > > ok, thanks, got in the queue now...also, realized my last > > run may have > > > been using the old swift. apparently i had SWIFT_HOME set in > > my env > > > and that overrides the newer swift i had set in my PATH. > > > > > > ~sk > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > davidk at ci.uchicago.edu > > > > wrote: > > > > > > > > > > > > > > > > > > Sarah, > > > > > > Can you give this another try with the latest 0.93? I made > > some > > > changes to the coaster and sge providers and was able to get > > it > > > working with a simple catns script. Here is the > > configuration file I > > > was using: > > > > > > > > > > > > > > > > > > > > > > > 3600 > > > > key="maxWallTime">00:00:03 > > > 1 > > > > key="nodeGranularity">16 > > > 16 > > > > key="queue">development > > > 0.9 > > > > > > > key="project">TG-DBS080004N > > > > > > 16way > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > Thanks, > > > > > > David > > > > > > ----- Original Message ----- > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, "Swift > > User" < > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > Subject: Re: [Swift-user] gram on ranger > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > on ci > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak < > > > > wozniak at mcs.anl.gov > > > > > wrote: > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger using the > > latest > > > > swift > > > > (built from trunk). it failes like so: > > > > > > > > Cannot submit job > > > > Caused by: > > > > org.globus.cog.abstraction. impl.common.task. > > > > TaskSubmissionException: > > > > Cannot > > > > submit job > > > > Caused by: org.globus.gram.GramException: Parameter not > > supported > > > > Cannot submit job > > > > > > > > the gram log was saying first that 'jobsPerNode' is not > > supported so > > > > i > > > > changed it to workersPerNode and then it was saying > > 'maxnodes' is > > > > not > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > 10000 > profile> > > > > 1 > > > > 00:15:00 > profile> > > > > 86400 > > > > 1 > > > > 256 > > > > 16way > > > > 1 > profile> > > > > 64 > profile> > > > > normal > > > > TG-DBS080004N > profile> > > > > > > > > > > > > url=" > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > /work/00043/ tg457040 > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > -- > > > > Justin M Wozniak > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci > > III > > > > University of California Irvine, Dept. of Neurology ~ > > 773-818-8300 > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ > > 773-818-8300 > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ > > 773-818-8300 > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Oct 12 14:13:44 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 12 Oct 2011 12:13:44 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: References: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> <1318375406.2770.0.camel@blabla> Message-ID: <1318446824.18036.0.camel@blabla> On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan > wrote: > Is this with a persistent coaster service? > > admittedly i have not used persistent coaster service...should i? No. I was just trying to figure out whether it might be something related to the persistent version. > i feel like it's documented *somewhere* (?) > > for now i've tried setting 'sitedir.keep=true' in the config so maybe > it won't try to run the cleanup job...we'll see (waiting in q) > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > wrote: > > > > That could be it.. maybe a cleanup script is not > getting the > > right parameters and failing. Do you happen to have > a copy of > > the coaster log? > > > > just put it in /home/skenny/swift_logs > > > > > > Maybe there will be some clues in there. > > > > ----- Original Message ----- > > > From: "Sarah Kenny" > > > > > To: "David Kelly" > > > Cc: "Swift Devel" , > "Swift > > User" , "Justin M > Wozniak" > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > Subject: Re: [Swift-user] gram on ranger > > > > > so, this workflow completes all the jobs but then > just hangs > > > indefinitely at the end...maybe a stray cleanup > job? > > > > > > log is here: > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > just tweaked the sites file a bit from what david > sent me: > > > > > > > > > > > > url=" > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > key="maxtime">28800 > > > > key="maxWallTime">00:15:00 > > > key="jobsPerNode">1 > > > > key="nodeGranularity">64 > > > key="maxNodes">256 > > > key="queue">normal > > > key="jobThrottle">1 > > > > key="project">TG-DBS080004N > > > key="pe">16way > > > > key="initialScore">10000 > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > skenny at uchicago.edu > > > > wrote: > > > > > > > > > ok, thanks, got in the queue now...also, realized > my last > > run may have > > > been using the old swift. apparently i had > SWIFT_HOME set in > > my env > > > and that overrides the newer swift i had set in my > PATH. > > > > > > ~sk > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > davidk at ci.uchicago.edu > > > > wrote: > > > > > > > > > > > > > > > > > > Sarah, > > > > > > Can you give this another try with the latest > 0.93? I made > > some > > > changes to the coaster and sge providers and was > able to get > > it > > > working with a simple catns script. Here is the > > configuration file I > > > was using: > > > > > > > > > > > > url=" > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > key="maxtime">3600 > > > > key="maxWallTime">00:00:03 > > > key="jobsPerNode">1 > > > > key="nodeGranularity">16 > > > key="maxNodes">16 > > > > key="queue">development > > > key="jobThrottle">0.9 > > > > > > > key="project">TG-DBS080004N > > > > > > key="pe">16way > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > Thanks, > > > > > > David > > > > > > ----- Original Message ----- > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > >, "Swift > > User" < > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > Subject: Re: [Swift-user] gram on ranger > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > on ci > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > < > > > > wozniak at mcs.anl.gov > > > > > wrote: > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > using the > > latest > > > > swift > > > > (built from trunk). it failes like so: > > > > > > > > Cannot submit job > > > > Caused by: > > > > org.globus.cog.abstraction. impl.common.task. > > > > TaskSubmissionException: > > > > Cannot > > > > submit job > > > > Caused by: org.globus.gram.GramException: > Parameter not > > supported > > > > Cannot submit job > > > > > > > > the gram log was saying first that 'jobsPerNode' > is not > > supported so > > > > i > > > > changed it to workersPerNode and then it was > saying > > 'maxnodes' is > > > > not > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > key="initialScore">10000 > profile> > > > > key="jobThrottle">1 > > > > key="maxWallTime">00:15:00 > profile> > > > > key="maxTime">86400 > > > > key="slots">1 > > > > key="maxNodes">256 > > > > key="pe">16way > > > > key="workersPerNode">1 > profile> > > > > key="nodeGranularity">64 > profile> > > > > key="queue">normal > > > > key="project">TG-DBS080004N > profile> > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > url=" > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > /work/00043/ > tg457040 > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > -- > > > > Justin M Wozniak > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > Bio Sci > > III > > > > University of California Irvine, Dept. of > Neurology ~ > > 773-818-8300 > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > Bio Sci III > > > University of California Irvine, Dept. of > Neurology ~ > > 773-818-8300 > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > Bio Sci III > > > University of California Irvine, Dept. of > Neurology ~ > > 773-818-8300 > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ > 773-818-8300 > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > From ketancmaheshwari at gmail.com Wed Oct 19 20:30:24 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 19 Oct 2011 20:30:24 -0500 Subject: [Swift-user] app argument int passed as float Message-ID: Hi, In my app definition, I have an argument of type int that is expected by the executable. I define my app something like this: (outfile o ) app myapp(string a, int b, datafile c, ...){ cmd a b c stdout @o; } I call this as: <..mappers..> out = myapp("foo", 35, @afile, ...); However, it seems that at the time of the actual call, the value 35 which is an int get replaced by the value 35.0, a float. I could see this from the arguments dumped by swift at the end of exception message: " Exception in presgt: Arguments: [TEST, 35.0, ..... " The underlying app binary doesn't like because apparently the java is calling Integer.parseInt and getting a float formatted string. Any ideas? Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Thu Oct 20 06:07:09 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 20 Oct 2011 04:07:09 -0700 Subject: [Swift-user] gram on ranger In-Reply-To: <1318446824.18036.0.camel@blabla> References: <1908023276.141555.1318358942817.JavaMail.root@zimbra-mb2.anl.gov> <1318375406.2770.0.camel@blabla> <1318446824.18036.0.camel@blabla> Message-ID: hi all, one of our users, anjali (cc'd here) is trying to submit this ~400k job workflow to ranger...thought i'd see if you felt like having a look :) log is here: /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log sites file: 7200 00:20:00 1 64 256 development 1.28 TG-DBS080004N 16way 10000 /work/00926/tg459516/swiftwork On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan wrote: > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan > > wrote: > > Is this with a persistent coaster service? > > > > admittedly i have not used persistent coaster service...should i? > > No. I was just trying to figure out whether it might be something > related to the persistent version. > > > i feel like it's documented *somewhere* (?) > > > > for now i've tried setting 'sitedir.keep=true' in the config so maybe > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > > > wrote: > > > > > > That could be it.. maybe a cleanup script is not > > getting the > > > right parameters and failing. Do you happen to have > > a copy of > > > the coaster log? > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > Maybe there will be some clues in there. > > > > > > ----- Original Message ----- > > > > From: "Sarah Kenny" > > > > > > > To: "David Kelly" > > > > Cc: "Swift Devel" , > > "Swift > > > User" , "Justin M > > Wozniak" > > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > so, this workflow completes all the jobs but then > > just hangs > > > > indefinitely at the end...maybe a stray cleanup > > job? > > > > > > > > log is here: > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > just tweaked the sites file a bit from what david > > sent me: > > > > > > > > > > > > > > > > > url=" > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > key="maxtime">28800 > > > > > > key="maxWallTime">00:15:00 > > > > > key="jobsPerNode">1 > > > > > > key="nodeGranularity">64 > > > > > key="maxNodes">256 > > > > > key="queue">normal > > > > > key="jobThrottle">1 > > > > > > key="project">TG-DBS080004N > > > > > key="pe">16way > > > > > > key="initialScore">10000 > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > skenny at uchicago.edu > > > > > wrote: > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > my last > > > run may have > > > > been using the old swift. apparently i had > > SWIFT_HOME set in > > > my env > > > > and that overrides the newer swift i had set in my > > PATH. > > > > > > > > ~sk > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > Can you give this another try with the latest > > 0.93? I made > > > some > > > > changes to the coaster and sge providers and was > > able to get > > > it > > > > working with a simple catns script. Here is the > > > configuration file I > > > > was using: > > > > > > > > > > > > > > > > > url=" > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > key="maxWallTime">00:00:03 > > > > > key="jobsPerNode">1 > > > > > > key="nodeGranularity">16 > > > > > key="maxNodes">16 > > > > > > key="queue">development > > > > > key="jobThrottle">0.9 > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > key="pe">16way > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > Thanks, > > > > > > > > David > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > >, "Swift > > > User" < > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > on ci > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > < > > > > > wozniak at mcs.anl.gov > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > using the > > > latest > > > > > swift > > > > > (built from trunk). it failes like so: > > > > > > > > > > Cannot submit job > > > > > Caused by: > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > TaskSubmissionException: > > > > > Cannot > > > > > submit job > > > > > Caused by: org.globus.gram.GramException: > > Parameter not > > > supported > > > > > Cannot submit job > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > is not > > > supported so > > > > > i > > > > > changed it to workersPerNode and then it was > > saying > > > 'maxnodes' is > > > > > not > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > profile> > > > > > > key="jobThrottle">1 > > > > > > key="maxWallTime">00:15:00 > > profile> > > > > > > key="maxTime">86400 > > > > > > key="slots">1 > > > > > > key="maxNodes">256 > > > > > > key="pe">16way > > > > > > key="workersPerNode">1 > > profile> > > > > > > key="nodeGranularity">64 > > profile> > > > > > > key="queue">normal > > > > > > key="project">TG-DBS080004N > > profile> > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > url=" > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > /work/00043/ > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > -- > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > Bio Sci > > > III > > > > > University of California Irvine, Dept. of > > Neurology ~ > > > 773-818-8300 > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > Bio Sci III > > > > University of California Irvine, Dept. of > > Neurology ~ > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > Bio Sci III > > > > University of California Irvine, Dept. of > > Neurology ~ > > > 773-818-8300 > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ > > 773-818-8300 > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 20 07:50:39 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 20 Oct 2011 07:50:39 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: Message-ID: <1630476647.112879.1319115039205.JavaMail.root@zimbra.anl.gov> Hi Sarah, Anjali, My initial theory on whats failing in this job is that the Ranger development queue is limited to jobs of 16 nodes or less. (The Ranger User Guide says maxprocs 256 for that queue, and qconf -sq development says slots 16, which agrees). So you need to either change to one of the production queues (normal, long etc) or reduce the values of maxnode and nodegranularity. I would also suggest (unless you have already done this) that you test first on a very small run (like a single RInvoke app call) and then scale up to just a few voxels per dataset before trying such a large run. Have you already tested that? Lastly, when reporting problems like this, the swift standard output/err is also very helpful to get a higher-level view of what went wrong. Swift needs to clearly return errors from the local resource provider, which it doesnt seem to be doing here. Ive filed this as bug 593 and assigned to David. Please let us know if changing the queue and/or slots resolves the problem. As mentioned in the bug report I think you can set debug=true (or yes?) in the provider-sge.properties file and get swift to preserve the output from SGE in ~/.globus/scripts. (In fact that may already be preserved, I am not sure). Please check there to see if the SGE error is there. Thanks, - Mike ----- Original Message ----- > From: "Sarah Kenny" > To: "Mihael Hategan" > Cc: "Anjali Raja" , "Swift Devel" , "Swift User" > > Sent: Thursday, October 20, 2011 6:07:09 AM > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > hi all, one of our users, anjali (cc'd here) is trying to submit this > ~400k job workflow to ranger...thought i'd see if you felt like having > a look :) > > log is here: > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > sites file: > > > > > > 7200 > 00:20:00 > 1 > 64 > 256 > development > 1.28 > TG-DBS080004N > 16way > 10000 > /work/00926/tg459516/swiftwork > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < hategan at mcs.anl.gov > > wrote: > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > hategan at mcs.anl.gov > > > wrote: > > Is this with a persistent coaster service? > > > > admittedly i have not used persistent coaster service...should i? > > No. I was just trying to figure out whether it might be something > related to the persistent version. > > > > > > i feel like it's documented *somewhere* (?) > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > maybe > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > < davidk at ci.uchicago.edu > > > > wrote: > > > > > > That could be it.. maybe a cleanup script is not > > getting the > > > right parameters and failing. Do you happen to have > > a copy of > > > the coaster log? > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > Maybe there will be some clues in there. > > > > > > ----- Original Message ----- > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > "Swift > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > Wozniak" > > > > < wozniak at mcs.anl.gov > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > so, this workflow completes all the jobs but then > > just hangs > > > > indefinitely at the end...maybe a stray cleanup > > job? > > > > > > > > log is here: > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > just tweaked the sites file a bit from what david > > sent me: > > > > > > > > > > > > > > > > > url=" > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > key="maxtime">28800 > > > > > > key="maxWallTime">00:15:00 > > > > > key="jobsPerNode">1 > > > > > > key="nodeGranularity">64 > > > > > key="maxNodes">256 > > > > > key="queue">normal > > > > > key="jobThrottle">1 > > > > > > key="project">TG-DBS080004N > > > > > key="pe">16way > > > > > > key="initialScore">10000 > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > skenny at uchicago.edu > > > > > wrote: > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > my last > > > run may have > > > > been using the old swift. apparently i had > > SWIFT_HOME set in > > > my env > > > > and that overrides the newer swift i had set in my > > PATH. > > > > > > > > ~sk > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > Can you give this another try with the latest > > 0.93? I made > > > some > > > > changes to the coaster and sge providers and was > > able to get > > > it > > > > working with a simple catns script. Here is the > > > configuration file I > > > > was using: > > > > > > > > > > > > > > > > > url=" > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > key="maxWallTime">00:00:03 > > > > > key="jobsPerNode">1 > > > > > > key="nodeGranularity">16 > > > > > key="maxNodes">16 > > > > > > key="queue">development > > > > > key="jobThrottle">0.9 > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > key="pe">16way > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > Thanks, > > > > > > > > David > > > > > > > > ----- Original Message ----- > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > >, "Swift > > > User" < > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > on ci > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > < > > > > > wozniak at mcs.anl.gov > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > using the > > > latest > > > > > swift > > > > > (built from trunk). it failes like so: > > > > > > > > > > Cannot submit job > > > > > Caused by: > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > TaskSubmissionException: > > > > > Cannot > > > > > submit job > > > > > Caused by: org.globus.gram.GramException: > > Parameter not > > > supported > > > > > Cannot submit job > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > is not > > > supported so > > > > > i > > > > > changed it to workersPerNode and then it was > > saying > > > 'maxnodes' is > > > > > not > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > profile> > > > > > > key="jobThrottle">1 > > > > > > key="maxWallTime">00:15:00 > > profile> > > > > > > key="maxTime">86400 > > > > > > key="slots">1 > > > > > > key="maxNodes">256 > > > > > > key="pe">16way > > > > > > key="workersPerNode">1 > > profile> > > > > > > key="nodeGranularity">64 > > profile> > > > > > > key="queue">normal > > > > > > key="project">TG-DBS080004N > > profile> > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > url=" > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > /work/00043/ > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > -- > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > Bio Sci > > > III > > > > > University of California Irvine, Dept. of > > Neurology ~ > > > 773-818-8300 > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > Bio Sci III > > > > University of California Irvine, Dept. of > > Neurology ~ > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > Bio Sci III > > > > University of California Irvine, Dept. of > > Neurology ~ > > > 773-818-8300 > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ > > 773-818-8300 > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Thu Oct 20 09:54:33 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 20 Oct 2011 09:54:33 -0500 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <1630476647.112879.1319115039205.JavaMail.root@zimbra.anl.gov> References: <1630476647.112879.1319115039205.JavaMail.root@zimbra.anl.gov> Message-ID: On Thu, Oct 20, 2011 at 7:50 AM, Michael Wilde wrote: > Hi Sarah, Anjali, > > My initial theory on whats failing in this job is that the Ranger > development queue is limited to jobs of 16 nodes or less. (The Ranger User > Guide says maxprocs 256 for that queue, and qconf -sq development says slots > 16, which agrees). So you need to either change to one of the production > queues (normal, long etc) or reduce the values of maxnode and > nodegranularity. > I have a little confusion here: the desired line in the final pbs script should be : #$ -pe way 256; in order to have 256 procs, however, putting maxnodes=16 on sites.xml results in the following line on pbs: #$ -pe way 16; I understand this number 16/256 is for procs since, when putting 256 with development queue, ranger indeed allows the job to run in development queue. > > I would also suggest (unless you have already done this) that you test > first on a very small run (like a single RInvoke app call) and then scale up > to just a few voxels per dataset before trying such a large run. Have you > already tested that? > > Lastly, when reporting problems like this, the swift standard output/err is > also very helpful to get a higher-level view of what went wrong. > > Swift needs to clearly return errors from the local resource provider, > which it doesnt seem to be doing here. Ive filed this as bug 593 and > assigned to David. > > Please let us know if changing the queue and/or slots resolves the problem. > As mentioned in the bug report I think you can set debug=true (or yes?) in > the provider-sge.properties file and get swift to preserve the output from > SGE in ~/.globus/scripts. (In fact that may already be preserved, I am not > sure). Please check there to see if the SGE error is there. > > Thanks, > > - Mike > > > ----- Original Message ----- > > From: "Sarah Kenny" > > To: "Mihael Hategan" > > Cc: "Anjali Raja" , "Swift Devel" < > swift-devel at ci.uchicago.edu>, "Swift User" > > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > hi all, one of our users, anjali (cc'd here) is trying to submit this > > ~400k job workflow to ranger...thought i'd see if you felt like having > > a look :) > > > > log is here: > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > sites file: > > > > > > > > > > > > 7200 > > 00:20:00 > > 1 > > 64 > > 256 > > development > > 1.28 > > TG-DBS080004N > > 16way > > 10000 > > /work/00926/tg459516/swiftwork > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < hategan at mcs.anl.gov > > > wrote: > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > Is this with a persistent coaster service? > > > > > > admittedly i have not used persistent coaster service...should i? > > > > No. I was just trying to figure out whether it might be something > > related to the persistent version. > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > maybe > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > < davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > That could be it.. maybe a cleanup script is not > > > getting the > > > > right parameters and failing. Do you happen to have > > > a copy of > > > > the coaster log? > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > ----- Original Message ----- > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > "Swift > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > Wozniak" > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > so, this workflow completes all the jobs but then > > > just hangs > > > > > indefinitely at the end...maybe a stray cleanup > > > job? > > > > > > > > > > log is here: > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > just tweaked the sites file a bit from what david > > > sent me: > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > key="jobsPerNode">1 > > > > > > > > key="nodeGranularity">64 > > > > > > > key="maxNodes">256 > > > > > > > key="queue">normal > > > > > > > key="jobThrottle">1 > > > > > > > > key="project">TG-DBS080004N > > > > > > > key="pe">16way > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > skenny at uchicago.edu > > > > > > wrote: > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > my last > > > > run may have > > > > > been using the old swift. apparently i had > > > SWIFT_HOME set in > > > > my env > > > > > and that overrides the newer swift i had set in my > > > PATH. > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > Can you give this another try with the latest > > > 0.93? I made > > > > some > > > > > changes to the coaster and sge providers and was > > > able to get > > > > it > > > > > working with a simple catns script. Here is the > > > > configuration file I > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > key="jobsPerNode">1 > > > > > > > > key="nodeGranularity">16 > > > > > > > key="maxNodes">16 > > > > > > > > key="queue">development > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > David > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > >, "Swift > > > > User" < > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > < > > > > > > wozniak at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > using the > > > > latest > > > > > > swift > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > Cannot submit job > > > > > > Caused by: > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > TaskSubmissionException: > > > > > > Cannot > > > > > > submit job > > > > > > Caused by: org.globus.gram.GramException: > > > Parameter not > > > > supported > > > > > > Cannot submit job > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > is not > > > > supported so > > > > > > i > > > > > > changed it to workersPerNode and then it was > > > saying > > > > 'maxnodes' is > > > > > > not > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > profile> > > > > > > > > key="jobThrottle">1 > > > > > > > > key="maxWallTime">00:15:00 > > > profile> > > > > > > > > key="maxTime">86400 > > > > > > > > key="slots">1 > > > > > > > > key="maxNodes">256 > > > > > > > > key="pe">16way > > > > > > > > key="workersPerNode">1 > > > profile> > > > > > > > > key="nodeGranularity">64 > > > profile> > > > > > > > > key="queue">normal > > > > > > > > key="project">TG-DBS080004N > > > profile> > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > url=" > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > /work/00043/ > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > -- > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci > > > > III > > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci III > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci III > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ > > > 773-818-8300 > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 20 10:21:43 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 20 Oct 2011 10:21:43 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: Message-ID: <988426934.113721.1319124103942.JavaMail.root@zimbra.anl.gov> Thanks, Ketan. If I understand you correctly, then I would consider this a Swift bug, in that maxnodes should always mean *nodes*, for every type of resource provider including SGE. Based on what you say, the SGE provider is in this case treating the requested maxnode count as cores (Assuming Anjali was running the same Swift revision as you were testing on here). But then that might not explain the error in the log that Sarah posted. It seems the next step is to try the run on a smaller job (we can test this ourselves), and see if we can replicate and diagnose the error, with SGE subit files and output/error logs. David, can you do this, since you were working on SGE testing last week? You and Ketan should share what you know about the situation, via swift-devel, as Ketan is also running on Ranger with persistent coasters I think. Thanks, Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Michael Wilde" > Cc: "Sarah Kenny" , "Anjali Raja" , "Swift Devel" > , "Swift User" > Sent: Thursday, October 20, 2011 9:54:33 AM > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > On Thu, Oct 20, 2011 at 7:50 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Hi Sarah, Anjali, > > My initial theory on whats failing in this job is that the Ranger > development queue is limited to jobs of 16 nodes or less. (The Ranger > User Guide says maxprocs 256 for that queue, and qconf -sq development > says slots 16, which agrees). So you need to either change to one of > the production queues (normal, long etc) or reduce the values of > maxnode and nodegranularity. > > > > I have a little confusion here: the desired line in the final pbs > script should be : #$ -pe way 256; in order to have 256 procs, > however, putting maxnodes=16 on sites.xml results in the following > line on pbs: > #$ -pe way 16; > I understand this number 16/256 is for procs since, when putting 256 > with development queue, ranger indeed allows the job to run in > development queue. > > > > I would also suggest (unless you have already done this) that you test > first on a very small run (like a single RInvoke app call) and then > scale up to just a few voxels per dataset before trying such a large > run. Have you already tested that? > > Lastly, when reporting problems like this, the swift standard > output/err is also very helpful to get a higher-level view of what > went wrong. > > Swift needs to clearly return errors from the local resource provider, > which it doesnt seem to be doing here. Ive filed this as bug 593 and > assigned to David. > > Please let us know if changing the queue and/or slots resolves the > problem. As mentioned in the bug report I think you can set debug=true > (or yes?) in the provider-sge.properties file and get swift to > preserve the output from SGE in ~/.globus/scripts. (In fact that may > already be preserved, I am not sure). Please check there to see if the > SGE error is there. > > Thanks, > > - Mike > > > > ----- Original Message ----- > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" < > > swift-devel at ci.uchicago.edu >, "Swift User" > > < swift-user at ci.uchicago.edu > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > > hi all, one of our users, anjali (cc'd here) is trying to submit > > this > > ~400k job workflow to ranger...thought i'd see if you felt like > > having > > a look :) > > > > log is here: > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > sites file: > > > > > > > > > > > > > > > 7200 > > 00:20:00 > > 1 > > 64 > > 256 > > development > > 1.28 > > TG-DBS080004N > > 16way > > 10000 > > /work/00926/tg459516/swiftwork > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < > > hategan at mcs.anl.gov > > > wrote: > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > Is this with a persistent coaster service? > > > > > > admittedly i have not used persistent coaster service...should i? > > > > No. I was just trying to figure out whether it might be something > > related to the persistent version. > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > maybe > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > < davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > That could be it.. maybe a cleanup script is not > > > getting the > > > > right parameters and failing. Do you happen to have > > > a copy of > > > > the coaster log? > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > ----- Original Message ----- > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > "Swift > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > Wozniak" > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > so, this workflow completes all the jobs but then > > > just hangs > > > > > indefinitely at the end...maybe a stray cleanup > > > job? > > > > > > > > > > log is here: > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > just tweaked the sites file a bit from what david > > > sent me: > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > key="jobsPerNode">1 > > > > > > > > key="nodeGranularity">64 > > > > > > > key="maxNodes">256 > > > > > > > key="queue">normal > > > > > > > key="jobThrottle">1 > > > > > > > > key="project">TG-DBS080004N > > > > > > > key="pe">16way > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > skenny at uchicago.edu > > > > > > wrote: > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > my last > > > > run may have > > > > > been using the old swift. apparently i had > > > SWIFT_HOME set in > > > > my env > > > > > and that overrides the newer swift i had set in my > > > PATH. > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > Can you give this another try with the latest > > > 0.93? I made > > > > some > > > > > changes to the coaster and sge providers and was > > > able to get > > > > it > > > > > working with a simple catns script. Here is the > > > > configuration file I > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > key="jobsPerNode">1 > > > > > > > > key="nodeGranularity">16 > > > > > > > key="maxNodes">16 > > > > > > > > key="queue">development > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > David > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > >, "Swift > > > > User" < > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > < > > > > > > wozniak at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > using the > > > > latest > > > > > > swift > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > Cannot submit job > > > > > > Caused by: > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > TaskSubmissionException: > > > > > > Cannot > > > > > > submit job > > > > > > Caused by: org.globus.gram.GramException: > > > Parameter not > > > > supported > > > > > > Cannot submit job > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > is not > > > > supported so > > > > > > i > > > > > > changed it to workersPerNode and then it was > > > saying > > > > 'maxnodes' is > > > > > > not > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > profile> > > > > > > > > key="jobThrottle">1 > > > > > > > > key="maxWallTime">00:15:00 > > > profile> > > > > > > > > key="maxTime">86400 > > > > > > > > key="slots">1 > > > > > > > > key="maxNodes">256 > > > > > > > > key="pe">16way > > > > > > > > key="workersPerNode">1 > > > profile> > > > > > > > > key="nodeGranularity">64 > > > profile> > > > > > > > > key="queue">normal > > > > > > > > key="project">TG-DBS080004N > > > profile> > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > url=" > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > /work/00043/ > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > -- > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci > > > > III > > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci III > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci III > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ > > > 773-818-8300 > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > -- > Ketan -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From davidk at ci.uchicago.edu Thu Oct 20 10:37:59 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Thu, 20 Oct 2011 10:37:59 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: Message-ID: <459404377.154944.1319125079761.JavaMail.root@zimbra-mb2.anl.gov> Ketan, I think you're right - I believe that line should be: #$ -pe I'll add this and do a little more testing to see if I can reproduce Sarah's problem. David > I have a little confusion here: the desired line in the final pbs > script should be : #$ -pe way 256; in order to have 256 procs, > however, putting maxnodes=16 on sites.xml results in the following > line on pbs: > #$ -pe way 16; > I understand this number 16/256 is for procs since, when putting 256 > with development queue, ranger indeed allows the job to run in > development queue. > > > > I would also suggest (unless you have already done this) that you test > first on a very small run (like a single RInvoke app call) and then > scale up to just a few voxels per dataset before trying such a large > run. Have you already tested that? > > Lastly, when reporting problems like this, the swift standard > output/err is also very helpful to get a higher-level view of what > went wrong. > > Swift needs to clearly return errors from the local resource provider, > which it doesnt seem to be doing here. Ive filed this as bug 593 and > assigned to David. > > Please let us know if changing the queue and/or slots resolves the > problem. As mentioned in the bug report I think you can set debug=true > (or yes?) in the provider-sge.properties file and get swift to > preserve the output from SGE in ~/.globus/scripts. (In fact that may > already be preserved, I am not sure). Please check there to see if the > SGE error is there. > > Thanks, > > - Mike > > > > ----- Original Message ----- > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" < > > swift-devel at ci.uchicago.edu >, "Swift User" > > < swift-user at ci.uchicago.edu > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > > hi all, one of our users, anjali (cc'd here) is trying to submit > > this > > ~400k job workflow to ranger...thought i'd see if you felt like > > having > > a look :) > > > > log is here: > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > sites file: > > > > > > > > > > > > > > > 7200 > > 00:20:00 > > 1 > > 64 > > 256 > > development > > 1.28 > > TG-DBS080004N > > 16way > > 10000 > > /work/00926/tg459516/swiftwork > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < > > hategan at mcs.anl.gov > > > wrote: > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > Is this with a persistent coaster service? > > > > > > admittedly i have not used persistent coaster service...should i? > > > > No. I was just trying to figure out whether it might be something > > related to the persistent version. > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > maybe > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > < davidk at ci.uchicago.edu > > > > > wrote: > > > > > > > > That could be it.. maybe a cleanup script is not > > > getting the > > > > right parameters and failing. Do you happen to have > > > a copy of > > > > the coaster log? > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > ----- Original Message ----- > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > "Swift > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > Wozniak" > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > so, this workflow completes all the jobs but then > > > just hangs > > > > > indefinitely at the end...maybe a stray cleanup > > > job? > > > > > > > > > > log is here: > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > just tweaked the sites file a bit from what david > > > sent me: > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > key="jobsPerNode">1 > > > > > > > > key="nodeGranularity">64 > > > > > > > key="maxNodes">256 > > > > > > > key="queue">normal > > > > > > > key="jobThrottle">1 > > > > > > > > key="project">TG-DBS080004N > > > > > > > key="pe">16way > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > skenny at uchicago.edu > > > > > > wrote: > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > my last > > > > run may have > > > > > been using the old swift. apparently i had > > > SWIFT_HOME set in > > > > my env > > > > > and that overrides the newer swift i had set in my > > > PATH. > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > Can you give this another try with the latest > > > 0.93? I made > > > > some > > > > > changes to the coaster and sge providers and was > > > able to get > > > > it > > > > > working with a simple catns script. Here is the > > > > configuration file I > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > key="jobsPerNode">1 > > > > > > > > key="nodeGranularity">16 > > > > > > > key="maxNodes">16 > > > > > > > > key="queue">development > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > David > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > >, "Swift > > > > User" < > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > < > > > > > > wozniak at mcs.anl.gov > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > using the > > > > latest > > > > > > swift > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > Cannot submit job > > > > > > Caused by: > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > TaskSubmissionException: > > > > > > Cannot > > > > > > submit job > > > > > > Caused by: org.globus.gram.GramException: > > > Parameter not > > > > supported > > > > > > Cannot submit job > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > is not > > > > supported so > > > > > > i > > > > > > changed it to workersPerNode and then it was > > > saying > > > > 'maxnodes' is > > > > > > not > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > profile> > > > > > > > > key="jobThrottle">1 > > > > > > > > key="maxWallTime">00:15:00 > > > profile> > > > > > > > > key="maxTime">86400 > > > > > > > > key="slots">1 > > > > > > > > key="maxNodes">256 > > > > > > > > key="pe">16way > > > > > > > > key="workersPerNode">1 > > > profile> > > > > > > > > key="nodeGranularity">64 > > > profile> > > > > > > > > key="queue">normal > > > > > > > > key="project">TG-DBS080004N > > > profile> > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > url=" > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > /work/00043/ > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > -- > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci > > > > III > > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci III > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > Bio Sci III > > > > > University of California Irvine, Dept. of > > > Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ > > > 773-818-8300 > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > > > > > -- > > Sarah Kenny > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > -- > Ketan > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Thu Oct 20 10:44:11 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 20 Oct 2011 10:44:11 -0500 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <459404377.154944.1319125079761.JavaMail.root@zimbra-mb2.anl.gov> References: <459404377.154944.1319125079761.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: David, On Thu, Oct 20, 2011 at 10:37 AM, David Kelly wrote: > Ketan, > > I think you're right - I believe that line should be: > > #$ -pe > I am not sure, it should be maxnodes*nodegranularity, since nodegranularity means the nodes-to-be-packed per job. I think, this should be, maxnodes*corespernode, which could be a static constant (value 16) for ranger. I could be wrong here, not sure. Furthermore, it seems the parallel_environment tag of sites.xml is still not being honored. It always defaults to '1way', irrespective of the provided value. I am using 0.93RC3. May be we can debug this together. > > I'll add this and do a little more testing to see if I can reproduce > Sarah's problem. > > David > > > I have a little confusion here: the desired line in the final pbs > > script should be : #$ -pe way 256; in order to have 256 procs, > > however, putting maxnodes=16 on sites.xml results in the following > > line on pbs: > > #$ -pe way 16; > > I understand this number 16/256 is for procs since, when putting 256 > > with development queue, ranger indeed allows the job to run in > > development queue. > > > > > > > > I would also suggest (unless you have already done this) that you test > > first on a very small run (like a single RInvoke app call) and then > > scale up to just a few voxels per dataset before trying such a large > > run. Have you already tested that? > > > > Lastly, when reporting problems like this, the swift standard > > output/err is also very helpful to get a higher-level view of what > > went wrong. > > > > Swift needs to clearly return errors from the local resource provider, > > which it doesnt seem to be doing here. Ive filed this as bug 593 and > > assigned to David. > > > > Please let us know if changing the queue and/or slots resolves the > > problem. As mentioned in the bug report I think you can set debug=true > > (or yes?) in the provider-sge.properties file and get swift to > > preserve the output from SGE in ~/.globus/scripts. (In fact that may > > already be preserved, I am not sure). Please check there to see if the > > SGE error is there. > > > > Thanks, > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > > Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" < > > > swift-devel at ci.uchicago.edu >, "Swift User" > > > < swift-user at ci.uchicago.edu > > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > > > > hi all, one of our users, anjali (cc'd here) is trying to submit > > > this > > > ~400k job workflow to ranger...thought i'd see if you felt like > > > having > > > a look :) > > > > > > log is here: > > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > > > sites file: > > > > > > > > > > > > > > > > > > > > > > > > 7200 > > > 00:20:00 > > > 1 > > > 64 > > > 256 > > > development > > > 1.28 > > > TG-DBS080004N > > > 16way > > > 10000 > > > /work/00926/tg459516/swiftwork > > > > > > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > > hategan at mcs.anl.gov > > > > > wrote: > > > > Is this with a persistent coaster service? > > > > > > > > admittedly i have not used persistent coaster service...should i? > > > > > > No. I was just trying to figure out whether it might be something > > > related to the persistent version. > > > > > > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > > maybe > > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > > < davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > That could be it.. maybe a cleanup script is not > > > > getting the > > > > > right parameters and failing. Do you happen to have > > > > a copy of > > > > > the coaster log? > > > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > > "Swift > > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > > Wozniak" > > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > so, this workflow completes all the jobs but then > > > > just hangs > > > > > > indefinitely at the end...maybe a stray cleanup > > > > job? > > > > > > > > > > > > log is here: > > > > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > > > just tweaked the sites file a bit from what david > > > > sent me: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">64 > > > > > > > > > key="maxNodes">256 > > > > > > > > > key="queue">normal > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > key="pe">16way > > > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > > skenny at uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > > my last > > > > > run may have > > > > > > been using the old swift. apparently i had > > > > SWIFT_HOME set in > > > > > my env > > > > > > and that overrides the newer swift i had set in my > > > > PATH. > > > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > > davidk at ci.uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > > > Can you give this another try with the latest > > > > 0.93? I made > > > > > some > > > > > > changes to the coaster and sge providers and was > > > > able to get > > > > > it > > > > > > working with a simple catns script. Here is the > > > > > configuration file I > > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">16 > > > > > > > > > key="maxNodes">16 > > > > > > > > > > key="queue">development > > > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > David > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > > >, "Swift > > > > > User" < > > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > > < > > > > > > > wozniak at mcs.anl.gov > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > > using the > > > > > latest > > > > > > > swift > > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > > > Cannot submit job > > > > > > > Caused by: > > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > > TaskSubmissionException: > > > > > > > Cannot > > > > > > > submit job > > > > > > > Caused by: org.globus.gram.GramException: > > > > Parameter not > > > > > supported > > > > > > > Cannot submit job > > > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > > is not > > > > > supported so > > > > > > > i > > > > > > > changed it to workersPerNode and then it was > > > > saying > > > > > 'maxnodes' is > > > > > > > not > > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > > profile> > > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > profile> > > > > > > > > > > key="maxTime">86400 > > > > > > > > > > key="slots">1 > > > > > > > > > > key="maxNodes">256 > > > > > > > > > > key="pe">16way > > > > > > > > > > key="workersPerNode">1 > > > > profile> > > > > > > > > > > key="nodeGranularity">64 > > > > profile> > > > > > > > > > > key="queue">normal > > > > > > > > > > key="project">TG-DBS080004N > > > > profile> > > > > > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > > url=" > > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > > > > /work/00043/ > > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > > > -- > > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Sarah Kenny > > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci > > > > > III > > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > > University of California Irvine, Dept. of Neurology ~ > > > > 773-818-8300 > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 20 10:55:52 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 20 Oct 2011 10:55:52 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: Message-ID: <525723894.113958.1319126152521.JavaMail.root@zimbra.anl.gov> That second number on the #$ -pe directive should be the total # of cores that the provider wants in the job, and on Ranger with the 16way pe must be a multiple of 16. Coasters will request up to "maxNodes" nodes in a given SGE jobs, and the number requested will always be a multiple of "nodeGranularity". But the number requested can be < maxNodes, depending on how many app() invocations the coaster provider/scheduler has decided to put into a coaster Block. Similarly the time requested for the block can be < maxTime (but the "overallocation" parameters can influence that and force all blocks to be maxTime seconds "wide". - Mike ----- Original Message ----- > From: "Ketan Maheshwari" > To: "David Kelly" > Cc: "Anjali Raja" , "Swift Devel" , "Swift User" > , "Michael Wilde" > Sent: Thursday, October 20, 2011 10:44:11 AM > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > David, > > > > On Thu, Oct 20, 2011 at 10:37 AM, David Kelly < davidk at ci.uchicago.edu > > wrote: > > > Ketan, > > I think you're right - I believe that line should be: > > #$ -pe > > > > I am not sure, it should be maxnodes*nodegranularity, since > nodegranularity means the nodes-to-be-packed per job. I think, this > should be, > maxnodes*corespernode, which could be a static constant (value 16) for > ranger. I could be wrong here, not sure. > > > Furthermore, it seems the parallel_environment tag of sites.xml is > still not being honored. It always defaults to '1way', irrespective of > the provided value. I am using 0.93RC3. > > > May be we can debug this together. > > > > I'll add this and do a little more testing to see if I can reproduce > Sarah's problem. > > David > > > > > > I have a little confusion here: the desired line in the final pbs > > script should be : #$ -pe way 256; in order to have 256 procs, > > however, putting maxnodes=16 on sites.xml results in the following > > line on pbs: > > #$ -pe way 16; > > I understand this number 16/256 is for procs since, when putting 256 > > with development queue, ranger indeed allows the job to run in > > development queue. > > > > > > > > I would also suggest (unless you have already done this) that you > > test > > first on a very small run (like a single RInvoke app call) and then > > scale up to just a few voxels per dataset before trying such a large > > run. Have you already tested that? > > > > Lastly, when reporting problems like this, the swift standard > > output/err is also very helpful to get a higher-level view of what > > went wrong. > > > > Swift needs to clearly return errors from the local resource > > provider, > > which it doesnt seem to be doing here. Ive filed this as bug 593 and > > assigned to David. > > > > Please let us know if changing the queue and/or slots resolves the > > problem. As mentioned in the bug report I think you can set > > debug=true > > (or yes?) in the provider-sge.properties file and get swift to > > preserve the output from SGE in ~/.globus/scripts. (In fact that may > > already be preserved, I am not sure). Please check there to see if > > the > > SGE error is there. > > > > Thanks, > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > > Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" < > > > swift-devel at ci.uchicago.edu >, "Swift User" > > > < swift-user at ci.uchicago.edu > > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > > > > hi all, one of our users, anjali (cc'd here) is trying to submit > > > this > > > ~400k job workflow to ranger...thought i'd see if you felt like > > > having > > > a look :) > > > > > > log is here: > > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > > > sites file: > > > > > > > > > > > > > > > > > > > > > > > > 7200 > > > 00:20:00 > > > 1 > > > 64 > > > 256 > > > development > > > 1.28 > > > TG-DBS080004N > > > 16way > > > 10000 > > > /work/00926/tg459516/swiftwork > > > > > > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > > hategan at mcs.anl.gov > > > > > wrote: > > > > Is this with a persistent coaster service? > > > > > > > > admittedly i have not used persistent coaster service...should > > > > i? > > > > > > No. I was just trying to figure out whether it might be something > > > related to the persistent version. > > > > > > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > > maybe > > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > > < davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > That could be it.. maybe a cleanup script is not > > > > getting the > > > > > right parameters and failing. Do you happen to have > > > > a copy of > > > > > the coaster log? > > > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > > "Swift > > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > > Wozniak" > > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > so, this workflow completes all the jobs but then > > > > just hangs > > > > > > indefinitely at the end...maybe a stray cleanup > > > > job? > > > > > > > > > > > > log is here: > > > > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > > > just tweaked the sites file a bit from what david > > > > sent me: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">64 > > > > > > > > > key="maxNodes">256 > > > > > > > > > key="queue">normal > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > key="pe">16way > > > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > > skenny at uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > > my last > > > > > run may have > > > > > > been using the old swift. apparently i had > > > > SWIFT_HOME set in > > > > > my env > > > > > > and that overrides the newer swift i had set in my > > > > PATH. > > > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > > davidk at ci.uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > > > Can you give this another try with the latest > > > > 0.93? I made > > > > > some > > > > > > changes to the coaster and sge providers and was > > > > able to get > > > > > it > > > > > > working with a simple catns script. Here is the > > > > > configuration file I > > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">16 > > > > > > > > > key="maxNodes">16 > > > > > > > > > > key="queue">development > > > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > David > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > > >, "Swift > > > > > User" < > > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > > < > > > > > > > wozniak at mcs.anl.gov > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > > using the > > > > > latest > > > > > > > swift > > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > > > Cannot submit job > > > > > > > Caused by: > > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > > TaskSubmissionException: > > > > > > > Cannot > > > > > > > submit job > > > > > > > Caused by: org.globus.gram.GramException: > > > > Parameter not > > > > > supported > > > > > > > Cannot submit job > > > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > > is not > > > > > supported so > > > > > > > i > > > > > > > changed it to workersPerNode and then it was > > > > saying > > > > > 'maxnodes' is > > > > > > > not > > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > > profile> > > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > profile> > > > > > > > > > > key="maxTime">86400 > > > > > > > > > > key="slots">1 > > > > > > > > > > key="maxNodes">256 > > > > > > > > > > key="pe">16way > > > > > > > > > > key="workersPerNode">1 > > > > profile> > > > > > > > > > > key="nodeGranularity">64 > > > > profile> > > > > > > > > > > key="queue">normal > > > > > > > > > > key="project">TG-DBS080004N > > > > profile> > > > > > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > > url=" > > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > > > > /work/00043/ > > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > > > -- > > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Sarah Kenny > > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci > > > > > III > > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > > University of California Irvine, Dept. of Neurology ~ > > > > 773-818-8300 > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > -- > Ketan -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Thu Oct 20 12:35:34 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 20 Oct 2011 10:35:34 -0700 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <525723894.113958.1319126152521.JavaMail.root@zimbra.anl.gov> References: <525723894.113958.1319126152521.JavaMail.root@zimbra.anl.gov> Message-ID: <1319132134.16281.0.camel@blabla> On Thu, 2011-10-20 at 10:55 -0500, Michael Wilde wrote: > That second number on the #$ -pe directive should be the total # of > cores that the provider wants in the job, and on Ranger with the 16way > pe must be a multiple of 16. I think it always has to be a multiple of 16 on Ranger, regardless of pe. From hategan at mcs.anl.gov Thu Oct 20 12:42:07 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 20 Oct 2011 10:42:07 -0700 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: References: <459404377.154944.1319125079761.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <1319132527.16281.4.camel@blabla> On Thu, 2011-10-20 at 10:44 -0500, Ketan Maheshwari wrote: > David, > > > On Thu, Oct 20, 2011 at 10:37 AM, David Kelly > wrote: > Ketan, > > I think you're right - I believe that line should be: > > #$ -pe > > > I am not sure, it should be maxnodes*nodegranularity, since > nodegranularity means the nodes-to-be-packed per job. I think, this > should be, > maxnodes*corespernode, which could be a static constant (value 16) for > ranger. I could be wrong here, not sure. I agree that maxnodes * nodegranularity does not make much sense since that could cause a job to request more than maxnodes nodes. I think this problem goes back to the idea that providers should interpret "count" as "the number of instances of the executable that I want started" and other parameters should dictate how exactly that count is spread over nodes and cores. From davidk at ci.uchicago.edu Thu Oct 20 16:06:15 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Thu, 20 Oct 2011 16:06:15 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <1319132527.16281.4.camel@blabla> Message-ID: <188408570.155505.1319144775904.JavaMail.root@zimbra-mb2.anl.gov> > I think this problem goes back to the idea that providers should > interpret "count" as "the number of instances of the executable that I > want started" and other parameters should dictate how exactly that > count is spread over nodes and cores. Mihael, In this setup I am using nodeGranularity=16, jobsPerNode=16, and maxNodes=16. The SGE submit file could request anywhere between 16 and 256 cores in multiples of 16. When I run catsn with -n=2, count is 16. When I run catsn with -n=20, two SGE submit scripts get created, each with count=16. Should count=32 in the second case? Am I misunderstanding what 'count' is? Is there any way to get the exact number of applications? Thanks, David From hategan at mcs.anl.gov Thu Oct 20 20:49:46 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 20 Oct 2011 18:49:46 -0700 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <188408570.155505.1319144775904.JavaMail.root@zimbra-mb2.anl.gov> References: <188408570.155505.1319144775904.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <1319161786.17554.0.camel@blabla> On Thu, 2011-10-20 at 16:06 -0500, David Kelly wrote: > > > I think this problem goes back to the idea that providers should > > interpret "count" as "the number of instances of the executable that I > > want started" and other parameters should dictate how exactly that > > count is spread over nodes and cores. > > Mihael, > > In this setup I am using nodeGranularity=16, jobsPerNode=16, and > maxNodes=16. The SGE submit file could request anywhere between 16 and > 256 cores in multiples of 16. > > When I run catsn with -n=2, count is 16. > When I run catsn with -n=20, two SGE submit scripts get created, each with count=16. > > Should count=32 in the second case? Am I misunderstanding what 'count' is? Is there any way to get the exact number of applications? Coasters? From davidk at ci.uchicago.edu Thu Oct 20 21:03:46 2011 From: davidk at ci.uchicago.edu (David Kelly) Date: Thu, 20 Oct 2011 21:03:46 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <1319161786.17554.0.camel@blabla> Message-ID: <1584784930.155805.1319162626982.JavaMail.root@zimbra-mb2.anl.gov> Yep, this is using coasters ----- Original Message ----- > From: "Mihael Hategan" > To: "David Kelly" > Cc: "Anjali Raja" , "Swift Devel" , "Swift User" > , "Ketan Maheshwari" > Sent: Thursday, October 20, 2011 8:49:46 PM > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > On Thu, 2011-10-20 at 16:06 -0500, David Kelly wrote: > > > > > I think this problem goes back to the idea that providers should > > > interpret "count" as "the number of instances of the executable > > > that I > > > want started" and other parameters should dictate how exactly that > > > count is spread over nodes and cores. > > > > Mihael, > > > > In this setup I am using nodeGranularity=16, jobsPerNode=16, and > > maxNodes=16. The SGE submit file could request anywhere between 16 > > and > > 256 cores in multiples of 16. > > > > When I run catsn with -n=2, count is 16. > > When I run catsn with -n=20, two SGE submit scripts get created, > > each with count=16. > > > > Should count=32 in the second case? Am I misunderstanding what > > 'count' is? Is there any way to get the exact number of > > applications? > > Coasters? From hategan at mcs.anl.gov Thu Oct 20 21:08:46 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 20 Oct 2011 19:08:46 -0700 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <1584784930.155805.1319162626982.JavaMail.root@zimbra-mb2.anl.gov> References: <1584784930.155805.1319162626982.JavaMail.root@zimbra-mb2.anl.gov> Message-ID: <1319162926.21652.2.camel@blabla> On Thu, 2011-10-20 at 21:03 -0500, David Kelly wrote: > Yep, this is using coasters > Then no. Count is whatever the block allocation algorithm decides it should be. > > > > > > Should count=32 in the second case? Am I misunderstanding what > > > 'count' is? Is there any way to get the exact number of > > > applications? > > > > Coasters? From wilde at mcs.anl.gov Fri Oct 21 14:33:48 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 21 Oct 2011 14:33:48 -0500 (CDT) Subject: [Swift-user] app argument int passed as float In-Reply-To: Message-ID: <1624852911.120710.1319225628694.JavaMail.root@zimbra.anl.gov> In at least some recent version of trunk, ints seem to get passed ok, as below. Now lets see where the difference is. - Mike bri$ cat args.swift type file; app (file o) echo(int i) { echo i stdout=@o; } file eo<"echo.out">; eo=echo(123); bri$ which swift ~/swift/rev/trunk/bin/swift bri$ swift -version no sites file specified, setting to default: /home/wilde/swift/rev/trunk/etc/sites.xml Swift svn swift-r5234 cog-r3296 bri$ swift args.swift no sites file specified, setting to default: /home/wilde/swift/rev/trunk/etc/sites.xml Swift svn swift-r5234 cog-r3296 RunID: 20111021-1431-8mzx0ya2 Progress: time: Fri, 21 Oct 2011 14:31:04 -0500 Final status: Fri, 21 Oct 2011 14:31:04 -0500 Finished successfully:1 bri$ cat echo.out 123 bri$ ----- Original Message ----- > From: "Ketan Maheshwari" > To: "Swift User" > Sent: Wednesday, October 19, 2011 8:30:24 PM > Subject: [Swift-user] app argument int passed as float > Hi, > > > In my app definition, I have an argument of type int that is expected > by the executable. > > > > I define my app something like this: > > > (outfile o ) app myapp(string a, int b, datafile c, ...){ > cmd a b c stdout @o; > } > > > I call this as: > <..mappers..> > > > out = myapp("foo", 35, @afile, ...); > > > However, it seems that at the time of the actual call, the value 35 > which is an int get replaced by the value 35.0, a float. > > > I could see this from the arguments dumped by swift at the end of > exception message: > > > " > > Exception in presgt: > Arguments: [TEST, 35.0, ..... > " > > > The underlying app binary doesn't like because apparently the java is > calling Integer.parseInt and getting a float formatted string. > > Any ideas? > > > Regards, -- > Ketan > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Oct 21 14:38:33 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 21 Oct 2011 14:38:33 -0500 (CDT) Subject: [Swift-user] app argument int passed as float In-Reply-To: <1624852911.120710.1319225628694.JavaMail.root@zimbra.anl.gov> Message-ID: <1255542536.120750.1319225913780.JavaMail.root@zimbra.anl.gov> I tried this with an example closer to what you wrote in the initial message, Ketan. It still seems to work OK under the trunk rev indicated: bri$ cat km1.swift type outfile; app (outfile o ) myapp(string a, int b ){ echo a b stdout=@o; } outfile out<"km1.out">; out = myapp("foo", 35 ); bri$ swift km1.swift no sites file specified, setting to default: /home/wilde/swift/rev/trunk/etc/sites.xml Swift svn swift-r5234 cog-r3296 RunID: 20111021-1436-z7s2z5rb Progress: time: Fri, 21 Oct 2011 14:37:01 -0500 Final status: Fri, 21 Oct 2011 14:37:01 -0500 Finished successfully:1 bri$ cat km1.out foo 35 bri$ - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Ketan Maheshwari" > Cc: "Swift User" > Sent: Friday, October 21, 2011 2:33:48 PM > Subject: Re: [Swift-user] app argument int passed as float > In at least some recent version of trunk, ints seem to get passed ok, > as below. > > Now lets see where the difference is. > > - Mike > > bri$ cat args.swift > type file; > > app (file o) echo(int i) > { > echo i stdout=@o; > } > > file eo<"echo.out">; > > eo=echo(123); > bri$ which swift > ~/swift/rev/trunk/bin/swift > bri$ swift -version > no sites file specified, setting to default: > /home/wilde/swift/rev/trunk/etc/sites.xml > Swift svn swift-r5234 cog-r3296 > > bri$ swift args.swift > no sites file specified, setting to default: > /home/wilde/swift/rev/trunk/etc/sites.xml > Swift svn swift-r5234 cog-r3296 > > RunID: 20111021-1431-8mzx0ya2 > Progress: time: Fri, 21 Oct 2011 14:31:04 -0500 > Final status: Fri, 21 Oct 2011 14:31:04 -0500 Finished successfully:1 > bri$ cat echo.out > 123 > bri$ > > > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Swift User" > > Sent: Wednesday, October 19, 2011 8:30:24 PM > > Subject: [Swift-user] app argument int passed as float > > Hi, > > > > > > In my app definition, I have an argument of type int that is > > expected > > by the executable. > > > > > > > > I define my app something like this: > > > > > > (outfile o ) app myapp(string a, int b, datafile c, ...){ > > cmd a b c stdout @o; > > } > > > > > > I call this as: > > <..mappers..> > > > > > > out = myapp("foo", 35, @afile, ...); > > > > > > However, it seems that at the time of the actual call, the value 35 > > which is an int get replaced by the value 35.0, a float. > > > > > > I could see this from the arguments dumped by swift at the end of > > exception message: > > > > > > " > > > > Exception in presgt: > > Arguments: [TEST, 35.0, ..... > > " > > > > > > The underlying app binary doesn't like because apparently the java > > is > > calling Integer.parseInt and getting a float formatted string. > > > > Any ideas? > > > > > > Regards, -- > > Ketan > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From ketancmaheshwari at gmail.com Fri Oct 21 15:10:13 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 21 Oct 2011 15:10:13 -0500 Subject: [Swift-user] app argument int passed as float In-Reply-To: <1255542536.120750.1319225913780.JavaMail.root@zimbra.anl.gov> References: <1624852911.120710.1319225628694.JavaMail.root@zimbra.anl.gov> <1255542536.120750.1319225913780.JavaMail.root@zimbra.anl.gov> Message-ID: Mike, Indeed the output seems to be as expected. I just tried this and worked for me. However, if you see the log, at the point where the arguments are passed to the worker it gets converted to float. See this line in the log of this run: 2011-10-21 15:05:03,291-0500 DEBUG vdl:execute2 JOB_START jobid=echo-suhonmhk tr=echo arguments=[foo, 35.0] tmpdir=km-20111021-1504-f4bc0xg4/jobs/s/echo-suhonmhk host=localhost 35 became 35.0 there. Regards, Ketan On Fri, Oct 21, 2011 at 2:38 PM, Michael Wilde wrote: > I tried this with an example closer to what you wrote in the initial > message, Ketan. > > It still seems to work OK under the trunk rev indicated: > > bri$ cat km1.swift > type outfile; > > app (outfile o ) myapp(string a, int b ){ > echo a b stdout=@o; > } > > outfile out<"km1.out">; > > out = myapp("foo", 35 ); > > bri$ swift km1.swift > > no sites file specified, setting to default: > /home/wilde/swift/rev/trunk/etc/sites.xml > Swift svn swift-r5234 cog-r3296 > > RunID: 20111021-1436-z7s2z5rb > Progress: time: Fri, 21 Oct 2011 14:37:01 -0500 > Final status: Fri, 21 Oct 2011 14:37:01 -0500 Finished successfully:1 > bri$ cat km1.out > foo 35 > bri$ > > - Mike > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Ketan Maheshwari" > > Cc: "Swift User" > > Sent: Friday, October 21, 2011 2:33:48 PM > > Subject: Re: [Swift-user] app argument int passed as float > > In at least some recent version of trunk, ints seem to get passed ok, > > as below. > > > > Now lets see where the difference is. > > > > - Mike > > > > bri$ cat args.swift > > type file; > > > > app (file o) echo(int i) > > { > > echo i stdout=@o; > > } > > > > file eo<"echo.out">; > > > > eo=echo(123); > > bri$ which swift > > ~/swift/rev/trunk/bin/swift > > bri$ swift -version > > no sites file specified, setting to default: > > /home/wilde/swift/rev/trunk/etc/sites.xml > > Swift svn swift-r5234 cog-r3296 > > > > bri$ swift args.swift > > no sites file specified, setting to default: > > /home/wilde/swift/rev/trunk/etc/sites.xml > > Swift svn swift-r5234 cog-r3296 > > > > RunID: 20111021-1431-8mzx0ya2 > > Progress: time: Fri, 21 Oct 2011 14:31:04 -0500 > > Final status: Fri, 21 Oct 2011 14:31:04 -0500 Finished successfully:1 > > bri$ cat echo.out > > 123 > > bri$ > > > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" > > > To: "Swift User" > > > Sent: Wednesday, October 19, 2011 8:30:24 PM > > > Subject: [Swift-user] app argument int passed as float > > > Hi, > > > > > > > > > In my app definition, I have an argument of type int that is > > > expected > > > by the executable. > > > > > > > > > > > > I define my app something like this: > > > > > > > > > (outfile o ) app myapp(string a, int b, datafile c, ...){ > > > cmd a b c stdout @o; > > > } > > > > > > > > > I call this as: > > > <..mappers..> > > > > > > > > > out = myapp("foo", 35, @afile, ...); > > > > > > > > > However, it seems that at the time of the actual call, the value 35 > > > which is an int get replaced by the value 35.0, a float. > > > > > > > > > I could see this from the arguments dumped by swift at the end of > > > exception message: > > > > > > > > > " > > > > > > Exception in presgt: > > > Arguments: [TEST, 35.0, ..... > > > " > > > > > > > > > The underlying app binary doesn't like because apparently the java > > > is > > > calling Integer.parseInt and getting a float formatted string. > > > > > > Any ideas? > > > > > > > > > Regards, -- > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri Oct 21 16:51:35 2011 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 21 Oct 2011 16:51:35 -0500 Subject: [Swift-user] app argument int passed as float In-Reply-To: References: <1624852911.120710.1319225628694.JavaMail.root@zimbra.anl.gov> <1255542536.120750.1319225913780.JavaMail.root@zimbra.anl.gov> Message-ID: I checked with trunk and this doesn't seem to be happening there. Will file a bug for 0.93. On Fri, Oct 21, 2011 at 3:10 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Mike, > > Indeed the output seems to be as expected. I just tried this and worked for > me. However, if you see the log, at the point where the arguments are passed > to the worker it gets converted to float. See this line in the log of this > run: > > 2011-10-21 15:05:03,291-0500 DEBUG vdl:execute2 JOB_START > jobid=echo-suhonmhk tr=echo arguments=[foo, 35.0] > tmpdir=km-20111021-1504-f4bc0xg4/jobs/s/echo-suhonmhk host=localhost > > 35 became 35.0 there. > > Regards, > Ketan > > On Fri, Oct 21, 2011 at 2:38 PM, Michael Wilde wrote: > >> I tried this with an example closer to what you wrote in the initial >> message, Ketan. >> >> It still seems to work OK under the trunk rev indicated: >> >> bri$ cat km1.swift >> type outfile; >> >> app (outfile o ) myapp(string a, int b ){ >> echo a b stdout=@o; >> } >> >> outfile out<"km1.out">; >> >> out = myapp("foo", 35 ); >> >> bri$ swift km1.swift >> >> no sites file specified, setting to default: >> /home/wilde/swift/rev/trunk/etc/sites.xml >> Swift svn swift-r5234 cog-r3296 >> >> RunID: 20111021-1436-z7s2z5rb >> Progress: time: Fri, 21 Oct 2011 14:37:01 -0500 >> Final status: Fri, 21 Oct 2011 14:37:01 -0500 Finished successfully:1 >> bri$ cat km1.out >> foo 35 >> bri$ >> >> - Mike >> >> >> ----- Original Message ----- >> > From: "Michael Wilde" >> > To: "Ketan Maheshwari" >> > Cc: "Swift User" >> > Sent: Friday, October 21, 2011 2:33:48 PM >> > Subject: Re: [Swift-user] app argument int passed as float >> > In at least some recent version of trunk, ints seem to get passed ok, >> > as below. >> > >> > Now lets see where the difference is. >> > >> > - Mike >> > >> > bri$ cat args.swift >> > type file; >> > >> > app (file o) echo(int i) >> > { >> > echo i stdout=@o; >> > } >> > >> > file eo<"echo.out">; >> > >> > eo=echo(123); >> > bri$ which swift >> > ~/swift/rev/trunk/bin/swift >> > bri$ swift -version >> > no sites file specified, setting to default: >> > /home/wilde/swift/rev/trunk/etc/sites.xml >> > Swift svn swift-r5234 cog-r3296 >> > >> > bri$ swift args.swift >> > no sites file specified, setting to default: >> > /home/wilde/swift/rev/trunk/etc/sites.xml >> > Swift svn swift-r5234 cog-r3296 >> > >> > RunID: 20111021-1431-8mzx0ya2 >> > Progress: time: Fri, 21 Oct 2011 14:31:04 -0500 >> > Final status: Fri, 21 Oct 2011 14:31:04 -0500 Finished successfully:1 >> > bri$ cat echo.out >> > 123 >> > bri$ >> > >> > >> > >> > ----- Original Message ----- >> > > From: "Ketan Maheshwari" >> > > To: "Swift User" >> > > Sent: Wednesday, October 19, 2011 8:30:24 PM >> > > Subject: [Swift-user] app argument int passed as float >> > > Hi, >> > > >> > > >> > > In my app definition, I have an argument of type int that is >> > > expected >> > > by the executable. >> > > >> > > >> > > >> > > I define my app something like this: >> > > >> > > >> > > (outfile o ) app myapp(string a, int b, datafile c, ...){ >> > > cmd a b c stdout @o; >> > > } >> > > >> > > >> > > I call this as: >> > > <..mappers..> >> > > >> > > >> > > out = myapp("foo", 35, @afile, ...); >> > > >> > > >> > > However, it seems that at the time of the actual call, the value 35 >> > > which is an int get replaced by the value 35.0, a float. >> > > >> > > >> > > I could see this from the arguments dumped by swift at the end of >> > > exception message: >> > > >> > > >> > > " >> > > >> > > Exception in presgt: >> > > Arguments: [TEST, 35.0, ..... >> > > " >> > > >> > > >> > > The underlying app binary doesn't like because apparently the java >> > > is >> > > calling Integer.parseInt and getting a float formatted string. >> > > >> > > Any ideas? >> > > >> > > >> > > Regards, -- >> > > Ketan >> > > >> > > >> > > >> > > _______________________________________________ >> > > Swift-user mailing list >> > > Swift-user at ci.uchicago.edu >> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >> > -- >> > Michael Wilde >> > Computation Institute, University of Chicago >> > Mathematics and Computer Science Division >> > Argonne National Laboratory >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > -- > Ketan > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Oct 21 17:01:09 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 21 Oct 2011 15:01:09 -0700 Subject: [Swift-user] app argument int passed as float In-Reply-To: References: <1624852911.120710.1319225628694.JavaMail.root@zimbra.anl.gov> <1255542536.120750.1319225913780.JavaMail.root@zimbra.anl.gov> Message-ID: <1319234469.17720.2.camel@blabla> This has been traditionally an issue with swift. Ints were internally represented as a java Double. I fixed that in trunk this summer, but it was after 0.93 was branched. I'm not sure whether this should be a bug or a known issue in 0.93. On Fri, 2011-10-21 at 16:51 -0500, Ketan Maheshwari wrote: > I checked with trunk and this doesn't seem to be happening there. Will > file a bug for 0.93. > > On Fri, Oct 21, 2011 at 3:10 PM, Ketan Maheshwari > wrote: > Mike, > > > Indeed the output seems to be as expected. I just tried this > and worked for me. However, if you see the log, at the point > where the arguments are passed to the worker it gets converted > to float. See this line in the log of this run: > > > 2011-10-21 15:05:03,291-0500 DEBUG vdl:execute2 JOB_START > jobid=echo-suhonmhk tr=echo arguments=[foo, 35.0] > tmpdir=km-20111021-1504-f4bc0xg4/jobs/s/echo-suhonmhk > host=localhost > > > 35 became 35.0 there. > > Regards, > Ketan > > > On Fri, Oct 21, 2011 at 2:38 PM, Michael Wilde > wrote: > I tried this with an example closer to what you wrote > in the initial message, Ketan. > > It still seems to work OK under the trunk rev > indicated: > > bri$ cat km1.swift > type outfile; > > app (outfile o ) myapp(string a, int b ){ > echo a b stdout=@o; > } > > outfile out<"km1.out">; > > out = myapp("foo", 35 ); > > bri$ swift km1.swift > > no sites file specified, setting to > default: /home/wilde/swift/rev/trunk/etc/sites.xml > Swift svn swift-r5234 cog-r3296 > > > RunID: 20111021-1436-z7s2z5rb > Progress: time: Fri, 21 Oct 2011 14:37:01 -0500 > Final status: Fri, 21 Oct 2011 14:37:01 -0500 > Finished successfully:1 > bri$ cat km1.out > foo 35 > bri$ > > - Mike > > > > ----- Original Message ----- > > From: "Michael Wilde" > > To: "Ketan Maheshwari" > > Cc: "Swift User" > > Sent: Friday, October 21, 2011 2:33:48 PM > > Subject: Re: [Swift-user] app argument int passed as > float > > In at least some recent version of trunk, ints seem > to get passed ok, > > as below. > > > > Now lets see where the difference is. > > > > - Mike > > > > bri$ cat args.swift > > type file; > > > > app (file o) echo(int i) > > { > > echo i stdout=@o; > > } > > > > file eo<"echo.out">; > > > > eo=echo(123); > > bri$ which swift > > ~/swift/rev/trunk/bin/swift > > bri$ swift -version > > no sites file specified, setting to default: > > /home/wilde/swift/rev/trunk/etc/sites.xml > > Swift svn swift-r5234 cog-r3296 > > > > bri$ swift args.swift > > no sites file specified, setting to default: > > /home/wilde/swift/rev/trunk/etc/sites.xml > > Swift svn swift-r5234 cog-r3296 > > > > RunID: 20111021-1431-8mzx0ya2 > > Progress: time: Fri, 21 Oct 2011 14:31:04 -0500 > > Final status: Fri, 21 Oct 2011 14:31:04 -0500 > Finished successfully:1 > > bri$ cat echo.out > > 123 > > bri$ > > > > > > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" > > > > To: "Swift User" > > > Sent: Wednesday, October 19, 2011 8:30:24 PM > > > Subject: [Swift-user] app argument int passed as > float > > > Hi, > > > > > > > > > In my app definition, I have an argument of type > int that is > > > expected > > > by the executable. > > > > > > > > > > > > I define my app something like this: > > > > > > > > > (outfile o ) app myapp(string a, int b, datafile > c, ...){ > > > cmd a b c stdout @o; > > > } > > > > > > > > > I call this as: > > > <..mappers..> > > > > > > > > > out = myapp("foo", 35, @afile, ...); > > > > > > > > > However, it seems that at the time of the actual > call, the value 35 > > > which is an int get replaced by the value 35.0, a > float. > > > > > > > > > I could see this from the arguments dumped by > swift at the end of > > > exception message: > > > > > > > > > " > > > > > > Exception in presgt: > > > Arguments: [TEST, 35.0, ..... > > > " > > > > > > > > > The underlying app binary doesn't like because > apparently the java > > > is > > > calling Integer.parseInt and getting a float > formatted string. > > > > > > Any ideas? > > > > > > > > > Regards, -- > > > Ketan > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From skenny at uchicago.edu Sat Oct 22 05:57:45 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Sat, 22 Oct 2011 03:57:45 -0700 Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: <988426934.113721.1319124103942.JavaMail.root@zimbra.anl.gov> References: <988426934.113721.1319124103942.JavaMail.root@zimbra.anl.gov> Message-ID: fyi, this works on a smaller workflow, we've run it several times on a 50k version. On Thu, Oct 20, 2011 at 8:21 AM, Michael Wilde wrote: > Thanks, Ketan. If I understand you correctly, then I would consider this a > Swift bug, in that maxnodes should always mean *nodes*, for every type of > resource provider including SGE. Based on what you say, the SGE provider is > in this case treating the requested maxnode count as cores (Assuming Anjali > was running the same Swift revision as you were testing on here). > > But then that might not explain the error in the log that Sarah posted. > > It seems the next step is to try the run on a smaller job (we can test this > ourselves), and see if we can replicate and diagnose the error, with SGE > subit files and output/error logs. > > David, can you do this, since you were working on SGE testing last week? > You and Ketan should share what you know about the situation, via > swift-devel, as Ketan is also running on Ranger with persistent coasters I > think. > > Thanks, > > Mike > > > ----- Original Message ----- > > From: "Ketan Maheshwari" > > To: "Michael Wilde" > > Cc: "Sarah Kenny" , "Anjali Raja" < > anjraja at gmail.com>, "Swift Devel" > > , "Swift User" > > Sent: Thursday, October 20, 2011 9:54:33 AM > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > On Thu, Oct 20, 2011 at 7:50 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Hi Sarah, Anjali, > > > > My initial theory on whats failing in this job is that the Ranger > > development queue is limited to jobs of 16 nodes or less. (The Ranger > > User Guide says maxprocs 256 for that queue, and qconf -sq development > > says slots 16, which agrees). So you need to either change to one of > > the production queues (normal, long etc) or reduce the values of > > maxnode and nodegranularity. > > > > > > > > I have a little confusion here: the desired line in the final pbs > > script should be : #$ -pe way 256; in order to have 256 procs, > > however, putting maxnodes=16 on sites.xml results in the following > > line on pbs: > > #$ -pe way 16; > > I understand this number 16/256 is for procs since, when putting 256 > > with development queue, ranger indeed allows the job to run in > > development queue. > > > > > > > > I would also suggest (unless you have already done this) that you test > > first on a very small run (like a single RInvoke app call) and then > > scale up to just a few voxels per dataset before trying such a large > > run. Have you already tested that? > > > > Lastly, when reporting problems like this, the swift standard > > output/err is also very helpful to get a higher-level view of what > > went wrong. > > > > Swift needs to clearly return errors from the local resource provider, > > which it doesnt seem to be doing here. Ive filed this as bug 593 and > > assigned to David. > > > > Please let us know if changing the queue and/or slots resolves the > > problem. As mentioned in the bug report I think you can set debug=true > > (or yes?) in the provider-sge.properties file and get swift to > > preserve the output from SGE in ~/.globus/scripts. (In fact that may > > already be preserved, I am not sure). Please check there to see if the > > SGE error is there. > > > > Thanks, > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > > Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" < > > > swift-devel at ci.uchicago.edu >, "Swift User" > > > < swift-user at ci.uchicago.edu > > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > > > > hi all, one of our users, anjali (cc'd here) is trying to submit > > > this > > > ~400k job workflow to ranger...thought i'd see if you felt like > > > having > > > a look :) > > > > > > log is here: > > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > > > sites file: > > > > > > > > > > > > > > > > > > > > > > > > 7200 > > > 00:20:00 > > > 1 > > > 64 > > > 256 > > > development > > > 1.28 > > > TG-DBS080004N > > > 16way > > > 10000 > > > /work/00926/tg459516/swiftwork > > > > > > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > > hategan at mcs.anl.gov > > > > > wrote: > > > > Is this with a persistent coaster service? > > > > > > > > admittedly i have not used persistent coaster service...should i? > > > > > > No. I was just trying to figure out whether it might be something > > > related to the persistent version. > > > > > > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > > maybe > > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > > < davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > That could be it.. maybe a cleanup script is not > > > > getting the > > > > > right parameters and failing. Do you happen to have > > > > a copy of > > > > > the coaster log? > > > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > > "Swift > > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > > Wozniak" > > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > so, this workflow completes all the jobs but then > > > > just hangs > > > > > > indefinitely at the end...maybe a stray cleanup > > > > job? > > > > > > > > > > > > log is here: > > > > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > > > just tweaked the sites file a bit from what david > > > > sent me: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">64 > > > > > > > > > key="maxNodes">256 > > > > > > > > > key="queue">normal > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > key="pe">16way > > > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > > skenny at uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > > my last > > > > > run may have > > > > > > been using the old swift. apparently i had > > > > SWIFT_HOME set in > > > > > my env > > > > > > and that overrides the newer swift i had set in my > > > > PATH. > > > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > > davidk at ci.uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > > > Can you give this another try with the latest > > > > 0.93? I made > > > > > some > > > > > > changes to the coaster and sge providers and was > > > > able to get > > > > > it > > > > > > working with a simple catns script. Here is the > > > > > configuration file I > > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">16 > > > > > > > > > key="maxNodes">16 > > > > > > > > > > key="queue">development > > > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > David > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > > >, "Swift > > > > > User" < > > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > > < > > > > > > > wozniak at mcs.anl.gov > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > > using the > > > > > latest > > > > > > > swift > > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > > > Cannot submit job > > > > > > > Caused by: > > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > > TaskSubmissionException: > > > > > > > Cannot > > > > > > > submit job > > > > > > > Caused by: org.globus.gram.GramException: > > > > Parameter not > > > > > supported > > > > > > > Cannot submit job > > > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > > is not > > > > > supported so > > > > > > > i > > > > > > > changed it to workersPerNode and then it was > > > > saying > > > > > 'maxnodes' is > > > > > > > not > > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > > profile> > > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > profile> > > > > > > > > > > key="maxTime">86400 > > > > > > > > > > key="slots">1 > > > > > > > > > > key="maxNodes">256 > > > > > > > > > > key="pe">16way > > > > > > > > > > key="workersPerNode">1 > > > > profile> > > > > > > > > > > key="nodeGranularity">64 > > > > profile> > > > > > > > > > > key="queue">normal > > > > > > > > > > key="project">TG-DBS080004N > > > > profile> > > > > > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > > url=" > > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > > > > /work/00043/ > > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > > > -- > > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Sarah Kenny > > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci > > > > > III > > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > > University of California Irvine, Dept. of Neurology ~ > > > > 773-818-8300 > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -- Sarah Kenny Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III University of California Irvine, Dept. of Neurology ~ 773-818-8300 -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Sat Oct 22 09:41:17 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 22 Oct 2011 09:41:17 -0500 (CDT) Subject: [Swift-user] [Swift-devel] gram on ranger In-Reply-To: Message-ID: <1923360968.122163.1319294477023.JavaMail.root@zimbra.anl.gov> Sarah, was this 50K version run with the same sites file and Swift version? At any rate, David is correcting some known problems in the SGE provider and increasing the test coverage for it. Once thats done we can try again. In the meantime, if you want to push this forward in parallel, can you try to run again and capture the SGE submit and stdout/err files? Im not 100% sure the following is correct, but I think you can set the SGE provider into debug mode by doing one or both of the following: etc/provider-sge.properties: add line: debug=true (I think this works for the PBS provider and assume it does for SGE; we need to verify) Also the sites/pbs page on the swiftdevel site has this, which *might* also give more debug info for SGE (again, needs to be checked): # Special functionality: suppresses auto-deletion of PBS submit file log4j.logger.org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor=DEBUG log4j.logger.org.globus.cog.abstraction.impl.scheduler.pbs.PBSExecutor=DEBUG - Mike ----- Original Message ----- > From: "Sarah Kenny" > To: "Michael Wilde" > Cc: "Ketan Maheshwari" , "David Kelly" , "Anjali Raja" > , "Swift Devel" , "Swift User" > Sent: Saturday, October 22, 2011 5:57:45 AM > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > fyi, this works on a smaller workflow, we've run it several times on a > 50k version. > > > On Thu, Oct 20, 2011 at 8:21 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Thanks, Ketan. If I understand you correctly, then I would consider > this a Swift bug, in that maxnodes should always mean *nodes*, for > every type of resource provider including SGE. Based on what you say, > the SGE provider is in this case treating the requested maxnode count > as cores (Assuming Anjali was running the same Swift revision as you > were testing on here). > > But then that might not explain the error in the log that Sarah > posted. > > It seems the next step is to try the run on a smaller job (we can test > this ourselves), and see if we can replicate and diagnose the error, > with SGE subit files and output/error logs. > > David, can you do this, since you were working on SGE testing last > week? > You and Ketan should share what you know about the situation, via > swift-devel, as Ketan is also running on Ranger with persistent > coasters I think. > > Thanks, > > > Mike > > > ----- Original Message ----- > > > From: "Ketan Maheshwari" < ketancmaheshwari at gmail.com > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: "Sarah Kenny" < skenny at uchicago.edu >, "Anjali Raja" < > > anjraja at gmail.com >, "Swift Devel" > > < swift-devel at ci.uchicago.edu >, "Swift User" < > > swift-user at ci.uchicago.edu > > > Sent: Thursday, October 20, 2011 9:54:33 AM > > > > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > On Thu, Oct 20, 2011 at 7:50 AM, Michael Wilde < wilde at mcs.anl.gov > > > wrote: > > > > > > Hi Sarah, Anjali, > > > > My initial theory on whats failing in this job is that the Ranger > > development queue is limited to jobs of 16 nodes or less. (The > > Ranger > > User Guide says maxprocs 256 for that queue, and qconf -sq > > development > > says slots 16, which agrees). So you need to either change to one of > > the production queues (normal, long etc) or reduce the values of > > maxnode and nodegranularity. > > > > > > > > I have a little confusion here: the desired line in the final pbs > > script should be : #$ -pe way 256; in order to have 256 procs, > > however, putting maxnodes=16 on sites.xml results in the following > > line on pbs: > > #$ -pe way 16; > > I understand this number 16/256 is for procs since, when putting 256 > > with development queue, ranger indeed allows the job to run in > > development queue. > > > > > > > > I would also suggest (unless you have already done this) that you > > test > > first on a very small run (like a single RInvoke app call) and then > > scale up to just a few voxels per dataset before trying such a large > > run. Have you already tested that? > > > > Lastly, when reporting problems like this, the swift standard > > output/err is also very helpful to get a higher-level view of what > > went wrong. > > > > Swift needs to clearly return errors from the local resource > > provider, > > which it doesnt seem to be doing here. Ive filed this as bug 593 and > > assigned to David. > > > > Please let us know if changing the queue and/or slots resolves the > > problem. As mentioned in the bug report I think you can set > > debug=true > > (or yes?) in the provider-sge.properties file and get swift to > > preserve the output from SGE in ~/.globus/scripts. (In fact that may > > already be preserved, I am not sure). Please check there to see if > > the > > SGE error is there. > > > > Thanks, > > > > - Mike > > > > > > > > ----- Original Message ----- > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > To: "Mihael Hategan" < hategan at mcs.anl.gov > > > > Cc: "Anjali Raja" < anjraja at gmail.com >, "Swift Devel" < > > > swift-devel at ci.uchicago.edu >, "Swift User" > > > < swift-user at ci.uchicago.edu > > > > Sent: Thursday, October 20, 2011 6:07:09 AM > > > Subject: Re: [Swift-devel] [Swift-user] gram on ranger > > > > > hi all, one of our users, anjali (cc'd here) is trying to submit > > > this > > > ~400k job workflow to ranger...thought i'd see if you felt like > > > having > > > a look :) > > > > > > log is here: > > > /home/skenny/swift_logs/corr_multisubj-20111018-1321-ihf8hz5g.log > > > > > > sites file: > > > > > > > > > > > > > > > > > > > > > > > > 7200 > > > 00:20:00 > > > 1 > > > 64 > > > 256 > > > development > > > 1.28 > > > TG-DBS080004N > > > 16way > > > 10000 > > > /work/00926/tg459516/swiftwork > > > > > > > > > > > > > > > On Wed, Oct 12, 2011 at 12:13 PM, Mihael Hategan < > > > hategan at mcs.anl.gov > > > > wrote: > > > > > > > > > > > > On Tue, 2011-10-11 at 17:13 -0700, Sarah Kenny wrote: > > > > > > > > > > > > On Tue, Oct 11, 2011 at 4:23 PM, Mihael Hategan < > > > > hategan at mcs.anl.gov > > > > > wrote: > > > > Is this with a persistent coaster service? > > > > > > > > admittedly i have not used persistent coaster service...should > > > > i? > > > > > > No. I was just trying to figure out whether it might be something > > > related to the persistent version. > > > > > > > > > > > > > > > > i feel like it's documented *somewhere* (?) > > > > > > > > for now i've tried setting 'sitedir.keep=true' in the config so > > > > maybe > > > > it won't try to run the cleanup job...we'll see (waiting in q) > > > > > > > > > > > > > > > > On Tue, 2011-10-11 at 12:05 -0700, Sarah Kenny wrote: > > > > > > > > > > > > > > > On Tue, Oct 11, 2011 at 11:49 AM, David Kelly > > > > < davidk at ci.uchicago.edu > > > > > > wrote: > > > > > > > > > > That could be it.. maybe a cleanup script is not > > > > getting the > > > > > right parameters and failing. Do you happen to have > > > > a copy of > > > > > the coaster log? > > > > > > > > > > just put it in /home/skenny/swift_logs > > > > > > > > > > > > > > > Maybe there will be some clues in there. > > > > > > > > > > ----- Original Message ----- > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > > > > > To: "David Kelly" < davidk at ci.uchicago.edu > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu >, > > > > "Swift > > > > > User" < swift-user at ci.uchicago.edu >, "Justin M > > > > Wozniak" > > > > > > < wozniak at mcs.anl.gov > > > > > > > > > > > > Sent: Tuesday, October 11, 2011 1:32:37 PM > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > so, this workflow completes all the jobs but then > > > > just hangs > > > > > > indefinitely at the end...maybe a stray cleanup > > > > job? > > > > > > > > > > > > log is here: > > > > > > > > > > > > > > > > /home/skenny/swift_logs/corr-20111010-2104-fl5yngd9.log > > > > > > > > > > > > just tweaked the sites file a bit from what david > > > > sent me: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > key="maxtime">28800 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">64 > > > > > > > > > key="maxNodes">256 > > > > > > > > > key="queue">normal > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > key="pe">16way > > > > > > > > > > key="initialScore">10000 > > > > > > > > > > > > > > > /work/00043/tg457040/sidgrid_out/skenny > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 3:43 PM, Sarah Kenny < > > > > > skenny at uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > ok, thanks, got in the queue now...also, realized > > > > my last > > > > > run may have > > > > > > been using the old swift. apparently i had > > > > SWIFT_HOME set in > > > > > my env > > > > > > and that overrides the newer swift i had set in my > > > > PATH. > > > > > > > > > > > > ~sk > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Oct 10, 2011 at 12:28 PM, David Kelly < > > > > > davidk at ci.uchicago.edu > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Sarah, > > > > > > > > > > > > Can you give this another try with the latest > > > > 0.93? I made > > > > > some > > > > > > changes to the coaster and sge providers and was > > > > able to get > > > > > it > > > > > > working with a simple catns script. Here is the > > > > > configuration file I > > > > > > was using: > > > > > > > > > > > > > > > > > > > > > > > > > > > url=" > > > > > > gatekeeper.ranger.tacc.teragrid.org "/> > > > > > > > > > > > > > > > > > > > > > > > > > > key="maxtime">3600 > > > > > > > > > > key="maxWallTime">00:00:03 > > > > > > > > > key="jobsPerNode">1 > > > > > > > > > > key="nodeGranularity">16 > > > > > > > > > key="maxNodes">16 > > > > > > > > > > key="queue">development > > > > > > > > > key="jobThrottle">0.9 > > > > > > > > > > > > > > > > key="project">TG-DBS080004N > > > > > > > > > > > > > > > key="pe">16way > > > > > > > > > > > > > > > /share/home/01503/davidkel/swiftwork > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > David > > > > > > > > > > > > ----- Original Message ----- > > > > > > > > > > > > > From: "Sarah Kenny" < skenny at uchicago.edu > > > > > > > > To: "Justin M Wozniak" < wozniak at mcs.anl.gov > > > > > > > > Cc: "Swift Devel" < swift-devel at ci.uchicago.edu > > > > >, "Swift > > > > > User" < > > > > > > > swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, October 7, 2011 3:13:57 PM > > > > > > > Subject: Re: [Swift-user] gram on ranger > > > > > > > > > > > /home/skenny/swift_logs/dummy-20111005-0126-6575n7x5.log > > > > > > > > > > > > > > on ci > > > > > > > > > > > > > > > > > > > > > On Fri, Oct 7, 2011 at 8:16 AM, Justin M Wozniak > > > > < > > > > > > > wozniak at mcs.anl.gov > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can I take a look at the log? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, 6 Oct 2011, Sarah Kenny wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > hey all, i'm trying to submit to gram on ranger > > > > using the > > > > > latest > > > > > > > swift > > > > > > > (built from trunk). it failes like so: > > > > > > > > > > > > > > Cannot submit job > > > > > > > Caused by: > > > > > > > org.globus.cog.abstraction. impl.common.task. > > > > > > > TaskSubmissionException: > > > > > > > Cannot > > > > > > > submit job > > > > > > > Caused by: org.globus.gram.GramException: > > > > Parameter not > > > > > supported > > > > > > > Cannot submit job > > > > > > > > > > > > > > the gram log was saying first that 'jobsPerNode' > > > > is not > > > > > supported so > > > > > > > i > > > > > > > changed it to workersPerNode and then it was > > > > saying > > > > > 'maxnodes' is > > > > > > > not > > > > > > > supported. here's my sites file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > key="initialScore">10000 > > > > profile> > > > > > > > > > > key="jobThrottle">1 > > > > > > > > > > key="maxWallTime">00:15:00 > > > > profile> > > > > > > > > > > key="maxTime">86400 > > > > > > > > > > key="slots">1 > > > > > > > > > > key="maxNodes">256 > > > > > > > > > > key="pe">16way > > > > > > > > > > key="workersPerNode">1 > > > > profile> > > > > > > > > > > key="nodeGranularity">64 > > > > profile> > > > > > > > > > > key="queue">normal > > > > > > > > > > key="project">TG-DBS080004N > > > > profile> > > > > > > > > > > > > > > > > > > > > > > > jobManager="gt2:gt2:SGE" > > > > > url=" > > > > > > > gatekeeper.ranger.tacc. teragrid.org "/> > > > > > > > > > > > > > > > > > > > > /work/00043/ > > > > tg457040 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thoughts? ideas? > > > > > > > > > > > > > > -- > > > > > > > Justin M Wozniak > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Sarah Kenny > > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci > > > > > III > > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-user mailing list > > > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Sarah Kenny > > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 > > > > Bio Sci III > > > > > > University of California Irvine, Dept. of > > > > Neurology ~ > > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Sarah Kenny > > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > > University of California Irvine, Dept. of Neurology ~ > > > > 773-818-8300 > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sarah Kenny > > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > > University of California Irvine, Dept. of Neurology ~ > > > > 773-818-8300 > > > > > > > > > > > > > > > > > > > > > > -- > > > Sarah Kenny > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > > > University of California Irvine, Dept. of Neurology ~ 773-818-8300 > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > -- > > Ketan > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > > -- > Sarah Kenny > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III > University of California Irvine, Dept. of Neurology ~ 773-818-8300 -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From svemalayan at yahoo.com Thu Oct 27 14:35:13 2011 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 27 Oct 2011 12:35:13 -0700 (PDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1319743849.21337.YahooMailNeo@web39502.mail.mud.yahoo.com> References: <1319743647.22987.YahooMailNeo@web39505.mail.mud.yahoo.com> <1319743849.21337.YahooMailNeo@web39502.mail.mud.yahoo.com> Message-ID: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> Dear All, When I try to run the HelloWorld.swift with MosaStore (MosaStore is a distributed storage developed by us) it is terminating with an error. Here HelloWorld.swift script is copied into the MosaStore filesystem.? Also Swift has write permission in the file system. Because it can create some temporary files (swift.log and first-20111026-1848-hmo97vu7.log). But then it prints the error below and terminates (I have attached the log files. I am using swift 0.93). Max heap: 259522560 Recompilation suppressed. Detailed exception: java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such file or directory) ??? at java.io.FileInputStream.open(Native Method) ??? at java.io.FileInputStream.(FileInputStream.java:137) ??? at java.io.FileInputStream.(FileInputStream.java:96) ??? at java.io.FileReader.(FileReader.java:58) ??? at org.globus.cog.karajan.Loader.load(Loader.java:209) ??? at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) Could not start execution. ??? /tmp/mount/dir1/first.kml (No such file or directory) It seems the Mosastore didnt receive any callback to create first.kml. I think, due to some reasons JVM assumes the storage is not available and terminate the swift program. So I am wondering whether you can provide any hint about this issue. Also I would like to build the swift in debug mode in order to run it with jdb (to get more information). Please let me know how to build the swift in debug mode? Thank you Emalayan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: first-20111027-1220-rfrj9ute.log Type: application/octet-stream Size: 965 bytes Desc: not available URL: From wilde at mcs.anl.gov Thu Oct 27 14:57:39 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 27 Oct 2011 14:57:39 -0500 (CDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: <1168469887.141227.1319745459780.JavaMail.root@zimbra.anl.gov> Emalyan, can you use strace on the Swift jvm invocation to trave Java's systems calls, and debug it in that manner? I suspect that Mosa is doing something that's not quite POSIX compliant, and confusing Java and hence Swift. - Mike ----- Original Message ----- > From: "Emalayan Vairavanathan" > To: swift-user at ci.uchicago.edu > Sent: Thursday, October 27, 2011 2:35:13 PM > Subject: [Swift-user] Swift-MosaStore integration problem > Dear All, > > > > > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is a > distributed storage developed by us) it is terminating with an error. > > > > Here HelloWorld.swift script is copied into the MosaStore filesystem. > Also Swift has write permission in the file system. Because it can > create some temporary files (swift.log and > first-20111026-1848-hmo97vu7.log). But then it prints the error below > and terminates (I have attached the log files. I am using swift 0.93). > > > > Max heap: 259522560 > Recompilation suppressed. > Detailed exception: > java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such file > or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:137) > at java.io.FileInputStream.(FileInputStream.java:96) > at java.io.FileReader.(FileReader.java:58) > at org.globus.cog.karajan.Loader.load(Loader.java:209) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) > Could not start execution. > /tmp/mount/dir1/first.kml (No such file or directory) > > > It seems the Mosastore didnt receive any callback to create first.kml. > I think, due to some reasons JVM assumes the storage is not available > and terminate the swift program. So I am wondering whether you can > provide any hint about this issue. > > Also I would like to build the swift in debug mode in order to run it > with jdb (to get more information). Please let me know how to build > the swift in debug mode? > > > Thank you > Emalayan > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From tim.g.armstrong at gmail.com Thu Oct 27 16:25:51 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 27 Oct 2011 16:25:51 -0500 Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1168469887.141227.1319745459780.JavaMail.root@zimbra.anl.gov> References: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> <1168469887.141227.1319745459780.JavaMail.root@zimbra.anl.gov> Message-ID: Does the directory /tmp/mount/dir1 exist before Swift starts up? Is it created before the exception occurs? On Thu, Oct 27, 2011 at 2:57 PM, Michael Wilde wrote: > Emalyan, can you use strace on the Swift jvm invocation to trave Java's > systems calls, and debug it in that manner? > > I suspect that Mosa is doing something that's not quite POSIX compliant, > and confusing Java and hence Swift. > > - Mike > > > ----- Original Message ----- > > From: "Emalayan Vairavanathan" > > To: swift-user at ci.uchicago.edu > > Sent: Thursday, October 27, 2011 2:35:13 PM > > Subject: [Swift-user] Swift-MosaStore integration problem > > Dear All, > > > > > > > > > > > > > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is a > > distributed storage developed by us) it is terminating with an error. > > > > > > > > Here HelloWorld.swift script is copied into the MosaStore filesystem. > > Also Swift has write permission in the file system. Because it can > > create some temporary files (swift.log and > > first-20111026-1848-hmo97vu7.log). But then it prints the error below > > and terminates (I have attached the log files. I am using swift 0.93). > > > > > > > > Max heap: 259522560 > > Recompilation suppressed. > > Detailed exception: > > java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such file > > or directory) > > at java.io.FileInputStream.open(Native Method) > > at java.io.FileInputStream.(FileInputStream.java:137) > > at java.io.FileInputStream.(FileInputStream.java:96) > > at java.io.FileReader.(FileReader.java:58) > > at org.globus.cog.karajan.Loader.load(Loader.java:209) > > at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) > > Could not start execution. > > /tmp/mount/dir1/first.kml (No such file or directory) > > > > > > It seems the Mosastore didnt receive any callback to create first.kml. > > I think, due to some reasons JVM assumes the storage is not available > > and terminate the swift program. So I am wondering whether you can > > provide any hint about this issue. > > > > Also I would like to build the swift in debug mode in order to run it > > with jdb (to get more information). Please let me know how to build > > the swift in debug mode? > > > > > > Thank you > > Emalayan > > > > > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Thu Oct 27 16:34:11 2011 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 27 Oct 2011 14:34:11 -0700 (PDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1168469887.141227.1319745459780.JavaMail.root@zimbra.anl.gov> References: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> <1168469887.141227.1319745459780.JavaMail.root@zimbra.anl.gov> Message-ID: <1319751251.52811.YahooMailNeo@web39501.mail.mud.yahoo.com> Thank you Mike, I was using STrace but it seems very complex to find the bug. So we are now using the profiler to mimic file system and comparing the input and output of its fuse callbacks with Mosa. Also I agree with you that Mosa is doing something wired. Regards Emalayan ________________________________ From: Michael Wilde To: Emalayan Vairavanathan Cc: swift-user at ci.uchicago.edu Sent: Thursday, 27 October 2011 12:57 PM Subject: Re: [Swift-user] Swift-MosaStore integration problem Emalyan, can you use strace on the Swift jvm invocation to trave Java's systems calls, and debug it in that manner? I suspect that Mosa is doing something that's not quite POSIX compliant, and confusing Java and hence Swift. - Mike ----- Original Message ----- > From: "Emalayan Vairavanathan" > To: swift-user at ci.uchicago.edu > Sent: Thursday, October 27, 2011 2:35:13 PM > Subject: [Swift-user] Swift-MosaStore integration problem > Dear All, > > > > > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is a > distributed storage developed by us) it is terminating with an error. > > > > Here HelloWorld.swift script is copied into the MosaStore filesystem. > Also Swift has write permission in the file system. Because it can > create some temporary files (swift.log and > first-20111026-1848-hmo97vu7.log). But then it prints the error below > and terminates (I have attached the log files. I am using swift 0.93). > > > > Max heap: 259522560 > Recompilation suppressed. > Detailed exception: > java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such file > or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.(FileInputStream.java:137) > at java.io.FileInputStream.(FileInputStream.java:96) > at java.io.FileReader.(FileReader.java:58) > at org.globus.cog.karajan.Loader.load(Loader.java:209) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) > Could not start execution. > /tmp/mount/dir1/first.kml (No such file or directory) > > > It seems the Mosastore didnt receive any callback to create first.kml. > I think, due to some reasons JVM assumes the storage is not available > and terminate the swift program. So I am wondering whether you can > provide any hint about this issue. > > Also I would like to build the swift in debug mode in order to run it > with jdb (to get more information). Please let me know how to build > the swift in debug mode? > > > Thank you > Emalayan > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Thu Oct 27 16:38:45 2011 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 27 Oct 2011 14:38:45 -0700 (PDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: References: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> <1168469887.141227.1319745459780.JavaMail.root@zimbra.anl.gov> Message-ID: <1319751525.77386.YahooMailNeo@web39501.mail.mud.yahoo.com> Hi Tim, /tmp/mount/dir1 exists before swift startup. Thank you. Emalayan ________________________________ From: Tim Armstrong To: Michael Wilde Cc: Emalayan Vairavanathan ; swift-user at ci.uchicago.edu Sent: Thursday, 27 October 2011 2:25 PM Subject: Re: [Swift-user] Swift-MosaStore integration problem Does the directory /tmp/mount/dir1 exist before Swift starts up?? Is it created before the exception occurs? On Thu, Oct 27, 2011 at 2:57 PM, Michael Wilde wrote: Emalyan, can you use strace on the Swift jvm invocation to trave Java's systems calls, and debug it in that manner? > >I suspect that Mosa is doing something that's not quite POSIX compliant, and confusing Java and hence Swift. > >- Mike > > > >----- Original Message ----- >> From: "Emalayan Vairavanathan" >> To: swift-user at ci.uchicago.edu >> Sent: Thursday, October 27, 2011 2:35:13 PM >> Subject: [Swift-user] Swift-MosaStore integration problem >> Dear All, >> >> >> >> >> >> >> >> When I try to run the HelloWorld.swift with MosaStore (MosaStore is a >> distributed storage developed by us) it is terminating with an error. >> >> >> >> Here HelloWorld.swift script is copied into the MosaStore filesystem. >> Also Swift has write permission in the file system. Because it can >> create some temporary files (swift.log and >> first-20111026-1848-hmo97vu7.log). But then it prints the error below >> and terminates (I have attached the log files. I am using swift 0.93). >> >> >> >> Max heap: 259522560 >> Recompilation suppressed. >> Detailed exception: >> java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such file >> or directory) >> at java.io.FileInputStream.open(Native Method) >> at java.io.FileInputStream.(FileInputStream.java:137) >> at java.io.FileInputStream.(FileInputStream.java:96) >> at java.io.FileReader.(FileReader.java:58) >> at org.globus.cog.karajan.Loader.load(Loader.java:209) >> at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) >> Could not start execution. >> /tmp/mount/dir1/first.kml (No such file or directory) >> >> >> It seems the Mosastore didnt receive any callback to create first.kml. >> I think, due to some reasons JVM assumes the storage is not available >> and terminate the swift program. So I am wondering whether you can >> provide any hint about this issue. >> >> Also I would like to build the swift in debug mode in order to run it >> with jdb (to get more information). Please let me know how to build >> the swift in debug mode? >> >> >> Thank you >> Emalayan >> >> >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >-- >Michael Wilde >Computation Institute, University of Chicago >Mathematics and Computer Science Division >Argonne National Laboratory > >_______________________________________________ >Swift-user mailing list >Swift-user at ci.uchicago.edu >https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 27 16:59:45 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 27 Oct 2011 16:59:45 -0500 (CDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1319751251.52811.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: <700664971.142027.1319752785624.JavaMail.root@zimbra.anl.gov> Emalyan, did you try filtering strace to zero in on open/close/creat etc? Also, to try compiling the smallest possible Swift script, like a single trace or assignment statement? But now that I think about it further, why are you running Swift on Mosa at all? Dont you just want to test that Swift can execute applications using Mosa as the workdirectory (in sites.xml)? Regards, - Mike ----- Original Message ----- > From: "Emalayan Vairavanathan" > To: "Michael Wilde" > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, October 27, 2011 4:34:11 PM > Subject: Re: [Swift-user] Swift-MosaStore integration problem > Thank you Mike, I was using STrace but it seems very complex to find > the bug. So we are now using the profiler to mimic file system and > comparing the input and output of its fuse callbacks with Mosa. > > > Also I agree with you that Mosa is doing something wired. > > > Regards > Emalayan > > > > > > From: Michael Wilde > To: Emalayan Vairavanathan > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, 27 October 2011 12:57 PM > Subject: Re: [Swift-user] Swift-MosaStore integration problem > > Emalyan, can you use strace on the Swift jvm invocation to trave > Java's systems calls, and debug it in that manner? > > I suspect that Mosa is doing something that's not quite POSIX > compliant, and confusing Java and hence Swift. > > - Mike > > > ----- Original Message ----- > > From: "Emalayan Vairavanathan" < svemalayan at yahoo.com > > > To: swift-user at ci.uchicago.edu > > Sent: Thursday, October 27, 2011 2:35:13 PM > > Subject: [Swift-user] Swift-MosaStore integration problem > > Dear All, > > > > > > > > > > > > > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is > > a > > distributed storage developed by us) it is terminating with an > > error. > > > > > > > > Here HelloWorld.swift script is copied into the MosaStore > > filesystem. > > Also Swift has write permission in the file system. Because it can > > create some temporary files (swift.log and > > first-20111026-1848-hmo97vu7.log). But then it prints the error > > below > > and terminates (I have attached the log files. I am using swift > > 0.93). > > > > > > > > Max heap: 259522560 > > Recompilation suppressed. > > Detailed exception: > > java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such > > file > > or directory) > > at java.io.FileInputStream.open(Native Method) > > at java.io.FileInputStream.(FileInputStream.java:137) > > at java.io.FileInputStream.(FileInputStream.java:96) > > at java.io.FileReader.(FileReader.java:58) > > at org.globus.cog.karajan.Loader.load(Loader.java:209) > > at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) > > Could not start execution. > > /tmp/mount/dir1/first.kml (No such file or directory) > > > > > > It seems the Mosastore didnt receive any callback to create > > first.kml. > > I think, due to some reasons JVM assumes the storage is not > > available > > and terminate the swift program. So I am wondering whether you can > > provide any hint about this issue. > > > > Also I would like to build the swift in debug mode in order to run > > it > > with jdb (to get more information). Please let me know how to build > > the swift in debug mode? > > > > > > Thank you > > Emalayan > > > > > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From benc at hawaga.org.uk Thu Oct 27 17:01:50 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 27 Oct 2011 17:01:50 -0500 Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> References: <1319743647.22987.YahooMailNeo@web39505.mail.mud.yahoo.com> <1319743849.21337.YahooMailNeo@web39502.mail.mud.yahoo.com> <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: On Oct 27, 2011, at 2:35 PM, Emalayan Vairavanathan wrote: > > Dear All, > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is a distributed storage developed by us) it is terminating with an error. > > Here HelloWorld.swift script is copied into the MosaStore filesystem. Also Swift has write permission in the file system. Because it can create some temporary files (swift.log and first-20111026-1848-hmo97vu7.log). But then it prints the error below and terminates (I have attached the log files. I am using swift 0.93). > > Max heap: 259522560 > Recompilation suppressed. Its saying reompilation suppressed - meaning that it thinks that some (though I don't remember which) intermediate files already exist, and so is not (re)generating them. Perhaps send ls -l of your directory to this list. Ben -- From tim.g.armstrong at gmail.com Thu Oct 27 17:12:29 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 27 Oct 2011 17:12:29 -0500 Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: References: <1319743647.22987.YahooMailNeo@web39505.mail.mud.yahoo.com> <1319743849.21337.YahooMailNeo@web39502.mail.mud.yahoo.com> <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: Nobody answered the part of the original question about building with debug symbols. Looking at the ant build file, it seems like the options to javac are controlled by a target in the file mbuild.xml. You can run ant with the switch -Ddebug=on in order to build the java class with the default debug info. If you want to control which debug info you include you might need to modify mbuild.xml - Tim On Thu, Oct 27, 2011 at 5:01 PM, Ben Clifford wrote: > > On Oct 27, 2011, at 2:35 PM, Emalayan Vairavanathan wrote: > > > > > Dear All, > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is a > distributed storage developed by us) it is terminating with an error. > > > > Here HelloWorld.swift script is copied into the MosaStore filesystem. > Also Swift has write permission in the file system. Because it can create > some temporary files (swift.log and first-20111026-1848-hmo97vu7.log). But > then it prints the error below and terminates (I have attached the log > files. I am using swift 0.93). > > > > Max heap: 259522560 > > Recompilation suppressed. > > Its saying reompilation suppressed - meaning that it thinks that some > (though I don't remember which) intermediate files already exist, and so is > not (re)generating them. > > Perhaps send ls -l of your directory to this list. > > Ben > > -- > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Thu Oct 27 20:29:18 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 27 Oct 2011 18:29:18 -0700 Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> References: <1319743647.22987.YahooMailNeo@web39505.mail.mud.yahoo.com> <1319743849.21337.YahooMailNeo@web39502.mail.mud.yahoo.com> <1319744113.99611.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: <1319765358.22789.5.camel@blabla> On Thu, 2011-10-27 at 12:35 -0700, Emalayan Vairavanathan wrote: > Also I would like to build the swift in debug mode in order to run it > with jdb (to get more information). Please let me know how to build > the swift in debug mode? > Set debug=true in swift/project.properties. Though that is the default and so swift should have debugging info by default. From svemalayan at yahoo.com Thu Oct 27 21:11:51 2011 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 27 Oct 2011 19:11:51 -0700 (PDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <700664971.142027.1319752785624.JavaMail.root@zimbra.anl.gov> References: <1319751251.52811.YahooMailNeo@web39501.mail.mud.yahoo.com> <700664971.142027.1319752785624.JavaMail.root@zimbra.anl.gov> Message-ID: <1319767911.15950.YahooMailNeo@web39506.mail.mud.yahoo.com> Emalyan, did you try filtering strace to zero in on open/close/creat etc? Mike could you please explain ? Also, to try compiling the smallest possible Swift script, like a single trace or assignment statement? Yes I agree, this may make debugging simple. But now that I think about it further, why are you running Swift on Mosa at all? Dont you just want to test that Swift can execute applications using Mosa as the workdirectory (in sites.xml)? Yes this make sense and help us to make quick progress. I will try this and get back to you. Thank you Emalayan ________________________________ From: Michael Wilde To: Emalayan Vairavanathan Cc: swift-user at ci.uchicago.edu Sent: Thursday, 27 October 2011 2:59 PM Subject: Re: [Swift-user] Swift-MosaStore integration problem Emalyan, did you try filtering strace to zero in on open/close/creat etc? Also, to try compiling the smallest possible Swift script, like a single trace or assignment statement? But now that I think about it further, why are you running Swift on Mosa at all? Dont you just want to test that Swift can execute applications using Mosa as the workdirectory (in sites.xml)? Regards, - Mike ----- Original Message ----- > From: "Emalayan Vairavanathan" > To: "Michael Wilde" > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, October 27, 2011 4:34:11 PM > Subject: Re: [Swift-user] Swift-MosaStore integration problem > Thank you Mike, I was using STrace but it seems very complex to find > the bug. So we are now using the profiler to mimic file system and > comparing the input and output of its fuse callbacks with Mosa. > > > Also I agree with you that Mosa is doing something wired. > > > Regards > Emalayan > > > > > > From: Michael Wilde > To: Emalayan Vairavanathan > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, 27 October 2011 12:57 PM > Subject: Re: [Swift-user] Swift-MosaStore integration problem > > Emalyan, can you use strace on the Swift jvm invocation to trave > Java's systems calls, and debug it in that manner? > > I suspect that Mosa is doing something that's not quite POSIX > compliant, and confusing Java and hence Swift. > > - Mike > > > ----- Original Message ----- > > From: "Emalayan Vairavanathan" < svemalayan at yahoo.com > > > To: swift-user at ci.uchicago.edu > > Sent: Thursday, October 27, 2011 2:35:13 PM > > Subject: [Swift-user] Swift-MosaStore integration problem > > Dear All, > > > > > > > > > > > > > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore is > > a > > distributed storage developed by us) it is terminating with an > > error. > > > > > > > > Here HelloWorld.swift script is copied into the MosaStore > > filesystem. > > Also Swift has write permission in the file system. Because it can > > create some temporary files (swift.log and > > first-20111026-1848-hmo97vu7.log). But then it prints the error > > below > > and terminates (I have attached the log files. I am using swift > > 0.93). > > > > > > > > Max heap: 259522560 > > Recompilation suppressed. > > Detailed exception: > > java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such > > file > > or directory) > > at java.io.FileInputStream.open(Native Method) > > at java.io.FileInputStream.(FileInputStream.java:137) > > at java.io.FileInputStream.(FileInputStream.java:96) > > at java.io.FileReader.(FileReader.java:58) > > at org.globus.cog.karajan.Loader.load(Loader.java:209) > > at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) > > Could not start execution. > > /tmp/mount/dir1/first.kml (No such file or directory) > > > > > > It seems the Mosastore didnt receive any callback to create > > first.kml. > > I think, due to some reasons JVM assumes the storage is not > > available > > and terminate the swift program. So I am wondering whether you can > > provide any hint about this issue. > > > > Also I would like to build the swift in debug mode in order to run > > it > > with jdb (to get more information). Please let me know how to build > > the swift in debug mode? > > > > > > Thank you > > Emalayan > > > > > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 27 21:26:35 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 27 Oct 2011 21:26:35 -0500 (CDT) Subject: [Swift-user] Swift-MosaStore integration problem In-Reply-To: <1319767911.15950.YahooMailNeo@web39506.mail.mud.yahoo.com> Message-ID: <1132949771.142466.1319768795482.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > From: "Emalayan Vairavanathan" > To: "Michael Wilde" > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, October 27, 2011 9:11:51 PM > Subject: Re: [Swift-user] Swift-MosaStore integration problem > Emalyan, did you try filtering strace to zero in on open/close/creat > etc? > Mike could you please explain ? strace has many options to trace on specific system calls, or *families* of system calls, to help reduce the volume of the trace and help you zero in on the things you care about. In particular, the -e file option: "-e trace=file Trace all system calls which take a file name as an argument. You can think of this as an abbreviation for -e trace=open,stat,chmod,unlink,... which is useful to seeing what files the process is referenc- ing. Furthermore, using the abbreviation will ensure that you don't accidentally forget to include a call like lstat in the list. Betchya woulda forgot that one." - Mike > > Also, to try compiling the smallest possible Swift script, like a > single trace or assignment statement? > Yes I agree, this may make debugging simple. > > But now that I think about it further, why are you running Swift on > Mosa at all? Dont you just want to test that Swift can execute > applications using Mosa as the workdirectory (in sites.xml)? > Yes this make sense and help us to make quick progress. I will try > this and get back to you. > > > Thank you > > Emalayan > > > > > > > > From: Michael Wilde > To: Emalayan Vairavanathan > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, 27 October 2011 2:59 PM > Subject: Re: [Swift-user] Swift-MosaStore integration problem > > Emalyan, did you try filtering strace to zero in on open/close/creat > etc? > > Also, to try compiling the smallest possible Swift script, like a > single trace or assignment statement? > > But now that I think about it further, why are you running Swift on > Mosa at all? Dont you just want to test that Swift can execute > applications using Mosa as the workdirectory (in sites.xml)? > > Regards, > > - Mike > > ----- Original Message ----- > > From: "Emalayan Vairavanathan" < svemalayan at yahoo.com > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: swift-user at ci.uchicago.edu > > Sent: Thursday, October 27, 2011 4:34:11 PM > > Subject: Re: [Swift-user] Swift-MosaStore integration problem > > Thank you Mike, I was using STrace but it seems very complex to find > > the bug. So we are now using the profiler to mimic file system and > > comparing the input and output of its fuse callbacks with Mosa. > > > > > > Also I agree with you that Mosa is doing something wired. > > > > > > Regards > > Emalayan > > > > > > > > > > > > From: Michael Wilde < wilde at mcs.anl.gov > > > To: Emalayan Vairavanathan < svemalayan at yahoo.com > > > Cc: swift-user at ci.uchicago.edu > > Sent: Thursday, 27 October 2011 12:57 PM > > Subject: Re: [Swift-user] Swift-MosaStore integration problem > > > > Emalyan, can you use strace on the Swift jvm invocation to trave > > Java's systems calls, and debug it in that manner? > > > > I suspect that Mosa is doing something that's not quite POSIX > > compliant, and confusing Java and hence Swift. > > > > - Mike > > > > > > ----- Original Message ----- > > > From: "Emalayan Vairavanathan" < svemalayan at yahoo.com > > > > To: swift-user at ci.uchicago.edu > > > Sent: Thursday, October 27, 2011 2:35:13 PM > > > Subject: [Swift-user] Swift-MosaStore integration problem > > > Dear All, > > > > > > > > > > > > > > > > > > > > > > > > When I try to run the HelloWorld.swift with MosaStore (MosaStore > > > is > > > a > > > distributed storage developed by us) it is terminating with an > > > error. > > > > > > > > > > > > Here HelloWorld.swift script is copied into the MosaStore > > > filesystem. > > > Also Swift has write permission in the file system. Because it can > > > create some temporary files (swift.log and > > > first-20111026-1848-hmo97vu7.log). But then it prints the error > > > below > > > and terminates (I have attached the log files. I am using swift > > > 0.93). > > > > > > > > > > > > Max heap: 259522560 > > > Recompilation suppressed. > > > Detailed exception: > > > java.io.FileNotFoundException: /tmp/mount/dir1/first.kml (No such > > > file > > > or directory) > > > at java.io.FileInputStream.open(Native Method) > > > at java.io.FileInputStream.(FileInputStream.java:137) > > > at java.io.FileInputStream.(FileInputStream.java:96) > > > at java.io.FileReader.(FileReader.java:58) > > > at org.globus.cog.karajan.Loader.load(Loader.java:209) > > > at org.griphyn.vdl.karajan.Loader.main(Loader.java:157) > > > Could not start execution. > > > /tmp/mount/dir1/first.kml (No such file or directory) > > > > > > > > > It seems the Mosastore didnt receive any callback to create > > > first.kml. > > > I think, due to some reasons JVM assumes the storage is not > > > available > > > and terminate the swift program. So I am wondering whether you can > > > provide any hint about this issue. > > > > > > Also I would like to build the swift in debug mode in order to run > > > it > > > with jdb (to get more information). Please let me know how to > > > build > > > the swift in debug mode? > > > > > > > > > Thank you > > > Emalayan > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory