From maltaweel at anl.gov Thu Oct 1 12:19:25 2009 From: maltaweel at anl.gov (Altaweel, Mark R.) Date: Thu, 1 Oct 2009 12:19:25 -0500 Subject: [Swift-user] Using swift for PCs Message-ID: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> Hi, I just started trying to use swift, and we are investigating to see if swift can be useful for our applications. Our users apply windows, mac, and linux/unix environments for their work, so we need to have swift work on all three platforms. I got it to run on the mac and linux side of things, but I am having a little trouble running it on windows. First, I am trying to run swift-0.9 through cygwin. I followed the install instructions for swift and when I tried to run any of the demos (e.g., first.swift) I get this error: SWIFT_HOME is not set, and all attempts at guessing it failed. ------------------------------------------------------------------------------------------- However, I did set SWIFT_HOME in the System. I am not sure what the problem is. Any hints would be appreciated. Thanks. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 1 13:29:54 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 01 Oct 2009 13:29:54 -0500 Subject: [Swift-user] Using swift for PCs In-Reply-To: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> Message-ID: <4AC4F522.3070003@mcs.anl.gov> Mark, the swift command should set $SWIFT_HOME by finding where it was executed from. That logic seems not to be working under cygwin, likely due to some shell or command differences. We'll need to investigate, but perhaps this gives you a clue to help you to down the problem. - Mike On 10/1/09 12:19 PM, Altaweel, Mark R. wrote: > Hi, > > I just started trying to use swift, and we are investigating to see if > swift can be useful for our applications. Our users apply windows, mac, > and linux/unix environments for their work, so we need to have swift > work on all three platforms. I got it to run on the mac and linux side > of things, but I am having a little trouble running it on windows. > > First, I am trying to run swift-0.9 through cygwin. I followed the > install instructions for swift and when I tried to run any of the demos > (e.g., first.swift) I get this error: > > SWIFT_HOME is not set, and all attempts at guessing it failed. > ------------------------------------------------------------------------------------------- > > However, I did set SWIFT_HOME in the System. I am not sure what the > problem is. Any hints would be appreciated. > > Thanks. > > Mark > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From foster at anl.gov Thu Oct 1 13:30:25 2009 From: foster at anl.gov (Ian Foster) Date: Thu, 1 Oct 2009 13:30:25 -0500 Subject: [Swift-user] Using swift for PCs In-Reply-To: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> Message-ID: One option is that we can set them up to log in to a Linux server and run from there? On Oct 1, 2009, at 12:19 PM, Altaweel, Mark R. wrote: > Hi, > > I just started trying to use swift, and we are investigating to see > if swift can be useful for our applications. Our users apply > windows, mac, and linux/unix environments for their work, so we need > to have swift work on all three platforms. I got it to run on the > mac and linux side of things, but I am having a little trouble > running it on windows. > > First, I am trying to run swift-0.9 through cygwin. I followed the > install instructions for swift and when I tried to run any of the > demos (e.g., first.swift) I get this error: > > SWIFT_HOME is not set, and all attempts at guessing it failed. > ------------------------------------------------------------------------------------------- > > However, I did set SWIFT_HOME in the System. I am not sure what the > problem is. Any hints would be appreciated. > > Thanks. > > Mark > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Oct 1 13:42:07 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 01 Oct 2009 13:42:07 -0500 Subject: [Swift-user] Using swift for PCs In-Reply-To: References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> Message-ID: <4AC4F7FF.3080300@mcs.anl.gov> Right, in our discussion yesterday we went through all the systems available, and Mark and Jonathan too have Linus systems and clusters. They are just sanity testing how Swift runs on Windows because they have a large Windows user community. - Mike On 10/1/09 1:30 PM, Ian Foster wrote: > One option is that we can set them up to log in to a Linux server and > run from there? > > > On Oct 1, 2009, at 12:19 PM, Altaweel, Mark R. wrote: > >> Hi, >> >> I just started trying to use swift, and we are investigating to see if >> swift can be useful for our applications. Our users apply windows, >> mac, and linux/unix environments for their work, so we need to have >> swift work on all three platforms. I got it to run on the mac and >> linux side of things, but I am having a little trouble running it on >> windows. >> >> First, I am trying to run swift-0.9 through cygwin. I followed the >> install instructions for swift and when I tried to run any of the >> demos (e.g., first.swift) I get this error: >> >> SWIFT_HOME is not set, and all attempts at guessing it failed. >> ------------------------------------------------------------------------------------------- >> >> However, I did set SWIFT_HOME in the System. I am not sure what the >> problem is. Any hints would be appreciated. >> >> Thanks. >> >> Mark >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From maltaweel at anl.gov Thu Oct 1 13:43:47 2009 From: maltaweel at anl.gov (Altaweel, Mark R.) Date: Thu, 1 Oct 2009 13:43:47 -0500 Subject: [Swift-user] Using swift for PCs In-Reply-To: <4AC4F7FF.3080300@mcs.anl.gov> References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> <4AC4F7FF.3080300@mcs.anl.gov> Message-ID: <156080967112E24F88A6F6FF0E10DA6D7E552E@OZZY.anl.gov> Yes, that's exactly the case. We need to make sure that a variety of users can run this, including those who simply want to run Repast on single workstations with multiple processors/cores. Mark -----Original Message----- From: Michael Wilde [mailto:wilde at mcs.anl.gov] Sent: Thursday, October 01, 2009 1:42 PM To: Ian Foster Cc: Altaweel, Mark R.; 'swift-user at ci.uchicago.edu' Subject: Re: [Swift-user] Using swift for PCs Right, in our discussion yesterday we went through all the systems available, and Mark and Jonathan too have Linus systems and clusters. They are just sanity testing how Swift runs on Windows because they have a large Windows user community. - Mike On 10/1/09 1:30 PM, Ian Foster wrote: > One option is that we can set them up to log in to a Linux server and > run from there? > > > On Oct 1, 2009, at 12:19 PM, Altaweel, Mark R. wrote: > >> Hi, >> >> I just started trying to use swift, and we are investigating to see >> if swift can be useful for our applications. Our users apply windows, >> mac, and linux/unix environments for their work, so we need to have >> swift work on all three platforms. I got it to run on the mac and >> linux side of things, but I am having a little trouble running it on >> windows. >> >> First, I am trying to run swift-0.9 through cygwin. I followed the >> install instructions for swift and when I tried to run any of the >> demos (e.g., first.swift) I get this error: >> >> SWIFT_HOME is not set, and all attempts at guessing it failed. >> --------------------------------------------------------------------- >> ---------------------- >> >> However, I did set SWIFT_HOME in the System. I am not sure what the >> problem is. Any hints would be appreciated. >> >> Thanks. >> >> Mark >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From hockyg at uchicago.edu Thu Oct 1 13:46:55 2009 From: hockyg at uchicago.edu (Glen Hocky) Date: Thu, 1 Oct 2009 14:46:55 -0400 Subject: [Swift-user] Using swift for PCs In-Reply-To: <156080967112E24F88A6F6FF0E10DA6D7E552E@OZZY.anl.gov> References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> <4AC4F7FF.3080300@mcs.anl.gov> <156080967112E24F88A6F6FF0E10DA6D7E552E@OZZY.anl.gov> Message-ID: I spent a few hours trying to get swift 0.9 to work in Cygwin w/ Alex Moore in David Biron's group. We were never successful, so it seems some especially advanced configuration is necessary. Glen On Thu, Oct 1, 2009 at 2:43 PM, Altaweel, Mark R. wrote: > Yes, that's exactly the case. We need to make sure that a variety of users > can run this, including those who simply want to run Repast on single > workstations with multiple processors/cores. > > Mark > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > Sent: Thursday, October 01, 2009 1:42 PM > To: Ian Foster > Cc: Altaweel, Mark R.; 'swift-user at ci.uchicago.edu' > Subject: Re: [Swift-user] Using swift for PCs > > Right, in our discussion yesterday we went through all the systems > available, and Mark and Jonathan too have Linus systems and clusters. > > They are just sanity testing how Swift runs on Windows because they have a > large Windows user community. > > - Mike > > > > > On 10/1/09 1:30 PM, Ian Foster wrote: > > One option is that we can set them up to log in to a Linux server and > > run from there? > > > > > > On Oct 1, 2009, at 12:19 PM, Altaweel, Mark R. wrote: > > > >> Hi, > >> > >> I just started trying to use swift, and we are investigating to see > >> if swift can be useful for our applications. Our users apply windows, > >> mac, and linux/unix environments for their work, so we need to have > >> swift work on all three platforms. I got it to run on the mac and > >> linux side of things, but I am having a little trouble running it on > >> windows. > >> > >> First, I am trying to run swift-0.9 through cygwin. I followed the > >> install instructions for swift and when I tried to run any of the > >> demos (e.g., first.swift) I get this error: > >> > >> SWIFT_HOME is not set, and all attempts at guessing it failed. > >> --------------------------------------------------------------------- > >> ---------------------- > >> > >> However, I did set SWIFT_HOME in the System. I am not sure what the > >> problem is. Any hints would be appreciated. > >> > >> Thanks. > >> > >> Mark > >> > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > ---------------------------------------------------------------------- > > -- > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Thu Oct 1 14:20:37 2009 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 01 Oct 2009 14:20:37 -0500 Subject: [Swift-user] Using swift for PCs In-Reply-To: References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> <4AC4F7FF.3080300@mcs.anl.gov> <156080967112E24F88A6F6FF0E10DA6D7E552E@OZZY.anl.gov> Message-ID: <4AC50105.2080609@cs.uchicago.edu> Hi, I just played with the new virtualization tool called VirtualBox, made by Sun. http://www.virtualbox.org/ Its has similar functionality as VMWare, but its much lighter weight (40MB install), its free, and works across Windows, Macs, and Linux (running in user space, so there is no special requirements from the underlying OS or even hardware). If Cygwin doesn't work out, why not try setting up a minimal install of Linux with Swift pre-installed in Virtual Box? I also think Virtual Box has some nice file sharing features, so the Linux install can access data from Windows. Just another idea to try out. Ioan Glen Hocky wrote: > I spent a few hours trying to get swift 0.9 to work in Cygwin w/ Alex > Moore in David Biron's group. We were never successful, so it seems > some especially advanced configuration is necessary. > > Glen > > On Thu, Oct 1, 2009 at 2:43 PM, Altaweel, Mark R. > wrote: > > Yes, that's exactly the case. We need to make sure that a variety > of users can run this, including those who simply want to run > Repast on single workstations with multiple processors/cores. > > Mark > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov > ] > Sent: Thursday, October 01, 2009 1:42 PM > To: Ian Foster > Cc: Altaweel, Mark R.; 'swift-user at ci.uchicago.edu > ' > Subject: Re: [Swift-user] Using swift for PCs > > Right, in our discussion yesterday we went through all the systems > available, and Mark and Jonathan too have Linus systems and clusters. > > They are just sanity testing how Swift runs on Windows because > they have a large Windows user community. > > - Mike > > > > > On 10/1/09 1:30 PM, Ian Foster wrote: > > One option is that we can set them up to log in to a Linux > server and > > run from there? > > > > > > On Oct 1, 2009, at 12:19 PM, Altaweel, Mark R. wrote: > > > >> Hi, > >> > >> I just started trying to use swift, and we are investigating to see > >> if swift can be useful for our applications. Our users apply > windows, > >> mac, and linux/unix environments for their work, so we need to have > >> swift work on all three platforms. I got it to run on the mac and > >> linux side of things, but I am having a little trouble running > it on > >> windows. > >> > >> First, I am trying to run swift-0.9 through cygwin. I followed the > >> install instructions for swift and when I tried to run any of the > >> demos (e.g., first.swift) I get this error: > >> > >> SWIFT_HOME is not set, and all attempts at guessing it failed. > >> > --------------------------------------------------------------------- > >> ---------------------- > >> > >> However, I did set SWIFT_HOME in the System. I am not sure what the > >> problem is. Any hints would be appreciated. > >> > >> Thanks. > >> > >> Mark > >> > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > ---------------------------------------------------------------------- > > -- > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ http://cucis.ece.northwestern.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From hockyg at uchicago.edu Thu Oct 1 14:33:25 2009 From: hockyg at uchicago.edu (Glen Hocky) Date: Thu, 1 Oct 2009 15:33:25 -0400 Subject: [Swift-user] Using swift for PCs In-Reply-To: <4AC50105.2080609@cs.uchicago.edu> References: <156080967112E24F88A6F6FF0E10DA6D7E552A@OZZY.anl.gov> <4AC4F7FF.3080300@mcs.anl.gov> <156080967112E24F88A6F6FF0E10DA6D7E552E@OZZY.anl.gov> <4AC50105.2080609@cs.uchicago.edu> Message-ID: I'm also a big fan of VirtualBox and one advantage of this would be that VirtualBox runs great on all 3 platforms (i've tested) so you could have everyone running the same virtual image, so configuration on all hosts would be equivalent. Glen On Thu, Oct 1, 2009 at 3:20 PM, Ioan Raicu wrote: > Hi, > I just played with the new virtualization tool called VirtualBox, made by > Sun. > http://www.virtualbox.org/ > > Its has similar functionality as VMWare, but its much lighter weight (40MB > install), its free, and works across Windows, Macs, and Linux (running in > user space, so there is no special requirements from the underlying OS or > even hardware). If Cygwin doesn't work out, why not try setting up a minimal > install of Linux with Swift pre-installed in Virtual Box? I also think > Virtual Box has some nice file sharing features, so the Linux install can > access data from Windows. > > Just another idea to try out. > > Ioan > > Glen Hocky wrote: > > I spent a few hours trying to get swift 0.9 to work in Cygwin w/ Alex Moore > in David Biron's group. We were never successful, so it seems some > especially advanced configuration is necessary. > > Glen > > On Thu, Oct 1, 2009 at 2:43 PM, Altaweel, Mark R. wrote: > >> Yes, that's exactly the case. We need to make sure that a variety of users >> can run this, including those who simply want to run Repast on single >> workstations with multiple processors/cores. >> >> Mark >> >> -----Original Message----- >> From: Michael Wilde [mailto:wilde at mcs.anl.gov] >> Sent: Thursday, October 01, 2009 1:42 PM >> To: Ian Foster >> Cc: Altaweel, Mark R.; 'swift-user at ci.uchicago.edu' >> Subject: Re: [Swift-user] Using swift for PCs >> >> Right, in our discussion yesterday we went through all the systems >> available, and Mark and Jonathan too have Linus systems and clusters. >> >> They are just sanity testing how Swift runs on Windows because they have a >> large Windows user community. >> >> - Mike >> >> >> >> >> On 10/1/09 1:30 PM, Ian Foster wrote: >> > One option is that we can set them up to log in to a Linux server and >> > run from there? >> > >> > >> > On Oct 1, 2009, at 12:19 PM, Altaweel, Mark R. wrote: >> > >> >> Hi, >> >> >> >> I just started trying to use swift, and we are investigating to see >> >> if swift can be useful for our applications. Our users apply windows, >> >> mac, and linux/unix environments for their work, so we need to have >> >> swift work on all three platforms. I got it to run on the mac and >> >> linux side of things, but I am having a little trouble running it on >> >> windows. >> >> >> >> First, I am trying to run swift-0.9 through cygwin. I followed the >> >> install instructions for swift and when I tried to run any of the >> >> demos (e.g., first.swift) I get this error: >> >> >> >> SWIFT_HOME is not set, and all attempts at guessing it failed. >> >> --------------------------------------------------------------------- >> >> ---------------------- >> >> >> >> However, I did set SWIFT_HOME in the System. I am not sure what the >> >> problem is. Any hints would be appreciated. >> >> >> >> Thanks. >> >> >> >> Mark >> >> >> >> _______________________________________________ >> >> Swift-user mailing list >> >> Swift-user at ci.uchicago.edu >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > >> > >> > ---------------------------------------------------------------------- >> > -- >> > >> > _______________________________________________ >> > Swift-user mailing list >> > Swift-user at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > ------------------------------ > > _______________________________________________ > Swift-user mailing listSwift-user at ci.uchicago.eduhttp://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > -- > ================================================================= > Ioan Raicu, Ph.D. > NSF/CRA Computing Innovation Fellow > ================================================================= > Center for Ultra-scale Computing and Information Security (CUCIS) > Department of Electrical Engineering and Computer Science > Northwestern University > 2145 Sheridan Rd, Tech M384 > Evanston, IL 60208-3118 > ================================================================= > Cel: 1-847-722-0876 > Tel: 1-847-491-8163 > Email: iraicu at eecs.northwestern.edu > Web: http://www.eecs.northwestern.edu/~iraicu/ > http://cucis.ece.northwestern.edu/ > ================================================================= > ================================================================= > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Tue Oct 13 17:15:19 2009 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 13 Oct 2009 17:15:19 -0500 Subject: [Swift-user] CFP: ACM Int. Symposium High Performance Distributed Computing (HPDC) 2010 Message-ID: <4AD4FBF7.4070601@cs.uchicago.edu> ACM HPDC 2010 Call For Papers 19th ACM International Symposium on High Performance Distributed Computing Chicago, Illinois June 21-25, 2010 http://hpdc2010.eecs.northwestern.edu The ACM International Symposium on High Performance Distributed Computing (HPDC) is the premier venue for presenting the latest research on the design, implementation, evaluation, and use of parallel and distributed systems for high performance and high end computing. The 19th installment of HPDC will take place in the heart of the Chicago, Illinois, the third largest city in the United States and a major technological and cultural capital. The conference will be held on June 23-25 (Wednesday through Friday) with affiliated workshops occurring on June 21-22 (Monday and Tuesday). Submissions are welcomed on all forms of high performance distributed computing, including grids, clouds, clusters, service-oriented computing, utility computing, peer-to-peer systems, and global computing ensembles. New scholarly research showing empirical and reproducible results in architectures, systems, and networks is strongly encouraged, as are experience reports of applications and deployments that can provide insights for future high performance distributed computing research. All papers will be rigorously reviewed by a distinguished program committee, with a strong focus on the combination of rigorous scientific results and likely high impact within high performance distributed computing. Research papers must clearly demonstrate research contributions and novelty while experience reports must clearly describe lessons learned and demonstrate impact. Topics of interest include (but are not limited to) the following, in the context of high performance distributed computing and high end computing: * Systems * Architectures * Algorithms * Networking * Programming languages and environments * Data management * I/O and file systems * Virtualization * Resource management, scheduling, and load-balancing * Performance modeling, simulation, and prediction * Fault tolerance, reliability and availability * Security, configuration, policy, and management issues * Multicore issues and opportunities * Models and use cases for utility, grid, and cloud computing Both full papers and short papers (for poster presentation and/or demonstrations) may be submitted. IMPORTANT DATES Paper Abstract submissions: January 15, 2010 Paper submissions: January 22, 2010 Author notification: March 30, 2010 Final manuscripts: April 23, 2010 SUBMISSIONS Authors are invited to submit full papers of at most 12 pages or short papers of at most 4 pages. The page limits include all figures and references. Papers should be formatted in the ACM proceedings style (e.g., http://www.acm.org/sigs/publications/proceedings-templates). Reviewing is single-blind. Papers must be self-contained and provide the technical substance required for the program committee to evaluate the paper's contribution, including how it differs from prior work. All papers will be reviewed and judged on correctness, originality, technical strength, significance, quality of presentation, and interest and relevance to the conference. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or a journal. There will be NO DEADLINE EXTENSIONS. PUBLICATION Accepted full and short papers will appear in the conference proceedings. WORKSHOPS A separate call for workshops is available at http://hpdc2010.eecs.northwestern.edu/hpdc2010-cfw.txt. The deadline for workshop proposals is November 2, 2009. GENERAL CO-CHAIRS Kate Keahey, Argonne National Labs Salim Hariri, University of Arizona STEERING COMMITTEE Salim Hariri, Univ. of Arizona (Chair) Andrew A. Chien, Intel / UCSD Henri Bal, Vrije University Franck Cappello, INRIA Jack Dongarra, Univ. of Tennessee Ian Foster, ANL& Univ. of Chicago Andrew Grimshaw, Univ. of Virginia Carl Kesselman, USC/ISI Dieter Kranzlmueller, Ludwig-Maximilians-Univ. Muenchen Miron Livny, Univ. of Wisconsin Manish Parashar, Rutgers University Karsten Schwan, Georgia Tech David Walker, Univ. of Cardiff Rich Wolski, UCSB PROGRAM CHAIR Peter Dinda, Northwestern University PROGRAM COMMITTEE Ron Brightwell, Sandia National Labs Fabian Bustamante, Northwestern University Henri Bal, Vrije Universiteit Frank Cappello, INRIA Claris Castillo, IBM Research Henri Casanova, University of Hawaii Abhishek Chandra, University of Minnesota Chris Colohan, Google Brian Cooper, Yahoo Research Wu-chun Feng, Virginia Tech Jose Fortes, University of Florida Ian Foster, University of Chicago / Argonne Geoffrey Fox, Indiana University Michael Gerndt, TU-Munich Andrew Grimshaw, University of Virginia Thilo Kielmann, Vrije Universiteit Arthur Maccabe, Oak Ridge National Labs Satoshi Matsuoka, Toyota Institute of Technology Jose Moreira, IBM Research Klara Nahrstedt, UIUC Dushyanth Narayanan, Microsoft Research Manish Parashar, Rutgers University Joel Saltz, Emory University Karsten Schwan, Georgia Tech Thomas Stricker, Google Jaspal Subhlok, University of Houston Michela Taufer, University of Delaware Valerie Taylor, TAMU Douglas Thain, University of Notre Dame Jon Weissman, University of Minnesota Rich Wolski, UCSB and Eucalyptus Systems Dongyan Xu, Purdue University Ken Yocum, UCSD WORKSHOP CHAIR Douglas Thain, University of Notre Dame PUBLICITY CO-CHAIRS Martin Swany, U. Delaware Morris Riedel, Juelich Supercomputing Centre Renato Ferreira, Universidade Federal de Minas Gerais Kento Aida, NII and Tokyo Institute of Technology LOCAL ARRANGEMENTS CHAIR Zhiling Lan, IIT STUDENT ACTIVITIES CO-CHAIRS John Lange, Northwestern University Ioan Raicu, Northwestern University -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ http://cucis.ece.northwestern.edu/ ================================================================= ================================================================= From skenny at uchicago.edu Wed Oct 14 15:39:15 2009 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Wed, 14 Oct 2009 15:39:15 -0500 (CDT) Subject: [Swift-user] Re: [Swift-devel] burnin' up ranger w/the latest coasters In-Reply-To: <20091013111417.CDU59058@m4500-02.uchicago.edu> References: <20091013111417.CDU59058@m4500-02.uchicago.edu> Message-ID: <20091014153915.CDW69329@m4500-02.uchicago.edu> for those interested, here are the config files used for this run: swift.properties: sites.file=config/coaster_ranger.xml tc.file=/ci/projects/cnari/config/tc.data lazy.errors=false caching.algorithm=LRU pgraph=false pgraph.graph.options=splines="compound", rankdir="TB" pgraph.node.options=color="seagreen", style="filled" clustering.enabled=false clustering.queue.delay=4 clustering.min.time=60 kickstart.enabled=maybe kickstart.always.transfer=false wrapperlog.always.transfer=false throttle.submit=3 throttle.host.submit=8 throttle.score.job.factor=64 throttle.transfers=16 throttle.file.operations=16 sitedir.keep=false execution.retries=3 replication.enabled=false replication.min.queue.time=60 replication.limit=3 foreach.max.threads=16384 coaster_ranger.xml: 1000.0 normal 32 1 16 8192 72000 TG-DBS080004N /work/00926/tg459516/sidgrid_out/{username} ---- Original message ---- >Date: Tue, 13 Oct 2009 11:14:17 -0500 (CDT) >From: >Subject: [Swift-devel] burnin' up ranger w/the latest coasters >To: swift-devel at ci.uchicago.edu > >Final status: Finished successfully:131072 > >re-running some of the workflows from our recent SEM >paper with the latest swift...sadly, queue time on ranger has >only gone up since those initial runs...but luckily coasters >has speeded things up, so it ends up evening out for time to >solution :) > >not sure i fully understand the plot: > >http://www.ci.uchicago.edu/~skenny/workflows/sem_131k/ > >log is here: > >/ci/projects/cnari/logs/skenny/4reg_2cond-20091012-1607-ugidm2s2.log >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From iraicu at cs.uchicago.edu Wed Oct 14 16:41:18 2009 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 14 Oct 2009 16:41:18 -0500 Subject: [Swift-user] Call for Workshops: ACM Int. Symposium High Performance Distributed Computing (HPDC) 2010 Message-ID: <4AD6457E.8040004@cs.uchicago.edu> HPDC 2010 - Call for Workshops http://hpdc2010.eecs.northwestern.edu We invite proposals for workshops to be held with the ACM Symposium on High Performance Distributed Computing to be held in Chicago, Illinois in June 2010. Workshops will be held June 21-22, preceding the main conference sessions June 23-25. Workshops provide forums for discussion among researchers and practitioners on focused topics or emerging research areas. Workshops may be organized in whatever way is appropriate to the topic, possibly including invited talks, panel discussions, presentation of work in progress, or full peer-reviewer papers. Each workshop will be a full day event hosting 20-40 participants. A workshop must be proposed in writing and sent to dthain at nd.edu. A workshop proposal should consist of: * Name of the workshop. * A few paragraphs describing the theme of the workshop and how it relates to the overall conference. * Data about previous offerings of the workshop, including attendance, number of papers or presentations submitted and accepted. * Names and affiliations of the workshop organizers, and if applicable, a significant portion of the program committee. * Plan for attracting submissions and attendees. * Timeline for milestones such as call for papers, submission deadline, and so forth. Workshop Proposal Deadline: November 2, 2009 Workshop Notification: November 9, 2009 Workshop Calls Online: November 23, 2009 Workshop Proceedings Due: April 23, 2010 -- ================================================================= Ioan Raicu, Ph.D. NSF/CRA Computing Innovation Fellow ================================================================= Center for Ultra-scale Computing and Information Security (CUCIS) Department of Electrical Engineering and Computer Science Northwestern University 2145 Sheridan Rd, Tech M384 Evanston, IL 60208-3118 ================================================================= Cel: 1-847-722-0876 Tel: 1-847-491-8163 Email: iraicu at eecs.northwestern.edu Web: http://www.eecs.northwestern.edu/~iraicu/ https://wiki.cucis.eecs.northwestern.edu/ ================================================================= ================================================================= From fedorov at bwh.harvard.edu Mon Oct 19 22:35:21 2009 From: fedorov at bwh.harvard.edu (Andriy Fedorov) Date: Mon, 19 Oct 2009 23:35:21 -0400 Subject: [Swift-user] Tuning parameters of coaster execution Message-ID: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> Hi, I am trying to understand how to set correctly the coaster-related parameters to optimize execution of my workflow. A single task I have takes around 1-2 minutes. I set maxWalltime to 2 minutes, and there 40 of these tasks in my toy workflow. Coasters are configured as gt2:gt2:pbs. When I run it with the default parameters, the workflow completes (this is great!). Now I am trying to understand what's going on and how to improve the performance. Looking at the scheduler queue, I see that two jobs are submitted in the beginning of the execution for 18 min each, one with 1 node, and one with 2 nodes. All of the execution is happening in these two jobs (the number of jobs submitted is just two, for 40 taks, so looks like things work). First question: why does it happen this way? (two jobs, 18 minutes each, specific node allocation) I assume only one of them (2-node) is executing worker tasks, but in this case why allocation time is 18 minutes, not 20 (each worker walltime is 2 min)? Second question: how do I make coaster to request more nodes? I tried to increase nodeGranularity to 10. This resulted in only one (!) job with 10 nodes and 20 min walltime showing up on the scheduler. But it looks like the jobs are still executed 2 at a time! Progress: Selecting site:38 Active:2 According to documentation, default workersPerNode=1, so I would expect at least 10 to be active. Again, I don't understand what's going on uder the hood.... Can coaster experts give me some guidance what is going on, and how to intelligently set the parameters? Thanks! -- Andriy Fedorov, Ph.D. Research Fellow Brigham and Women's Hospital Harvard Medical School 75 Francis Street Boston, MA 02115 USA fedorov at bwh.harvard.edu From wilde at mcs.anl.gov Mon Oct 19 23:26:02 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 19 Oct 2009 23:26:02 -0500 Subject: [Swift-user] Tuning parameters of coaster execution In-Reply-To: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> References: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> Message-ID: <4ADD3BDA.6050809@mcs.anl.gov> Hi Andriy, We'll need to wait for Mihael to advise you on this, but there's a few messages and threads in swift-devel that may be useful: Ranger block scheduling: http://mail.ci.uchicago.edu/pipermail/swift-devel/2009-September/005985.html http://mail.ci.uchicago.edu/pipermail/swift-devel/2009-September/005986.html Using Ranger with the latest coasters: http://mail.ci.uchicago.edu/pipermail/swift-devel/2009-October/005994.html Also, the following maybe helpful to force a specific number of coasters to start and/or jobs to run on them, but I dont know how these settings interact with the coaster "block" settings: --- To adjust the throttle, you can use this in your sites.xml element: 2.55 10000 The #jobs per site is then throttled to (jobThrottle * 100) + 1 = 256 when initialScore is large enough (and 10000 is). Eg, if you had have cores, set jobThrottle to 0.49 For 200 cores, use 1.99 etc. If you know how many cores you have available, always set initialScore to 10000 which bypasses the Swift "slow start". --- Mihael, can you create a few examples of consistent parameter settings that work well together for a few illustrative configurations? - Mike On 10/19/09 10:35 PM, Andriy Fedorov wrote: > Hi, > > I am trying to understand how to set correctly the coaster-related > parameters to optimize execution of my workflow. A single task I have > takes around 1-2 minutes. I set maxWalltime to 2 minutes, and there 40 > of these tasks in my toy workflow. Coasters are configured as > gt2:gt2:pbs. When I run it with the default parameters, the workflow > completes (this is great!). > > Now I am trying to understand what's going on and how to improve the > performance. Looking at the scheduler queue, I see that two jobs are > submitted in the beginning of the execution for 18 min each, one with > 1 node, and one with 2 nodes. All of the execution is happening in > these two jobs (the number of jobs submitted is just two, for 40 taks, > so looks like things work). First question: why does it happen this > way? (two jobs, 18 minutes each, specific node allocation) I assume > only one of them (2-node) is executing worker tasks, but in this case > why allocation time is 18 minutes, not 20 (each worker walltime is 2 > min)? > > Second question: how do I make coaster to request more nodes? I tried > to increase nodeGranularity to 10. This resulted in only one (!) job > with 10 nodes and 20 min walltime showing up on the scheduler. But it > looks like the jobs are still executed 2 at a time! > > Progress: Selecting site:38 Active:2 > > According to documentation, default workersPerNode=1, so I would > expect at least 10 to be active. Again, I don't understand what's > going on uder the hood.... > > Can coaster experts give me some guidance what is going on, and how to > intelligently set the parameters? > > Thanks! > > -- > Andriy Fedorov, Ph.D. > > Research Fellow > Brigham and Women's Hospital > Harvard Medical School > 75 Francis Street > Boston, MA 02115 USA > fedorov at bwh.harvard.edu > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From fedorov at bwh.harvard.edu Tue Oct 20 08:51:24 2009 From: fedorov at bwh.harvard.edu (Andriy Fedorov) Date: Tue, 20 Oct 2009 09:51:24 -0400 Subject: [Swift-user] Tuning parameters of coaster execution In-Reply-To: <4ADD3BDA.6050809@mcs.anl.gov> References: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> <4ADD3BDA.6050809@mcs.anl.gov> Message-ID: <82f536810910200651o21a755e1m57e7c5ef8dc6d3e4@mail.gmail.com> Mike, Very helpful pointers. I did search, but on swift-users, not swift-devel. Yes, it would be great if there were some typical configuration examples. For me, as an application developer, it is not obvious to figure these out, even though I have some experience with TeraGrid and Globus... Let's see what's Mihael's opinion. I will work from the examples included in the posts you suggested meanwhile. Thanks! -- Andriy Fedorov, Ph.D. Research Fellow Brigham and Women's Hospital Harvard Medical School 75 Francis Street Boston, MA 02115 USA fedorov at bwh.harvard.edu On Tue, Oct 20, 2009 at 00:26, Michael Wilde wrote: > Hi Andriy, > > We'll need to wait for Mihael to advise you on this, but there's a few > messages and threads in swift-devel that may be useful: > > Ranger block scheduling: > > http://mail.ci.uchicago.edu/pipermail/swift-devel/2009-September/005985.html > http://mail.ci.uchicago.edu/pipermail/swift-devel/2009-September/005986.html > > Using Ranger with the latest coasters: > > http://mail.ci.uchicago.edu/pipermail/swift-devel/2009-October/005994.html > > Also, the following maybe helpful to force a specific number of coasters to > start and/or jobs to run on them, but I dont know how these settings > interact with the coaster "block" settings: > > --- > To adjust the throttle, you can use this in your sites.xml element: > > ? ? ?2.55 > ? ? ?10000 > > The #jobs per site is then throttled to (jobThrottle * 100) + 1 = 256 > when initialScore is large enough (and 10000 is). > > Eg, if you had have cores, set jobThrottle to 0.49 > For 200 cores, use 1.99 > etc. > > If you know how many cores you have available, always set initialScore > to 10000 which bypasses the Swift "slow start". > --- > > Mihael, can you create a few examples of consistent parameter settings that > work well together for a few illustrative configurations? > > - Mike > > > On 10/19/09 10:35 PM, Andriy Fedorov wrote: >> >> Hi, >> >> I am trying to understand how to set correctly the coaster-related >> parameters to optimize execution of my workflow. A single task I have >> takes around 1-2 minutes. I set maxWalltime to 2 minutes, and there 40 >> of these tasks in my toy workflow. Coasters are configured as >> gt2:gt2:pbs. When I run it with the default parameters, the workflow >> completes (this is great!). >> >> Now I am trying to understand what's going on and how to improve the >> performance. Looking at the scheduler queue, I see that two jobs are >> submitted in the beginning of the execution for 18 min each, one with >> 1 node, and one with 2 nodes. All of the execution is happening in >> these two jobs (the number of jobs submitted is just two, for 40 taks, >> so looks like things work). First question: why does it happen this >> way? (two jobs, 18 minutes each, specific node allocation) I assume >> only one of them (2-node) is executing worker tasks, but in this case >> why allocation time is 18 minutes, not 20 (each worker walltime is 2 >> min)? >> >> Second question: how do I make coaster to request more nodes? I tried >> to increase nodeGranularity to 10. This resulted in only one (!) job >> with 10 nodes and 20 min walltime showing up on the scheduler. But it >> looks like the jobs are still executed 2 at a time! >> >> Progress: ?Selecting site:38 ?Active:2 >> >> According to documentation, default workersPerNode=1, so I would >> expect at least 10 to be active. Again, I don't understand what's >> going on uder the hood.... >> >> Can coaster experts give me some guidance what is going on, and how to >> intelligently set the parameters? >> >> Thanks! >> >> -- >> Andriy Fedorov, Ph.D. >> >> Research Fellow >> Brigham and Women's Hospital >> Harvard Medical School >> 75 Francis Street >> Boston, MA 02115 USA >> fedorov at bwh.harvard.edu >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From hategan at mcs.anl.gov Tue Oct 20 10:55:47 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 Oct 2009 10:55:47 -0500 Subject: [Swift-user] Tuning parameters of coaster execution In-Reply-To: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> References: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> Message-ID: <1256054147.22279.18.camel@localhost> On Mon, 2009-10-19 at 23:35 -0400, Andriy Fedorov wrote: > Hi, > > I am trying to understand how to set correctly the coaster-related > parameters to optimize execution of my workflow. A single task I have > takes around 1-2 minutes. I set maxWalltime to 2 minutes, and there 40 > of these tasks in my toy workflow. Coasters are configured as > gt2:gt2:pbs. When I run it with the default parameters, the workflow > completes (this is great!). > > Now I am trying to understand what's going on and how to improve the > performance. Looking at the scheduler queue, I see that two jobs are > submitted in the beginning of the execution for 18 min each, one with > 1 node, and one with 2 nodes. All of the execution is happening in > these two jobs (the number of jobs submitted is just two, for 40 taks, > so looks like things work). First question: why does it happen this > way? (two jobs, 18 minutes each, specific node allocation) I assume > only one of them (2-node) is executing worker tasks, but in this case > why allocation time is 18 minutes, not 20 (each worker walltime is 2 > min)? > > Second question: how do I make coaster to request more nodes? I tried > to increase nodeGranularity to 10. This resulted in only one (!) job > with 10 nodes and 20 min walltime showing up on the scheduler. But it > looks like the jobs are still executed 2 at a time! You need a more recent version of the code. A few weeks ago the "parallelism" option was added. By default it's set to try to allocate as many nodes as there are jobs (parallelism=0.0), whereas the behavior you see would have parallelism=1.0. I should change the way the numbers are specified. It's not exactly intuitive unless you look at how it works. Anyway, it boils down to the notion of job size and block size. The block size is defined as workers*bwalltime^parallelism, while the job size is jwalltime^parallelism. At any given time you can fit roughly workers*bwalltime^parallelism/jwalltime^parallelism jobs in a block. You can see that with parallelism=0, that reduces to workers/count(jobs). Conversely, with parallelism=1 the jobs size is jwalltime and if your block had bwalltime you could fit workers*bwalltime/jwalltime jobs in it. At the same time, bwalltime is controlled by the overallocation factors. Once the block walltime is decided, the width (number of workers) is picked based on the job sizes that need to be fit (according to the above scheme). Anyway, to sum it up, use a more recent version. From fedorov at bwh.harvard.edu Tue Oct 20 11:04:46 2009 From: fedorov at bwh.harvard.edu (Andriy Fedorov) Date: Tue, 20 Oct 2009 12:04:46 -0400 Subject: [Swift-user] Tuning parameters of coaster execution In-Reply-To: <1256054147.22279.18.camel@localhost> References: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> <1256054147.22279.18.camel@localhost> Message-ID: <82f536810910200904x584d8ca3m2da7fab8dc660b1d@mail.gmail.com> On Tue, Oct 20, 2009 at 11:55, Mihael Hategan wrote: > You need a more recent version of the code. > Mihael, I actually updated svn for both cog and swift yesterday prior to running the tests. Here's what swift reports I have right now: Swift svn swift-r3170 cog-r2529 > A few weeks ago the "parallelism" option was added. By default it's set > to try to allocate as many nodes as there are jobs (parallelism=0.0), > whereas the behavior you see would have parallelism=1.0. I should change > the way the numbers are specified. It's not exactly intuitive unless you > look at how it works. > > Anyway, it boils down to the notion of job size and block size. The > block size is defined as workers*bwalltime^parallelism, while the job > size is jwalltime^parallelism. At any given time you can fit roughly > workers*bwalltime^parallelism/jwalltime^parallelism jobs in a block. > > You can see that with parallelism=0, that reduces to > workers/count(jobs). > > Conversely, with parallelism=1 the jobs size is jwalltime and if your > block had bwalltime you could fit workers*bwalltime/jwalltime jobs in > it. > > At the same time, bwalltime is controlled by the overallocation factors. > Once the block walltime is decided, the width (number of workers) is > picked based on the job sizes that need to be fit (according to the > above scheme). > > Anyway, to sum it up, use a more recent version. > > From hategan at mcs.anl.gov Tue Oct 20 11:23:22 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 Oct 2009 11:23:22 -0500 Subject: [Swift-user] Tuning parameters of coaster execution In-Reply-To: <82f536810910200904x584d8ca3m2da7fab8dc660b1d@mail.gmail.com> References: <82f536810910192035o1eaf761chfff2e006e31fb51a@mail.gmail.com> <1256054147.22279.18.camel@localhost> <82f536810910200904x584d8ca3m2da7fab8dc660b1d@mail.gmail.com> Message-ID: <1256055802.24685.13.camel@localhost> On Tue, 2009-10-20 at 12:04 -0400, Andriy Fedorov wrote: > On Tue, Oct 20, 2009 at 11:55, Mihael Hategan wrote: > > You need a more recent version of the code. > > > > Mihael, I actually updated svn for both cog and swift yesterday prior > to running the tests. Here's what swift reports I have right now: > > Swift svn swift-r3170 cog-r2529 Given that even when you have granularity=10 you still see 2 jobs, I suspect you are using swift site throttling parameters that force that. I would set the jobThrottle higher and possibly the initial score higher. For troubleshooting, what you could do is, on the remote side, say cat ~/.globus/coasters/coasters.log|grep "BlockQueueProcessor">bqp.log and post that. Also, you could set the remoteMonitorEnabled profile to "true" to get visual feedback of what's happening. The allocation time is 18 minutes because the new stuff doesn't overallocate using a fixed multiplier (though you can force it to do so). For small jobs (walltime = 1s) the multiplier is set by lowOverallocation (10.0 by default) while for large jobs (walltime -> +inf) the multiplier is 1, with an exponential decay in-between. If you want to always have blocks being 10 times the job walltime, you can set highOverallocation to 10. From HodgessE at uhd.edu Tue Oct 20 21:41:30 2009 From: HodgessE at uhd.edu (Hodgess, Erin) Date: Tue, 20 Oct 2009 21:41:30 -0500 Subject: [Swift-user] using swift on a cluster Message-ID: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus> Hi Swift Users: I'm on a cluster and would like to use swift on the different sites on the cluster. How would I do that, please? Thanks, Erin Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Oct 20 22:49:45 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 20 Oct 2009 22:49:45 -0500 Subject: [Swift-user] using swift on a cluster In-Reply-To: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus> References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus> Message-ID: <4ADE84D9.6020508@mcs.anl.gov> Hi Erin, I'm assuming you meant "use Swift to run jobs on the compute nodes of the cluster"? If so, you first need to find out what scheduler (also called "batch system" or "local resource manager") the cluster is running. Thats typical one of these: PBS, Condor, or SGE. Either ask your system administrator, or see if the "man" command or similar probes give you a clue: Condor: condor_q -version condor_q -version $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ $CondorPlatform: I386-LINUX_RHEL5 $ PBS: man qstat: qstat(1B) PBS SGE: man qstat: QSTAT(1) Sun Grid Engine User Commands If its PBS or Condor, then the Swift user guide gives the sites.xml entries to use. Tell us what you find, then try following the instructions in the user guide, and follow up with questions as needed. - Mike On 10/20/09 9:41 PM, Hodgess, Erin wrote: > Hi Swift Users: > > I'm on a cluster and would like to use swift on the different sites on > the cluster. > > How would I do that, please? > > Thanks, > Erin > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From HodgessE at uhd.edu Wed Oct 21 03:07:22 2009 From: HodgessE at uhd.edu (Hodgess, Erin) Date: Wed, 21 Oct 2009 03:07:22 -0500 Subject: [Swift-user] using swift on a cluster References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus> <4ADE84D9.6020508@mcs.anl.gov> Message-ID: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> Hello! We are indeed using condor. I wanted to try a small test run, but am running into trouble: [hodgess at grid bin]$ cat myjob.submit executable=/usr/bin/id output=results.output error=results.error log=results.log queue [hodgess at grid bin]$ condor_submit myjob.submit Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 15. [hodgess at grid bin]$ ls results* results.error results.log results.output You have new mail in /var/spool/mail/hodgess [hodgess at grid bin]$ cat results.log 000 (015.000.000) 10/21 03:06:03 Job submitted from host: <192.168.1.11:46274> ... 001 (015.000.000) 10/21 03:06:05 Job executing on host: <10.1.255.244:44508> ... 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor. ... 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. ... [hodgess at grid bin]$ I'm not sure why the job is not linked. Any suggestions would be much appreciated. Thanks, Erin Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -----Original Message----- From: Michael Wilde [mailto:wilde at mcs.anl.gov] Sent: Tue 10/20/2009 10:49 PM To: Hodgess, Erin Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] using swift on a cluster Hi Erin, I'm assuming you meant "use Swift to run jobs on the compute nodes of the cluster"? If so, you first need to find out what scheduler (also called "batch system" or "local resource manager") the cluster is running. Thats typical one of these: PBS, Condor, or SGE. Either ask your system administrator, or see if the "man" command or similar probes give you a clue: Condor: condor_q -version condor_q -version $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ $CondorPlatform: I386-LINUX_RHEL5 $ PBS: man qstat: qstat(1B) PBS SGE: man qstat: QSTAT(1) Sun Grid Engine User Commands If its PBS or Condor, then the Swift user guide gives the sites.xml entries to use. Tell us what you find, then try following the instructions in the user guide, and follow up with questions as needed. - Mike On 10/20/09 9:41 PM, Hodgess, Erin wrote: > Hi Swift Users: > > I'm on a cluster and would like to use swift on the different sites on > the cluster. > > How would I do that, please? > > Thanks, > Erin > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From HodgessE at uhd.edu Wed Oct 21 03:17:00 2009 From: HodgessE at uhd.edu (Hodgess, Erin) Date: Wed, 21 Oct 2009 03:17:00 -0500 Subject: [Swift-user] using swift on a cluster References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus><4ADE84D9.6020508@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> Message-ID: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> Aha! I needed the universe=vanilla line. Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -----Original Message----- From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin Sent: Wed 10/21/2009 3:07 AM To: Michael Wilde Cc: swift-user at ci.uchicago.edu Subject: RE: [Swift-user] using swift on a cluster Hello! We are indeed using condor. I wanted to try a small test run, but am running into trouble: [hodgess at grid bin]$ cat myjob.submit executable=/usr/bin/id output=results.output error=results.error log=results.log queue [hodgess at grid bin]$ condor_submit myjob.submit Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 15. [hodgess at grid bin]$ ls results* results.error results.log results.output You have new mail in /var/spool/mail/hodgess [hodgess at grid bin]$ cat results.log 000 (015.000.000) 10/21 03:06:03 Job submitted from host: <192.168.1.11:46274> ... 001 (015.000.000) 10/21 03:06:05 Job executing on host: <10.1.255.244:44508> ... 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor. ... 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. ... [hodgess at grid bin]$ I'm not sure why the job is not linked. Any suggestions would be much appreciated. Thanks, Erin Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -----Original Message----- From: Michael Wilde [mailto:wilde at mcs.anl.gov] Sent: Tue 10/20/2009 10:49 PM To: Hodgess, Erin Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] using swift on a cluster Hi Erin, I'm assuming you meant "use Swift to run jobs on the compute nodes of the cluster"? If so, you first need to find out what scheduler (also called "batch system" or "local resource manager") the cluster is running. Thats typical one of these: PBS, Condor, or SGE. Either ask your system administrator, or see if the "man" command or similar probes give you a clue: Condor: condor_q -version condor_q -version $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ $CondorPlatform: I386-LINUX_RHEL5 $ PBS: man qstat: qstat(1B) PBS SGE: man qstat: QSTAT(1) Sun Grid Engine User Commands If its PBS or Condor, then the Swift user guide gives the sites.xml entries to use. Tell us what you find, then try following the instructions in the user guide, and follow up with questions as needed. - Mike On 10/20/09 9:41 PM, Hodgess, Erin wrote: > Hi Swift Users: > > I'm on a cluster and would like to use swift on the different sites on > the cluster. > > How would I do that, please? > > Thanks, > Erin > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Oct 21 07:02:12 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 21 Oct 2009 07:02:12 -0500 Subject: [Swift-user] using swift on a cluster In-Reply-To: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus><4ADE84D9.6020508@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> Message-ID: <4ADEF844.5020202@mcs.anl.gov> For running Swift locally on a Condor cluster, use a sites.xml based on this example: /home/erin/swiftwork .03 10000 /home/erin/swiftwork .19 10000 The jobThrottle values above will enable Swift to run up to 4 jobs at a time on localhost and 20 jobs at a time on the Condor cluster. Use tc.data to catalog applications on pool or the other. Set jobThrottle as desired to control execution parallelism. #jobs run in parallel is (jobThrottle * 100)+1 initialScore=10000 overrides Swift's "start slow" approach to sensing the site's responsiveness. - Mike On 10/21/09 3:17 AM, Hodgess, Erin wrote: > Aha! > > I needed the universe=vanilla line. > > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin > Sent: Wed 10/21/2009 3:07 AM > To: Michael Wilde > Cc: swift-user at ci.uchicago.edu > Subject: RE: [Swift-user] using swift on a cluster > > Hello! > > We are indeed using condor. > > I wanted to try a small test run, but am running into trouble: > > [hodgess at grid bin]$ cat myjob.submit > executable=/usr/bin/id > output=results.output > error=results.error > log=results.log > queue > [hodgess at grid bin]$ condor_submit myjob.submit > Submitting job(s). > Logging submit event(s). > 1 job(s) submitted to cluster 15. > [hodgess at grid bin]$ ls results* > results.error results.log results.output > You have new mail in /var/spool/mail/hodgess > [hodgess at grid bin]$ cat results.log > 000 (015.000.000) 10/21 03:06:03 Job submitted from host: > <192.168.1.11:46274> > ... > 001 (015.000.000) 10/21 03:06:05 Job executing on host: <10.1.255.244:44508> > ... > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor. > ... > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. > ... > [hodgess at grid bin]$ > > I'm not sure why the job is not linked. > > Any suggestions would be much appreciated. > > Thanks, > Erin > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > Sent: Tue 10/20/2009 10:49 PM > To: Hodgess, Erin > Cc: swift-user at ci.uchicago.edu > Subject: Re: [Swift-user] using swift on a cluster > > Hi Erin, > > I'm assuming you meant "use Swift to run jobs on the compute nodes of > the cluster"? > > If so, you first need to find out what scheduler (also called "batch > system" or "local resource manager") the cluster is running. > > Thats typical one of these: PBS, Condor, or SGE. > > Either ask your system administrator, or see if the "man" command or > similar probes give you a clue: > > Condor: condor_q -version > > condor_q -version > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ > $CondorPlatform: I386-LINUX_RHEL5 $ > > PBS: man qstat: > > qstat(1B) PBS > > SGE: man qstat: > > QSTAT(1) Sun Grid Engine User Commands > > > If its PBS or Condor, then the Swift user guide gives the sites.xml > entries to use. > > Tell us what you find, then try following the instructions in the user > guide, and follow up with questions as needed. > > - Mike > > > On 10/20/09 9:41 PM, Hodgess, Erin wrote: > > Hi Swift Users: > > > > I'm on a cluster and would like to use swift on the different sites on > > the cluster. > > > > How would I do that, please? > > > > Thanks, > > Erin > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From HodgessE at uhd.edu Wed Oct 21 09:03:24 2009 From: HodgessE at uhd.edu (Hodgess, Erin) Date: Wed, 21 Oct 2009 09:03:24 -0500 Subject: [Swift-user] using swift on a cluster References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus><4ADE84D9.6020508@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> <4ADEF844.5020202@mcs.anl.gov> Message-ID: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B7@BALI.uhd.campus> Here is the output: [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml firstR.swift Swift 0.9 swift-r2860 cog-r2388 RunID: 20091021-0901-aku7y862 Progress: Execution failed: No service contacts available [hodgess at grid bin]$ Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -----Original Message----- From: Michael Wilde [mailto:wilde at mcs.anl.gov] Sent: Wed 10/21/2009 7:02 AM To: Hodgess, Erin Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] using swift on a cluster For running Swift locally on a Condor cluster, use a sites.xml based on this example: /home/erin/swiftwork .03 10000 /home/erin/swiftwork .19 10000 The jobThrottle values above will enable Swift to run up to 4 jobs at a time on localhost and 20 jobs at a time on the Condor cluster. Use tc.data to catalog applications on pool or the other. Set jobThrottle as desired to control execution parallelism. #jobs run in parallel is (jobThrottle * 100)+1 initialScore=10000 overrides Swift's "start slow" approach to sensing the site's responsiveness. - Mike On 10/21/09 3:17 AM, Hodgess, Erin wrote: > Aha! > > I needed the universe=vanilla line. > > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin > Sent: Wed 10/21/2009 3:07 AM > To: Michael Wilde > Cc: swift-user at ci.uchicago.edu > Subject: RE: [Swift-user] using swift on a cluster > > Hello! > > We are indeed using condor. > > I wanted to try a small test run, but am running into trouble: > > [hodgess at grid bin]$ cat myjob.submit > executable=/usr/bin/id > output=results.output > error=results.error > log=results.log > queue > [hodgess at grid bin]$ condor_submit myjob.submit > Submitting job(s). > Logging submit event(s). > 1 job(s) submitted to cluster 15. > [hodgess at grid bin]$ ls results* > results.error results.log results.output > You have new mail in /var/spool/mail/hodgess > [hodgess at grid bin]$ cat results.log > 000 (015.000.000) 10/21 03:06:03 Job submitted from host: > <192.168.1.11:46274> > ... > 001 (015.000.000) 10/21 03:06:05 Job executing on host: <10.1.255.244:44508> > ... > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor. > ... > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. > ... > [hodgess at grid bin]$ > > I'm not sure why the job is not linked. > > Any suggestions would be much appreciated. > > Thanks, > Erin > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > Sent: Tue 10/20/2009 10:49 PM > To: Hodgess, Erin > Cc: swift-user at ci.uchicago.edu > Subject: Re: [Swift-user] using swift on a cluster > > Hi Erin, > > I'm assuming you meant "use Swift to run jobs on the compute nodes of > the cluster"? > > If so, you first need to find out what scheduler (also called "batch > system" or "local resource manager") the cluster is running. > > Thats typical one of these: PBS, Condor, or SGE. > > Either ask your system administrator, or see if the "man" command or > similar probes give you a clue: > > Condor: condor_q -version > > condor_q -version > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ > $CondorPlatform: I386-LINUX_RHEL5 $ > > PBS: man qstat: > > qstat(1B) PBS > > SGE: man qstat: > > QSTAT(1) Sun Grid Engine User Commands > > > If its PBS or Condor, then the Swift user guide gives the sites.xml > entries to use. > > Tell us what you find, then try following the instructions in the user > guide, and follow up with questions as needed. > > - Mike > > > On 10/20/09 9:41 PM, Hodgess, Erin wrote: > > Hi Swift Users: > > > > I'm on a cluster and would like to use swift on the different sites on > > the cluster. > > > > How would I do that, please? > > > > Thanks, > > Erin > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Oct 21 09:22:35 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 21 Oct 2009 09:22:35 -0500 Subject: [Swift-user] using swift on a cluster In-Reply-To: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B7@BALI.uhd.campus> References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus><4ADE84D9.6020508@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> <4ADEF844.5020202@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B7@BALI.uhd.campus> Message-ID: <4ADF192B.8020804@mcs.anl.gov> Erin, we need to look into this further. Please make sure that you are running either Swift 0.9 or the latest source from svn. And tell us what revision you are running. Also please post your tc.data and sites.xml (and log file is its small enought); see if there are any messages in the .log file that would clarify the error. Make sure that your app is cataloged in tc.data as being on pool "condor". But I think if it were not, you'd see a different error. It almost looks to me like Swift is looking for the GRAM service contact string, as if it thinks you are asking for Condor-G instead of local Condor, eg: grid gt2 belhaven-1.renci.org/jobmanager-fork Just as a test, try changing provider="condor" to "pbs" in sites.xml. If the error changes to something like "PBS not installed" or "qsub not found" then I would suspect this is the case. Its possible you can add just the jobType element with the value set to vanilla instead of grid, but I am purely *guessing*; we'll look deeper as soon as you send the info above and we have time. - Mike On 10/21/09 9:03 AM, Hodgess, Erin wrote: > Here is the output: > > > [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml > firstR.swift > Swift 0.9 swift-r2860 cog-r2388 > > RunID: 20091021-0901-aku7y862 > Progress: > Execution failed: > No service contacts available > [hodgess at grid bin]$ > > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > Sent: Wed 10/21/2009 7:02 AM > To: Hodgess, Erin > Cc: swift-user at ci.uchicago.edu > Subject: Re: [Swift-user] using swift on a cluster > > For running Swift locally on a Condor cluster, use a sites.xml based on > this example: > > > > > > > > > /home/erin/swiftwork > .03 > 10000 > > > > > > /home/erin/swiftwork > .19 > 10000 > > > > > The jobThrottle values above will enable Swift to run up to 4 jobs at a > time on localhost and 20 jobs at a time on the Condor cluster. > > Use tc.data to catalog applications on pool or the other. > > Set jobThrottle as desired to control execution parallelism. > > #jobs run in parallel is (jobThrottle * 100)+1 > > initialScore=10000 overrides Swift's "start slow" approach to sensing > the site's responsiveness. > > - Mike > > On 10/21/09 3:17 AM, Hodgess, Erin wrote: > > Aha! > > > > I needed the universe=vanilla line. > > > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > > > -----Original Message----- > > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin > > Sent: Wed 10/21/2009 3:07 AM > > To: Michael Wilde > > Cc: swift-user at ci.uchicago.edu > > Subject: RE: [Swift-user] using swift on a cluster > > > > Hello! > > > > We are indeed using condor. > > > > I wanted to try a small test run, but am running into trouble: > > > > [hodgess at grid bin]$ cat myjob.submit > > executable=/usr/bin/id > > output=results.output > > error=results.error > > log=results.log > > queue > > [hodgess at grid bin]$ condor_submit myjob.submit > > Submitting job(s). > > Logging submit event(s). > > 1 job(s) submitted to cluster 15. > > [hodgess at grid bin]$ ls results* > > results.error results.log results.output > > You have new mail in /var/spool/mail/hodgess > > [hodgess at grid bin]$ cat results.log > > 000 (015.000.000) 10/21 03:06:03 Job submitted from host: > > <192.168.1.11:46274> > > ... > > 001 (015.000.000) 10/21 03:06:05 Job executing on host: > <10.1.255.244:44508> > > ... > > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor. > > ... > > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. > > ... > > [hodgess at grid bin]$ > > > > I'm not sure why the job is not linked. > > > > Any suggestions would be much appreciated. > > > > Thanks, > > Erin > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > > > -----Original Message----- > > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > > Sent: Tue 10/20/2009 10:49 PM > > To: Hodgess, Erin > > Cc: swift-user at ci.uchicago.edu > > Subject: Re: [Swift-user] using swift on a cluster > > > > Hi Erin, > > > > I'm assuming you meant "use Swift to run jobs on the compute nodes of > > the cluster"? > > > > If so, you first need to find out what scheduler (also called "batch > > system" or "local resource manager") the cluster is running. > > > > Thats typical one of these: PBS, Condor, or SGE. > > > > Either ask your system administrator, or see if the "man" command or > > similar probes give you a clue: > > > > Condor: condor_q -version > > > > condor_q -version > > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ > > $CondorPlatform: I386-LINUX_RHEL5 $ > > > > PBS: man qstat: > > > > qstat(1B) PBS > > > > SGE: man qstat: > > > > QSTAT(1) Sun Grid Engine User Commands > > > > > > If its PBS or Condor, then the Swift user guide gives the sites.xml > > entries to use. > > > > Tell us what you find, then try following the instructions in the user > > guide, and follow up with questions as needed. > > > > - Mike > > > > > > On 10/20/09 9:41 PM, Hodgess, Erin wrote: > > > Hi Swift Users: > > > > > > I'm on a cluster and would like to use swift on the different sites on > > > the cluster. > > > > > > How would I do that, please? > > > > > > Thanks, > > > Erin > > > > > > > > > Erin M. Hodgess, PhD > > > Associate Professor > > > Department of Computer and Mathematical Sciences > > > University of Houston - Downtown > > > mailto: hodgesse at uhd.edu > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > From HodgessE at uhd.edu Wed Oct 21 12:10:25 2009 From: HodgessE at uhd.edu (Hodgess, Erin) Date: Wed, 21 Oct 2009 12:10:25 -0500 Subject: [Swift-user] using swift on a cluster References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus><4ADE84D9.6020508@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> <4ADEF844.5020202@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B7@BALI.uhd.campus> <4ADF192B.8020804@mcs.anl.gov> Message-ID: <70A5AC06FDB5E54482D19E1C04CDFCF307C377C0@BALI.uhd.campus> Hi again! Here are the sites.xml and tc.data files. Thanks, Erin [hodgess at grid bin]$ cat sites.xml /home/hodgess/swiftwork .03 10000 /home/hodgess/swiftwork .19 10000 [hodgess at grid bin]$ cat tc.data localhost convert /usr/bin/convert INSTALLED INTEL32::LINUX null localhost RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh INSTALLED INTEL32::LINUX null condor RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh INSTALLED INTEL32::LINUX null [hodgess at grid bin]$ cat firstR.R cat: firstR.R: No such file or directory [hodgess at grid bin]$ cat firstR.swift type file{} app (file output) firstone (file scriptFile) { RInvoke @filename(scriptFile) @filename(output); } file scriptFile <"a1.in" >; file output <"a1.out" >; output=firstone(scriptFile); [hodgess at grid bin]$ Erin M. Hodgess, PhD Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: hodgesse at uhd.edu -----Original Message----- From: Michael Wilde [mailto:wilde at mcs.anl.gov] Sent: Wed 10/21/2009 9:22 AM To: Hodgess, Erin Cc: swift-user at ci.uchicago.edu Subject: Re: [Swift-user] using swift on a cluster Erin, we need to look into this further. Please make sure that you are running either Swift 0.9 or the latest source from svn. And tell us what revision you are running. Also please post your tc.data and sites.xml (and log file is its small enought); see if there are any messages in the .log file that would clarify the error. Make sure that your app is cataloged in tc.data as being on pool "condor". But I think if it were not, you'd see a different error. It almost looks to me like Swift is looking for the GRAM service contact string, as if it thinks you are asking for Condor-G instead of local Condor, eg: grid gt2 belhaven-1.renci.org/jobmanager-fork Just as a test, try changing provider="condor" to "pbs" in sites.xml. If the error changes to something like "PBS not installed" or "qsub not found" then I would suspect this is the case. Its possible you can add just the jobType element with the value set to vanilla instead of grid, but I am purely *guessing*; we'll look deeper as soon as you send the info above and we have time. - Mike On 10/21/09 9:03 AM, Hodgess, Erin wrote: > Here is the output: > > > [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml > firstR.swift > Swift 0.9 swift-r2860 cog-r2388 > > RunID: 20091021-0901-aku7y862 > Progress: > Execution failed: > No service contacts available > [hodgess at grid bin]$ > > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > Sent: Wed 10/21/2009 7:02 AM > To: Hodgess, Erin > Cc: swift-user at ci.uchicago.edu > Subject: Re: [Swift-user] using swift on a cluster > > For running Swift locally on a Condor cluster, use a sites.xml based on > this example: > > > > > > > > > /home/erin/swiftwork > .03 > 10000 > > > > > > /home/erin/swiftwork > .19 > 10000 > > > > > The jobThrottle values above will enable Swift to run up to 4 jobs at a > time on localhost and 20 jobs at a time on the Condor cluster. > > Use tc.data to catalog applications on pool or the other. > > Set jobThrottle as desired to control execution parallelism. > > #jobs run in parallel is (jobThrottle * 100)+1 > > initialScore=10000 overrides Swift's "start slow" approach to sensing > the site's responsiveness. > > - Mike > > On 10/21/09 3:17 AM, Hodgess, Erin wrote: > > Aha! > > > > I needed the universe=vanilla line. > > > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > > > -----Original Message----- > > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin > > Sent: Wed 10/21/2009 3:07 AM > > To: Michael Wilde > > Cc: swift-user at ci.uchicago.edu > > Subject: RE: [Swift-user] using swift on a cluster > > > > Hello! > > > > We are indeed using condor. > > > > I wanted to try a small test run, but am running into trouble: > > > > [hodgess at grid bin]$ cat myjob.submit > > executable=/usr/bin/id > > output=results.output > > error=results.error > > log=results.log > > queue > > [hodgess at grid bin]$ condor_submit myjob.submit > > Submitting job(s). > > Logging submit event(s). > > 1 job(s) submitted to cluster 15. > > [hodgess at grid bin]$ ls results* > > results.error results.log results.output > > You have new mail in /var/spool/mail/hodgess > > [hodgess at grid bin]$ cat results.log > > 000 (015.000.000) 10/21 03:06:03 Job submitted from host: > > <192.168.1.11:46274> > > ... > > 001 (015.000.000) 10/21 03:06:05 Job executing on host: > <10.1.255.244:44508> > > ... > > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for Condor. > > ... > > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. > > ... > > [hodgess at grid bin]$ > > > > I'm not sure why the job is not linked. > > > > Any suggestions would be much appreciated. > > > > Thanks, > > Erin > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > > > -----Original Message----- > > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > > Sent: Tue 10/20/2009 10:49 PM > > To: Hodgess, Erin > > Cc: swift-user at ci.uchicago.edu > > Subject: Re: [Swift-user] using swift on a cluster > > > > Hi Erin, > > > > I'm assuming you meant "use Swift to run jobs on the compute nodes of > > the cluster"? > > > > If so, you first need to find out what scheduler (also called "batch > > system" or "local resource manager") the cluster is running. > > > > Thats typical one of these: PBS, Condor, or SGE. > > > > Either ask your system administrator, or see if the "man" command or > > similar probes give you a clue: > > > > Condor: condor_q -version > > > > condor_q -version > > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ > > $CondorPlatform: I386-LINUX_RHEL5 $ > > > > PBS: man qstat: > > > > qstat(1B) PBS > > > > SGE: man qstat: > > > > QSTAT(1) Sun Grid Engine User Commands > > > > > > If its PBS or Condor, then the Swift user guide gives the sites.xml > > entries to use. > > > > Tell us what you find, then try following the instructions in the user > > guide, and follow up with questions as needed. > > > > - Mike > > > > > > On 10/20/09 9:41 PM, Hodgess, Erin wrote: > > > Hi Swift Users: > > > > > > I'm on a cluster and would like to use swift on the different sites on > > > the cluster. > > > > > > How would I do that, please? > > > > > > Thanks, > > > Erin > > > > > > > > > Erin M. Hodgess, PhD > > > Associate Professor > > > Department of Computer and Mathematical Sciences > > > University of Houston - Downtown > > > mailto: hodgesse at uhd.edu > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Oct 21 12:36:43 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 21 Oct 2009 12:36:43 -0500 Subject: [Swift-user] using swift on a cluster In-Reply-To: <70A5AC06FDB5E54482D19E1C04CDFCF307C377C0@BALI.uhd.campus> References: <70A5AC06FDB5E54482D19E1C04CDFCF307C377B3@BALI.uhd.campus><4ADE84D9.6020508@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B5@BALI.uhd.campus> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B6@BALI.uhd.campus> <4ADEF844.5020202@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377B7@BALI.uhd.campus> <4ADF192B.8020804@mcs.anl.gov> <70A5AC06FDB5E54482D19E1C04CDFCF307C377C0@BALI.uhd.campus> Message-ID: <4ADF46AB.8030603@mcs.anl.gov> Erin, The first line of your sites.xml file seems to be left there in error: > [hodgess at grid bin]$ cat sites.xml > Can you remove that and try again? Im not sure how that got parsed. - Mike On 10/21/09 12:10 PM, Hodgess, Erin wrote: > Hi again! > > Here are the sites.xml and tc.data files. > > Thanks, > Erin > > > [hodgess at grid bin]$ cat sites.xml > > > > > > > > /home/hodgess/swiftwork > .03 > 10000 > > > > > > /home/hodgess/swiftwork > .19 > 10000 > > > > [hodgess at grid bin]$ cat tc.data > localhost convert /usr/bin/convert INSTALLED > INTEL32::LINUX null > localhost RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh > INSTALLED INTEL32::LINUX null > condor RInvoke /home/hodgess/R-2.9.2/bin/RInvoke.sh INSTALLED > INTEL32::LINUX null > [hodgess at grid bin]$ cat firstR.R > cat: firstR.R: No such file or directory > [hodgess at grid bin]$ cat firstR.swift > type file{} > app (file output) firstone (file scriptFile) { > RInvoke @filename(scriptFile) @filename(output); > } > > > file scriptFile <"a1.in" >; > file output <"a1.out" >; > output=firstone(scriptFile); > [hodgess at grid bin]$ > > > Erin M. Hodgess, PhD > Associate Professor > Department of Computer and Mathematical Sciences > University of Houston - Downtown > mailto: hodgesse at uhd.edu > > > > -----Original Message----- > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > Sent: Wed 10/21/2009 9:22 AM > To: Hodgess, Erin > Cc: swift-user at ci.uchicago.edu > Subject: Re: [Swift-user] using swift on a cluster > > Erin, we need to look into this further. > > Please make sure that you are running either Swift 0.9 or the latest > source from svn. And tell us what revision you are running. > > Also please post your tc.data and sites.xml (and log file is its small > enought); see if there are any messages in the .log file that would > clarify the error. > > Make sure that your app is cataloged in tc.data as being on pool > "condor". But I think if it were not, you'd see a different error. > > It almost looks to me like Swift is looking for the GRAM service contact > string, as if it thinks you are asking for Condor-G instead of local > Condor, eg: > > grid > key="gridResource">gt2 belhaven-1.renci.org/jobmanager-fork > > Just as a test, try changing provider="condor" to "pbs" in sites.xml. If > the error changes to something like "PBS not installed" or "qsub not > found" then I would suspect this is the case. > > Its possible you can add just the jobType element with the value set to > vanilla instead of grid, but I am purely *guessing*; we'll look deeper > as soon as you send the info above and we have time. > > - Mike > > > On 10/21/09 9:03 AM, Hodgess, Erin wrote: > > Here is the output: > > > > > > [hodgess at grid bin]$ swift -tc.file tc.data -sites.file sites.xml > > firstR.swift > > Swift 0.9 swift-r2860 cog-r2388 > > > > RunID: 20091021-0901-aku7y862 > > Progress: > > Execution failed: > > No service contacts available > > [hodgess at grid bin]$ > > > > > > > > Erin M. Hodgess, PhD > > Associate Professor > > Department of Computer and Mathematical Sciences > > University of Houston - Downtown > > mailto: hodgesse at uhd.edu > > > > > > > > -----Original Message----- > > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > > Sent: Wed 10/21/2009 7:02 AM > > To: Hodgess, Erin > > Cc: swift-user at ci.uchicago.edu > > Subject: Re: [Swift-user] using swift on a cluster > > > > For running Swift locally on a Condor cluster, use a sites.xml based on > > this example: > > > > > > > > > > > > > > > > > > /home/erin/swiftwork > > .03 > > 10000 > > > > > > > > > > > > /home/erin/swiftwork > > .19 > > 10000 > > > > > > > > > > The jobThrottle values above will enable Swift to run up to 4 jobs at a > > time on localhost and 20 jobs at a time on the Condor cluster. > > > > Use tc.data to catalog applications on pool or the other. > > > > Set jobThrottle as desired to control execution parallelism. > > > > #jobs run in parallel is (jobThrottle * 100)+1 > > > > initialScore=10000 overrides Swift's "start slow" approach to sensing > > the site's responsiveness. > > > > - Mike > > > > On 10/21/09 3:17 AM, Hodgess, Erin wrote: > > > Aha! > > > > > > I needed the universe=vanilla line. > > > > > > > > > > > > Erin M. Hodgess, PhD > > > Associate Professor > > > Department of Computer and Mathematical Sciences > > > University of Houston - Downtown > > > mailto: hodgesse at uhd.edu > > > > > > > > > > > > -----Original Message----- > > > From: swift-user-bounces at ci.uchicago.edu on behalf of Hodgess, Erin > > > Sent: Wed 10/21/2009 3:07 AM > > > To: Michael Wilde > > > Cc: swift-user at ci.uchicago.edu > > > Subject: RE: [Swift-user] using swift on a cluster > > > > > > Hello! > > > > > > We are indeed using condor. > > > > > > I wanted to try a small test run, but am running into trouble: > > > > > > [hodgess at grid bin]$ cat myjob.submit > > > executable=/usr/bin/id > > > output=results.output > > > error=results.error > > > log=results.log > > > queue > > > [hodgess at grid bin]$ condor_submit myjob.submit > > > Submitting job(s). > > > Logging submit event(s). > > > 1 job(s) submitted to cluster 15. > > > [hodgess at grid bin]$ ls results* > > > results.error results.log results.output > > > You have new mail in /var/spool/mail/hodgess > > > [hodgess at grid bin]$ cat results.log > > > 000 (015.000.000) 10/21 03:06:03 Job submitted from host: > > > <192.168.1.11:46274> > > > ... > > > 001 (015.000.000) 10/21 03:06:05 Job executing on host: > > <10.1.255.244:44508> > > > ... > > > 002 (015.000.000) 10/21 03:06:05 (1) Job not properly linked for > Condor. > > > ... > > > 009 (015.000.000) 10/21 03:06:05 Job was aborted by the user. > > > ... > > > [hodgess at grid bin]$ > > > > > > I'm not sure why the job is not linked. > > > > > > Any suggestions would be much appreciated. > > > > > > Thanks, > > > Erin > > > > > > > > > Erin M. Hodgess, PhD > > > Associate Professor > > > Department of Computer and Mathematical Sciences > > > University of Houston - Downtown > > > mailto: hodgesse at uhd.edu > > > > > > > > > > > > -----Original Message----- > > > From: Michael Wilde [mailto:wilde at mcs.anl.gov] > > > Sent: Tue 10/20/2009 10:49 PM > > > To: Hodgess, Erin > > > Cc: swift-user at ci.uchicago.edu > > > Subject: Re: [Swift-user] using swift on a cluster > > > > > > Hi Erin, > > > > > > I'm assuming you meant "use Swift to run jobs on the compute nodes of > > > the cluster"? > > > > > > If so, you first need to find out what scheduler (also called "batch > > > system" or "local resource manager") the cluster is running. > > > > > > Thats typical one of these: PBS, Condor, or SGE. > > > > > > Either ask your system administrator, or see if the "man" command or > > > similar probes give you a clue: > > > > > > Condor: condor_q -version > > > > > > condor_q -version > > > $CondorVersion: 7.2.4 Jun 16 2009 BuildID: 159529 $ > > > $CondorPlatform: I386-LINUX_RHEL5 $ > > > > > > PBS: man qstat: > > > > > > qstat(1B) PBS > > > > > > SGE: man qstat: > > > > > > QSTAT(1) Sun Grid Engine User Commands > > > > > > > > > If its PBS or Condor, then the Swift user guide gives the sites.xml > > > entries to use. > > > > > > Tell us what you find, then try following the instructions in the user > > > guide, and follow up with questions as needed. > > > > > > - Mike > > > > > > > > > On 10/20/09 9:41 PM, Hodgess, Erin wrote: > > > > Hi Swift Users: > > > > > > > > I'm on a cluster and would like to use swift on the different > sites on > > > > the cluster. > > > > > > > > How would I do that, please? > > > > > > > > Thanks, > > > > Erin > > > > > > > > > > > > Erin M. Hodgess, PhD > > > > Associate Professor > > > > Department of Computer and Mathematical Sciences > > > > University of Houston - Downtown > > > > mailto: hodgesse at uhd.edu > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > From skenny at uchicago.edu Fri Oct 23 10:45:04 2009 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Fri, 23 Oct 2009 10:45:04 -0500 (CDT) Subject: [Swift-user] Re: [Swift-devel] burnin' up ranger w/the latest coasters In-Reply-To: <20091014153915.CDW69329@m4500-02.uchicago.edu> References: <20091013111417.CDU59058@m4500-02.uchicago.edu> <20091014153915.CDW69329@m4500-02.uchicago.edu> Message-ID: <20091023104504.CEI95549@m4500-02.uchicago.edu> however...when i use the configs here and i try to run a workflow with 196,608 jobs it seems that coasters starts to ramp up nicely, but maybe a little too well :) as it begins requesting more cores than i'm allowed in the normal queue on ranger. that is, the limit is 4096. i tried changing maxNodes to 4096 which did not work. i'm wondering if workers per node should actually be 16 instead (?) but i know you've gotten it to work well with the setting at 32 so i'm not sure... anyway, it ramped up nicely (and was only like 8 jobs away from finishing the whole thing) i just need to know how to cap it off so it won't ask for more than 4096 cores. thanks ~sk ---- Original message ---- >Date: Wed, 14 Oct 2009 15:39:15 -0500 (CDT) >From: >Subject: Re: [Swift-devel] burnin' up ranger w/the latest coasters >To: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu > >for those interested, here are the config files used for this run: > >swift.properties: > >sites.file=config/coaster_ranger.xml >tc.file=/ci/projects/cnari/config/tc.data >lazy.errors=false >caching.algorithm=LRU >pgraph=false >pgraph.graph.options=splines="compound", rankdir="TB" >pgraph.node.options=color="seagreen", style="filled" >clustering.enabled=false >clustering.queue.delay=4 >clustering.min.time=60 >kickstart.enabled=maybe >kickstart.always.transfer=false >wrapperlog.always.transfer=false >throttle.submit=3 >throttle.host.submit=8 >throttle.score.job.factor=64 >throttle.transfers=16 >throttle.file.operations=16 >sitedir.keep=false >execution.retries=3 >replication.enabled=false >replication.min.queue.time=60 >replication.limit=3 >foreach.max.threads=16384 > >coaster_ranger.xml: > > > > > > > > > key="jobThrottle">1000.0 > url="gt2://gatekeeper.ranger.tacc.teragrid.org"/> > normal > 32 > 1 > 16 > 8192 > 72000 > key="project">TG-DBS080004N > url="gatekeeper.ranger.tacc.teragrid.org" >jobManager="gt2:gt2:SGE"/> > > >/work/00926/tg459516/sidgrid_out/{username} > > > > > >---- Original message ---- >>Date: Tue, 13 Oct 2009 11:14:17 -0500 (CDT) >>From: >>Subject: [Swift-devel] burnin' up ranger w/the latest coasters >>To: swift-devel at ci.uchicago.edu >> >>Final status: Finished successfully:131072 >> >>re-running some of the workflows from our recent SEM >>paper with the latest swift...sadly, queue time on ranger has >>only gone up since those initial runs...but luckily coasters >>has speeded things up, so it ends up evening out for time to >>solution :) >> >>not sure i fully understand the plot: >> >>http://www.ci.uchicago.edu/~skenny/workflows/sem_131k/ >> >>log is here: >> >>/ci/projects/cnari/logs/skenny/4reg_2cond-20091012-1607-ugidm2s2.log >>_______________________________________________ >>Swift-devel mailing list >>Swift-devel at ci.uchicago.edu >>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Fri Oct 23 11:37:10 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 Oct 2009 11:37:10 -0500 Subject: [Swift-user] Re: [Swift-devel] burnin' up ranger w/the latest coasters In-Reply-To: <20091023104504.CEI95549@m4500-02.uchicago.edu> References: <20091013111417.CDU59058@m4500-02.uchicago.edu> <20091014153915.CDW69329@m4500-02.uchicago.edu> <20091023104504.CEI95549@m4500-02.uchicago.edu> Message-ID: <1256315830.10810.5.camel@localhost> On Fri, 2009-10-23 at 10:45 -0500, skenny at uchicago.edu wrote: > however...when i use the configs here and i try to run a > workflow with 196,608 jobs it seems that coasters starts to > ramp up nicely, but maybe a little too well :) as it begins > requesting more cores than i'm allowed in the normal queue on > ranger. that is, the limit is 4096. i tried changing maxNodes > to 4096 which did not work. Shouldn't that be 4096/workersPerNode? > i'm wondering if workers per node > should actually be 16 instead (?) but i know you've gotten it > to work well with the setting at 32 so i'm not sure... You could set it to 16. My reasoning for doubling it was that if the processes you run are slightly I/O bound, then you'd get slightly better performance by running two processes per core. > > anyway, it ramped up nicely (and was only like 8 jobs away > from finishing the whole thing) i just need to know how to cap > it off so it won't ask for more than 4096 cores. > > thanks > ~sk > > ---- Original message ---- > >Date: Wed, 14 Oct 2009 15:39:15 -0500 (CDT) > >From: > >Subject: Re: [Swift-devel] burnin' up ranger w/the latest > coasters > >To: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu > > > >for those interested, here are the config files used for this > run: > > > >swift.properties: > > > >sites.file=config/coaster_ranger.xml > >tc.file=/ci/projects/cnari/config/tc.data > >lazy.errors=false > >caching.algorithm=LRU > >pgraph=false > >pgraph.graph.options=splines="compound", rankdir="TB" > >pgraph.node.options=color="seagreen", style="filled" > >clustering.enabled=false > >clustering.queue.delay=4 > >clustering.min.time=60 > >kickstart.enabled=maybe > >kickstart.always.transfer=false > >wrapperlog.always.transfer=false > >throttle.submit=3 > >throttle.host.submit=8 > >throttle.score.job.factor=64 > >throttle.transfers=16 > >throttle.file.operations=16 > >sitedir.keep=false > >execution.retries=3 > >replication.enabled=false > >replication.min.queue.time=60 > >replication.limit=3 > >foreach.max.threads=16384 > > > >coaster_ranger.xml: > > > > > > > > > > > > > > > > > > >key="jobThrottle">1000.0 > > >url="gt2://gatekeeper.ranger.tacc.teragrid.org"/> > > normal > > 32 > > 1 > > 16 > > 8192 > > 72000 > > >key="project">TG-DBS080004N > > >url="gatekeeper.ranger.tacc.teragrid.org" > >jobManager="gt2:gt2:SGE"/> > > > > > >/work/00926/tg459516/sidgrid_out/{username} > > > > > > > > > > > >---- Original message ---- > >>Date: Tue, 13 Oct 2009 11:14:17 -0500 (CDT) > >>From: > >>Subject: [Swift-devel] burnin' up ranger w/the latest coasters > >>To: swift-devel at ci.uchicago.edu > >> > >>Final status: Finished successfully:131072 > >> > >>re-running some of the workflows from our recent SEM > >>paper with the latest swift...sadly, queue time on ranger has > >>only gone up since those initial runs...but luckily coasters > >>has speeded things up, so it ends up evening out for time to > >>solution :) > >> > >>not sure i fully understand the plot: > >> > >>http://www.ci.uchicago.edu/~skenny/workflows/sem_131k/ > >> > >>log is here: > >> > >>/ci/projects/cnari/logs/skenny/4reg_2cond-20091012-1607-ugidm2s2.log > >>_______________________________________________ > >>Swift-devel mailing list > >>Swift-devel at ci.uchicago.edu > >>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >_______________________________________________ > >Swift-devel mailing list > >Swift-devel at ci.uchicago.edu > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From skenny at uchicago.edu Fri Oct 23 13:02:00 2009 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Fri, 23 Oct 2009 13:02:00 -0500 (CDT) Subject: [Swift-user] Re: [Swift-devel] burnin' up ranger w/the latest coasters In-Reply-To: <1256315830.10810.5.camel@localhost> References: <20091013111417.CDU59058@m4500-02.uchicago.edu> <20091014153915.CDW69329@m4500-02.uchicago.edu> <20091023104504.CEI95549@m4500-02.uchicago.edu> <1256315830.10810.5.camel@localhost> Message-ID: <20091023130200.CEJ16713@m4500-02.uchicago.edu> >> however...when i use the configs here and i try to run a >> workflow with 196,608 jobs it seems that coasters starts to >> ramp up nicely, but maybe a little too well :) as it begins >> requesting more cores than i'm allowed in the normal queue on >> ranger. that is, the limit is 4096. i tried changing maxNodes >> to 4096 which did not work. > >Shouldn't that be 4096/workersPerNode? don't think i'm understanding you here...the workersPerNode you originally suggested was 32. why would i increase that to 4096 when what i'm trying to do is request fewer total cores? > >> i'm wondering if workers per node >> should actually be 16 instead (?) but i know you've gotten it >> to work well with the setting at 32 so i'm not sure... > >You could set it to 16. My reasoning for doubling it was that if the >processes you run are slightly I/O bound, then you'd get slightly better >performance by running two processes per core. > >> >> anyway, it ramped up nicely (and was only like 8 jobs away >> from finishing the whole thing) i just need to know how to cap >> it off so it won't ask for more than 4096 cores. >> >> thanks >> ~sk >> >> ---- Original message ---- >> >Date: Wed, 14 Oct 2009 15:39:15 -0500 (CDT) >> >From: >> >Subject: Re: [Swift-devel] burnin' up ranger w/the latest >> coasters >> >To: swift-user at ci.uchicago.edu, swift-devel at ci.uchicago.edu >> > >> >for those interested, here are the config files used for this >> run: >> > >> >swift.properties: >> > >> >sites.file=config/coaster_ranger.xml >> >tc.file=/ci/projects/cnari/config/tc.data >> >lazy.errors=false >> >caching.algorithm=LRU >> >pgraph=false >> >pgraph.graph.options=splines="compound", rankdir="TB" >> >pgraph.node.options=color="seagreen", style="filled" >> >clustering.enabled=false >> >clustering.queue.delay=4 >> >clustering.min.time=60 >> >kickstart.enabled=maybe >> >kickstart.always.transfer=false >> >wrapperlog.always.transfer=false >> >throttle.submit=3 >> >throttle.host.submit=8 >> >throttle.score.job.factor=64 >> >throttle.transfers=16 >> >throttle.file.operations=16 >> >sitedir.keep=false >> >execution.retries=3 >> >replication.enabled=false >> >replication.min.queue.time=60 >> >replication.limit=3 >> >foreach.max.threads=16384 >> > >> >coaster_ranger.xml: >> > >> > >> > >> > >> > >> > >> > >> > >> > > >key="jobThrottle">1000.0 >> > > >url="gt2://gatekeeper.ranger.tacc.teragrid.org"/> >> > normal >> > 32 >> > 1 >> > 16 >> > 8192 >> > 72000 >> > > >key="project">TG-DBS080004N >> > > >url="gatekeeper.ranger.tacc.teragrid.org" >> >jobManager="gt2:gt2:SGE"/> >> > >> > >> >/work/00926/tg459516/sidgrid_out/{username} >> > >> > >> > >> > >> > >> >---- Original message ---- >> >>Date: Tue, 13 Oct 2009 11:14:17 -0500 (CDT) >> >>From: >> >>Subject: [Swift-devel] burnin' up ranger w/the latest coasters >> >>To: swift-devel at ci.uchicago.edu >> >> >> >>Final status: Finished successfully:131072 >> >> >> >>re-running some of the workflows from our recent SEM >> >>paper with the latest swift...sadly, queue time on ranger has >> >>only gone up since those initial runs...but luckily coasters >> >>has speeded things up, so it ends up evening out for time to >> >>solution :) >> >> >> >>not sure i fully understand the plot: >> >> >> >>http://www.ci.uchicago.edu/~skenny/workflows/sem_131k/ >> >> >> >>log is here: >> >> >> >>/ci/projects/cnari/logs/skenny/4reg_2cond-20091012-1607-ugidm2s2.log >> >>_______________________________________________ >> >>Swift-devel mailing list >> >>Swift-devel at ci.uchicago.edu >> >>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >_______________________________________________ >> >Swift-devel mailing list >> >Swift-devel at ci.uchicago.edu >> >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Oct 23 13:18:24 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 Oct 2009 13:18:24 -0500 Subject: [Swift-user] Re: [Swift-devel] burnin' up ranger w/the latest coasters In-Reply-To: <20091023130200.CEJ16713@m4500-02.uchicago.edu> References: <20091013111417.CDU59058@m4500-02.uchicago.edu> <20091014153915.CDW69329@m4500-02.uchicago.edu> <20091023104504.CEI95549@m4500-02.uchicago.edu> <1256315830.10810.5.camel@localhost> <20091023130200.CEJ16713@m4500-02.uchicago.edu> Message-ID: <1256321904.15412.3.camel@localhost> On Fri, 2009-10-23 at 13:02 -0500, skenny at uchicago.edu wrote: > >> however...when i use the configs here and i try to run a > >> workflow with 196,608 jobs it seems that coasters starts to > >> ramp up nicely, but maybe a little too well :) as it begins > >> requesting more cores than i'm allowed in the normal queue on > >> ranger. that is, the limit is 4096. i tried changing maxNodes > >> to 4096 which did not work. > > > >Shouldn't that be 4096/workersPerNode? > > don't think i'm understanding you here...the workersPerNode > you originally suggested was 32. why would i increase that to > 4096 when what i'm trying to do is request fewer total cores? No. If you have C CORESPerNODE and the maximum number of CORES you can request is 4096, then the maximum NODES you can request is 4096/C not 4096. You are setting maxNodes to 4096. That means it will request 4096*16 cores. From wtan at mcs.anl.gov Mon Oct 26 16:56:03 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Mon, 26 Oct 2009 16:56:03 -0500 Subject: [Swift-user] Chesnoknov workflow Message-ID: <4AE61AF3.3040208@mcs.anl.gov> Hi Mike and others, A recapture of the story: We have a chesnokov workflow which contains a forEach and the input file array has 1171 files. Running it at a desktop and in a 32-core server have different performance indexes, in terms of execution time See http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en --------------------the data Mike wants to see--------------------------------------------- In the same 32-core machine (crush), using a local file system instead of the share file system, will reduce the execution time, from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. --------------------end of the the data Mike wants to see--------------------------------------------- Best regards, Wei -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan From foster at anl.gov Mon Oct 26 17:09:00 2009 From: foster at anl.gov (Ian Foster) Date: Mon, 26 Oct 2009 17:09:00 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE61AF3.3040208@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> Message-ID: <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> that's great. Do you have the Swift log plots? On Oct 26, 2009, at 4:56 PM, Wei Tan wrote: > Hi Mike and others, > > A recapture of the story: > We have a chesnokov workflow which contains a forEach and the > input file array has 1171 files. > Running it at a desktop and in a 32-core server have different > performance indexes, in terms of execution time > See http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en > > --------------------the data Mike wants to > see--------------------------------------------- > In the same 32-core machine (crush), using a local file system > instead of the share file system, will reduce the execution time, > from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. > --------------------end of the the data Mike wants to > see--------------------------------------------- > > Best regards, > > Wei > > -- > Wei Tan, Ph.D. > Computation Institute > the University of Chicago|Argonne National Laboratory > http://www.mcs.anl.gov/~wtan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wtan at mcs.anl.gov Mon Oct 26 17:18:46 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Mon, 26 Oct 2009 17:18:46 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> Message-ID: <4AE62046.1060003@mcs.anl.gov> Hi Ian, I am not sure which log plots you are talking about. But my working directory is crush.mcs.anl.gov/tmp/wtan, I guess you can find all the logs you want to see there? To be more specific: /workingdir: the directory from which I issue the command line swift ...:there are some error logs since there are 23/1141 failure jobs. /ecg3... directory generated when running swift workflow /swift-workflows the directory containing the workflow /app: the directory containing the executable and the input files /swift-0.9 the swift installation directory Best regards, Wei Ian Foster wrote: > that's great. Do you have the Swift log plots? > > On Oct 26, 2009, at 4:56 PM, Wei Tan wrote: > >> Hi Mike and others, >> >> A recapture of the story: >> We have a chesnokov workflow which contains a forEach and the input >> file array has 1171 files. >> Running it at a desktop and in a 32-core server have different >> performance indexes, in terms of execution time >> See >> http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en >> >> >> --------------------the data Mike wants to >> see--------------------------------------------- >> In the same 32-core machine (crush), using a local file system >> instead of the share file system, will reduce the execution time, >> from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. >> --------------------end of the the data Mike wants to >> see--------------------------------------------- >> >> Best regards, >> >> Wei >> >> -- >> Wei Tan, Ph.D. >> Computation Institute >> the University of Chicago|Argonne National Laboratory >> http://www.mcs.anl.gov/~wtan >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan From foster at anl.gov Mon Oct 26 17:24:08 2009 From: foster at anl.gov (Ian Foster) Date: Mon, 26 Oct 2009 17:24:08 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE62046.1060003@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> Message-ID: <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> Hi Wei: I can't recall the details, but there are nice tools for generating a Web page with plots. Ian/ On Oct 26, 2009, at 5:18 PM, Wei Tan wrote: > Hi Ian, > > I am not sure which log plots you are talking about. But my > working directory is > crush.mcs.anl.gov/tmp/wtan, I guess you can find all the logs you > want to see there? > > To be more specific: > /workingdir: the directory from which I issue the command line > swift ...:there are some error logs since there are 23/1141 failure > jobs. > /ecg3... directory generated when running swift workflow > /swift-workflows the directory containing the workflow > /app: the directory containing the executable and the input files > /swift-0.9 the swift installation directory > > > Best regards, > > Wei > > > Ian Foster wrote: >> that's great. Do you have the Swift log plots? >> >> On Oct 26, 2009, at 4:56 PM, Wei Tan wrote: >> >>> Hi Mike and others, >>> >>> A recapture of the story: >>> We have a chesnokov workflow which contains a forEach and the >>> input file array has 1171 files. >>> Running it at a desktop and in a 32-core server have different >>> performance indexes, in terms of execution time >>> See http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en >>> >>> --------------------the data Mike wants to >>> see--------------------------------------------- >>> In the same 32-core machine (crush), using a local file system >>> instead of the share file system, will reduce the execution time, >>> from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. >>> --------------------end of the the data Mike wants to >>> see--------------------------------------------- >>> >>> Best regards, >>> >>> Wei >>> >>> -- >>> Wei Tan, Ph.D. >>> Computation Institute >>> the University of Chicago|Argonne National Laboratory >>> http://www.mcs.anl.gov/~wtan >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > > -- > Wei Tan, Ph.D. > Computation Institute > the University of Chicago|Argonne National Laboratory > http://www.mcs.anl.gov/~wtan > From wilde at mcs.anl.gov Mon Oct 26 17:28:03 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 26 Oct 2009 17:28:03 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> Message-ID: <4AE62273.2020500@mcs.anl.gov> Wei, its the swift-plot-log command, in the Swift user guide at: http://www.ci.uchicago.edu/swift/guides/userguide.php#id2711073 - Mike On 10/26/09 5:24 PM, Ian Foster wrote: > Hi Wei: > > I can't recall the details, but there are nice tools for generating a > Web page with plots. > > Ian/ > > > On Oct 26, 2009, at 5:18 PM, Wei Tan wrote: > >> Hi Ian, >> >> I am not sure which log plots you are talking about. But my >> working directory is >> crush.mcs.anl.gov/tmp/wtan, I guess you can find all the logs you >> want to see there? >> >> To be more specific: >> /workingdir: the directory from which I issue the command line >> swift ...:there are some error logs since there are 23/1141 failure >> jobs. >> /ecg3... directory generated when running swift workflow >> /swift-workflows the directory containing the workflow >> /app: the directory containing the executable and the input files >> /swift-0.9 the swift installation directory >> >> >> Best regards, >> >> Wei >> >> >> Ian Foster wrote: >>> that's great. Do you have the Swift log plots? >>> >>> On Oct 26, 2009, at 4:56 PM, Wei Tan wrote: >>> >>>> Hi Mike and others, >>>> >>>> A recapture of the story: >>>> We have a chesnokov workflow which contains a forEach and the >>>> input file array has 1171 files. >>>> Running it at a desktop and in a 32-core server have different >>>> performance indexes, in terms of execution time >>>> See http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en >>>> >>>> --------------------the data Mike wants to >>>> see--------------------------------------------- >>>> In the same 32-core machine (crush), using a local file system >>>> instead of the share file system, will reduce the execution time, >>>> from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. >>>> --------------------end of the the data Mike wants to >>>> see--------------------------------------------- >>>> >>>> Best regards, >>>> >>>> Wei >>>> >>>> -- >>>> Wei Tan, Ph.D. >>>> Computation Institute >>>> the University of Chicago|Argonne National Laboratory >>>> http://www.mcs.anl.gov/~wtan >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> -- >> Wei Tan, Ph.D. >> Computation Institute >> the University of Chicago|Argonne National Laboratory >> http://www.mcs.anl.gov/~wtan >> > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user From wilde at mcs.anl.gov Mon Oct 26 23:29:09 2009 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 26 Oct 2009 23:29:09 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE62273.2020500@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> Message-ID: <4AE67715.1090909@mcs.anl.gov> More documentation on the log processing tools is at: http://www.ci.uchicago.edu/swift/guides/log-processing.php - Mike On 10/26/09 5:28 PM, Michael Wilde wrote: > Wei, its the swift-plot-log command, in the Swift user guide at: > > http://www.ci.uchicago.edu/swift/guides/userguide.php#id2711073 > > - Mike > > On 10/26/09 5:24 PM, Ian Foster wrote: >> Hi Wei: >> >> I can't recall the details, but there are nice tools for generating a >> Web page with plots. >> >> Ian/ >> >> >> On Oct 26, 2009, at 5:18 PM, Wei Tan wrote: >> >>> Hi Ian, >>> >>> I am not sure which log plots you are talking about. But my >>> working directory is >>> crush.mcs.anl.gov/tmp/wtan, I guess you can find all the logs you >>> want to see there? >>> >>> To be more specific: >>> /workingdir: the directory from which I issue the command line >>> swift ...:there are some error logs since there are 23/1141 failure >>> jobs. >>> /ecg3... directory generated when running swift workflow >>> /swift-workflows the directory containing the workflow >>> /app: the directory containing the executable and the input files >>> /swift-0.9 the swift installation directory >>> >>> >>> Best regards, >>> >>> Wei >>> >>> >>> Ian Foster wrote: >>>> that's great. Do you have the Swift log plots? >>>> >>>> On Oct 26, 2009, at 4:56 PM, Wei Tan wrote: >>>> >>>>> Hi Mike and others, >>>>> >>>>> A recapture of the story: >>>>> We have a chesnokov workflow which contains a forEach and the >>>>> input file array has 1171 files. >>>>> Running it at a desktop and in a 32-core server have different >>>>> performance indexes, in terms of execution time >>>>> See http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en >>>>> >>>>> --------------------the data Mike wants to >>>>> see--------------------------------------------- >>>>> In the same 32-core machine (crush), using a local file system >>>>> instead of the share file system, will reduce the execution time, >>>>> from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. >>>>> --------------------end of the the data Mike wants to >>>>> see--------------------------------------------- >>>>> >>>>> Best regards, >>>>> >>>>> Wei >>>>> >>>>> -- >>>>> Wei Tan, Ph.D. >>>>> Computation Institute >>>>> the University of Chicago|Argonne National Laboratory >>>>> http://www.mcs.anl.gov/~wtan >>>>> >>>>> _______________________________________________ >>>>> Swift-user mailing list >>>>> Swift-user at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> -- >>> Wei Tan, Ph.D. >>> Computation Institute >>> the University of Chicago|Argonne National Laboratory >>> http://www.mcs.anl.gov/~wtan >>> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From wtan at mcs.anl.gov Tue Oct 27 11:49:54 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Tue, 27 Oct 2009 11:49:54 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE67715.1090909@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> Message-ID: <4AE724B2.2050507@mcs.anl.gov> I got this result when running swift-plot-log Log file path is /tmp/wtan/workingdir/ecg3-20091026-1332-fkizh09c.log Log is in directory /tmp/wtan/workingdir Log basename is ecg3-20091026-1332-fkizh09c Now in directory /tmp/swift-plot-log-btGjiRNxbiel5334 make: ../swift-0.9/bin/../libexec/log-processing//makefile: No such file or directory make: *** No rule to make target `../swift-0.9/bin/../libexec/log-processing//makefile'. Stop. From the webpage, it seems that I need to install gnuplot 4.0, gnu m4, gnu textutils, perl first? I am working on it and will post the result here. Thanks, Wei Michael Wilde wrote: > More documentation on the log processing tools is at: > > http://www.ci.uchicago.edu/swift/guides/log-processing.php > > - Mike > > On 10/26/09 5:28 PM, Michael Wilde wrote: >> Wei, its the swift-plot-log command, in the Swift user guide at: >> >> http://www.ci.uchicago.edu/swift/guides/userguide.php#id2711073 >> >> - Mike >> >> On 10/26/09 5:24 PM, Ian Foster wrote: >>> Hi Wei: >>> >>> I can't recall the details, but there are nice tools for generating >>> a Web page with plots. >>> >>> Ian/ >>> >>> >>> On Oct 26, 2009, at 5:18 PM, Wei Tan wrote: >>> >>>> Hi Ian, >>>> >>>> I am not sure which log plots you are talking about. But my >>>> working directory is >>>> crush.mcs.anl.gov/tmp/wtan, I guess you can find all the logs you >>>> want to see there? >>>> >>>> To be more specific: >>>> /workingdir: the directory from which I issue the command line >>>> swift ...:there are some error logs since there are 23/1141 >>>> failure jobs. >>>> /ecg3... directory generated when running swift workflow >>>> /swift-workflows the directory containing the workflow >>>> /app: the directory containing the executable and the input files >>>> /swift-0.9 the swift installation directory >>>> >>>> >>>> Best regards, >>>> >>>> Wei >>>> >>>> >>>> Ian Foster wrote: >>>>> that's great. Do you have the Swift log plots? >>>>> >>>>> On Oct 26, 2009, at 4:56 PM, Wei Tan wrote: >>>>> >>>>>> Hi Mike and others, >>>>>> >>>>>> A recapture of the story: >>>>>> We have a chesnokov workflow which contains a forEach and the >>>>>> input file array has 1171 files. >>>>>> Running it at a desktop and in a 32-core server have different >>>>>> performance indexes, in terms of execution time >>>>>> See >>>>>> http://spreadsheets.google.com/ccc?key=0AriiWNEG__VUdEM0ampSaWRqcGROTW1TNE00X29GVHc&hl=en >>>>>> >>>>>> >>>>>> --------------------the data Mike wants to >>>>>> see--------------------------------------------- >>>>>> In the same 32-core machine (crush), using a local file system >>>>>> instead of the share file system, will reduce the execution >>>>>> time, from *2min20sec~2min50sec*, to *1min40sec~1min50sec*. >>>>>> --------------------end of the the data Mike wants to >>>>>> see--------------------------------------------- >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Wei >>>>>> >>>>>> -- >>>>>> Wei Tan, Ph.D. >>>>>> Computation Institute >>>>>> the University of Chicago|Argonne National Laboratory >>>>>> http://www.mcs.anl.gov/~wtan >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-user mailing list >>>>>> Swift-user at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>>> -- >>>> Wei Tan, Ph.D. >>>> Computation Institute >>>> the University of Chicago|Argonne National Laboratory >>>> http://www.mcs.anl.gov/~wtan >>>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan From hategan at mcs.anl.gov Tue Oct 27 11:58:27 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Oct 2009 11:58:27 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE724B2.2050507@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> Message-ID: <1256662707.20353.8.camel@localhost> On Tue, 2009-10-27 at 11:49 -0500, Wei Tan wrote: > I got this result when running swift-plot-log > > Log file path is /tmp/wtan/workingdir/ecg3-20091026-1332-fkizh09c.log > Log is in directory /tmp/wtan/workingdir > Log basename is ecg3-20091026-1332-fkizh09c > Now in directory /tmp/swift-plot-log-btGjiRNxbiel5334 > make: ../swift-0.9/bin/../libexec/log-processing//makefile: No such file > or directory > make: *** No rule to make target > `../swift-0.9/bin/../libexec/log-processing//makefile'. Stop. > > From the webpage, it seems that I need to install gnuplot 4.0, gnu m4, > gnu textutils, perl first? > I am working on it and will post the result here. May I suggest doing this on a CI machine instead of cygwin? From wtan at mcs.anl.gov Tue Oct 27 11:59:47 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Tue, 27 Oct 2009 11:59:47 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256662707.20353.8.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> Message-ID: <4AE72703.2020100@mcs.anl.gov> > > May I suggest doing this on a CI machine instead of cygwin? > > Sure, but I am using a MCS machine, not cygwin. :-) Thanks, Wei -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan From hategan at mcs.anl.gov Tue Oct 27 12:03:39 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Oct 2009 12:03:39 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE72703.2020100@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> Message-ID: <1256663019.20883.0.camel@localhost> What exact command did you type to launch swift-plot-log? On Tue, 2009-10-27 at 11:59 -0500, Wei Tan wrote: > > > > May I suggest doing this on a CI machine instead of cygwin? > > > > > Sure, but I am using a MCS machine, not cygwin. :-) > Thanks, > > Wei > From hategan at mcs.anl.gov Tue Oct 27 12:16:20 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Oct 2009 12:16:20 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256663019.20883.0.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> Message-ID: <1256663780.22694.5.camel@localhost> On Tue, 2009-10-27 at 12:03 -0500, Mihael Hategan wrote: > What exact command did you type to launch swift-plot-log? > ../swift-0.9/bin/swift-plot-log ecg3-20091026-1332-fkizh09c.log For reference: The solution is: $PWD/../swift-0.9/bin/swift-plot-log ecg3-20091026-1332-fkizh09c.log The problem is that when swift-plot-log is invoked with a relative directory in the path name, it fails to find its tools after doing a cd. From wtan at mcs.anl.gov Tue Oct 27 15:50:07 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Tue, 27 Oct 2009 15:50:07 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256663780.22694.5.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> Message-ID: <4AE75CFF.10201@mcs.anl.gov> Hi Mihael, Thanks. The command generated the msg attached and no result file can be found. Best regards, Wei Mihael Hategan wrote: > On Tue, 2009-10-27 at 12:03 -0500, Mihael Hategan wrote: > >> What exact command did you type to launch swift-plot-log? >> > > >> ../swift-0.9/bin/swift-plot-log ecg3-20091026-1332-fkizh09c.log >> > > For reference: > > The solution is: > $PWD/../swift-0.9/bin/swift-plot-log ecg3-20091026-1332-fkizh09c.log > > The problem is that when swift-plot-log is invoked with a relative > directory in the path name, it fails to find its tools after doing a cd. > > > > -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: error.txt URL: From hategan at mcs.anl.gov Tue Oct 27 15:56:43 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Oct 2009 15:56:43 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE75CFF.10201@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> Message-ID: <1256677003.27643.1.camel@localhost> On Tue, 2009-10-27 at 15:50 -0500, Wei Tan wrote: > Hi Mihael, > > Thanks. The command generated the msg attached and no result file > can be found. There should be a result directory called /tmp/wtan/workingdir/report-ecg3-20091026-1332-fkizh09c From wtan at mcs.anl.gov Tue Oct 27 16:05:47 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Tue, 27 Oct 2009 16:05:47 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256677003.27643.1.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> Message-ID: <4AE760AB.9040809@mcs.anl.gov> Yes but that is the directory when I execute the workflow? I issued this command: /tmp/wtan/swift-0.9/bin/swift-plot-log /tmp/wtan/workingdir/ecg3-20091026-1332-fkizh09c.log Is that the correct log file? Thanks, Wei Mihael Hategan wrote: > On Tue, 2009-10-27 at 15:50 -0500, Wei Tan wrote: > >> Hi Mihael, >> >> Thanks. The command generated the msg attached and no result file >> can be found. >> > > There should be a result directory > called /tmp/wtan/workingdir/report-ecg3-20091026-1332-fkizh09c > > -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan From hategan at mcs.anl.gov Tue Oct 27 16:09:41 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Oct 2009 16:09:41 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE760AB.9040809@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> <4AE760AB.9040809@mcs.anl.gov> Message-ID: <1256677781.27934.1.camel@localhost> On Tue, 2009-10-27 at 16:05 -0500, Wei Tan wrote: > Yes but that is the directory when I execute the workflow? No. Note the fact that what I'm pointing at begins with "report-". > I issued this command: > /tmp/wtan/swift-0.9/bin/swift-plot-log > /tmp/wtan/workingdir/ecg3-20091026-1332-fkizh09c.log I'm not sure, but I think the report directory is in the same place as the log. It may also be in the current directory, whatever that is. > > There should be a result directory > > called /tmp/wtan/workingdir/report-ecg3-20091026-1332-fkizh09c > > > > > From wtan at mcs.anl.gov Tue Oct 27 16:23:06 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Tue, 27 Oct 2009 16:23:06 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256677781.27934.1.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> <4AE760AB.9040809@mcs.anl.gov> <1256677781.27934.1.camel@localhost> Message-ID: <4AE764BA.70101@mcs.anl.gov> There is no report-* directory generated. I think the stdout contains some error msg so I posted it in my previous email. Thanks, Wei Mihael Hategan wrote: > On Tue, 2009-10-27 at 16:05 -0500, Wei Tan wrote: > >> Yes but that is the directory when I execute the workflow? >> > > No. Note the fact that what I'm pointing at begins with "report-". > > >> I issued this command: >> /tmp/wtan/swift-0.9/bin/swift-plot-log >> /tmp/wtan/workingdir/ecg3-20091026-1332-fkizh09c.log >> > > I'm not sure, but I think the report directory is in the same place as > the log. > > It may also be in the current directory, whatever that is. > > >>> There should be a result directory >>> called /tmp/wtan/workingdir/report-ecg3-20091026-1332-fkizh09c >>> >>> >>> > > -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan From hategan at mcs.anl.gov Tue Oct 27 16:28:28 2009 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Oct 2009 16:28:28 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE764BA.70101@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> <4AE760AB.9040809@mcs.anl.gov> <1256677781.27934.1.camel@localhost> <4AE764BA.70101@mcs.anl.gov> Message-ID: <1256678908.28392.1.camel@localhost> On Tue, 2009-10-27 at 16:23 -0500, Wei Tan wrote: > There is no report-* directory generated. I think the stdout contains > some error msg so I posted it in my previous email. I don't see anything. Maybe the error was on stderr or a sub-process? From wtan at mcs.anl.gov Tue Oct 27 16:47:25 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Tue, 27 Oct 2009 16:47:25 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256678908.28392.1.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> <4AE760AB.9040809@mcs.anl.gov> <1256677781.27934.1.camel@localhost> <4AE764BA.70101@mcs.anl.gov> <1256678908.28392.1.camel@localhost> Message-ID: <4AE76A6D.1000808@mcs.anl.gov> I attached the stdout and stderr. Mihael Hategan wrote: > On Tue, 2009-10-27 at 16:23 -0500, Wei Tan wrote: > >> There is no report-* directory generated. I think the stdout contains >> some error msg so I posted it in my previous email. >> > > I don't see anything. Maybe the error was on stderr or a sub-process? > > > -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: error2.txt URL: From benc at hawaga.org.uk Wed Oct 28 22:47:53 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 Oct 2009 03:47:53 +0000 (GMT) Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <1256663780.22694.5.camel@localhost> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> Message-ID: > For reference: > > The solution is: > $PWD/../swift-0.9/bin/swift-plot-log ecg3-20091026-1332-fkizh09c.log or put swift on your path? -- From benc at hawaga.org.uk Wed Oct 28 22:53:37 2009 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 29 Oct 2009 03:53:37 +0000 (GMT) Subject: [Swift-user] Chesnoknov workflow In-Reply-To: <4AE76A6D.1000808@mcs.anl.gov> References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> <4AE760AB.9040809@mcs.anl.gov> <1256677781.27934.1.camel@localhost> <4AE764BA.70101@mcs.anl.gov> <1256678908.28392.1.camel@localhost> <4AE76A6D.1000808@mcs.anl.gov> Message-ID: The error is with missing kickstart directory - I thought r2794 fixed that, but perhaps not... From wtan at mcs.anl.gov Wed Oct 28 23:01:29 2009 From: wtan at mcs.anl.gov (Wei Tan) Date: Wed, 28 Oct 2009 23:01:29 -0500 Subject: [Swift-user] Chesnoknov workflow In-Reply-To: References: <4AE61AF3.3040208@mcs.anl.gov> <00D43302-AE07-449E-AE6B-BAB5F41316FB@anl.gov> <4AE62046.1060003@mcs.anl.gov> <2F0ABDE0-FCD3-44C8-B33B-1302DBB485C2@anl.gov> <4AE62273.2020500@mcs.anl.gov> <4AE67715.1090909@mcs.anl.gov> <4AE724B2.2050507@mcs.anl.gov> <1256662707.20353.8.camel@localhost> <4AE72703.2020100@mcs.anl.gov> <1256663019.20883.0.camel@localhost> <1256663780.22694.5.camel@localhost> <4AE75CFF.10201@mcs.anl.gov> <1256677003.27643.1.camel@localhost> <4AE760AB.9040809@mcs.anl.gov> <1256677781.27934.1.camel@localhost> <4AE764BA.70101@mcs.anl.gov> <1256678908.28392.1.camel@localhost> <4AE76A6D.1000808@mcs.anl.gov> Message-ID: <4AE91399.3080604@mcs.anl.gov> Hi Ben, I used the absolute directory and it works fine now. Thanks for your reply anyway :-) Best regards, Wei Ben Clifford wrote: > The error is with missing kickstart directory - I thought r2794 fixed > that, but perhaps not... > > > > -- Wei Tan, Ph.D. Computation Institute the University of Chicago|Argonne National Laboratory http://www.mcs.anl.gov/~wtan