From svemalayan at yahoo.com Tue Jan 10 20:50:21 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Tue, 10 Jan 2012 18:50:21 -0800 (PST) Subject: [Swift-user] Running swift application on BG/P Message-ID: <1326250221.99895.YahooMailNeo@web39503.mail.mud.yahoo.com> Dear All, I am trying to run a simple Swift program (which prints "Hello World") on BG/P (Surveyor). I modified site-catalog, transformation-catalog and config files and launched the swift application from head-node using the command below. swift -config cf? -tc.file tc -sites.file sites.xml first.swift I can see that the Swift allocates some nodes and trying to run the application but the result was not generated. Also I couldn't see any error in the log-files / stdout. I am suspecting it might be because of the workers cannot connect to the coaster-service. Or may be a problem with catalog files? Please find my site.xml, tc.data, cf and application files below. Please let me know if I am making any mistakes. Thank you Emalayan first.swift type messagefile; app (messagefile t) greeting() { ??? echo "Hello, world!" stdout=@filename(t); } messagefile outfile <"hello.txt">; outfile = greeting(); site.xml ? ??? ??? ??? ??? ??? MTCScienceApps ??? default ??? zeptoos ??? true ??? 21 ??? 10000 ??? 1 ??? DEBUG ??? 1 ??? 900 ??? 64 ??? 64 ??? /home/emalayan/app/forEmalayan_ccGrdid/swift.workdir ? tc: surveyor??????? echo??????????? /bin/echo?????? INSTALLED?????? INTEL32::LINUX surveyor??????? cat???????????? /bin/cat??????? INSTALLED?????? INTEL32::LINUX surveyor??????? ls????????????? /bin/ls???????? INSTALLED?????? INTEL32::LINUX surveyor??????? grep??????????? /bin/grep?????? INSTALLED?????? INTEL32::LINUX surveyor??????? sort??????????? /bin/sort?????? INSTALLED?????? INTEL32::LINUX surveyor??????? paste?????????? /bin/paste????? INSTALLED?????? INTEL32::LINUX surveyor??????? wc????????????? /usr/bin/wc???? INSTALLED?????? INTEL32::LINUX surveyor??????? perl??????????? /usr/bin/perl?? INSTALLED?????? INTEL32::LINUX #surveyor do_merge /home/emalayan/app/forEmalayan_ccGrdid/app/modmerge null null null #surveyor score /home/emalayan/app/forEmalayan_ccGrdid/app/Scoring/scoredat.exe null null null surveyor??????? modftdock?????? /home/emalayan/app/forEmalayan_ccGrdid/app/modftdock.sh null null null cf: wrapperlog.always.transfer=true sitedir.keep=true execution.retries=1 lazy.errors=true status.mode=provider use.provider.staging=true provider.staging.pin.swiftfiles=false foreach.max.threads=100 provenance.log=false -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Sat Jan 14 08:10:30 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sat, 14 Jan 2012 08:10:30 -0600 Subject: [Swift-user] CFP: ACM HPDC 2012, abstracts due January 16th, 2012 Message-ID: <4F118CD6.9090905@cs.iit.edu> **** CALL FOR PAPERS **** The 21st International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC'12) Delft University of Technology, Delft, the Netherlands June 18-22, 2012 http://www.hpdc.org/2012 The ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC) is the premier annual conference on the design, the implementation, the evaluation, and the use of parallel and distributed systems for high-end computing. HPDC'12 will take place in Delft, the Netherlands, a historical, picturesque city that is less than one hour away from Amsterdam-Schiphol airport. The conference will be held on June 20-22 (Wednesday to Friday), with affiliated workshops taking place on June 18-19 (Monday and Tuesday). **** SUBMISSION DEADLINES **** Abstracts: 16 January 2012 Papers: 23 January 2012 (No extensions!) **** HPDC'12 GENERAL CHAIR **** Dick Epema, Delft University of Technology, Delft, the Netherlands **** HPDC'12 PROGRAM CO-CHAIRS **** Thilo Kielmann, Vrije Universiteit, Amsterdam, the Netherlands Matei Ripeanu, The University of British Columbia, Vancouver, Canada **** HPDC'12 WORKSHOPS CHAIR **** Alexandru Iosup, Delft University of Technology, Delft, the Netherlands **** SCOPE AND TOPICS **** Submissions are welcomed on all forms of high-performance parallel and distributed computing, including but not limited to clusters, clouds, grids, utility computing, data-intensive computing, and massively multicore systems. Submissions that explore solutions to estimate and reduce the energy footprint of such systems are particularly encouraged. All papers will be evaluated for their originality, potential impact, correctness, quality of presentation, appropriate presentation of related work, and relevance to the conference, with a strong preference for rigorous results obtained in operational parallel and distributed systems. The topics of interest of the conference include, but are not limited to, the following, in the context of high-performance parallel and distributed computing: - Systems, networks, and architectures for high-end computing - Massively multicore systems - Virtualization of machines, networks, and storage - Programming languages and environments - I/O, storage systems, and data management - Resource management, energy and cost minimizations - Performance modeling and analysis - Fault tolerance, reliability, and availability - Data-intensive computing - Applications of parallel and distributed computing **** PAPER SUBMISSION GUIDELINES **** Authors are invited to submit technical papers of at most 12 pages in PDF format, including figures and references. Papers should be formatted in the ACM Proceedings Style and submitted via the conference web site. No changes to the margins, spacing, or font sizes as specified by the style file are allowed. Accepted papers will appear in the conference proceedings, and will be incorporated into the ACM Digital Library. A limited number of papers will be accepted as posters. Papers must be self-contained and provide the technical substance required for the program committee to evaluate their contributions. Submitted papers must be original work that has not appeared in and is not under consideration for another conference or a journal. See the ACM Prior Publication Policy for more details. **** IMPORTANT DATES **** Abstracts Due: 16 January 2012 Papers Due: 23 January 2012 (No extensions!) Reviews Released to Authors: 8 March 2012 Author Rebuttals Due: 12 March 2012 Author Notifications: 19 March 2012 Final Papers Due: 16 April 2012 Conference Dates: 18-22 June 2012 -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= From iraicu at cs.iit.edu Sat Jan 14 12:00:43 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sat, 14 Jan 2012 12:00:43 -0600 Subject: [Swift-user] CFP: IEEE eScience 2012 in Chicago IL USA Message-ID: <4F11C2CB.1070809@cs.iit.edu> Call for Papers 8th IEEE International Conference on eScience October 8-12, 2012 Chicago, IL, USA Researchers in all disciplines are increasingly adopting digital tools, techniques and practices, often in communities and projects that span disciplines, laboratories, organizations, and national boundaries. The eScience 2012 conference is designed to bring together leading international and interdisciplinary research communities, developers, and users of eScience applications and enabling IT technologies. The conference serves as a forum to present the results of the latest applications research and product/tool developments and to highlight related activities from around the world. Also, we are now entering the second decade of eScience and the 2012 conference gives an opportunity to take stock of what has been achieved so far and look forward to the challenges and opportunities the next decade will bring. A special emphasis of the 2012 conference is on advances in the application of technology in a particular discipline. Accordingly, significant advances in applications science and technology will be considered as important as the development of new technologies themselves. Further, we welcome contributions in educational activities under any of these disciplines. As a result, the conference will be structured around two e-Science tracks: * *eScience Algorithms and Applications* o eScience application areas, including: + Physical sciences + Biomedical sciences + Social sciences and humanities o Data-oriented approaches and applications o Compute-oriented approaches and applications o Extreme scale approaches and applications * *Cyberinfrastructure to support eScience* o Novel hardware o Novel uses of production infrastructure o Software and services o Tools The conference proceedings will be published by the IEEE Computer Society Press, USA and will be made available online through the IEEE Digital Library. Selected papers will be invited to submit extended versions to a special issue of the Future Generation Computer Systems (FGCS) journal. SUBMISSION PROCESS Authors are invited to submit papers with unpublished, original work of not more than 8 pages of double column text using single spaced 10 point size on 8.5 x 11 inch pages, as per IEEE 8.5 x 11 manuscript guidelines. (Up to 2 additional pages may be purchased for US$150/page) Templates are available from http://www.ieee.org/conferences_events/conferences/publishing/templates.html. Authors should submit a PDF file that will print on a PostScript printer to https://www.easychair.org/conferences/?conf=escience2012 (Note that paper submitters also must submit an abstract in advance of the paper deadline. This should be done through the same site where papers are submitted.) It is a requirement that at least one author of each accepted paper attend the conference. ORGANIZATION General Chair * *Ian Foster*, University of Chicago & Argonne National Laboratory, USA Program Co-Chairs * *Daniel S. Katz*, University of Chicago & Argonne National Laboratory, USA * *Heinz Stockinger*, SIB Swiss Institute of Bioinformatics, Switzerland Program Vice Co-Chairs * eScience Algorithms and Applications Track o *David Abramson*, Monash University, Australia o *Gabrielle Allen*, Louisiana State University, USA * Cyberinfrastructure to support eScience Track o *Rosa M. Badia*, Barcelona Supercomputing Center / CSIC, Spain o *Geoffrey Fox*, Indiana University, USA Sponsorship Chair * *Charlie Catlett*, Argonne National Laboratory, USA Conference Manager and Finance Chair * *Julie Wulf-Knoerzer*, University of Chicago & Argonne National Laboratory, USA Publicity Chairs * *Kento Aida*, National Institute of Informatics, Japan * *Ioan Raicu*, Illinois Institute of Technology, USA * *David Wallom*, Oxford e-Research Centre, UK Local Organizing Committee * *Ninfa Mayorga*, University of Chicago, USA * *Evelyn Rayburn*, University of Chicago, USA * *Lynn Valentini*, Argonne National Laboratory, USA Program Committee * eScience Algorithms and Applications Track o *Srinivas Aluru*, Iowa State University, USA o *Ashiq Anjum*, University of Derby, UK o *David A. Bader*, Georgia Institute of Technology, USA o *Jon Blower*, University of Reading, UK o *Paul Bonnington*, Monash University, Australia o *Simon Cox*, University of Southampton, UK o *David De Roure*, Oxford e-Research Centre, UK o *George Djorgovski*, California Institute of Technology, USA o *Anshu Dubey*, University of Chicago & Argonne National Laboratory, USA o *Yuri Estrin*, Monash University, Australia o *Dan Fay*, Microsoft, USA o *Jeremy Frey*, University of Southampton, UK o *Wolfgang Gentzsch*, HPC Consultant, Germany o *Lutz Gross*, The University of Queensland, Austrialia o *Sverker Holmgren*, Uppsala University, Sweden o *Bill Howe*, University of Washington, USA o *Marina Jirotka*, University of Oxford, UK o *Timoleon Kipouros*, University of Cambridge, UK o *Kerstin Kleese van Dam*, Pacific Northwest National Laboratory, USA o *Arun S. Konagurthu*, Monash University, Australia o *Peter Kunszt*, SystemsX.ch, Switzerland o *Alexey Lastovetsky*, University College Dublin, Ireland o *Andrew Lewis*, Griffith University, Australia o *Sergio Maffioletti*, University of Zurich, Switzerland o *Amitava Majumdar*, San Diego Supercomputer Center, University of California at San Diego, USA o *Rui Mao*, Shenzhen University, China o *Madhav V. Marathe*, Virginia Tech, USA o *Maryann Martone*, University of California at San Diego, USA o *Louis Moresi*, Monash University, Australia o *Riccardo Murri*, University of Zurich, Switzerland o *Silvia D. Olabarriaga*, Academic Medical Center of the University of Amsterdam, Netherlands o *Enrique S. Quintana-Ort?*, Universidad Jaume I, Spain o *Abani Patra*, University at Buffalo, USA o *Rob Pennington*, NSF, USA o *Andrew Perry*, Monash University, Australia o *Beth Plale*, Indiana University, USA o *Michael Resch*, University of Stuttgart, Germany o *Adrian Sandu*, Virginia Tech, USA o *Mark Savill*, Cranfield University, UK o *Erik Schnetter*, Perimeter Institute for Theoretical Physics, Canada o *Edward Seidel*, Louisiana State University, USA o *Suzanne M. Shontz*, The Pennsylvania State University, USA o *David Skinner*, Lawrence Berkeley National Laboratory, USA o *Alan Sussman*, University of Maryland, USA o *Alex Szalay*, Johns Hopkins University, USA o *Domenico Talia*, ICAR-CNR & University of Calabria, Italy o *Jian Tao*, Louisiana State University, USA o *David Wallom*, Oxford e-Research Centre, UK o *Shaowen Wang*, University of Illinois at Urbana-Champaign, USA o *Michael Wilde*, Argonne National Laboratory & University of Chicago, USA o *Nancy Wilkins-Diehr*, San Diego Supercomputer Center, University of California at San Diego, USA o *Wu Zhang*, Shanghai University, China o *Yunquan Zhang*, Chinese Academy of Sciences, China * Cyberinfrastructure to support eScience Track o *Deb Agarwal*, Lawrence Berkeley National Laboratory, USA o *Ilkay Altintas*, San Diego Supercomputer Center, University of California at San Diego, USA o *Henri Bal*, Vrije Universiteit, Netherlands o *Roger Barga*, Microsoft, USA o *Martin Berzins*, University of Utah, USA o *John Brooke*, University of Manchester, UK o *Thomas Fahringer*, University of Innsbruck, Austria o *Gilles Fedak*, INRIA, France o *Jos? A. B. Fortes*, University of Florida, USA o *Yolanda Gil*, ISI/USC, USA o *Madhusudhan Govindaraju*, SUNY Binghamton, USA o *Thomas Hacker*, Purdue University, USA o *Ken Hawick*, Massey University, New Zealand o *Marty Humphrey*, University of Virginia, USA o *Hai Jin*, Huazhong University of Science and Technology, China o *Thilo Kielmann*, Vrije Universiteit, Netherlands o *Scott Klasky*, Oak Ridge National Laboratory, USA o *Isao Kojima*, AIST, Japan o *Tevfik Kosar*, University at Buffalo, USA o *Dieter Kranzlmueller*, LMU & LRZ Munich, Germany o *Erwin Laure*, KTH, Sweden o *Jysoo Lee*, KISTI, Korea o *Li Xiaoming*, Peking University, China o *Bertram Lud?scher*, University of California, Davis, USA o *Andrew Lumsdaine*, Indiana University, USA o *Tanu Malik*, University of Chicago, USA o *Satoshi Matsuoka*, Tokyo Institute of Technology, Japan o *Reagan Moore*, University of North Carolina at Chapel Hill, USA o *Shirley Moore*, University of Kentucky, USA o *Steven Newhouse*, EGI, Netherlands o *Dhabaleswar K. (DK) Panda*, The Ohio State University, USA o *Manish Parashar*, Rutgers University, USA o *Ron Perrott*, University of Oxford, UK o *Depei Qian*, Beihang University, China o *Judy Qui*, Indiana University, USA o *Ioan Raicu*, Illinois Institute of Technology, USA o *Lavanya Ramakrishnan*, Lawrence Berkeley National Laboratory, USA o *Omer Rana*, Cardiff University, UK o *Paul Roe*, Queensland University of Technology, Australia o *Bruno Schulze*, LNCC, Brazil o *Marc Snir*, Argonne National Laboratory & University of Illinois at Urbana-Champaign, USA o *Xian-He Sun*, Illinois Institute of Technology, USA o *Yoshio Tanaka*, AIST, Japan o *Michela Taufer*, University of Delaware, USA o *Kerry Taylor*, CSIRO, Australia o *Douglas Thain*, University of Notre Dame, USA o *Paul Watson*, Newcastle University, UK o *Jun Zhao*, University of Oxford, UK -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Sat Jan 14 21:58:00 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Sat, 14 Jan 2012 21:58:00 -0600 Subject: [Swift-user] Call for Workshops at IEEE eScience, due January 23, 2012 Message-ID: <4F124EC8.2050308@cs.iit.edu> Call for Workshops 8th IEEE International Conference on eScience October 8-12, 2012 Chicago, IL, USA The 8th IEEE eScience conference (e-Science 2012), sponsored by the IEEE Computer Society's Technical Committee for Scalable Computing (TCSC), will be held in Chicago Illinois from 8-12th October 2012. The eScience 2011 conference is designed to bring together leading international and interdisciplinary research communities, developers, and users of eScience applications and enabling IT technologies. Multiple e-Science 2012 Workshops will be held on Monday and Tuesday, 8th and 9th October, co-located with the main conference. Workshops are an important part of the conference in providing opportunity for researchers to present their work in a more focused way than the conference itself and to have discussion of particular topics of interest to the community. We cordially invite you to submit workshop proposals on any eScience related topic to the Workshop Chair. To help those interested know their purpose and scope, workshop proposals should include: * A description of the workshop, its focus, goals, and outcome * A draft call for papers * Names and affiliations of the organizers and tentative composition of the committees * Expected numbers of submissions and accepted papers * Prior history of this workshop, if any. Please include: number of submissions, number of accepted papers, and attendee count. Workshop organizers are responsible for establishing a program committee, collecting and evaluating submissions, notifying authors of acceptance or rejection in due time, ensuring a transparent and fair selection process, organizing selected papers into sessions, and assigning session chairs. Proposals will be selected that show clear focus and objectives in areas of emerging or developing interest guaranteed to generate significant interest in the community. Once accepted, the workshop should establish its own paper submission system. For each paper selected for publication, an author must be registered for eScience 2012. Each paper must be presented in person by at least one of the authors. It is expected that the proceedings of the eScience 2012 workshops will be published by the IEEE Computer Society Press, USA and will be made available online through the IEEE Digital Library. SUBMISSION PROCESS Workshop proposals should be emailed to escience2012-workshops at fnal.gov ORGANIZATION General Chair * *Ian Foster*, University of Chicago & Argonne National Laboratory, USA Program Co-Chairs * *Daniel S. Katz*, University of Chicago & Argonne National Laboratory, USA * *Heinz Stockinger*, SIB Swiss Institute of Bioinformatics, Switzerland Workshops Chair * *Ruth Pordes*, FNAL, USA Sponsorship Chair * *Charlie Catlett*, Argonne National Laboratory, USA Conference Manager and Finance Chair * *Julie Wulf-Knoerzer*, University of Chicago & Argonne National Laboratory, USA Publicity Chairs * *Kento Aida*, National Institute of Informatics, Japan * *Ioan Raicu*, Illinois Institute of Technology, USA * *David Wallom*, Oxford e-Research Centre, UK Local Organizing Committee * *Ninfa Mayorga*, University of Chicago, USA * *Evelyn Rayburn*, University of Chicago, USA * *Lynn Valentini*, Argonne National Laboratory, USA -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 16 04:08:25 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 16 Jan 2012 02:08:25 -0800 (PST) Subject: [Swift-user] Running Swift+ModFTDock+MosaStore on BG/P Message-ID: <1326708505.28876.YahooMailNeo@web39507.mail.mud.yahoo.com> Dear All, I am trying to run swift+ModFTDock with MosaStore. But I have some problems in deploying MosaStore with Swift. MosaStore should be deployed before application starts. But currently swift-script is launched from the head-node and the rest is taken care by the swift run-time. Is there a way to deploy MosaStore firstly and then launch swift+modftdock on the nodes ? ??? Or Should I incorporate MosaStore deployment into the swift-scripts? Thank you very much Emalayan -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Mon Jan 16 14:36:50 2012 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 16 Jan 2012 14:36:50 -0600 (Central Standard Time) Subject: [Swift-user] Running Swift+ModFTDock+MosaStore on BG/P In-Reply-To: <1326708505.28876.YahooMailNeo@web39507.mail.mud.yahoo.com> References: <1326708505.28876.YahooMailNeo@web39507.mail.mud.yahoo.com> Message-ID: On Mon, 16 Jan 2012, Emalayan Vairavanathan wrote: > I am trying to run swift+ModFTDock with MosaStore. But I have some > problems in deploying MosaStore with Swift. MosaStore should be deployed > before application starts. But currently swift-script is launched from > the head-node and the rest is taken care by the swift run-time. > > Is there a way to deploy MosaStore firstly and then launch swift+modftdock on the nodes ? We do not currently have a technique to launch Coasters workers and MosaStore simultaneously. Justin -- Justin M Wozniak From turam at mcs.anl.gov Wed Jan 18 12:33:31 2012 From: turam at mcs.anl.gov (Thomas Uram) Date: Wed, 18 Jan 2012 12:33:31 -0600 Subject: [Swift-user] Question re: reliance on proxy cert Message-ID: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> I'm using coasters with ssh:pbs and have the proper bits in ~/.ssh/auth.defaults to support authentication, but when I run the script it fails due to a missing or expired proxy cert: Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u1154) not found. Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u1154) not found. Why does it fail when an alternative authentication mechanism is available that would succeed? Is there an option to control this? The complete log is here, which includes the Swift script and the sites file. http://www.ci.uchicago.edu/~turam/hostname-20120118-1128-z3xo7eg9.log Thanks, Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Wed Jan 18 12:42:09 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 18 Jan 2012 12:42:09 -0600 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> References: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> Message-ID: <5AF5DF5B-A50D-44CC-AFA1-368DE67BEC6D@mcs.anl.gov> Using jobmanager="ssh:pbs" starts coasters on the remote side using ssh to get to the remote machine. Coasters still needs a proxy to validate with. On Jan 18, 2012, at 12:33 PM, Thomas Uram wrote: > I'm using coasters with ssh:pbs and have the proper bits in ~/.ssh/auth.defaults to support authentication, but when I run the script it fails due to a missing or expired proxy cert: > > Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job > Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not start coaster service > Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u1154) not found. > Caused by: org.globus.gsi.GlobusCredentialException: [JGLOBUS-5] Proxy file (/tmp/x509up_u1154) not found. > > Why does it fail when an alternative authentication mechanism is available that would succeed? Is there an option to control this? > > The complete log is here, which includes the Swift script and the sites file. > > http://www.ci.uchicago.edu/~turam/hostname-20120118-1128-z3xo7eg9.log > > Thanks, > Tom > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Jan 18 12:43:40 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 18 Jan 2012 10:43:40 -0800 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> References: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> Message-ID: <1326912220.12093.7.camel@blabla> On Wed, 2012-01-18 at 12:33 -0600, Thomas Uram wrote: > I'm using coasters with ssh:pbs and have the proper bits in > ~/.ssh/auth.defaults to support authentication, but when I run the > script it fails due to a missing or expired proxy cert: [...] > Why does it fail when an alternative authentication mechanism is > available that would succeed? Is there an option to control this? It fails because while ssh is used to start the coaster service executable, the connection between client and service is secured by GSI. This model was just peachy in the Globus scenario, where you would need a proxy anyway to start the service and delegation could be used to supply credentials to the coaster service. Not so much with ssh. I've been thinking about a way to deal with this, and I think I'm leaning towards some shared secret that could be used as a one-time authentication token by the service. The problem is making sure that whatever provider is used to communicate said secret to the service remains a secret (i.e. passing it on any command line is out of the question). But that seems to require the use of an encrypted file transfer provider, which breaks the abstraction we have a bit, so it might require more changes than I want to see. So suggestions are welcome. Mihael From wilde at mcs.anl.gov Wed Jan 18 12:49:56 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 18 Jan 2012 12:49:56 -0600 (CST) Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <1326912220.12093.7.camel@blabla> Message-ID: <1645351678.154107.1326912596272.JavaMail.root@zimbra.anl.gov> Tom, for now, this means that when using automatic coasters over ssh, you need to (manually, out of band) create an x509 proxy on both sides. Either securely copy one, or run grid-proxy-init on both the client side (where you are running Swift) and on each site on which Swift will start a coaster service. One can bypass the need for proxies when setting up manual coaster configurations with the -nosec argument to the coaster-service command. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Thomas Uram" > Cc: "swift user" > Sent: Wednesday, January 18, 2012 12:43:40 PM > Subject: Re: [Swift-user] Question re: reliance on proxy cert > On Wed, 2012-01-18 at 12:33 -0600, Thomas Uram wrote: > > I'm using coasters with ssh:pbs and have the proper bits in > > ~/.ssh/auth.defaults to support authentication, but when I run the > > script it fails due to a missing or expired proxy cert: > > [...] > > > Why does it fail when an alternative authentication mechanism is > > available that would succeed? Is there an option to control this? > > It fails because while ssh is used to start the coaster service > executable, the connection between client and service is secured by > GSI. > > This model was just peachy in the Globus scenario, where you would > need > a proxy anyway to start the service and delegation could be used to > supply credentials to the coaster service. > > Not so much with ssh. I've been thinking about a way to deal with > this, > and I think I'm leaning towards some shared secret that could be used > as > a one-time authentication token by the service. The problem is making > sure that whatever provider is used to communicate said secret to the > service remains a secret (i.e. passing it on any command line is out > of > the question). But that seems to require the use of an encrypted file > transfer provider, which breaks the abstraction we have a bit, so it > might require more changes than I want to see. > > So suggestions are welcome. > > Mihael > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jonmon at mcs.anl.gov Wed Jan 18 12:55:31 2012 From: jonmon at mcs.anl.gov (jonmon at mcs.anl.gov) Date: Wed, 18 Jan 2012 18:55:31 +0000 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <1645351678.154107.1326912596272.JavaMail.root@zimbra.anl.gov> References: <1326912220.12093.7.camel@blabla> <1645351678.154107.1326912596272.JavaMail.root@zimbra.anl.gov> Message-ID: <2145510777-1326912933-cardhu_decombobulator_blackberry.rim.net-495983841-@b15.c3.bise6.blackberry> Is the proxy on the execution site necessary? I thought it was just the client side. I don't think I create a proxy on the execution site...I only request one using my-proxy on the client side. -----Original Message----- From: Michael Wilde Sender: swift-user-bounces at ci.uchicago.edu Date: Wed, 18 Jan 2012 12:49:56 To: Mihael Hategan; Thomas Uram Cc: swift user Subject: Re: [Swift-user] Question re: reliance on proxy cert Tom, for now, this means that when using automatic coasters over ssh, you need to (manually, out of band) create an x509 proxy on both sides. Either securely copy one, or run grid-proxy-init on both the client side (where you are running Swift) and on each site on which Swift will start a coaster service. One can bypass the need for proxies when setting up manual coaster configurations with the -nosec argument to the coaster-service command. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Thomas Uram" > Cc: "swift user" > Sent: Wednesday, January 18, 2012 12:43:40 PM > Subject: Re: [Swift-user] Question re: reliance on proxy cert > On Wed, 2012-01-18 at 12:33 -0600, Thomas Uram wrote: > > I'm using coasters with ssh:pbs and have the proper bits in > > ~/.ssh/auth.defaults to support authentication, but when I run the > > script it fails due to a missing or expired proxy cert: > > [...] > > > Why does it fail when an alternative authentication mechanism is > > available that would succeed? Is there an option to control this? > > It fails because while ssh is used to start the coaster service > executable, the connection between client and service is secured by > GSI. > > This model was just peachy in the Globus scenario, where you would > need > a proxy anyway to start the service and delegation could be used to > supply credentials to the coaster service. > > Not so much with ssh. I've been thinking about a way to deal with > this, > and I think I'm leaning towards some shared secret that could be used > as > a one-time authentication token by the service. The problem is making > sure that whatever provider is used to communicate said secret to the > service remains a secret (i.e. passing it on any command line is out > of > the question). But that seems to require the use of an encrypted file > transfer provider, which breaks the abstraction we have a bit, so it > might require more changes than I want to see. > > So suggestions are welcome. > > Mihael > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From hategan at mcs.anl.gov Wed Jan 18 13:37:23 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 18 Jan 2012 11:37:23 -0800 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <2145510777-1326912933-cardhu_decombobulator_blackberry.rim.net-495983841-@b15.c3.bise6.blackberry> References: <1326912220.12093.7.camel@blabla> <1645351678.154107.1326912596272.JavaMail.root@zimbra.anl.gov> <2145510777-1326912933-cardhu_decombobulator_blackberry.rim.net-495983841-@b15.c3.bise6.blackberry> Message-ID: <1326915443.13122.1.camel@blabla> On Wed, 2012-01-18 at 18:55 +0000, jonmon at mcs.anl.gov wrote: > Is the proxy on the execution site necessary? I thought it was just > the client side. I don't think I create a proxy on the execution > site...I only request one using my-proxy on the client side. It's not if you use Globus to start the coaster service (or if you run locally). That's because Globus gets a proxy on the remote site for you. It is with any other providers because the client needs to be able to tell that whatever coaster service is trying to connect to it is legit. From turam at mcs.anl.gov Thu Jan 19 14:20:27 2012 From: turam at mcs.anl.gov (Thomas Uram) Date: Thu, 19 Jan 2012 14:20:27 -0600 Subject: [Swift-user] GSISSH in Swift [was Fwd: [Swift-devel] Documentation of sites.xml] In-Reply-To: <1326483206.31692.2.camel@blabla> References: <903303148.134563.1326323490348.JavaMail.root@zimbra.anl.gov> <1326483206.31692.2.camel@blabla> Message-ID: <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> Here's a sites.xml file that specifies gsissh as the executable to use for ssh. This did not work. In other words, the gsissh executable was not used in place of ssh. I've modified Swift to use the gsissh executable, until we can resolve how this should be done within the sites.xml file. gsissh /home/turam/tmp Tom On Jan 13, 2012, at 1:33 PM, Mihael Hategan wrote: > On Fri, 2012-01-13 at 13:28 -0600, Thomas Uram wrote: >> Hey Mihael: >> >> >> I wouldn't prod you to respond after only two days, except that you >> usually respond within minutes! > > Minutes it is. >> >> GSISSH provider availability will be important to me quite soon, so >> I'm interested in your answer.... > > If your gsissh happens to be named "ssh" and is in the path, it should > just work. Otherwise you need to pass the "ssh" attribute to the > provider with the name of the executable. In swift that would be > gsissh. > > That's in theory at least. Let me know if it works in practice. > > Mihael > > From svemalayan at yahoo.com Thu Jan 19 17:51:15 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 19 Jan 2012 15:51:15 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters Message-ID: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> Dear All, I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? Everything works fine when I try to run the same montage-scripts with swift on a single machine. Thank you Emalayan 2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending 2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout org.globus.cog.karajan.workflow.service.ReplyTimeoutException ??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) ??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) ??????? at java.util.TimerThread.mainLoop(Timer.java:534) ??????? at java.util.TimerThread.run(Timer.java:484) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Thu Jan 19 18:49:45 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 19 Jan 2012 18:49:45 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: Emalayan, >From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? Also, what version of Swift is this? Regards, Ketan On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Thu Jan 19 19:07:24 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 19 Jan 2012 17:07:24 -0800 (PST) Subject: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? Message-ID: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> Dear All, I tried to run a simple helloworld.swift application with coasters (with the setup below). Two machines: Machine-1 (for coaster-service) and Machine-2 (for workers) respectively. I started the coaster-service in Machine-1 and also started the helloworld.swift from Machine-1. I observed the following with two different swift versions. With swift-0.92.1 - helloworld.swift was waiting until the worker on Machine-2 is started and then it returned the result. With swift-0.93??? - helloworld.swift immediately completed and provided the correct results even before starting the worker. I suspect there might be some configuration issues / bug with swift-0.93 (may be in site catalog). Any ideas / suggestions ? Please kindly let me know if you have questions. Regards Emalayan ============================ Please find my site.catalog before ================================= ??? ??? passive ??? 4 ??? 10000 ??? 100 ??? 100 ??? 100 ??? 1 ??? 10 ??? 25.00 ??? 10000 ??? proxy ??? ??? /home/emalayan/App/forEmalayan/swift.workdir ? ====================================================================================? -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Thu Jan 19 19:09:22 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 19 Jan 2012 17:09:22 -0800 (PST) Subject: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? In-Reply-To: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> References: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> Message-ID: <1327021762.47671.YahooMailNeo@web39504.mail.mud.yahoo.com> I used the same site and tc catalogs with both swift versions. Thank you Emalayan ________________________________ From: Emalayan Vairavanathan To: swift user Sent: Thursday, 19 January 2012 5:07 PM Subject: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? Dear All, I tried to run a simple helloworld.swift application with coasters (with the setup below). Two machines: Machine-1 (for coaster-service) and Machine-2 (for workers) respectively. I started the coaster-service in Machine-1 and also started the helloworld.swift from Machine-1. I observed the following with two different swift versions. With swift-0.92.1 - helloworld.swift was waiting until the worker on Machine-2 is started and then it returned the result. With swift-0.93??? - helloworld.swift immediately completed and provided the correct results even before starting the worker. I suspect there might be some configuration issues / bug with swift-0.93 (may be in site catalog). Any ideas / suggestions ? Please kindly let me know if you have questions. Regards Emalayan ============================ Please find my site.catalog before ================================= ??? ??? passive ??? 4 ??? 10000 ??? 100 ??? 100 ??? 100 ??? 1 ??? 10 ??? 25.00 ??? 10000 ??? proxy ??? ??? /home/emalayan/App/forEmalayan/swift.workdir ? ====================================================================================? _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Thu Jan 19 19:10:56 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 19 Jan 2012 19:10:56 -0600 Subject: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? In-Reply-To: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> References: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> Message-ID: On Thu, Jan 19, 2012 at 7:07 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Dear All, > > I tried to run a simple helloworld.swift application with coasters (with > the setup below). > > Two machines: Machine-1 (for coaster-service) and Machine-2 (for workers) > respectively. > > I started the coaster-service in Machine-1 and also started the > helloworld.swift from Machine-1. I observed the following with two > different swift versions. > > With swift-0.92.1 - helloworld.swift was waiting until the worker on > Machine-2 is started and then it returned the result. > With swift-0.93 - helloworld.swift immediately completed and provided > the correct results even before starting the worker. > This is because, coaster-service in 0.93 onwards is configured to launch workers automatically, while this is not the case with 0.92.1 in which workers needs to be started manually. > > I suspect there might be some configuration issues / bug with swift-0.93 > (may be in site catalog). > > Any ideas / suggestions ? > > Please kindly let me know if you have questions. > > Regards > Emalayan > > ============================ Please find my site.catalog before > ================================= > > > jobmanager="local:local"/> > passive > > 4 > 10000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > > /home/emalayan/App/forEmalayan/swift.workdir > > > > > ==================================================================================== > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Thu Jan 19 19:14:42 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 19 Jan 2012 17:14:42 -0800 (PST) Subject: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? In-Reply-To: References: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> Message-ID: <1327022082.94703.YahooMailNeo@web39506.mail.mud.yahoo.com> Thank you Ketan. By the way how coster-service finds the ip-address of the worker nodes ? ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: swift user Sent: Thursday, 19 January 2012 5:10 PM Subject: Re: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? On Thu, Jan 19, 2012 at 7:07 PM, Emalayan Vairavanathan wrote: Dear All, > > >I tried to run a simple helloworld.swift application with coasters (with the setup below). > > >Two machines: Machine-1 (for coaster-service) and Machine-2 (for workers) respectively. > > > >I started the coaster-service in Machine-1 and also started the helloworld.swift from Machine-1. I observed the following with two different swift versions. > > >With swift-0.92.1 - helloworld.swift was waiting until the worker on Machine-2 is started and then it returned the result. >With swift-0.93??? - helloworld.swift immediately completed and provided the correct results even before starting the worker. This is because, coaster-service in 0.93 onwards is configured to launch workers automatically, while this is not the case with 0.92.1 in which workers needs to be started manually. ? > >I suspect there might be some configuration issues / bug with swift-0.93 (may be in site catalog). > > >Any ideas / suggestions ? > > >Please kindly let me know if you have questions. > > > >Regards >Emalayan > > >============================ Please find my site.catalog before ================================= > > >??? >??? passive > >??? 4 >??? 10000 >??? 100 >??? 100 >??? 100 >??? 1 >??? 10 >??? 25.00 >??? 10000 >??? proxy >??? >??? /home/emalayan/App/forEmalayan/swift.workdir >? > > > > > > >====================================================================================? > > > >_______________________________________________ >Swift-user mailing list >Swift-user at ci.uchicago.edu >https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Thu Jan 19 19:16:51 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 19 Jan 2012 19:16:51 -0600 Subject: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? In-Reply-To: <1327022082.94703.YahooMailNeo@web39506.mail.mud.yahoo.com> References: <1327021644.63695.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327022082.94703.YahooMailNeo@web39506.mail.mud.yahoo.com> Message-ID: Workers start only on the localhost. On Thu, Jan 19, 2012 at 7:14 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Thank you Ketan. By the way how coster-service finds the ip-address of the > worker nodes ? > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 5:10 PM > *Subject:* Re: [Swift-user] Swift 0.93 + Coasters - Configuration issues ? > > > > On Thu, Jan 19, 2012 at 7:07 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I tried to run a simple helloworld.swift application with coasters (with > the setup below). > > Two machines: Machine-1 (for coaster-service) and Machine-2 (for workers) > respectively. > > I started the coaster-service in Machine-1 and also started the > helloworld.swift from Machine-1. I observed the following with two > different swift versions. > > With swift-0.92.1 - helloworld.swift was waiting until the worker on > Machine-2 is started and then it returned the result. > With swift-0.93 - helloworld.swift immediately completed and provided > the correct results even before starting the worker. > > > This is because, coaster-service in 0.93 onwards is configured to launch > workers automatically, while this is not the case with 0.92.1 in which > workers needs to be started manually. > > > > I suspect there might be some configuration issues / bug with swift-0.93 > (may be in site catalog). > > Any ideas / suggestions ? > > Please kindly let me know if you have questions. > > Regards > Emalayan > > ============================ Please find my site.catalog before > ================================= > > > jobmanager="local:local"/> > passive > > 4 > 10000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > > /home/emalayan/App/forEmalayan/swift.workdir > > > > > ==================================================================================== > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Thu Jan 19 19:55:20 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 19 Jan 2012 17:55:20 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> Hi Ketan, This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) Please let me know if you have any idea. Regards Emalayan =============================================================================================== Swift 0.93 swift-r5501 cog-r3350 RunID: 20120119-1749-rjshh1r9 ?(input): found 10 files Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 Find: http://localhost:1984 Find:? keepalive(120), reconnect - http://localhost:1984 Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 Exception in mConcatFit: Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] Host: localhost Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 Execution failed: ??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: swift user Sent: Thursday, 19 January 2012 4:49 PM Subject: Re: [Swift-user] Montage+Swift+Coasters Emalayan, From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? Also, what version of Swift is this? Regards, Ketan On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: Dear All, > > >I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? > > > >Everything works fine when I try to run the same montage-scripts with swift on a single machine. > > > >Thank you >Emalayan > > > > > >2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >org.globus.cog.karajan.workflow.service.ReplyTimeoutException >??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >??????? at java.util.TimerThread.mainLoop(Timer.java:534) >??????? at java.util.TimerThread.run(Timer.java:484) >_______________________________________________ >Swift-user mailing list >Swift-user at ci.uchicago.edu >https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri Jan 20 09:39:34 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 20 Jan 2012 09:39:34 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> Message-ID: Emalayan, I would check all the mappers and the resulting paths in the Swift source. Also try running the failed job something like this: cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- b1sa4vlk * * mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir error 520 indicates workers are not able to reach the data. Also check if swift.workdir is writable on the site by the worker nodes. On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Fri Jan 20 15:52:56 2012 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 20 Jan 2012 22:52:56 +0100 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <1326912220.12093.7.camel@blabla> References: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> <1326912220.12093.7.camel@blabla> Message-ID: <3E8936F5-C4B5-4FBF-9AAA-F5529232E331@hawaga.org.uk> in the ssh case, you should have a secure standard in/standard out over which you can send securely and so do either something like a gsi delegation or a shared secret transmission or whatever. that doesn't apply to arbitrary cog providers though, I think. so maybe its yet another growth of the configuration option space...? On Jan 18, 2012, at 7:43 PM, Mihael Hategan wrote: > On Wed, 2012-01-18 at 12:33 -0600, Thomas Uram wrote: >> I'm using coasters with ssh:pbs and have the proper bits in >> ~/.ssh/auth.defaults to support authentication, but when I run the >> script it fails due to a missing or expired proxy cert: > > [...] > >> Why does it fail when an alternative authentication mechanism is >> available that would succeed? Is there an option to control this? > > It fails because while ssh is used to start the coaster service > executable, the connection between client and service is secured by > GSI. > > This model was just peachy in the Globus scenario, where you would need > a proxy anyway to start the service and delegation could be used to > supply credentials to the coaster service. > > Not so much with ssh. I've been thinking about a way to deal with this, > and I think I'm leaning towards some shared secret that could be used as > a one-time authentication token by the service. The problem is making > sure that whatever provider is used to communicate said secret to the > service remains a secret (i.e. passing it on any command line is out of > the question). But that seems to require the use of an encrypted file > transfer provider, which breaks the abstraction we have a bit, so it > might require more changes than I want to see. > > So suggestions are welcome. > > Mihael > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > From hategan at mcs.anl.gov Fri Jan 20 16:48:25 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 20 Jan 2012 14:48:25 -0800 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <3E8936F5-C4B5-4FBF-9AAA-F5529232E331@hawaga.org.uk> References: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> <1326912220.12093.7.camel@blabla> <3E8936F5-C4B5-4FBF-9AAA-F5529232E331@hawaga.org.uk> Message-ID: <1327099705.7785.6.camel@blabla> On Fri, 2012-01-20 at 22:52 +0100, Ben Clifford wrote: > in the ssh case, you should have a secure standard in/standard out > over which you can send securely and so do either something like a gsi > delegation or a shared secret transmission or whatever. Right. Though there's some care to be taken there. echo "secret" > secretfile is something that can be seen in ps. Can you think of anything that could go wrong with cat > secretfile? > > that doesn't apply to arbitrary cog providers though, I think. Right. And in the shared secret case, there would have to be an additional security mechanism (e.g. some key exchange + symmetric encryption without host certificate checks). > > so maybe its yet another growth of the configuration option space...? Right. That's another reason that gives me a bit of pause here. But too much pause isn't good either. From benc at hawaga.org.uk Sat Jan 21 03:56:10 2012 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 21 Jan 2012 10:56:10 +0100 Subject: [Swift-user] Question re: reliance on proxy cert In-Reply-To: <1327099705.7785.6.camel@blabla> References: <83743945-35DD-4AB2-A934-E97C5744916B@mcs.anl.gov> <1326912220.12093.7.camel@blabla> <3E8936F5-C4B5-4FBF-9AAA-F5529232E331@hawaga.org.uk> <1327099705.7785.6.camel@blabla> Message-ID: On Jan 20, 2012, at 11:48 PM, Mihael Hategan wrote: > On Fri, 2012-01-20 at 22:52 +0100, Ben Clifford wrote: >> in the ssh case, you should have a secure standard in/standard out >> over which you can send securely and so do either something like a gsi >> delegation or a shared secret transmission or whatever. > > Right. Though there's some care to be taken there. echo "secret" > > secretfile is something that can be seen in ps. Can you think of > anything that could go wrong with cat > secretfile? secretfile exsts on the remote filesystem in a way that is possibly publicly visible; touch secretfile ; chmod go-rwx secretfile cat >> secretfile might be better. or you could feed it into the program that wants the secret directly and forget the filesystem entirely. > Right. And in the shared secret case, there would have to be an > additional security mechanism (e.g. some key exchange + symmetric > encryption without host certificate checks). You get a bunch of that from ssh already. There's probably more elaborate stuff that can be done - eg rather than having a shared secret at all, use the ssh channel to exchange two public keys, one from each end; or make sure that the wire protocol never sends the whole secret over the out-of-ssh channel, just some proof that it knows it. Depends how crazy you want to go on security... -- From hategan at mcs.anl.gov Sat Jan 21 20:04:47 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 21 Jan 2012 18:04:47 -0800 Subject: [Swift-user] GSISSH in Swift [was Fwd: [Swift-devel] Documentation of sites.xml] In-Reply-To: <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> References: <903303148.134563.1326323490348.JavaMail.root@zimbra.anl.gov> <1326483206.31692.2.camel@blabla> <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> Message-ID: <1327197887.7405.2.camel@blabla> My bad. Those profile entries don't get propagated to the task that starts the coaster service. Change of plans. The latest trunk adds an etc/provider-sshcl.properties file, the contents of which are self explanatory. Mihael On Thu, 2012-01-19 at 14:20 -0600, Thomas Uram wrote: > Here's a sites.xml file that specifies gsissh as the executable to use for ssh. This did not work. In other words, the gsissh executable was not used in place of ssh. I've modified Swift to use the gsissh executable, until we can resolve how this should be done within the sites.xml file. > > > > > gsissh > > /home/turam/tmp > > > > > Tom > > > > On Jan 13, 2012, at 1:33 PM, Mihael Hategan wrote: > > > On Fri, 2012-01-13 at 13:28 -0600, Thomas Uram wrote: > >> Hey Mihael: > >> > >> > >> I wouldn't prod you to respond after only two days, except that you > >> usually respond within minutes! > > > > Minutes it is. > >> > >> GSISSH provider availability will be important to me quite soon, so > >> I'm interested in your answer.... > > > > If your gsissh happens to be named "ssh" and is in the path, it should > > just work. Otherwise you need to pass the "ssh" attribute to the > > provider with the name of the executable. In swift that would be > > gsissh. > > > > That's in theory at least. Let me know if it works in practice. > > > > Mihael > > > > > From jonmon at mcs.anl.gov Sat Jan 21 20:20:26 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Sat, 21 Jan 2012 20:20:26 -0600 Subject: [Swift-user] GSISSH in Swift [was Fwd: [Swift-devel] Documentation of sites.xml] In-Reply-To: <1327197887.7405.2.camel@blabla> References: <903303148.134563.1326323490348.JavaMail.root@zimbra.anl.gov> <1326483206.31692.2.camel@blabla> <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> <1327197887.7405.2.camel@blabla> Message-ID: <33E27556-DD8E-407A-9610-94F8DA1AAD70@mcs.anl.gov> So we use the etc./provider-sscl.properties file instead of an .ssh/config file? I ask this because it looks like the code that allows you to specify a username and private key file in the sites file is still there. I know from past testing that at least specifying a username in the sites file was not being honored. On Jan 21, 2012, at 8:04 PM, Mihael Hategan wrote: > My bad. Those profile entries don't get propagated to the task that > starts the coaster service. > > Change of plans. The latest trunk adds an etc/provider-sshcl.properties > file, the contents of which are self explanatory. > > Mihael > > On Thu, 2012-01-19 at 14:20 -0600, Thomas Uram wrote: >> Here's a sites.xml file that specifies gsissh as the executable to use for ssh. This did not work. In other words, the gsissh executable was not used in place of ssh. I've modified Swift to use the gsissh executable, until we can resolve how this should be done within the sites.xml file. >> >> >> >> >> gsissh >> >> /home/turam/tmp >> >> >> >> >> Tom >> >> >> >> On Jan 13, 2012, at 1:33 PM, Mihael Hategan wrote: >> >>> On Fri, 2012-01-13 at 13:28 -0600, Thomas Uram wrote: >>>> Hey Mihael: >>>> >>>> >>>> I wouldn't prod you to respond after only two days, except that you >>>> usually respond within minutes! >>> >>> Minutes it is. >>>> >>>> GSISSH provider availability will be important to me quite soon, so >>>> I'm interested in your answer.... >>> >>> If your gsissh happens to be named "ssh" and is in the path, it should >>> just work. Otherwise you need to pass the "ssh" attribute to the >>> provider with the name of the executable. In swift that would be >>> gsissh. >>> >>> That's in theory at least. Let me know if it works in practice. >>> >>> Mihael >>> >>> >> > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jonmon at mcs.anl.gov Mon Jan 23 13:08:29 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Mon, 23 Jan 2012 13:08:29 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> Message-ID: <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> Emalayan, So I have ran the scripts with some of my own test cases and do not see it failing. Could you provide your config files? Please provide the tc, sites, and config file(if you use a config file). On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies > > From: Ketan Maheshwari > To: Emalayan Vairavanathan > Cc: swift user > Sent: Thursday, 19 January 2012 4:49 PM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? > > Everything works fine when I try to run the same montage-scripts with swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 23 13:25:56 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 23 Jan 2012 11:25:56 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> Message-ID: <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> Jon, Please find the detail below and let me know if you have any questions about my setup. Thank you Emalayan ========================================================== site.xml ??? ??? passive ??? 4 ??? 100000 ??? 100 ??? 100 ??? 100 ??? 1 ??? 10 ??? 25.00 ??? 10000 ??? proxy ??? ??? /tmp/swift.workdir ? ======================================================= tc localhost sh /bin/sh null null null localhost cat /bin/cat null null null localhost echo /bin/echo null null null localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null ================================================================= cf wrapperlog.always.transfer=true sitedir.keep=true execution.retries=1 lazy.errors=true status.mode=provider use.provider.staging=true provider.staging.pin.swiftfiles=false foreach.max.threads=100 provenance.log=false =================================================================== ________________________________ From: Jonathan Monette To: Ketan Maheshwari Cc: Emalayan Vairavanathan ; swift user Sent: Monday, 23 January 2012 11:08 AM Subject: Re: [Swift-user] Montage+Swift+Coasters Emalayan, ? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: Emalayan, > > >I would check all the mappers and the resulting paths in the Swift source.? > > >Also try running the failed job something like this:? > > >cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > > >mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir > > >error 520 indicates workers are not able to reach the data. > > >Also check if swift.workdir is writable on the site by the worker nodes. > > >On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: > >Hi Ketan, >> >> >>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >> >> >> >>Please let me know if you have any idea. >> >> >> >>Regards >>Emalayan >> >> >> >>=============================================================================================== >> >>Swift 0.93 swift-r5501 cog-r3350 >> >>RunID: 20120119-1749-rjshh1r9 >>?(input): found 10 files >>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>Find: http://localhost:1984 >>Find:? keepalive(120), reconnect - http://localhost:1984 >>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>Exception in mConcatFit: >>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>Host: localhost >>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>- - - >> >>Caused by: null >>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>Execution failed: >>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >> >> >> >> >>________________________________ >> From: Ketan Maheshwari >>To: Emalayan Vairavanathan >>Cc: swift user >>Sent: Thursday, 19 January 2012 4:49 PM >>Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> >> >>Emalayan, >> >> >>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >> >>Also, what version of Swift is this? >> >> >>Regards, >>Ketan >> >> >>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >> >>Dear All, >>> >>> >>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>> >>> >>> >>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>> >>> >>> >>>Thank you >>>Emalayan >>> >>> >>> >>> >>> >>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>??????? at java.util.TimerThread.run(Timer.java:484) >>>_______________________________________________ >>>Swift-user mailing list >>>Swift-user at ci.uchicago.edu >>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> >> >>-- >>Ketan >> >> >> >> >> > > > >-- >Ketan > > > _______________________________________________ >Swift-user mailing list >Swift-user at ci.uchicago.edu >https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Jan 23 13:36:08 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 23 Jan 2012 13:36:08 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> Message-ID: Emalayan, Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Jon, > > Please find the detail below and let me know if you have any questions > about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > jobmanager="local:local"/> > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null > null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null > null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null > null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null > null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec > null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null > null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null > null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null > null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null > null nul > > localhost Background_list > /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null > null null > localhost create_status_table > /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py > null null null > localhost mProjectPP_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null > null null > localhost mProject_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null > null null > localhost mBackground_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null > null null > localhost mDiffFit_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null > null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > ------------------------------ > *From:* Jonathan Monette > *To:* Ketan Maheshwari > *Cc:* Emalayan Vairavanathan ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:08 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see > it failing. Could you provide your config files? Please provide the tc, > sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- > b1sa4vlk > * > * > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 > fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 23 13:50:46 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 23 Jan 2012 11:50:46 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> Message-ID: <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. Thank you Emalayan ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: Jonathan Monette ; swift user Sent: Monday, 23 January 2012 11:36 AM Subject: Re: [Swift-user] Montage+Swift+Coasters Emalayan, Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: Jon, > > >Please find the detail below and let me know if you have any questions about my setup. > > > >Thank you >Emalayan > > > >========================================================== > >site.xml > > > > >??? >??? passive > >??? 4 >??? 100000 >??? 100 >??? 100 >??? 100 >??? 1 >??? 10 >??? 25.00 >??? 10000 >??? proxy >??? >??? /tmp/swift.workdir >? > > > > >======================================================= > > >tc > > >localhost sh /bin/sh null null null >localhost cat /bin/cat null null null >localhost echo /bin/echo null null null >localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null >localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null >localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null >localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null >localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null >localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null >localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null >localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null >localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null >localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null >localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul > >localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null >localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null >localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null >localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null >localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null >localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null > > >================================================================= > > >cf > > >wrapperlog.always.transfer=true >sitedir.keep=true >execution.retries=1 >lazy.errors=true >status.mode=provider >use.provider.staging=true >provider.staging.pin.swiftfiles=false >foreach.max.threads=100 >provenance.log=false > > >=================================================================== > > > > >________________________________ > From: Jonathan Monette >To: Ketan Maheshwari >Cc: Emalayan Vairavanathan ; swift user >Sent: Monday, 23 January 2012 11:08 AM >Subject: Re: [Swift-user] Montage+Swift+Coasters > > > >Emalayan, >? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). > > >On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > >Emalayan, >> >> >>I would check all the mappers and the resulting paths in the Swift source.? >> >> >>Also try running the failed job something like this:? >> >> >>cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >> >> >>mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >> >> >>error 520 indicates workers are not able to reach the data. >> >> >>Also check if swift.workdir is writable on the site by the worker nodes. >> >> >>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >> >>Hi Ketan, >>> >>> >>>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >>> >>> >>> >>>Please let me know if you have any idea. >>> >>> >>> >>>Regards >>>Emalayan >>> >>> >>> >>>=============================================================================================== >>> >>>Swift 0.93 swift-r5501 cog-r3350 >>> >>>RunID: 20120119-1749-rjshh1r9 >>>?(input): found 10 files >>>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>>Find: http://localhost:1984 >>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>>Exception in mConcatFit: >>>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>>Host: localhost >>>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>- - - >>> >>>Caused by: null >>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>Execution failed: >>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>> >>> >>> >>> >>>________________________________ >>> From: Ketan Maheshwari >>>To: Emalayan Vairavanathan >>>Cc: swift user >>>Sent: Thursday, 19 January 2012 4:49 PM >>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>> >>> >>> >>>Emalayan, >>> >>> >>>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >>> >>>Also, what version of Swift is this? >>> >>> >>>Regards, >>>Ketan >>> >>> >>>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >>> >>>Dear All, >>>> >>>> >>>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>>> >>>> >>>> >>>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>>> >>>> >>>> >>>>Thank you >>>>Emalayan >>>> >>>> >>>> >>>> >>>> >>>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>>??????? at java.util.TimerThread.run(Timer.java:484) >>>>_______________________________________________ >>>>Swift-user mailing list >>>>Swift-user at ci.uchicago.edu >>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>> >>> >>> >>>-- >>>Ketan >>> >>> >>> >>> >>> >> >> >> >>-- >>Ketan >> >> >> _______________________________________________ >>Swift-user mailing list >>Swift-user at ci.uchicago.edu >>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftMontage-20120123-1147-9ik4yhc0.log.gz Type: application/x-gzip Size: 5050 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Mon Jan 23 13:55:45 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 23 Jan 2012 13:55:45 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> Message-ID: How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Thanks Ketan and Jon. I tried but it is still giving error. I have > attached the log file. > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:36 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > Likely, /tmp is not readable/writable across the machines. Could you try > changing workdir to your /home > > On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Jon, > > Please find the detail below and let me know if you have any questions > about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > jobmanager="local:local"/> > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null > null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null > null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null > null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null > null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec > null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null > null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null > null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null > null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null > null nul > > localhost Background_list > /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null > null null > localhost create_status_table > /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py > null null null > localhost mProjectPP_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null > null null > localhost mProject_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null > null null > localhost mBackground_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null > null null > localhost mDiffFit_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null > null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > ------------------------------ > *From:* Jonathan Monette > *To:* Ketan Maheshwari > *Cc:* Emalayan Vairavanathan ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:08 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see > it failing. Could you provide your config files? Please provide the tc, > sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- > b1sa4vlk > * > * > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 > fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 23 14:25:50 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 23 Jan 2012 12:25:50 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> Message-ID: <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> I am using swift-0.93. I started only the coaster-service manually using following command (workers were started automatically). coaster-service -port 1984 -localport 35753 -nosec Then application prints following output and terminates. (I have attached the log file with this mail. Please discard the previous log file because system was not configured properly) Please let me know if you need more information. Thank you Emalayan ==================================================================================== Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 RunID: 20120123-1219-zj95uaye ?(input): found 10 files Progress:? time: Mon, 23 Jan 2012 12:19:39 -0800 Find: http://localhost:1984 Find:? keepalive(120), reconnect - http://localhost:1984 Progress:? time: Mon, 23 Jan 2012 12:19:41 -0800? Stage in:1? Submitted:9 Progress:? time: Mon, 23 Jan 2012 12:19:45 -0800? Active:9? Stage out:1 Progress:? time: Mon, 23 Jan 2012 12:19:46 -0800? Active:6? Stage out:2? Finished successfully:2 Progress:? time: Mon, 23 Jan 2012 12:19:47 -0800? Submitted:1? Finished successfully:10 Progress:? time: Mon, 23 Jan 2012 12:19:49 -0800? Active:1? Finished successfully:10 Progress:? time: Mon, 23 Jan 2012 12:19:50 -0800? Submitted:1? Finished successfully:12 Progress:? time: Mon, 23 Jan 2012 12:19:51 -0800? Stage in:12? Submitted:5? Finished successfully:13 Progress:? time: Mon, 23 Jan 2012 12:19:52 -0800? Stage in:1? Submitted:5? Active:9? Stage out:2? Finished successfully:13 Progress:? time: Mon, 23 Jan 2012 12:19:53 -0800? Active:5? Finished successfully:25 Exception in mConcatFit: Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, fits.tbl, stat_dir] Host: localhost Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk - - - Caused by: null Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 Execution failed: ??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies [emalayan at node090 scripts]$ ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: Jonathan Monette ; swift user Sent: Monday, 23 January 2012 11:55 AM Subject: Re: [Swift-user] Montage+Swift+Coasters How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan wrote: Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. > > >Thank you >Emalayan > > > > >________________________________ > From: Ketan Maheshwari >To: Emalayan Vairavanathan >Cc: Jonathan Monette ; swift user >Sent: Monday, 23 January 2012 11:36 AM >Subject: Re: [Swift-user] Montage+Swift+Coasters > > > >Emalayan, > > >Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home > > >On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: > >Jon, >> >> >>Please find the detail below and let me know if you have any questions about my setup. >> >> >> >>Thank you >>Emalayan >> >> >> >>========================================================== >> >>site.xml >> >> >> >> >>??? >>??? passive >> >>??? 4 >>??? 100000 >>??? 100 >>??? 100 >>??? 100 >>??? 1 >>??? 10 >>??? 25.00 >>??? 10000 >>??? proxy >>??? >>??? /tmp/swift.workdir >>? >> >> >> >> >>======================================================= >> >> >>tc >> >> >>localhost sh /bin/sh null null null >>localhost cat /bin/cat null null null >>localhost echo /bin/echo null null null >>localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null >>localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null >>localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null >>localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null >>localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null >>localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null >>localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null >>localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null >>localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null >>localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null >>localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul >> >>localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null >>localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null >>localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null >>localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null >>localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null >>localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null >> >> >>================================================================= >> >> >>cf >> >> >>wrapperlog.always.transfer=true >>sitedir.keep=true >>execution.retries=1 >>lazy.errors=true >>status.mode=provider >>use.provider.staging=true >>provider.staging.pin.swiftfiles=false >>foreach.max.threads=100 >>provenance.log=false >> >> >>=================================================================== >> >> >> >> >>________________________________ >> From: Jonathan Monette >>To: Ketan Maheshwari >>Cc: Emalayan Vairavanathan ; swift user >>Sent: Monday, 23 January 2012 11:08 AM >>Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> >> >>Emalayan, >>? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). >> >> >>On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: >> >>Emalayan, >>> >>> >>>I would check all the mappers and the resulting paths in the Swift source.? >>> >>> >>>Also try running the failed job something like this:? >>> >>> >>>cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>> >>> >>>mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >>> >>> >>>error 520 indicates workers are not able to reach the data. >>> >>> >>>Also check if swift.workdir is writable on the site by the worker nodes. >>> >>> >>>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >>> >>>Hi Ketan, >>>> >>>> >>>>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >>>> >>>> >>>> >>>>Please let me know if you have any idea. >>>> >>>> >>>> >>>>Regards >>>>Emalayan >>>> >>>> >>>> >>>>=============================================================================================== >>>> >>>>Swift 0.93 swift-r5501 cog-r3350 >>>> >>>>RunID: 20120119-1749-rjshh1r9 >>>>?(input): found 10 files >>>>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>>>Find: http://localhost:1984 >>>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>>>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>>>Exception in mConcatFit: >>>>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>>>Host: localhost >>>>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>- - - >>>> >>>>Caused by: null >>>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>>Execution failed: >>>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>> >>>> >>>> >>>> >>>>________________________________ >>>> From: Ketan Maheshwari >>>>To: Emalayan Vairavanathan >>>>Cc: swift user >>>>Sent: Thursday, 19 January 2012 4:49 PM >>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>> >>>> >>>> >>>>Emalayan, >>>> >>>> >>>>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >>>> >>>>Also, what version of Swift is this? >>>> >>>> >>>>Regards, >>>>Ketan >>>> >>>> >>>>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >>>> >>>>Dear All, >>>>> >>>>> >>>>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>>>> >>>>> >>>>> >>>>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>>>> >>>>> >>>>> >>>>>Thank you >>>>>Emalayan >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>>>??????? at java.util.TimerThread.run(Timer.java:484) >>>>>_______________________________________________ >>>>>Swift-user mailing list >>>>>Swift-user at ci.uchicago.edu >>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>> >>>> >>>> >>>> >>>>-- >>>>Ketan >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>>-- >>>Ketan >>> >>> >>> _______________________________________________ >>>Swift-user mailing list >>>Swift-user at ci.uchicago.edu >>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> > > > >-- >Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftMontage-20120123-1219-zj95uaye.log.gz Type: application/x-gzip Size: 17804 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Mon Jan 23 14:57:45 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 23 Jan 2012 14:57:45 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: Emalayan, Could you also send your swift source. Have you tried running mConcatFit from within the SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > I am using swift-0.93. I started only the coaster-service manually using > following command (workers were started automatically). > > coaster-service -port 1984 -localport 35753 -nosec > > Then application prints following output and terminates. (I have attached > the log file with this mail. Please discard the previous log file because > system was not configured properly) > > Please let me know if you need more information. > > Thank you > Emalayan > > > ==================================================================================== > Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 > > RunID: 20120123-1219-zj95uaye > (input): found 10 files > Progress: time: Mon, 23 Jan 2012 12:19:39 -0800 > > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Mon, 23 Jan 2012 12:19:41 -0800 Stage in:1 Submitted:9 > Progress: time: Mon, 23 Jan 2012 12:19:45 -0800 Active:9 Stage out:1 > Progress: time: Mon, 23 Jan 2012 12:19:46 -0800 Active:6 Stage out:2 > Finished successfully:2 > Progress: time: Mon, 23 Jan 2012 12:19:47 -0800 Submitted:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:49 -0800 Active:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:50 -0800 Submitted:1 Finished > successfully:12 > Progress: time: Mon, 23 Jan 2012 12:19:51 -0800 Stage in:12 > Submitted:5 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:52 -0800 Stage in:1 Submitted:5 > Active:9 Stage out:2 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:53 -0800 Active:5 Finished > successfully:25 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk > > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > [emalayan at node090 scripts]$ > > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:55 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > How are you starting the service? Are you starting workers manually? if > yes, could you paste commandlines for both? > > On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Thanks Ketan and Jon. I tried but it is still giving error. I have > attached the log file. > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:36 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > Likely, /tmp is not readable/writable across the machines. Could you try > changing workdir to your /home > > On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Jon, > > Please find the detail below and let me know if you have any questions > about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > jobmanager="local:local"/> > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null > null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null > null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null > null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null > null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec > null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null > null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null > null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null > null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null > null nul > > localhost Background_list > /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null > null null > localhost create_status_table > /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py > null null null > localhost mProjectPP_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null > null null > localhost mProject_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null > null null > localhost mBackground_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null > null null > localhost mDiffFit_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null > null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > ------------------------------ > *From:* Jonathan Monette > *To:* Ketan Maheshwari > *Cc:* Emalayan Vairavanathan ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:08 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see > it failing. Could you provide your config files? Please provide the tc, > sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- > b1sa4vlk > * > * > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 > fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Jan 23 15:13:58 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jan 2012 13:13:58 -0800 Subject: [Swift-user] GSISSH in Swift [was Fwd: [Swift-devel] Documentation of sites.xml] In-Reply-To: <33E27556-DD8E-407A-9610-94F8DA1AAD70@mcs.anl.gov> References: <903303148.134563.1326323490348.JavaMail.root@zimbra.anl.gov> <1326483206.31692.2.camel@blabla> <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> <1327197887.7405.2.camel@blabla> <33E27556-DD8E-407A-9610-94F8DA1AAD70@mcs.anl.gov> Message-ID: <1327353238.24069.3.camel@blabla> On Sat, 2012-01-21 at 20:20 -0600, Jonathan Monette wrote: > So we use the etc./provider-sscl.properties file instead of > an .ssh/config file? No. That's strictly for the ssh executable. > I ask this because it looks like the code that allows you to specify > a username and private key file in the sites file is still there. I > know from past testing that at least specifying a username in the > sites file was not being honored. > Right. Not when starting coasters. Jobs run through the ssh-cl provider alone should honor those. From jonmon at mcs.anl.gov Mon Jan 23 15:48:22 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Mon, 23 Jan 2012 15:48:22 -0600 Subject: [Swift-user] GSISSH in Swift [was Fwd: [Swift-devel] Documentation of sites.xml] In-Reply-To: <1327353238.24069.3.camel@blabla> References: <903303148.134563.1326323490348.JavaMail.root@zimbra.anl.gov> <1326483206.31692.2.camel@blabla> <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> <1327197887.7405.2.camel@blabla> <33E27556-DD8E-407A-9610-94F8DA1AAD70@mcs.anl.gov> <1327353238.24069.3.camel@blabla> Message-ID: On Jan 23, 2012, at 3:13 PM, Mihael Hategan wrote: > On Sat, 2012-01-21 at 20:20 -0600, Jonathan Monette wrote: >> So we use the etc./provider-sscl.properties file instead of >> an .ssh/config file? > > No. That's strictly for the ssh executable. > >> I ask this because it looks like the code that allows you to specify >> a username and private key file in the sites file is still there. I >> know from past testing that at least specifying a username in the >> sites file was not being honored. >> > > Right. Not when starting coasters. Jobs run through the ssh-cl provider > alone should honor those. > So even if ssh-cl is going to start coasters, wouldn't you still want to honor those variables? What if the machine I am using sshcl to start coasters on requires a different username than the one that it is defaulted to? I know you can use a .ssh/config file for that but then why even have the option to specify it in Swift? > From ketancmaheshwari at gmail.com Mon Jan 23 16:28:46 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 23 Jan 2012 16:28:46 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > I am using swift-0.93. I started only the coaster-service manually using > following command (workers were started automatically). > Are you aware that workers will start automatically *only* on the localhost where the service is running and not on the remote nodes. > > coaster-service -port 1984 -localport 35753 -nosec > > Then application prints following output and terminates. (I have attached > the log file with this mail. Please discard the previous log file because > system was not configured properly) > > Please let me know if you need more information. > > Thank you > Emalayan > > > ==================================================================================== > Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 > > RunID: 20120123-1219-zj95uaye > (input): found 10 files > Progress: time: Mon, 23 Jan 2012 12:19:39 -0800 > > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Mon, 23 Jan 2012 12:19:41 -0800 Stage in:1 Submitted:9 > Progress: time: Mon, 23 Jan 2012 12:19:45 -0800 Active:9 Stage out:1 > Progress: time: Mon, 23 Jan 2012 12:19:46 -0800 Active:6 Stage out:2 > Finished successfully:2 > Progress: time: Mon, 23 Jan 2012 12:19:47 -0800 Submitted:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:49 -0800 Active:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:50 -0800 Submitted:1 Finished > successfully:12 > Progress: time: Mon, 23 Jan 2012 12:19:51 -0800 Stage in:12 > Submitted:5 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:52 -0800 Stage in:1 Submitted:5 > Active:9 Stage out:2 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:53 -0800 Active:5 Finished > successfully:25 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk > > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > [emalayan at node090 scripts]$ > > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:55 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > How are you starting the service? Are you starting workers manually? if > yes, could you paste commandlines for both? > > On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Thanks Ketan and Jon. I tried but it is still giving error. I have > attached the log file. > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:36 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > Likely, /tmp is not readable/writable across the machines. Could you try > changing workdir to your /home > > On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Jon, > > Please find the detail below and let me know if you have any questions > about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > jobmanager="local:local"/> > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null > null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null > null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null > null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null > null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec > null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null > null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null > null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null > null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null > null nul > > localhost Background_list > /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null > null null > localhost create_status_table > /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py > null null null > localhost mProjectPP_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null > null null > localhost mProject_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null > null null > localhost mBackground_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null > null null > localhost mDiffFit_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null > null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > ------------------------------ > *From:* Jonathan Monette > *To:* Ketan Maheshwari > *Cc:* Emalayan Vairavanathan ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:08 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see > it failing. Could you provide your config files? Please provide the tc, > sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- > b1sa4vlk > * > * > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 > fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 23 16:52:38 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 23 Jan 2012 14:52:38 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> Hi Ketan, Please find my answers below. [Ketan] Emalayan, Could you also send your swift source. [Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 source code ? [Ketan] Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? [Emalayan] There were not such directory created. [Ketan] Are you aware that workers will start automatically *only* on the localhost where the service is running and not on the remote nodes. [Emalayan] Yes, I am aware about this. I ran both coaster-service and application scripts on the same node.But would like to know about setting up workers on other nodes too. Thank you Emalayan ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: Jonathan Monette ; swift user Sent: Monday, 23 January 2012 12:57 PM Subject: Re: [Swift-user] Montage+Swift+Coasters Emalayan, Could you also send your swift source.? Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan wrote: I am using swift-0.93. I started only the coaster-service manually using following command (workers were started automatically). > > >coaster-service -port 1984 -localport 35753 -nosec > > >Then application prints following output and terminates. (I have attached the log file with this mail. Please discard the previous log file because system was not configured properly) > > >Please let me know if you need more information. > > >Thank you >Emalayan > > > >==================================================================================== > >Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 > >RunID: 20120123-1219-zj95uaye >?(input): found 10 files >Progress:? time: Mon, 23 Jan 2012 12:19:39 -0800 > >Find: http://localhost:1984 >Find:? keepalive(120), reconnect - http://localhost:1984 >Progress:? time: Mon, 23 Jan 2012 12:19:41 -0800? Stage in:1? Submitted:9 >Progress:? time: Mon, 23 Jan 2012 12:19:45 -0800? Active:9? Stage out:1 >Progress:? time: Mon, 23 Jan 2012 12:19:46 -0800? Active:6? Stage out:2? Finished successfully:2 >Progress:? time: Mon, 23 Jan 2012 12:19:47 -0800? Submitted:1? Finished successfully:10 >Progress:? time: Mon, 23 Jan 2012 12:19:49 -0800? Active:1? Finished successfully:10 >Progress:? time: Mon, 23 Jan 2012 12:19:50 -0800? Submitted:1? Finished successfully:12 >Progress:? time: Mon, 23 Jan 2012 12:19:51 -0800? Stage in:12? Submitted:5? Finished successfully:13 >Progress:? time: Mon, 23 Jan 2012 12:19:52 -0800? Stage in:1? Submitted:5? Active:9? Stage out:2? Finished successfully:13 >Progress:? time: Mon, 23 Jan 2012 12:19:53 -0800? Active:5? Finished successfully:25 >Exception in mConcatFit: >Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, fits.tbl, stat_dir] >Host: localhost >Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk > >- - - > >Caused by: null >Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >Execution failed: >??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >[emalayan at node090 scripts]$ > > > > > >________________________________ > From: Ketan Maheshwari >To: Emalayan Vairavanathan >Cc: Jonathan Monette ; swift user >Sent: Monday, 23 January 2012 11:55 AM >Subject: Re: [Swift-user] Montage+Swift+Coasters > > > >How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? > > >On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan wrote: > >Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. >> >> >>Thank you >>Emalayan >> >> >> >> >>________________________________ >> From: Ketan Maheshwari >>To: Emalayan Vairavanathan >>Cc: Jonathan Monette ; swift user >>Sent: Monday, 23 January 2012 11:36 AM >>Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> >> >>Emalayan, >> >> >>Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home >> >> >>On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: >> >>Jon, >>> >>> >>>Please find the detail below and let me know if you have any questions about my setup. >>> >>> >>> >>>Thank you >>>Emalayan >>> >>> >>> >>>========================================================== >>> >>>site.xml >>> >>> >>> >>> >>>??? >>>??? passive >>> >>>??? 4 >>>??? 100000 >>>??? 100 >>>??? 100 >>>??? 100 >>>??? 1 >>>??? 10 >>>??? 25.00 >>>??? 10000 >>>??? proxy >>>??? >>>??? /tmp/swift.workdir >>>? >>> >>> >>> >>> >>>======================================================= >>> >>> >>>tc >>> >>> >>>localhost sh /bin/sh null null null >>>localhost cat /bin/cat null null null >>>localhost echo /bin/echo null null null >>>localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null >>>localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null >>>localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null >>>localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null >>>localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null >>>localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null >>>localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null >>>localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null >>>localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null >>>localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null >>>localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul >>> >>>localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null >>>localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null >>>localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null >>>localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null >>>localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null >>>localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null >>> >>> >>>================================================================= >>> >>> >>>cf >>> >>> >>>wrapperlog.always.transfer=true >>>sitedir.keep=true >>>execution.retries=1 >>>lazy.errors=true >>>status.mode=provider >>>use.provider.staging=true >>>provider.staging.pin.swiftfiles=false >>>foreach.max.threads=100 >>>provenance.log=false >>> >>> >>>=================================================================== >>> >>> >>> >>> >>>________________________________ >>> From: Jonathan Monette >>>To: Ketan Maheshwari >>>Cc: Emalayan Vairavanathan ; swift user >>>Sent: Monday, 23 January 2012 11:08 AM >>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>> >>> >>> >>>Emalayan, >>>? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). >>> >>> >>>On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: >>> >>>Emalayan, >>>> >>>> >>>>I would check all the mappers and the resulting paths in the Swift source.? >>>> >>>> >>>>Also try running the failed job something like this:? >>>> >>>> >>>>cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>> >>>> >>>>mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >>>> >>>> >>>>error 520 indicates workers are not able to reach the data. >>>> >>>> >>>>Also check if swift.workdir is writable on the site by the worker nodes. >>>> >>>> >>>>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >>>> >>>>Hi Ketan, >>>>> >>>>> >>>>>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >>>>> >>>>> >>>>> >>>>>Please let me know if you have any idea. >>>>> >>>>> >>>>> >>>>>Regards >>>>>Emalayan >>>>> >>>>> >>>>> >>>>>=============================================================================================== >>>>> >>>>>Swift 0.93 swift-r5501 cog-r3350 >>>>> >>>>>RunID: 20120119-1749-rjshh1r9 >>>>>?(input): found 10 files >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>>>>Find: http://localhost:1984 >>>>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>>>>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>>>>Exception in mConcatFit: >>>>>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>>>>Host: localhost >>>>>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>>- - - >>>>> >>>>>Caused by: null >>>>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>>>Execution failed: >>>>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>>> >>>>> >>>>> >>>>> >>>>>________________________________ >>>>> From: Ketan Maheshwari >>>>>To: Emalayan Vairavanathan >>>>>Cc: swift user >>>>>Sent: Thursday, 19 January 2012 4:49 PM >>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>> >>>>> >>>>> >>>>>Emalayan, >>>>> >>>>> >>>>>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >>>>> >>>>>Also, what version of Swift is this? >>>>> >>>>> >>>>>Regards, >>>>>Ketan >>>>> >>>>> >>>>>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >>>>> >>>>>Dear All, >>>>>> >>>>>> >>>>>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>>>>> >>>>>> >>>>>> >>>>>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>>>>> >>>>>> >>>>>> >>>>>>Thank you >>>>>>Emalayan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>>>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>>>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>>>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>>>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>>>>??????? at java.util.TimerThread.run(Timer.java:484) >>>>>>_______________________________________________ >>>>>>Swift-user mailing list >>>>>>Swift-user at ci.uchicago.edu >>>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>Ketan >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>>-- >>>>Ketan >>>> >>>> >>>> _______________________________________________ >>>>Swift-user mailing list >>>>Swift-user at ci.uchicago.edu >>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >>> >>> >> >> >> >>-- >>Ketan >> >> >> >> >> > > > >-- >Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Jan 23 20:38:38 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 23 Jan 2012 20:38:38 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> Message-ID: On Mon, Jan 23, 2012 at 4:52 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Hi Ketan, > > Please find my answers below. > > [Ketan] Emalayan, Could you also send your swift source. > [Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 > source code ? > Montage script > > > [Ketan] Have you tried running mConcatFit from within the > SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > [Emalayan] There were not such directory created. > should be in your workdir. > > > [Ketan] Are you aware that workers will start automatically *only* on the > localhost where the service is running and not on the remote nodes. > [Emalayan] Yes, I am aware about this. I ran both coaster-service and > application scripts on the same node. But would like to know about > setting up workers on other nodes too. > you may run worker.pl manually. or better put in a for loop in a simple shell script to run multiple workers. commandline is something like: worker.pl label /path/to/log > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 12:57 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, Could you also send your swift source. > > Have you tried running mConcatFit from within the > SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > > On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > I am using swift-0.93. I started only the coaster-service manually using > following command (workers were started automatically). > > coaster-service -port 1984 -localport 35753 -nosec > > Then application prints following output and terminates. (I have attached > the log file with this mail. Please discard the previous log file because > system was not configured properly) > > Please let me know if you need more information. > > Thank you > Emalayan > > > ==================================================================================== > Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 > > RunID: 20120123-1219-zj95uaye > (input): found 10 files > Progress: time: Mon, 23 Jan 2012 12:19:39 -0800 > > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Mon, 23 Jan 2012 12:19:41 -0800 Stage in:1 Submitted:9 > Progress: time: Mon, 23 Jan 2012 12:19:45 -0800 Active:9 Stage out:1 > Progress: time: Mon, 23 Jan 2012 12:19:46 -0800 Active:6 Stage out:2 > Finished successfully:2 > Progress: time: Mon, 23 Jan 2012 12:19:47 -0800 Submitted:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:49 -0800 Active:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:50 -0800 Submitted:1 Finished > successfully:12 > Progress: time: Mon, 23 Jan 2012 12:19:51 -0800 Stage in:12 > Submitted:5 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:52 -0800 Stage in:1 Submitted:5 > Active:9 Stage out:2 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:53 -0800 Active:5 Finished > successfully:25 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk > > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > [emalayan at node090 scripts]$ > > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:55 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > How are you starting the service? Are you starting workers manually? if > yes, could you paste commandlines for both? > > On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Thanks Ketan and Jon. I tried but it is still giving error. I have > attached the log file. > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:36 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > Likely, /tmp is not readable/writable across the machines. Could you try > changing workdir to your /home > > On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Jon, > > Please find the detail below and let me know if you have any questions > about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > jobmanager="local:local"/> > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null > null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null > null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null > null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null > null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec > null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null > null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null > null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null > null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null > null nul > > localhost Background_list > /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null > null null > localhost create_status_table > /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py > null null null > localhost mProjectPP_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null > null null > localhost mProject_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null > null null > localhost mBackground_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null > null null > localhost mDiffFit_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null > null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > ------------------------------ > *From:* Jonathan Monette > *To:* Ketan Maheshwari > *Cc:* Emalayan Vairavanathan ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:08 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see > it failing. Could you provide your config files? Please provide the tc, > sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- > b1sa4vlk > * > * > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 > fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 23 21:21:02 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 23 Jan 2012 19:21:02 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> Message-ID: <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> Hi Ketan, Please find the attached source code. Also I couldn't find SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory inside workdir. Please let me know if you need more information Thank you Emalayan ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: Jonathan Monette ; swift user Sent: Monday, 23 January 2012 6:38 PM Subject: Re: [Swift-user] Montage+Swift+Coasters On Mon, Jan 23, 2012 at 4:52 PM, Emalayan Vairavanathan wrote: Hi Ketan, > > > >Please find my answers below. > > > >[Ketan] Emalayan, Could you also send your swift source. > >[Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 source code ? Montage script ? > >[Ketan] Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? >[Emalayan] There were not such directory created. should be in your workdir. ? > > > >[Ketan] Are you aware that workers will start automatically *only* on the localhost where the service is running and not on the remote nodes. >[Emalayan] Yes, I am aware about this. I ran both coaster-service and application scripts on the same node. But would like to know about setting up workers on other nodes too. you may run worker.pl manually. or better put in a for loop in a simple shell script to run multiple workers. commandline is something like: worker.pl label /path/to/log ? > >Thank you >Emalayan > > > > >________________________________ > From: Ketan Maheshwari >To: Emalayan Vairavanathan >Cc: Jonathan Monette ; swift user >Sent: Monday, 23 January 2012 12:57 PM >Subject: Re: [Swift-user] Montage+Swift+Coasters > > > >Emalayan, Could you also send your swift source.? > > >Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > > >On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan wrote: > >I am using swift-0.93. I started only the coaster-service manually using following command (workers were started automatically). >> >> >>coaster-service -port 1984 -localport 35753 -nosec >> >> >>Then application prints following output and terminates. (I have attached the log file with this mail. Please discard the previous log file because system was not configured properly) >> >> >>Please let me know if you need more information. >> >> >>Thank you >>Emalayan >> >> >> >>==================================================================================== >> >>Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 >> >>RunID: 20120123-1219-zj95uaye >>?(input): found 10 files >>Progress:? time: Mon, 23 Jan 2012 12:19:39 -0800 >> >>Find: http://localhost:1984 >>Find:? keepalive(120), reconnect - http://localhost:1984 >>Progress:? time: Mon, 23 Jan 2012 12:19:41 -0800? Stage in:1? Submitted:9 >>Progress:? time: Mon, 23 Jan 2012 12:19:45 -0800? Active:9? Stage out:1 >>Progress:? time: Mon, 23 Jan 2012 12:19:46 -0800? Active:6? Stage out:2? Finished successfully:2 >>Progress:? time: Mon, 23 Jan 2012 12:19:47 -0800? Submitted:1? Finished successfully:10 >>Progress:? time: Mon, 23 Jan 2012 12:19:49 -0800? Active:1? Finished successfully:10 >>Progress:? time: Mon, 23 Jan 2012 12:19:50 -0800? Submitted:1? Finished successfully:12 >>Progress:? time: Mon, 23 Jan 2012 12:19:51 -0800? Stage in:12? Submitted:5? Finished successfully:13 >>Progress:? time: Mon, 23 Jan 2012 12:19:52 -0800? Stage in:1? Submitted:5? Active:9? Stage out:2? Finished successfully:13 >>Progress:? time: Mon, 23 Jan 2012 12:19:53 -0800? Active:5? Finished successfully:25 >>Exception in mConcatFit: >>Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, fits.tbl, stat_dir] >>Host: localhost >>Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk >> >>- - - >> >>Caused by: null >>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>Execution failed: >>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>[emalayan at node090 scripts]$ >> >> >> >> >> >>________________________________ >> From: Ketan Maheshwari >>To: Emalayan Vairavanathan >>Cc: Jonathan Monette ; swift user >>Sent: Monday, 23 January 2012 11:55 AM >>Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> >> >>How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? >> >> >>On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan wrote: >> >>Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. >>> >>> >>>Thank you >>>Emalayan >>> >>> >>> >>> >>>________________________________ >>> From: Ketan Maheshwari >>>To: Emalayan Vairavanathan >>>Cc: Jonathan Monette ; swift user >>>Sent: Monday, 23 January 2012 11:36 AM >>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>> >>> >>> >>>Emalayan, >>> >>> >>>Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home >>> >>> >>>On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: >>> >>>Jon, >>>> >>>> >>>>Please find the detail below and let me know if you have any questions about my setup. >>>> >>>> >>>> >>>>Thank you >>>>Emalayan >>>> >>>> >>>> >>>>========================================================== >>>> >>>>site.xml >>>> >>>> >>>> >>>> >>>>??? >>>>??? passive >>>> >>>>??? 4 >>>>??? 100000 >>>>??? 100 >>>>??? 100 >>>>??? 100 >>>>??? 1 >>>>??? 10 >>>>??? 25.00 >>>>??? 10000 >>>>??? proxy >>>>??? >>>>??? /tmp/swift.workdir >>>>? >>>> >>>> >>>> >>>> >>>>======================================================= >>>> >>>> >>>>tc >>>> >>>> >>>>localhost sh /bin/sh null null null >>>>localhost cat /bin/cat null null null >>>>localhost echo /bin/echo null null null >>>>localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null >>>>localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null >>>>localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null >>>>localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null >>>>localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null >>>>localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null >>>>localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null >>>>localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null >>>>localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null >>>>localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null >>>>localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul >>>> >>>>localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null >>>>localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null >>>>localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null >>>>localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null >>>>localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null >>>>localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null >>>> >>>> >>>>================================================================= >>>> >>>> >>>>cf >>>> >>>> >>>>wrapperlog.always.transfer=true >>>>sitedir.keep=true >>>>execution.retries=1 >>>>lazy.errors=true >>>>status.mode=provider >>>>use.provider.staging=true >>>>provider.staging.pin.swiftfiles=false >>>>foreach.max.threads=100 >>>>provenance.log=false >>>> >>>> >>>>=================================================================== >>>> >>>> >>>> >>>> >>>>________________________________ >>>> From: Jonathan Monette >>>>To: Ketan Maheshwari >>>>Cc: Emalayan Vairavanathan ; swift user >>>>Sent: Monday, 23 January 2012 11:08 AM >>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>> >>>> >>>> >>>>Emalayan, >>>>? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). >>>> >>>> >>>>On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: >>>> >>>>Emalayan, >>>>> >>>>> >>>>>I would check all the mappers and the resulting paths in the Swift source.? >>>>> >>>>> >>>>>Also try running the failed job something like this:? >>>>> >>>>> >>>>>cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>> >>>>> >>>>>mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >>>>> >>>>> >>>>>error 520 indicates workers are not able to reach the data. >>>>> >>>>> >>>>>Also check if swift.workdir is writable on the site by the worker nodes. >>>>> >>>>> >>>>>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >>>>> >>>>>Hi Ketan, >>>>>> >>>>>> >>>>>>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >>>>>> >>>>>> >>>>>> >>>>>>Please let me know if you have any idea. >>>>>> >>>>>> >>>>>> >>>>>>Regards >>>>>>Emalayan >>>>>> >>>>>> >>>>>> >>>>>>=============================================================================================== >>>>>> >>>>>>Swift 0.93 swift-r5501 cog-r3350 >>>>>> >>>>>>RunID: 20120119-1749-rjshh1r9 >>>>>>?(input): found 10 files >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>>>>>Find: http://localhost:1984 >>>>>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>>>>>Exception in mConcatFit: >>>>>>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>>>>>Host: localhost >>>>>>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>>>- - - >>>>>> >>>>>>Caused by: null >>>>>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>>>>Execution failed: >>>>>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>________________________________ >>>>>> From: Ketan Maheshwari >>>>>>To: Emalayan Vairavanathan >>>>>>Cc: swift user >>>>>>Sent: Thursday, 19 January 2012 4:49 PM >>>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>>> >>>>>> >>>>>> >>>>>>Emalayan, >>>>>> >>>>>> >>>>>>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >>>>>> >>>>>>Also, what version of Swift is this? >>>>>> >>>>>> >>>>>>Regards, >>>>>>Ketan >>>>>> >>>>>> >>>>>>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >>>>>> >>>>>>Dear All, >>>>>>> >>>>>>> >>>>>>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>>>>>> >>>>>>> >>>>>>> >>>>>>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>>>>>> >>>>>>> >>>>>>> >>>>>>>Thank you >>>>>>>Emalayan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>>>>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>>>>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>>>>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>>>>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>>>>>??????? at java.util.TimerThread.run(Timer.java:484) >>>>>>>_______________________________________________ >>>>>>>Swift-user mailing list >>>>>>>Swift-user at ci.uchicago.edu >>>>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>-- >>>>>>Ketan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>Ketan >>>>> >>>>> >>>>> _______________________________________________ >>>>>Swift-user mailing list >>>>>Swift-user at ci.uchicago.edu >>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>>> >>>> >>> >>> >>> >>>-- >>>Ketan >>> >>> >>> >>> >>> >> >> >> >>-- >>Ketan >> >> >> >> >> > > > >-- >Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftMontage_new.tar.gz Type: application/x-gzip Size: 4920 bytes Desc: not available URL: From hategan at mcs.anl.gov Mon Jan 23 21:25:51 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 23 Jan 2012 19:25:51 -0800 Subject: [Swift-user] GSISSH in Swift [was Fwd: [Swift-devel] Documentation of sites.xml] In-Reply-To: References: <903303148.134563.1326323490348.JavaMail.root@zimbra.anl.gov> <1326483206.31692.2.camel@blabla> <980E744D-AA8E-4014-90B1-701A7D03F421@mcs.anl.gov> <1327197887.7405.2.camel@blabla> <33E27556-DD8E-407A-9610-94F8DA1AAD70@mcs.anl.gov> <1327353238.24069.3.camel@blabla> Message-ID: <1327375551.28494.3.camel@blabla> On Mon, 2012-01-23 at 15:48 -0600, Jonathan Monette wrote: > > Right. Not when starting coasters. Jobs run through the ssh-cl provider > > alone should honor those. > > > So even if ssh-cl is going to start coasters, wouldn't you still want > to honor those variables? Technically those variables apply to the provider that runs the jobs (coasters in this case) not the provider that starts the service (ssh-cl in this case). There is currently no way for the coaster provider to know what variables to forward to what provider. > What if the machine I am using sshcl to start coasters on requires a > different username than the one that it is defaulted to? .ssh/config, the same way .ssh/auth.defaults worked. > I know you can use a .ssh/config file for that but then why even > have the option to specify it in Swift? Good point. I will remove them, since even in the plain ssh-cl case, .ssh/config should suffice. From ketancmaheshwari at gmail.com Mon Jan 23 21:41:07 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 23 Jan 2012 21:41:07 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> Message-ID: On Mon, Jan 23, 2012 at 9:21 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Hi Ketan, > > Please find the attached source code. Also I couldn't find > SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory inside > workdir. > try again setting this to false in your config: wrapperlog.always.transfer=true > > Please let me know if you need more information > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 6:38 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > > > On Mon, Jan 23, 2012 at 4:52 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > Please find my answers below. > > [Ketan] Emalayan, Could you also send your swift source. > [Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 > source code ? > > > Montage script > > > > > [Ketan] Have you tried running mConcatFit from within the > SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > [Emalayan] There were not such directory created. > > > should be in your workdir. > > > > > [Ketan] Are you aware that workers will start automatically *only* on the > localhost where the service is running and not on the remote nodes. > [Emalayan] Yes, I am aware about this. I ran both coaster-service and > application scripts on the same node. But would like to know about > setting up workers on other nodes too. > > > you may run worker.pl manually. or better put in a for loop in a simple > shell script to run multiple workers. commandline is something like: > worker.pl label /path/to/log > > > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 12:57 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, Could you also send your swift source. > > Have you tried running mConcatFit from within the > SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > > On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > I am using swift-0.93. I started only the coaster-service manually using > following command (workers were started automatically). > > coaster-service -port 1984 -localport 35753 -nosec > > Then application prints following output and terminates. (I have attached > the log file with this mail. Please discard the previous log file because > system was not configured properly) > > Please let me know if you need more information. > > Thank you > Emalayan > > > ==================================================================================== > Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 > > RunID: 20120123-1219-zj95uaye > (input): found 10 files > Progress: time: Mon, 23 Jan 2012 12:19:39 -0800 > > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Mon, 23 Jan 2012 12:19:41 -0800 Stage in:1 Submitted:9 > Progress: time: Mon, 23 Jan 2012 12:19:45 -0800 Active:9 Stage out:1 > Progress: time: Mon, 23 Jan 2012 12:19:46 -0800 Active:6 Stage out:2 > Finished successfully:2 > Progress: time: Mon, 23 Jan 2012 12:19:47 -0800 Submitted:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:49 -0800 Active:1 Finished > successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:50 -0800 Submitted:1 Finished > successfully:12 > Progress: time: Mon, 23 Jan 2012 12:19:51 -0800 Stage in:12 > Submitted:5 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:52 -0800 Stage in:1 Submitted:5 > Active:9 Stage out:2 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:53 -0800 Active:5 Finished > successfully:25 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk > > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > [emalayan at node090 scripts]$ > > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:55 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > How are you starting the service? Are you starting workers manually? if > yes, could you paste commandlines for both? > > On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Thanks Ketan and Jon. I tried but it is still giving error. I have > attached the log file. > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* Jonathan Monette ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:36 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > Likely, /tmp is not readable/writable across the machines. Could you try > changing workdir to your /home > > On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Jon, > > Please find the detail below and let me know if you have any questions > about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > jobmanager="local:local"/> > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null > null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null > null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null > null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null > null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec > null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null > null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null > null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null > null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null > null nul > > localhost Background_list > /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null > null null > localhost create_status_table > /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py > null null null > localhost mProjectPP_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null > null null > localhost mProject_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null > null null > localhost mBackground_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null > null null > localhost mDiffFit_wrap > /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null > null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > ------------------------------ > *From:* Jonathan Monette > *To:* Ketan Maheshwari > *Cc:* Emalayan Vairavanathan ; swift user < > swift-user at ci.uchicago.edu> > *Sent:* Monday, 23 January 2012 11:08 AM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see > it failing. Could you provide your config files? Please provide the tc, > sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > > Emalayan, > > I would check all the mappers and the resulting paths in the Swift source. > > Also try running the failed job something like this: > > cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit- > b1sa4vlk > * > * > mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 > fits.tbl stat_dir > > error 520 indicates workers are not able to reach the data. > > Also check if swift.workdir is writable on the site by the worker nodes. > > On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Hi Ketan, > > This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 > and getting totally different error messages with swift 0.93. I can ask > Jon about these messages. (These scripts was working well with only Swift) > > Please let me know if you have any idea. > > Regards > Emalayan > > > =============================================================================================== > Swift 0.93 swift-r5501 cog-r3350 > > RunID: 20120119-1749-rjshh1r9 > (input): found 10 files > Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 > Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 > Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished > successfully:7 > Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished > successfully:10 > Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 > Submitting:11 Submitted:6 Finished successfully:12 > Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 > Active:6 Stage out:2 Finished successfully:17 > Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished > successfully:30 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, > fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - > Closed not derived due to errors in data dependencies > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Thursday, 19 January 2012 4:49 PM > *Subject:* Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > From your symptoms, it seems you are facing the same issue as I've been. > Could you tell more about the amount of data that needs to be staged to run > the Montage stages during which these warnings turn up? How much time > elapses since the start of your workflow after which you see these messages? > > Also, what version of Swift is this? > > Regards, > Ketan > > On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > I have a problem in running Montage with Coasters (in our local cluster > - no batch schedulers). After few stages the swift run-time continuously > prints the warnings below. Any ideas ? Should I increase the heartbeat > count ? > > Everything works fine when I try to run the same montage-scripts with > swift on a single machine. > > Thank you > Emalayan > > > 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): > handling reply timeout; sendReqTime=120119-153609.206, > sendTime=120119-153609.206, now=120119-153809.207 > 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): > re-sending > 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault > was: Reply timeout > org.globus.cog.karajan.workflow.service.ReplyTimeoutException > at > org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) > at > org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) > at java.util.TimerThread.mainLoop(Timer.java:534) > at java.util.TimerThread.run(Timer.java:484) > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 23 21:55:49 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 23 Jan 2012 19:55:49 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> Message-ID: <1327377349.65265.YahooMailNeo@web39505.mail.mud.yahoo.com> Hi Ketan, I tired but getting the same error. ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: Jonathan Monette ; swift user Sent: Monday, 23 January 2012 7:41 PM Subject: Re: [Swift-user] Montage+Swift+Coasters On Mon, Jan 23, 2012 at 9:21 PM, Emalayan Vairavanathan wrote: Hi Ketan, > > > >Please find the attached source code. Also I couldn't find SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory inside workdir. try again setting this to false in your config: wrapperlog.always.transfer=true ? > >Please let me know if you need more information > > > >Thank you >Emalayan > > > > >________________________________ > From: Ketan Maheshwari >To: Emalayan Vairavanathan >Cc: Jonathan Monette ; swift user >Sent: Monday, 23 January 2012 6:38 PM >Subject: Re: [Swift-user] Montage+Swift+Coasters > > > > > > >On Mon, Jan 23, 2012 at 4:52 PM, Emalayan Vairavanathan wrote: > >Hi Ketan, >> >> >> >>Please find my answers below. >> >> >> >>[Ketan] Emalayan, Could you also send your swift source. >> >>[Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 source code ? > > >Montage script >? > >> >>[Ketan] Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? >>[Emalayan] There were not such directory created. > > >should be in your workdir. >? > >> >> >> >>[Ketan] Are you aware that workers will start automatically *only* on the localhost where the service is running and not on the remote nodes. >>[Emalayan] Yes, I am aware about this. I ran both coaster-service and application scripts on the same node. But would like to know about setting up workers on other nodes too. > > >you may run worker.pl manually. or better put in a for loop in a simple shell script to run multiple workers. commandline is something like: worker.pl label /path/to/log >? > >> >>Thank you >>Emalayan >> >> >> >> >>________________________________ >> From: Ketan Maheshwari >>To: Emalayan Vairavanathan >>Cc: Jonathan Monette ; swift user >>Sent: Monday, 23 January 2012 12:57 PM >>Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> >> >>Emalayan, Could you also send your swift source.? >> >> >>Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? >> >> >>On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan wrote: >> >>I am using swift-0.93. I started only the coaster-service manually using following command (workers were started automatically). >>> >>> >>>coaster-service -port 1984 -localport 35753 -nosec >>> >>> >>>Then application prints following output and terminates. (I have attached the log file with this mail. Please discard the previous log file because system was not configured properly) >>> >>> >>>Please let me know if you need more information. >>> >>> >>>Thank you >>>Emalayan >>> >>> >>> >>>==================================================================================== >>> >>>Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 >>> >>>RunID: 20120123-1219-zj95uaye >>>?(input): found 10 files >>>Progress:? time: Mon, 23 Jan 2012 12:19:39 -0800 >>> >>>Find: http://localhost:1984 >>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>Progress:? time: Mon, 23 Jan 2012 12:19:41 -0800? Stage in:1? Submitted:9 >>>Progress:? time: Mon, 23 Jan 2012 12:19:45 -0800? Active:9? Stage out:1 >>>Progress:? time: Mon, 23 Jan 2012 12:19:46 -0800? Active:6? Stage out:2? Finished successfully:2 >>>Progress:? time: Mon, 23 Jan 2012 12:19:47 -0800? Submitted:1? Finished successfully:10 >>>Progress:? time: Mon, 23 Jan 2012 12:19:49 -0800? Active:1? Finished successfully:10 >>>Progress:? time: Mon, 23 Jan 2012 12:19:50 -0800? Submitted:1? Finished successfully:12 >>>Progress:? time: Mon, 23 Jan 2012 12:19:51 -0800? Stage in:12? Submitted:5? Finished successfully:13 >>>Progress:? time: Mon, 23 Jan 2012 12:19:52 -0800? Stage in:1? Submitted:5? Active:9? Stage out:2? Finished successfully:13 >>>Progress:? time: Mon, 23 Jan 2012 12:19:53 -0800? Active:5? Finished successfully:25 >>>Exception in mConcatFit: >>>Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, fits.tbl, stat_dir] >>>Host: localhost >>>Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk >>> >>>- - - >>> >>>Caused by: null >>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>Execution failed: >>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>[emalayan at node090 scripts]$ >>> >>> >>> >>> >>> >>>________________________________ >>> From: Ketan Maheshwari >>>To: Emalayan Vairavanathan >>>Cc: Jonathan Monette ; swift user >>>Sent: Monday, 23 January 2012 11:55 AM >>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>> >>> >>> >>>How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? >>> >>> >>>On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan wrote: >>> >>>Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. >>>> >>>> >>>>Thank you >>>>Emalayan >>>> >>>> >>>> >>>> >>>>________________________________ >>>> From: Ketan Maheshwari >>>>To: Emalayan Vairavanathan >>>>Cc: Jonathan Monette ; swift user >>>>Sent: Monday, 23 January 2012 11:36 AM >>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>> >>>> >>>> >>>>Emalayan, >>>> >>>> >>>>Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home >>>> >>>> >>>>On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: >>>> >>>>Jon, >>>>> >>>>> >>>>>Please find the detail below and let me know if you have any questions about my setup. >>>>> >>>>> >>>>> >>>>>Thank you >>>>>Emalayan >>>>> >>>>> >>>>> >>>>>========================================================== >>>>> >>>>>site.xml >>>>> >>>>> >>>>> >>>>> >>>>>??? >>>>>??? passive >>>>> >>>>>??? 4 >>>>>??? 100000 >>>>>??? 100 >>>>>??? 100 >>>>>??? 100 >>>>>??? 1 >>>>>??? 10 >>>>>??? 25.00 >>>>>??? 10000 >>>>>??? proxy >>>>>??? >>>>>??? /tmp/swift.workdir >>>>>? >>>>> >>>>> >>>>> >>>>> >>>>>======================================================= >>>>> >>>>> >>>>>tc >>>>> >>>>> >>>>>localhost sh /bin/sh null null null >>>>>localhost cat /bin/cat null null null >>>>>localhost echo /bin/echo null null null >>>>>localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null >>>>>localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null >>>>>localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null >>>>>localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null >>>>>localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null >>>>>localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null >>>>>localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null >>>>>localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null >>>>>localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null >>>>>localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null >>>>>localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul >>>>> >>>>>localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null >>>>>localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null >>>>>localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null >>>>>localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null >>>>>localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null >>>>>localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null >>>>> >>>>> >>>>>================================================================= >>>>> >>>>> >>>>>cf >>>>> >>>>> >>>>>wrapperlog.always.transfer=true >>>>>sitedir.keep=true >>>>>execution.retries=1 >>>>>lazy.errors=true >>>>>status.mode=provider >>>>>use.provider.staging=true >>>>>provider.staging.pin.swiftfiles=false >>>>>foreach.max.threads=100 >>>>>provenance.log=false >>>>> >>>>> >>>>>=================================================================== >>>>> >>>>> >>>>> >>>>> >>>>>________________________________ >>>>> From: Jonathan Monette >>>>>To: Ketan Maheshwari >>>>>Cc: Emalayan Vairavanathan ; swift user >>>>>Sent: Monday, 23 January 2012 11:08 AM >>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>> >>>>> >>>>> >>>>>Emalayan, >>>>>? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). >>>>> >>>>> >>>>>On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: >>>>> >>>>>Emalayan, >>>>>> >>>>>> >>>>>>I would check all the mappers and the resulting paths in the Swift source.? >>>>>> >>>>>> >>>>>>Also try running the failed job something like this:? >>>>>> >>>>>> >>>>>>cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>>> >>>>>> >>>>>>mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >>>>>> >>>>>> >>>>>>error 520 indicates workers are not able to reach the data. >>>>>> >>>>>> >>>>>>Also check if swift.workdir is writable on the site by the worker nodes. >>>>>> >>>>>> >>>>>>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >>>>>> >>>>>>Hi Ketan, >>>>>>> >>>>>>> >>>>>>>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >>>>>>> >>>>>>> >>>>>>> >>>>>>>Please let me know if you have any idea. >>>>>>> >>>>>>> >>>>>>> >>>>>>>Regards >>>>>>>Emalayan >>>>>>> >>>>>>> >>>>>>> >>>>>>>=============================================================================================== >>>>>>> >>>>>>>Swift 0.93 swift-r5501 cog-r3350 >>>>>>> >>>>>>>RunID: 20120119-1749-rjshh1r9 >>>>>>>?(input): found 10 files >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>>>>>>Find: http://localhost:1984 >>>>>>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>>>>>>Exception in mConcatFit: >>>>>>>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>>>>>>Host: localhost >>>>>>>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>>>>- - - >>>>>>> >>>>>>>Caused by: null >>>>>>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>>>>>Execution failed: >>>>>>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>________________________________ >>>>>>> From: Ketan Maheshwari >>>>>>>To: Emalayan Vairavanathan >>>>>>>Cc: swift user >>>>>>>Sent: Thursday, 19 January 2012 4:49 PM >>>>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>>>> >>>>>>> >>>>>>> >>>>>>>Emalayan, >>>>>>> >>>>>>> >>>>>>>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >>>>>>> >>>>>>>Also, what version of Swift is this? >>>>>>> >>>>>>> >>>>>>>Regards, >>>>>>>Ketan >>>>>>> >>>>>>> >>>>>>>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >>>>>>> >>>>>>>Dear All, >>>>>>>> >>>>>>>> >>>>>>>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Thank you >>>>>>>>Emalayan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>>>>>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>>>>>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>>>>>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>>>>>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>>>>>>??????? at java.util.TimerThread.run(Timer.java:484) >>>>>>>>_______________________________________________ >>>>>>>>Swift-user mailing list >>>>>>>>Swift-user at ci.uchicago.edu >>>>>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>-- >>>>>>>Ketan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>-- >>>>>>Ketan >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>>Swift-user mailing list >>>>>>Swift-user at ci.uchicago.edu >>>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>>-- >>>>Ketan >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>>-- >>>Ketan >>> >>> >>> >>> >>> >> >> >> >>-- >>Ketan >> >> >> >> >> > > > >-- >Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Tue Jan 24 10:58:14 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 24 Jan 2012 10:58:14 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327377349.65265.YahooMailNeo@web39505.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327377349.65265.YahooM ailNeo@web39505.mail.mud.yahoo.com> Message-ID: Ketan has given a lot of tips that I would have as well. Two things, can you set lazy.errors=false in the cf file? This may give us a different error since the script will fail immediately instead of trying to continue. The other thing is, I have not tried these scripts with provider staging turned on. This may be what is causing the data problem. First try the above to see if we get a different error or at least better information. On Jan 23, 2012, at 9:55 PM, Emalayan Vairavanathan wrote: > Hi Ketan, I tired but getting the same error. > > From: Ketan Maheshwari > To: Emalayan Vairavanathan > Cc: Jonathan Monette ; swift user > Sent: Monday, 23 January 2012 7:41 PM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > > > On Mon, Jan 23, 2012 at 9:21 PM, Emalayan Vairavanathan wrote: > Hi Ketan, > > Please find the attached source code. Also I couldn't find SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory inside workdir. > > try again setting this to false in your config: > wrapperlog.always.transfer=true > > > Please let me know if you need more information > > Thank you > Emalayan > > From: Ketan Maheshwari > To: Emalayan Vairavanathan > Cc: Jonathan Monette ; swift user > Sent: Monday, 23 January 2012 6:38 PM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > > > On Mon, Jan 23, 2012 at 4:52 PM, Emalayan Vairavanathan wrote: > Hi Ketan, > > Please find my answers below. > > [Ketan] Emalayan, Could you also send your swift source. > [Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 source code ? > > Montage script > > > > [Ketan] Have you tried running mConcatFit from within the SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > [Emalayan] There were not such directory created. > > should be in your workdir. > > > > [Ketan] Are you aware that workers will start automatically *only* on the localhost where the service is running and not on the remote nodes. > [Emalayan] Yes, I am aware about this. I ran both coaster-service and application scripts on the same node. But would like to know about setting up workers on other nodes too. > > you may run worker.pl manually. or better put in a for loop in a simple shell script to run multiple workers. commandline is something like: worker.pl label /path/to/log > > > Thank you > Emalayan > > From: Ketan Maheshwari > To: Emalayan Vairavanathan > Cc: Jonathan Monette ; swift user > Sent: Monday, 23 January 2012 12:57 PM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, Could you also send your swift source. > > Have you tried running mConcatFit from within the SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? > > On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan wrote: > I am using swift-0.93. I started only the coaster-service manually using following command (workers were started automatically). > > coaster-service -port 1984 -localport 35753 -nosec > > Then application prints following output and terminates. (I have attached the log file with this mail. Please discard the previous log file because system was not configured properly) > > Please let me know if you need more information. > > Thank you > Emalayan > > ==================================================================================== > Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 > > RunID: 20120123-1219-zj95uaye > (input): found 10 files > Progress: time: Mon, 23 Jan 2012 12:19:39 -0800 > > Find: http://localhost:1984 > Find: keepalive(120), reconnect - http://localhost:1984 > Progress: time: Mon, 23 Jan 2012 12:19:41 -0800 Stage in:1 Submitted:9 > Progress: time: Mon, 23 Jan 2012 12:19:45 -0800 Active:9 Stage out:1 > Progress: time: Mon, 23 Jan 2012 12:19:46 -0800 Active:6 Stage out:2 Finished successfully:2 > Progress: time: Mon, 23 Jan 2012 12:19:47 -0800 Submitted:1 Finished successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:49 -0800 Active:1 Finished successfully:10 > Progress: time: Mon, 23 Jan 2012 12:19:50 -0800 Submitted:1 Finished successfully:12 > Progress: time: Mon, 23 Jan 2012 12:19:51 -0800 Stage in:12 Submitted:5 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:52 -0800 Stage in:1 Submitted:5 Active:9 Stage out:2 Finished successfully:13 > Progress: time: Mon, 23 Jan 2012 12:19:53 -0800 Active:5 Finished successfully:25 > Exception in mConcatFit: > Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, fits.tbl, stat_dir] > Host: localhost > Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk > > - - - > > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 > Execution failed: > back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies > [emalayan at node090 scripts]$ > > > From: Ketan Maheshwari > To: Emalayan Vairavanathan > Cc: Jonathan Monette ; swift user > Sent: Monday, 23 January 2012 11:55 AM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? > > On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan wrote: > Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. > > Thank you > Emalayan > > From: Ketan Maheshwari > To: Emalayan Vairavanathan > Cc: Jonathan Monette ; swift user > Sent: Monday, 23 January 2012 11:36 AM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > > Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home > > On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: > Jon, > > Please find the detail below and let me know if you have any questions about my setup. > > Thank you > Emalayan > > ========================================================== > site.xml > > > > > passive > > 4 > 100000 > 100 > 100 > 100 > 1 > 10 > 25.00 > 10000 > proxy > > /tmp/swift.workdir > > > > ======================================================= > > tc > > localhost sh /bin/sh null null null > localhost cat /bin/cat null null null > localhost echo /bin/echo null null null > localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null > localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null > localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null > localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null > localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null > localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null > localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null > localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null > localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null > localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null > localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul > > localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null > localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null > localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null > localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null > localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null > localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null > > ================================================================= > > cf > > wrapperlog.always.transfer=true > sitedir.keep=true > execution.retries=1 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=100 > provenance.log=false > > =================================================================== > > From: Jonathan Monette > To: Ketan Maheshwari > Cc: Emalayan Vairavanathan ; swift user > Sent: Monday, 23 January 2012 11:08 AM > Subject: Re: [Swift-user] Montage+Swift+Coasters > > Emalayan, > So I have ran the scripts with some of my own test cases and do not see it failing. Could you provide your config files? Please provide the tc, sites, and config file(if you use a config file). > > On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: > >> Emalayan, >> >> I would check all the mappers and the resulting paths in the Swift source. >> >> Also try running the failed job something like this: >> >> cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >> >> mConcatFit _concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >> >> error 520 indicates workers are not able to reach the data. >> >> Also check if swift.workdir is writable on the site by the worker nodes. >> >> On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >> Hi Ketan, >> >> This was with swift-0.92.1. Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >> >> Please let me know if you have any idea. >> >> Regards >> Emalayan >> >> =============================================================================================== >> Swift 0.93 swift-r5501 cog-r3350 >> >> RunID: 20120119-1749-rjshh1r9 >> (input): found 10 files >> Progress: time: Thu, 19 Jan 2012 17:49:20 -0800 >> Find: http://localhost:1984 >> Find: keepalive(120), reconnect - http://localhost:1984 >> Progress: time: Thu, 19 Jan 2012 17:49:22 -0800 Stage in:1 Submitted:9 >> Progress: time: Thu, 19 Jan 2012 17:49:25 -0800 Active:9 Stage out:1 >> Progress: time: Thu, 19 Jan 2012 17:49:26 -0800 Stage out:3 Finished successfully:7 >> Progress: time: Thu, 19 Jan 2012 17:49:28 -0800 Active:1 Finished successfully:10 >> Progress: time: Thu, 19 Jan 2012 17:49:29 -0800 Stage in:1 Submitting:11 Submitted:6 Finished successfully:12 >> Progress: time: Thu, 19 Jan 2012 17:49:30 -0800 Stage in:4 Submitted:1 Active:6 Stage out:2 Finished successfully:17 >> Progress: time: Thu, 19 Jan 2012 17:49:31 -0800 Active:1 Finished successfully:30 >> Exception in mConcatFit: >> Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >> Host: localhost >> Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >> - - - >> >> Caused by: null >> Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >> Execution failed: >> back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >> >> From: Ketan Maheshwari >> To: Emalayan Vairavanathan >> Cc: swift user >> Sent: Thursday, 19 January 2012 4:49 PM >> Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> Emalayan, >> >> From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >> >> Also, what version of Swift is this? >> >> Regards, >> Ketan >> >> On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >> Dear All, >> >> I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >> >> Everything works fine when I try to run the same montage-scripts with swift on a single machine. >> >> Thank you >> Emalayan >> >> >> 2012-01-19 15:38:09,207-0800 WARN Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >> 2012-01-19 15:38:09,207-0800 INFO Command Command(119, HEARTBEAT): re-sending >> 2012-01-19 15:38:09,209-0800 WARN Command Command(119, HEARTBEAT)fault was: Reply timeout >> org.globus.cog.karajan.workflow.service.ReplyTimeoutException >> at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >> at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >> at java.util.TimerThread.mainLoop(Timer.java:534) >> at java.util.TimerThread.run(Timer.java:484) >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> >> -- >> Ketan >> >> >> >> >> >> >> >> -- >> Ketan >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > > > > > -- > Ketan > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From turam at mcs.anl.gov Tue Jan 24 11:42:16 2012 From: turam at mcs.anl.gov (Thomas Uram) Date: Tue, 24 Jan 2012 11:42:16 -0600 Subject: [Swift-user] Files staged out with 0.93rc2, but not with trunk Message-ID: With the same script, 0.93rc2 stages files out, but trunk does not. Log files are here (which conveniently include the script and config files): http://www.mcs.anl.gov/~turam/20120124-1135/hostname-20120124-1131-uk1bexk9.093.log http://www.mcs.anl.gov/~turam/20120124-1135/hostname-20120124-1128-cf8mo3pe.trunk.log In the case of the trunk, the _concurrent directory is created, but files do not exist there. I haven't dug deeply into this. I'm using trunk for the ssh-cl provider, but clearly also want the files to be staged out properly. Thanks, Tom From svemalayan at yahoo.com Tue Jan 24 13:37:35 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Tue, 24 Jan 2012 11:37:35 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327377349.65265.Yahoo! M ailNeo@web39505.mail.mud.yahoo.com> Message-ID: <1327433855.75566.YahooMailNeo@web39505.mail.mud.yahoo.com> Now the scripts are giving a different error as below. Any ideas ? I have attached the log file too. Please let me know if you need more information. ==============================Error message====================================================== Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 RunID: 20120124-1127-9a0xujy4 ?(input): found 10 files Progress:? time: Tue, 24 Jan 2012 11:27:18 -0800 Find: http://localhost:1984 Find:? keepalive(120), reconnect - http://localhost:1984 Progress:? time: Tue, 24 Jan 2012 11:27:20 -0800? Submitted:9? Active:1 Progress:? time: Tue, 24 Jan 2012 11:27:28 -0800? Active:9? Checking status:1 Progress:? time: Tue, 24 Jan 2012 11:27:29 -0800? Stage out:1? Finished successfully:9 Progress:? time: Tue, 24 Jan 2012 11:27:31 -0800? Checking status:1? Finished successfully:10 Progress:? time: Tue, 24 Jan 2012 11:27:32 -0800? Submitted:6? Active:11? Checking status:1? Finished successfully:12 Progress:? time: Tue, 24 Jan 2012 11:27:37 -0800? Active:16? Checking status:1? Finished successfully:13 Progress:? time: Tue, 24 Jan 2012 11:27:38 -0800? Active:15? Stage out:1? Finished successfully:14 Progress:? time: Tue, 24 Jan 2012 11:27:39 -0800? Active:11? Stage out:2? Finished successfully:17 Progress:? time: Tue, 24 Jan 2012 11:27:40 -0800? Stage out:1? Finished successfully:29 Progress:? time: Tue, 24 Jan 2012 11:27:41 -0800? Checking status:1? Finished successfully:31 Progress:? time: Tue, 24 Jan 2012 11:27:46 -0800? Active:9? Finished successfully:33 Failed but can retry:1 Progress:? time: Tue, 24 Jan 2012 11:27:48 -0800? Active:10? Finished successfully:33 Progress:? time: Tue, 24 Jan 2012 11:27:50 -0800? Active:9? Finished successfully:33 Failed but can retry:1 Execution failed: ??? Application /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py failed with an exit code of 127 Progress:? time: Tue, 24 Jan 2012 11:27:52 -0800? Failed:9? Finished successfully:33 Failed but can retry:1 ============================================================================================= ________________________________ From: Jonathan Monette To: Emalayan Vairavanathan Cc: Ketan Maheshwari ; swift user Sent: Tuesday, 24 January 2012 8:58 AM Subject: Re: [Swift-user] Montage+Swift+Coasters Ketan has given a lot of tips that I would have as well. Two things, can you set lazy.errors=false in the cf file? ?This may give us a different error since the script will fail immediately instead of trying to continue. The other thing is, I have not tried these scripts with provider staging turned on. ?This may be what is causing the data problem. ?First try the above to see if we get a different error or at least better information. On Jan 23, 2012, at 9:55 PM, Emalayan Vairavanathan wrote: Hi Ketan, I tired but getting the same error. > > > >________________________________ > From: Ketan Maheshwari >To: Emalayan Vairavanathan >Cc: Jonathan Monette ; swift user >Sent: Monday, 23 January 2012 7:41 PM >Subject: Re: [Swift-user] Montage+Swift+Coasters > > > > > >On Mon, Jan 23, 2012 at 9:21 PM, Emalayan Vairavanathan wrote: > >Hi Ketan, >> >> >> >>Please find the attached source code. Also I couldn't find SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory inside workdir. > > >try again setting this to false in your config: >wrapperlog.always.transfer=true >? > >> >>Please let me know if you need more information >> >> >> >>Thank you >>Emalayan >> >> >> >> >>________________________________ >> From: Ketan Maheshwari >>To: Emalayan Vairavanathan >>Cc: Jonathan Monette ; swift user >>Sent: Monday, 23 January 2012 6:38 PM >>Subject: Re: [Swift-user] Montage+Swift+Coasters >> >> >> >> >> >> >>On Mon, Jan 23, 2012 at 4:52 PM, Emalayan Vairavanathan wrote: >> >>Hi Ketan, >>> >>> >>> >>>Please find my answers below. >>> >>> >>> >>>[Ketan] Emalayan, Could you also send your swift source. >>> >>>[Emalayan] did you ask for the Montage swift scripts ? / swift-0.93 source code ? >> >> >>Montage script >>? >> >>> >>>[Ketan] Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? >>>[Emalayan] There were not such directory created. >> >> >>should be in your workdir. >>? >> >>> >>> >>> >>>[Ketan] Are you aware that workers will start automatically *only* on the localhost where the service is running and not on the remote nodes. >>>[Emalayan] Yes, I am aware about this. I ran both coaster-service and application scripts on the same node. But would like to know about setting up workers on other nodes too. >> >> >>you may run worker.pl manually. or better put in a for loop in a simple shell script to run multiple workers. commandline is something like: worker.pl label /path/to/log >>? >> >>> >>>Thank you >>>Emalayan >>> >>> >>> >>> >>>________________________________ >>> From: Ketan Maheshwari >>>To: Emalayan Vairavanathan >>>Cc: Jonathan Monette ; swift user >>>Sent: Monday, 23 January 2012 12:57 PM >>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>> >>> >>> >>>Emalayan, Could you also send your swift source.? >>> >>> >>>Have you tried running mConcatFit from within the?SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk directory? >>> >>> >>>On Mon, Jan 23, 2012 at 2:25 PM, Emalayan Vairavanathan wrote: >>> >>>I am using swift-0.93. I started only the coaster-service manually using following command (workers were started automatically). >>>> >>>> >>>>coaster-service -port 1984 -localport 35753 -nosec >>>> >>>> >>>>Then application prints following output and terminates. (I have attached the log file with this mail. Please discard the previous log file because system was not configured properly) >>>> >>>> >>>>Please let me know if you need more information. >>>> >>>> >>>>Thank you >>>>Emalayan >>>> >>>> >>>> >>>>==================================================================================== >>>> >>>>Swift 0.93 swift-r5501 (swift modified locally) cog-r3350 >>>> >>>>RunID: 20120123-1219-zj95uaye >>>>?(input): found 10 files >>>>Progress:? time: Mon, 23 Jan 2012 12:19:39 -0800 >>>> >>>>Find: http://localhost:1984 >>>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:41 -0800? Stage in:1? Submitted:9 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:45 -0800? Active:9? Stage out:1 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:46 -0800? Active:6? Stage out:2? Finished successfully:2 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:47 -0800? Submitted:1? Finished successfully:10 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:49 -0800? Active:1? Finished successfully:10 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:50 -0800? Submitted:1? Finished successfully:12 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:51 -0800? Stage in:12? Submitted:5? Finished successfully:13 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:52 -0800? Stage in:1? Submitted:5? Active:9? Stage out:2? Finished successfully:13 >>>>Progress:? time: Mon, 23 Jan 2012 12:19:53 -0800? Active:5? Finished successfully:25 >>>>Exception in mConcatFit: >>>>Arguments: [_concurrent/status_tbl-bf92dd4d-ecf0-490e-ab93-cf7863688950-5, fits.tbl, stat_dir] >>>>Host: localhost >>>>Directory: SwiftMontage-20120123-1219-zj95uaye/jobs/4/mConcatFit-4o2fb2mk >>>> >>>>- - - >>>> >>>>Caused by: null >>>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>>Execution failed: >>>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>>[emalayan at node090 scripts]$ >>>> >>>> >>>> >>>> >>>> >>>>________________________________ >>>> From: Ketan Maheshwari >>>>To: Emalayan Vairavanathan >>>>Cc: Jonathan Monette ; swift user >>>>Sent: Monday, 23 January 2012 11:55 AM >>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>> >>>> >>>> >>>>How are you starting the service? Are you starting workers manually? if yes, could you paste commandlines for both? >>>> >>>> >>>>On Mon, Jan 23, 2012 at 1:50 PM, Emalayan Vairavanathan wrote: >>>> >>>>Thanks Ketan and Jon. I tried but it is still giving error. I have attached the log file. >>>>> >>>>> >>>>>Thank you >>>>>Emalayan >>>>> >>>>> >>>>> >>>>> >>>>>________________________________ >>>>> From: Ketan Maheshwari >>>>>To: Emalayan Vairavanathan >>>>>Cc: Jonathan Monette ; swift user >>>>>Sent: Monday, 23 January 2012 11:36 AM >>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>> >>>>> >>>>> >>>>>Emalayan, >>>>> >>>>> >>>>>Likely, /tmp is not readable/writable across the machines. Could you try changing workdir to your /home >>>>> >>>>> >>>>>On Mon, Jan 23, 2012 at 1:25 PM, Emalayan Vairavanathan wrote: >>>>> >>>>>Jon, >>>>>> >>>>>> >>>>>>Please find the detail below and let me know if you have any questions about my setup. >>>>>> >>>>>> >>>>>> >>>>>>Thank you >>>>>>Emalayan >>>>>> >>>>>> >>>>>> >>>>>>========================================================== >>>>>> >>>>>>site.xml >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>??? >>>>>>??? passive >>>>>> >>>>>>??? 4 >>>>>>??? 100000 >>>>>>??? 100 >>>>>>??? 100 >>>>>>??? 100 >>>>>>??? 1 >>>>>>??? 10 >>>>>>??? 25.00 >>>>>>??? 10000 >>>>>>??? proxy >>>>>>??? >>>>>>??? /tmp/swift.workdir >>>>>>? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>======================================================= >>>>>> >>>>>> >>>>>>tc >>>>>> >>>>>> >>>>>>localhost sh /bin/sh null null null >>>>>>localhost cat /bin/cat null null null >>>>>>localhost echo /bin/echo null null null >>>>>>localhost do_merge /home/emalayan/App/forEmalayan/app/modmerge null null null >>>>>>localhost mProjExec /home/emalayan/App/Montage_v3.3/bin/mProjExec null null null >>>>>>localhost mImgtbl /home/emalayan/App/Montage_v3.3/bin/mImgtbl null null null >>>>>>localhost mAdd /home/emalayan/App/Montage_v3.3/bin/mAdd null null null >>>>>>localhost mOverlaps /home/emalayan/App/Montage_v3.3/bin/mOverlaps null null null >>>>>>localhost mJPEG /home/emalayan/App/Montage_v3.3/bin/mJPEG null null null >>>>>>localhost mDiffExec_wrap /home/emalayan/App/Montage_v3.3/bin/mDiffExec null null null >>>>>>localhost mFitExec /home/emalayan/App/Montage_v3.3/bin/mFitExec null null null >>>>>>localhost mBgModel /home/emalayan/App/Montage_v3.3/bin/mBgModel null null null >>>>>>localhost mBgExec /home/emalayan/App/Montage_v3.3/bin/mBgExec null null null >>>>>>localhost mConcatFit /home/emalayan/App/Montage_v3.3/bin/mConcatFit null null nul >>>>>> >>>>>>localhost Background_list /home/emalayan/App/montage-swift/SwiftMontage/apps/Background_list.py null null null >>>>>>localhost create_status_table /home/emalayan/App/montage-swift/SwiftMontage/apps/create_status_table.py null null null >>>>>>localhost mProjectPP_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProjectPP_wrap.py null null null >>>>>>localhost mProject_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mProject_wrap.py null null null >>>>>>localhost mBackground_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mBackground_wrap.py null null null >>>>>>localhost mDiffFit_wrap /home/emalayan/App/montage-swift/SwiftMontage/apps/mDiffFit_wrap.py null null null >>>>>> >>>>>> >>>>>>================================================================= >>>>>> >>>>>> >>>>>>cf >>>>>> >>>>>> >>>>>>wrapperlog.always.transfer=true >>>>>>sitedir.keep=true >>>>>>execution.retries=1 >>>>>>lazy.errors=true >>>>>>status.mode=provider >>>>>>use.provider.staging=true >>>>>>provider.staging.pin.swiftfiles=false >>>>>>foreach.max.threads=100 >>>>>>provenance.log=false >>>>>> >>>>>> >>>>>>=================================================================== >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>________________________________ >>>>>> From: Jonathan Monette >>>>>>To: Ketan Maheshwari >>>>>>Cc: Emalayan Vairavanathan ; swift user >>>>>>Sent: Monday, 23 January 2012 11:08 AM >>>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>>> >>>>>> >>>>>> >>>>>>Emalayan, >>>>>>? ?So I have ran the scripts with some of my own test cases and do not see it failing. ?Could you provide your config files? ?Please provide the tc, sites, and config file(if you use a config file). >>>>>> >>>>>> >>>>>>On Jan 20, 2012, at 9:39 AM, Ketan Maheshwari wrote: >>>>>> >>>>>>Emalayan, >>>>>>> >>>>>>> >>>>>>>I would check all the mappers and the resulting paths in the Swift source.? >>>>>>> >>>>>>> >>>>>>>Also try running the failed job something like this:? >>>>>>> >>>>>>> >>>>>>>cd /SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>>>> >>>>>>> >>>>>>>mConcatFit?_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5 fits.tbl stat_dir >>>>>>> >>>>>>> >>>>>>>error 520 indicates workers are not able to reach the data. >>>>>>> >>>>>>> >>>>>>>Also check if swift.workdir is writable on the site by the worker nodes. >>>>>>> >>>>>>> >>>>>>>On Thu, Jan 19, 2012 at 7:55 PM, Emalayan Vairavanathan wrote: >>>>>>> >>>>>>>Hi Ketan, >>>>>>>> >>>>>>>> >>>>>>>>This was with swift-0.92.1.Now I have downloaded the latest swift 0.93 and getting totally different error messages with swift 0.93. I can ask Jon about these messages. (These scripts was working well with only Swift) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Please let me know if you have any idea. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Regards >>>>>>>>Emalayan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>=============================================================================================== >>>>>>>> >>>>>>>>Swift 0.93 swift-r5501 cog-r3350 >>>>>>>> >>>>>>>>RunID: 20120119-1749-rjshh1r9 >>>>>>>>?(input): found 10 files >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:20 -0800 >>>>>>>>Find: http://localhost:1984 >>>>>>>>Find:? keepalive(120), reconnect - http://localhost:1984 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:22 -0800? Stage in:1? Submitted:9 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:25 -0800? Active:9? Stage out:1 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:26 -0800? Stage out:3? Finished successfully:7 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:28 -0800? Active:1? Finished successfully:10 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:29 -0800? Stage in:1? Submitting:11? Submitted:6? Finished successfully:12 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:30 -0800? Stage in:4? Submitted:1? Active:6? Stage out:2? Finished successfully:17 >>>>>>>>Progress:? time: Thu, 19 Jan 2012 17:49:31 -0800? Active:1? Finished successfully:30 >>>>>>>>Exception in mConcatFit: >>>>>>>>Arguments: [_concurrent/status_tbl-7a8340c2-045d-4039-a77c-00429b78d9c9-5, fits.tbl, stat_dir] >>>>>>>>Host: localhost >>>>>>>>Directory: SwiftMontage-20120119-1749-rjshh1r9/jobs/b/mConcatFit-b1sa4vlk >>>>>>>>- - - >>>>>>>> >>>>>>>>Caused by: null >>>>>>>>Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 520 >>>>>>>>Execution failed: >>>>>>>>??? back_list:Table = org.griphyn.vdl.mapping.DataDependentException - Closed not derived due to errors in data dependencies >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>________________________________ >>>>>>>> From: Ketan Maheshwari >>>>>>>>To: Emalayan Vairavanathan >>>>>>>>Cc: swift user >>>>>>>>Sent: Thursday, 19 January 2012 4:49 PM >>>>>>>>Subject: Re: [Swift-user] Montage+Swift+Coasters >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>Emalayan, >>>>>>>> >>>>>>>> >>>>>>>>From your symptoms, it seems you are facing the same issue as I've been. Could you tell more about the amount of data that needs to be staged to run the Montage stages during which these warnings turn up? How much time elapses since the start of your workflow after which you see these messages? >>>>>>>> >>>>>>>>Also, what version of Swift is this? >>>>>>>> >>>>>>>> >>>>>>>>Regards, >>>>>>>>Ketan >>>>>>>> >>>>>>>> >>>>>>>>On Thu, Jan 19, 2012 at 5:51 PM, Emalayan Vairavanathan wrote: >>>>>>>> >>>>>>>>Dear All, >>>>>>>>> >>>>>>>>> >>>>>>>>>I have a problem in running Montage with Coasters (in our local cluster - no batch schedulers). After few stages the swift run-time continuously prints the warnings below. Any ideas ? Should I increase the heartbeat count ? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>Everything works fine when I try to run the same montage-scripts with swift on a single machine. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>Thank you >>>>>>>>>Emalayan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>2012-01-19 15:38:09,207-0800 WARN? Command Command(119, HEARTBEAT): handling reply timeout; sendReqTime=120119-153609.206, sendTime=120119-153609.206, now=120119-153809.207 >>>>>>>>>2012-01-19 15:38:09,207-0800 INFO? Command Command(119, HEARTBEAT): re-sending >>>>>>>>>2012-01-19 15:38:09,209-0800 WARN? Command Command(119, HEARTBEAT)fault was: Reply timeout >>>>>>>>>org.globus.cog.karajan.workflow.service.ReplyTimeoutException >>>>>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command.handleReplyTimeout(Command.java:288) >>>>>>>>>??????? at org.globus.cog.karajan.workflow.service.commands.Command$Timeout.run(Command.java:293) >>>>>>>>>??????? at java.util.TimerThread.mainLoop(Timer.java:534) >>>>>>>>>??????? at java.util.TimerThread.run(Timer.java:484) >>>>>>>>>_______________________________________________ >>>>>>>>>Swift-user mailing list >>>>>>>>>Swift-user at ci.uchicago.edu >>>>>>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>-- >>>>>>>>Ketan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>-- >>>>>>>Ketan >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>>Swift-user mailing list >>>>>>>Swift-user at ci.uchicago.edu >>>>>>>https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>Ketan >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>>-- >>>>Ketan >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>>-- >>>Ketan >>> >>> >>> >>> >>> >> >> >> >>-- >>Ketan >> >> >> >> >> > > > >-- >Ketan > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SwiftMontage-20120124-1127-9a0xujy4.log.gz Type: application/x-gzip Size: 21155 bytes Desc: not available URL: From jonmon at mcs.anl.gov Tue Jan 24 14:41:41 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 24 Jan 2012 14:41:41 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <1327433855.75566.YahooMailNeo@web39505.mail.mud.yahoo.com> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327377349.65265.YahooM ailNeo@web39505.mail.mud.yahoo.com> <1327433855.75566.YahooMailNeo@web39505.mail.mud.yahoo.com> Message-ID: <96C93FE9-5535-45B5-9D70-1356268B73D5@mcs.anl.gov> Error code 127 means that the command wasn't found. This line was in the log stderr.txt: /bin/sh: mBackground: command not found So?.do you have all the Montage binaries in your path? What happens when you say mBackground in the terminal? On Jan 24, 2012, at 1:37 PM, Emalayan Vairavanathan wrote: > From svemalayan at yahoo.com Tue Jan 24 14:51:29 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Tue, 24 Jan 2012 12:51:29 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <96C93FE9-5535-45B5-9D70-1356268B73D5@mcs.anl.gov> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327377349.65265.Yahoo! M ailNeo@web39505.mail.mud.yahoo.com> <1327433855.75566.YahooMailNeo@web39505.mail.mud.yahoo.com> <96C93FE9-5535-45B5-9D70-1356268B73D5@mcs.anl.gov> Message-ID: <1327438289.71241.YahooMailNeo@web39502.mail.mud.yahoo.com> mBackground is in my path. (see below). But it is not in my tc file. Should I add it to my tc file as well ? $which mBackground ~/App/Montage_v3.3/bin/mBackground Thank you Emalayan ________________________________ From: Jonathan Monette To: Emalayan Vairavanathan Cc: Ketan Maheshwari ; swift user Sent: Tuesday, 24 January 2012 12:41 PM Subject: Re: [Swift-user] Montage+Swift+Coasters Error code 127 means that the command wasn't found.? This line was in the log stderr.txt: /bin/sh: mBackground: command not found So?.do you have all the Montage binaries in your path?? What happens when you say mBackground in the terminal? On Jan 24, 2012, at 1:37 PM, Emalayan Vairavanathan wrote: > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Tue Jan 24 15:11:15 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 24 Jan 2012 15:11:15 -0600 Subject: [Swift-user] Montage+Swift+Coasters In-Reply-To: <96C93FE9-5535-45B5-9D70-1356268B73D5@mcs.anl.gov> References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327377349.65265.YahooM ailNeo@web39505.mail.mud.yahoo.com> <1327433855.75566.YahooMailNeo@web39505.mail.mud.yahoo.com> <96C93FE9-5535-45B5-9D70-1356268B73D5@mcs.anl.gov> Message-ID: Ok. Try adding it to your tc. You are only using the local machine correct? You aren't using some remote cluster are you? On Jan 24, 2012, at 2:41 PM, Jonathan Monette wrote: > Error code 127 means that the command wasn't found. This line was in the log > > stderr.txt: /bin/sh: mBackground: command not found > > So?.do you have all the Montage binaries in your path? What happens when you say mBackground in the terminal? > > On Jan 24, 2012, at 1:37 PM, Emalayan Vairavanathan wrote: > >> > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From turam at mcs.anl.gov Tue Jan 24 17:03:31 2012 From: turam at mcs.anl.gov (Thomas Uram) Date: Tue, 24 Jan 2012 17:03:31 -0600 Subject: [Swift-user] Files staged out with 0.93rc2, but not with trunk In-Reply-To: References: Message-ID: Update: I'm not sure of the intention of Swift rev 4773: "clean up temporary variables (including file removal for the concurrent mapper)" https://trac.ci.uchicago.edu/swift/changeset/4773 but it's removing the destination files after they've been staged out. I've disabled this for my purposes. If I'm to avoid this functionality in some other way, let me know. Thanks, Tom On Jan 24, 2012, at 11:42 AM, Thomas Uram wrote: > > With the same script, 0.93rc2 stages files out, but trunk does not. Log files are here (which conveniently include the script and config files): > > http://www.mcs.anl.gov/~turam/20120124-1135/hostname-20120124-1131-uk1bexk9.093.log > http://www.mcs.anl.gov/~turam/20120124-1135/hostname-20120124-1128-cf8mo3pe.trunk.log > > In the case of the trunk, the _concurrent directory is created, but files do not exist there. > > I haven't dug deeply into this. I'm using trunk for the ssh-cl provider, but clearly also want the files to be staged out properly. > > Thanks, > Tom > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Tue Jan 24 17:32:26 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 24 Jan 2012 17:32:26 -0600 Subject: [Swift-user] surveyor Message-ID: Hello, I am trying to run on survey from communicado. Below are the config files. What other options are there for auth.defaults. I do not have an ssh key set up on surveyor so I can't say surveyor.alcf.anl.gov.type=key. The only other way to access surveyor is with OTP. Since I am using surveyor as a stepping stone to intrepid(which only has OTP) how do I specify to Swift to use OTP in auth.defaults. I was trying to use the new ssh-cl provider but that is only for execution. Swift complains about not being able to create the share directory so I assumed I couldn't say so I have been trying to change it to ssh. sites.xml ======= MTCScienceApps short zeptoos true 21 10000 1 DEBUG 1 900 64 64 /home/jonmon/swift.workdir tc ======= surveyor echo /bin/echo INSTALLED INTEL32::LINUX surveyor cat /bin/cat INSTALLED INTEL32::LINUX surveyor ls /bin/ls INSTALLED INTEL32::LINUX surveyor grep /bin/grep INSTALLED INTEL32::LINUX surveyor sort /bin/sort INSTALLED INTEL32::LINUX surveyor paste /bin/paste INSTALLED INTEL32::LINUX surveyor wc /usr/bin/wc INSTALLED INTEL32::LINUX swiftscirpt ======== type file; app (file o) echo() { echo "Hello World!!!" stdout=@o; } file output<"hello.txt">; output = echo(); From wilde at mcs.anl.gov Tue Jan 24 18:04:35 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 24 Jan 2012 18:04:35 -0600 (CST) Subject: [Swift-user] surveyor In-Reply-To: Message-ID: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> Jon, try configuring auth.defaults as you would for a password login, but dont specify a password. Swift should then prompt you for one (with a text prompt, assuming you are running Swift in a non-X setting). E.g.: login1.pads.ci.uchicago.edu.type=password login1.pads.ci.uchicago.edu.username=wilde I dont know if the ssh provider will maintain a connection so that you only need to provide the OTP once. That might be an issue, if not. - Mike ----- Original Message ----- > From: "Jonathan Monette" > To: "swift user" > Sent: Tuesday, January 24, 2012 5:32:26 PM > Subject: [Swift-user] surveyor > Hello, > I am trying to run on survey from communicado. Below are the config > files. What other options are there for auth.defaults. I do not have > an ssh key set up on surveyor so I can't say > surveyor.alcf.anl.gov.type=key. The only other way to access surveyor > is with OTP. Since I am using surveyor as a stepping stone to > intrepid(which only has OTP) how do I specify to Swift to use OTP in > auth.defaults. I was trying to use the new ssh-cl provider but that is > only for execution. Swift complains about not being able to create the > share directory so I assumed I couldn't say so I > have been trying to change it to ssh. > > sites.xml > ======= > > > url="surveyor.alcf.anl.gov" /> > > > MTCScienceApps > short > zeptoos > true > 21 > 10000 > 1 > DEBUG > 1 > 900 > 64 > 64 > /home/jonmon/swift.workdir > > > > tc > ======= > surveyor echo /bin/echo INSTALLED INTEL32::LINUX > surveyor cat /bin/cat INSTALLED INTEL32::LINUX > surveyor ls /bin/ls INSTALLED INTEL32::LINUX > surveyor grep /bin/grep INSTALLED INTEL32::LINUX > surveyor sort /bin/sort INSTALLED INTEL32::LINUX > surveyor paste /bin/paste INSTALLED INTEL32::LINUX > surveyor wc /usr/bin/wc INSTALLED INTEL32::LINUX > > swiftscirpt > ======== > type file; > > app (file o) echo() > { > echo "Hello World!!!" stdout=@o; > } > > file output<"hello.txt">; > > output = echo(); > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jonmon at mcs.anl.gov Tue Jan 24 18:17:12 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 24 Jan 2012 18:17:12 -0600 Subject: [Swift-user] surveyor In-Reply-To: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> Message-ID: That does not seem to work. It never asks me for a password. Below is the output from the terminal. After the "[null]" characters are just cycling really fast. If I type my OTP there it just starts over, as you can see the repeated attempts before the exception. [jonmon at communicado: ~/Workspace/Swift/surveyor]$ swift -tc.file tc.data -sites.file sites.xml echo.swift Swift trunk swift-r5520 (swift modified locally) cog-r3354 RunID: 20120124-1813-0xh284fa Progress: time: Tue, 24 Jan 2012 18:13:48 -0600 Initializing:1 [jonmon] [null] Progress: time: Tue, 24 Jan 2012 18:13:52 -0600 Initializing site shared directory:1 [jonmon] [null] Progress: time: Tue, 24 Jan 2012 18:14:18 -0600 Initializing site shared directory:1 Progress: time: Tue, 24 Jan 2012 18:14:20 -0600 Initializing site shared directory:1 [jonmon] [null] EXCEPTION Could not initialize shared directory on surveyor Caused by: null Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on surveyor.alcf.anl.gov:22 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on surveyor.alcf.anl.gov:22 Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Password Authentication failed Progress: time: Tue, 24 Jan 2012 18:14:35 -0600 Failed:1 Execution failed: Password Authentication failed On Jan 24, 2012, at 6:04 PM, Michael Wilde wrote: > Jon, try configuring auth.defaults as you would for a password login, but dont specify a password. Swift should then prompt you for one (with a text prompt, assuming you are running Swift in a non-X setting). E.g.: > > login1.pads.ci.uchicago.edu.type=password > login1.pads.ci.uchicago.edu.username=wilde > > I dont know if the ssh provider will maintain a connection so that you only need to provide the OTP once. That might be an issue, if not. > > - Mike > > ----- Original Message ----- >> From: "Jonathan Monette" >> To: "swift user" >> Sent: Tuesday, January 24, 2012 5:32:26 PM >> Subject: [Swift-user] surveyor >> Hello, >> I am trying to run on survey from communicado. Below are the config >> files. What other options are there for auth.defaults. I do not have >> an ssh key set up on surveyor so I can't say >> surveyor.alcf.anl.gov.type=key. The only other way to access surveyor >> is with OTP. Since I am using surveyor as a stepping stone to >> intrepid(which only has OTP) how do I specify to Swift to use OTP in >> auth.defaults. I was trying to use the new ssh-cl provider but that is >> only for execution. Swift complains about not being able to create the >> share directory so I assumed I couldn't say so I >> have been trying to change it to ssh. >> >> sites.xml >> ======= >> >> >> > url="surveyor.alcf.anl.gov" /> >> >> >> MTCScienceApps >> short >> zeptoos >> true >> 21 >> 10000 >> 1 >> DEBUG >> 1 >> 900 >> 64 >> 64 >> /home/jonmon/swift.workdir >> >> >> >> tc >> ======= >> surveyor echo /bin/echo INSTALLED INTEL32::LINUX >> surveyor cat /bin/cat INSTALLED INTEL32::LINUX >> surveyor ls /bin/ls INSTALLED INTEL32::LINUX >> surveyor grep /bin/grep INSTALLED INTEL32::LINUX >> surveyor sort /bin/sort INSTALLED INTEL32::LINUX >> surveyor paste /bin/paste INSTALLED INTEL32::LINUX >> surveyor wc /usr/bin/wc INSTALLED INTEL32::LINUX >> >> swiftscirpt >> ======== >> type file; >> >> app (file o) echo() >> { >> echo "Hello World!!!" stdout=@o; >> } >> >> file output<"hello.txt">; >> >> output = echo(); >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > From jonmon at mcs.anl.gov Tue Jan 24 19:07:23 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Tue, 24 Jan 2012 19:07:23 -0600 Subject: [Swift-user] Files staged out with 0.93rc2, but not with trunk In-Reply-To: References: Message-ID: <2D9BB293-0D3F-46C7-80D8-EBDCCC82A08E@mcs.anl.gov> I think that was when Mihael added some garbage collection. I think it was though that if the file was mapped with the concurrent mapper and it was no longer needed, then it could be removed. If you want to keep the mapped files around longer then maybe use the single_file_mapper or the simple_mapper. On Jan 24, 2012, at 5:03 PM, Thomas Uram wrote: > Update: > > I'm not sure of the intention of Swift rev 4773: > > "clean up temporary variables (including file removal for the concurrent mapper)" > https://trac.ci.uchicago.edu/swift/changeset/4773 > > but it's removing the destination files after they've been staged out. I've disabled this for my purposes. If I'm to avoid this functionality in some other way, let me know. > > Thanks, > Tom > > > On Jan 24, 2012, at 11:42 AM, Thomas Uram wrote: > >> >> With the same script, 0.93rc2 stages files out, but trunk does not. Log files are here (which conveniently include the script and config files): >> >> http://www.mcs.anl.gov/~turam/20120124-1135/hostname-20120124-1131-uk1bexk9.093.log >> http://www.mcs.anl.gov/~turam/20120124-1135/hostname-20120124-1128-cf8mo3pe.trunk.log >> >> In the case of the trunk, the _concurrent directory is created, but files do not exist there. >> >> I haven't dug deeply into this. I'm using trunk for the ssh-cl provider, but clearly also want the files to be staged out properly. >> >> Thanks, >> Tom >> > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Jan 25 00:04:47 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 24 Jan 2012 22:04:47 -0800 Subject: [Swift-user] surveyor In-Reply-To: References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> Message-ID: <1327471487.6175.0.camel@blabla> Use "interactive" instead of "password". Mihael On Tue, 2012-01-24 at 18:17 -0600, Jonathan Monette wrote: > That does not seem to work. It never asks me for a password. Below is the output from the terminal. After the "[null]" characters are just cycling really fast. If I type my OTP there it just starts over, as you can see the repeated attempts before the exception. > > [jonmon at communicado: ~/Workspace/Swift/surveyor]$ swift -tc.file tc.data -sites.file sites.xml echo.swift > Swift trunk swift-r5520 (swift modified locally) cog-r3354 > > RunID: 20120124-1813-0xh284fa > Progress: time: Tue, 24 Jan 2012 18:13:48 -0600 Initializing:1 > [jonmon] > [null] > Progress: time: Tue, 24 Jan 2012 18:13:52 -0600 Initializing site shared directory:1 > [jonmon] > [null] > Progress: time: Tue, 24 Jan 2012 18:14:18 -0600 Initializing site shared directory:1 > Progress: time: Tue, 24 Jan 2012 18:14:20 -0600 Initializing site shared directory:1 > [jonmon] > [null] > EXCEPTION Could not initialize shared directory on surveyor > Caused by: null > Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on surveyor.alcf.anl.gov:22 > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on surveyor.alcf.anl.gov:22 > Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Password Authentication failed > Progress: time: Tue, 24 Jan 2012 18:14:35 -0600 Failed:1 > Execution failed: > Password Authentication failed > On Jan 24, 2012, at 6:04 PM, Michael Wilde wrote: > > > Jon, try configuring auth.defaults as you would for a password login, but dont specify a password. Swift should then prompt you for one (with a text prompt, assuming you are running Swift in a non-X setting). E.g.: > > > > login1.pads.ci.uchicago.edu.type=password > > login1.pads.ci.uchicago.edu.username=wilde > > > > I dont know if the ssh provider will maintain a connection so that you only need to provide the OTP once. That might be an issue, if not. > > > > - Mike > > > > ----- Original Message ----- > >> From: "Jonathan Monette" > >> To: "swift user" > >> Sent: Tuesday, January 24, 2012 5:32:26 PM > >> Subject: [Swift-user] surveyor > >> Hello, > >> I am trying to run on survey from communicado. Below are the config > >> files. What other options are there for auth.defaults. I do not have > >> an ssh key set up on surveyor so I can't say > >> surveyor.alcf.anl.gov.type=key. The only other way to access surveyor > >> is with OTP. Since I am using surveyor as a stepping stone to > >> intrepid(which only has OTP) how do I specify to Swift to use OTP in > >> auth.defaults. I was trying to use the new ssh-cl provider but that is > >> only for execution. Swift complains about not being able to create the > >> share directory so I assumed I couldn't say so I > >> have been trying to change it to ssh. > >> > >> sites.xml > >> ======= > >> > >> > >> >> url="surveyor.alcf.anl.gov" /> > >> > >> > >> MTCScienceApps > >> short > >> zeptoos > >> true > >> 21 > >> 10000 > >> 1 > >> DEBUG > >> 1 > >> 900 > >> 64 > >> 64 > >> /home/jonmon/swift.workdir > >> > >> > >> > >> tc > >> ======= > >> surveyor echo /bin/echo INSTALLED INTEL32::LINUX > >> surveyor cat /bin/cat INSTALLED INTEL32::LINUX > >> surveyor ls /bin/ls INSTALLED INTEL32::LINUX > >> surveyor grep /bin/grep INSTALLED INTEL32::LINUX > >> surveyor sort /bin/sort INSTALLED INTEL32::LINUX > >> surveyor paste /bin/paste INSTALLED INTEL32::LINUX > >> surveyor wc /usr/bin/wc INSTALLED INTEL32::LINUX > >> > >> swiftscirpt > >> ======== > >> type file; > >> > >> app (file o) echo() > >> { > >> echo "Hello World!!!" stdout=@o; > >> } > >> > >> file output<"hello.txt">; > >> > >> output = echo(); > >> > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jonmon at mcs.anl.gov Wed Jan 25 10:22:53 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 25 Jan 2012 10:22:53 -0600 Subject: [Swift-user] surveyor In-Reply-To: <1327471487.6175.0.camel@blabla> References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> <1327471487.6175.0.camel@blabla> Message-ID: interactive did not seem to work either. This time this is what I got before I killed it. Swift trunk swift-r5520 (swift modified locally) cog-r3354 RunID: 20120125-1013-3iuuhmu7 Progress: time: Wed, 25 Jan 2012 10:13:18 -0600 [null] ) [null] / [null] - [null] ) [null] ,/1579> Progress: time: Wed, 25 Jan 2012 10:13:26 -0600 Initializing site shared directory:1 [null] 2 [null] [null] ! [null] On Jan 25, 2012, at 12:04 AM, Mihael Hategan wrote: > Use "interactive" instead of "password". > > Mihael > > On Tue, 2012-01-24 at 18:17 -0600, Jonathan Monette wrote: >> That does not seem to work. It never asks me for a password. Below is the output from the terminal. After the "[null]" characters are just cycling really fast. If I type my OTP there it just starts over, as you can see the repeated attempts before the exception. >> >> [jonmon at communicado: ~/Workspace/Swift/surveyor]$ swift -tc.file tc.data -sites.file sites.xml echo.swift >> Swift trunk swift-r5520 (swift modified locally) cog-r3354 >> >> RunID: 20120124-1813-0xh284fa >> Progress: time: Tue, 24 Jan 2012 18:13:48 -0600 Initializing:1 >> [jonmon] >> [null] >> Progress: time: Tue, 24 Jan 2012 18:13:52 -0600 Initializing site shared directory:1 >> [jonmon] >> [null] >> Progress: time: Tue, 24 Jan 2012 18:14:18 -0600 Initializing site shared directory:1 >> Progress: time: Tue, 24 Jan 2012 18:14:20 -0600 Initializing site shared directory:1 >> [jonmon] >> [null] >> EXCEPTION Could not initialize shared directory on surveyor >> Caused by: null >> Caused by: org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on surveyor.alcf.anl.gov:22 >> Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on surveyor.alcf.anl.gov:22 >> Caused by: org.globus.cog.abstraction.impl.common.task.InvalidSecurityContextException: Password Authentication failed >> Progress: time: Tue, 24 Jan 2012 18:14:35 -0600 Failed:1 >> Execution failed: >> Password Authentication failed >> On Jan 24, 2012, at 6:04 PM, Michael Wilde wrote: >> >>> Jon, try configuring auth.defaults as you would for a password login, but dont specify a password. Swift should then prompt you for one (with a text prompt, assuming you are running Swift in a non-X setting). E.g.: >>> >>> login1.pads.ci.uchicago.edu.type=password >>> login1.pads.ci.uchicago.edu.username=wilde >>> >>> I dont know if the ssh provider will maintain a connection so that you only need to provide the OTP once. That might be an issue, if not. >>> >>> - Mike >>> >>> ----- Original Message ----- >>>> From: "Jonathan Monette" >>>> To: "swift user" >>>> Sent: Tuesday, January 24, 2012 5:32:26 PM >>>> Subject: [Swift-user] surveyor >>>> Hello, >>>> I am trying to run on survey from communicado. Below are the config >>>> files. What other options are there for auth.defaults. I do not have >>>> an ssh key set up on surveyor so I can't say >>>> surveyor.alcf.anl.gov.type=key. The only other way to access surveyor >>>> is with OTP. Since I am using surveyor as a stepping stone to >>>> intrepid(which only has OTP) how do I specify to Swift to use OTP in >>>> auth.defaults. I was trying to use the new ssh-cl provider but that is >>>> only for execution. Swift complains about not being able to create the >>>> share directory so I assumed I couldn't say so I >>>> have been trying to change it to ssh. >>>> >>>> sites.xml >>>> ======= >>>> >>>> >>>> >>> url="surveyor.alcf.anl.gov" /> >>>> >>>> >>>> MTCScienceApps >>>> short >>>> zeptoos >>>> true >>>> 21 >>>> 10000 >>>> 1 >>>> DEBUG >>>> 1 >>>> 900 >>>> 64 >>>> 64 >>>> /home/jonmon/swift.workdir >>>> >>>> >>>> >>>> tc >>>> ======= >>>> surveyor echo /bin/echo INSTALLED INTEL32::LINUX >>>> surveyor cat /bin/cat INSTALLED INTEL32::LINUX >>>> surveyor ls /bin/ls INSTALLED INTEL32::LINUX >>>> surveyor grep /bin/grep INSTALLED INTEL32::LINUX >>>> surveyor sort /bin/sort INSTALLED INTEL32::LINUX >>>> surveyor paste /bin/paste INSTALLED INTEL32::LINUX >>>> surveyor wc /usr/bin/wc INSTALLED INTEL32::LINUX >>>> >>>> swiftscirpt >>>> ======== >>>> type file; >>>> >>>> app (file o) echo() >>>> { >>>> echo "Hello World!!!" stdout=@o; >>>> } >>>> >>>> file output<"hello.txt">; >>>> >>>> output = echo(); >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > From hategan at mcs.anl.gov Wed Jan 25 13:14:28 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jan 2012 11:14:28 -0800 Subject: [Swift-user] surveyor In-Reply-To: References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> <1327471487.6175.0.camel@blabla> Message-ID: <1327518868.3840.1.camel@blabla> On Wed, 2012-01-25 at 10:22 -0600, Jonathan Monette wrote: > interactive did not seem to work either. This time this is what I got before I killed it. Seems to be working to some extent :) I gather you are not running this on a local machine (something that has the keyboard you are typing on attached to it)? > > Swift trunk swift-r5520 (swift modified locally) cog-r3354 > > RunID: 20120125-1013-3iuuhmu7 > Progress: time: Wed, 25 Jan 2012 10:13:18 -0600 > [null] ) > [null] / > [null] - > [null] ) > [null] ,/1579> > Progress: time: Wed, 25 Jan 2012 10:13:26 -0600 Initializing site shared directory:1 > [null] 2 > [null] > [null] ! > [null] From jonmon at mcs.anl.gov Wed Jan 25 13:19:05 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 25 Jan 2012 13:19:05 -0600 Subject: [Swift-user] surveyor In-Reply-To: <1327518868.3840.1.camel@blabla> References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> <1327471487.6175.0.camel@blabla> <1327518868.3840.1.camel@blabla> Message-ID: Correct. I am logged into communicado and trying to have swift execute using surveyor. On Jan 25, 2012, at 1:14 PM, Mihael Hategan wrote: > On Wed, 2012-01-25 at 10:22 -0600, Jonathan Monette wrote: >> interactive did not seem to work either. This time this is what I got before I killed it. > > Seems to be working to some extent :) > > I gather you are not running this on a local machine (something that has > the keyboard you are typing on attached to it)? > >> >> Swift trunk swift-r5520 (swift modified locally) cog-r3354 >> >> RunID: 20120125-1013-3iuuhmu7 >> Progress: time: Wed, 25 Jan 2012 10:13:18 -0600 >> [null] ) >> [null] / >> [null] - >> [null] ) >> [null] ,/1579> >> Progress: time: Wed, 25 Jan 2012 10:13:26 -0600 Initializing site shared directory:1 >> [null] 2 >> [null] >> [null] ! >> [null] > > From hategan at mcs.anl.gov Wed Jan 25 13:22:39 2012 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 25 Jan 2012 11:22:39 -0800 Subject: [Swift-user] surveyor In-Reply-To: References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> <1327471487.6175.0.camel@blabla> <1327518868.3840.1.camel@blabla> Message-ID: <1327519359.4031.2.camel@blabla> On Wed, 2012-01-25 at 13:19 -0600, Jonathan Monette wrote: > Correct. I am logged into communicado and trying to have swift execute using surveyor. Btw, those weird characters are there because Java (at least 1.4 and lower) did not have a console password masking mechanism. You should just type the password, unless there's something else wrong. In the mean time I'll check to see if newer versions of Java have a more reasonable solution to the problem. > > On Jan 25, 2012, at 1:14 PM, Mihael Hategan wrote: > > > On Wed, 2012-01-25 at 10:22 -0600, Jonathan Monette wrote: > >> interactive did not seem to work either. This time this is what I got before I killed it. > > > > Seems to be working to some extent :) > > > > I gather you are not running this on a local machine (something that has > > the keyboard you are typing on attached to it)? > > > >> > >> Swift trunk swift-r5520 (swift modified locally) cog-r3354 > >> > >> RunID: 20120125-1013-3iuuhmu7 > >> Progress: time: Wed, 25 Jan 2012 10:13:18 -0600 > >> [null] ) > >> [null] / > >> [null] - > >> [null] ) > >> [null] ,/1579> > >> Progress: time: Wed, 25 Jan 2012 10:13:26 -0600 Initializing site shared directory:1 > >> [null] 2 > >> [null] > >> [null] ! > >> [null] > > > > > From jonmon at mcs.anl.gov Wed Jan 25 13:24:39 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Wed, 25 Jan 2012 13:24:39 -0600 Subject: [Swift-user] surveyor In-Reply-To: <1327519359.4031.2.camel@blabla> References: <400622107.175325.1327449875179.JavaMail.root@zimbra.anl.gov> <1327471487.6175.0.camel@blabla> <1327518868.3840.1.camel@blabla> <1327519359.4031.2.camel@blabla> Message-ID: <2E759923-AE96-475B-8D7A-F2A5FCE61078@mcs.anl.gov> I tried typing in my password. The prompt just showed back up. That was a second run that was copied. I will retry again though. On Jan 25, 2012, at 1:22 PM, Mihael Hategan wrote: > On Wed, 2012-01-25 at 13:19 -0600, Jonathan Monette wrote: >> Correct. I am logged into communicado and trying to have swift execute using surveyor. > > Btw, those weird characters are there because Java (at least 1.4 and > lower) did not have a console password masking mechanism. You should > just type the password, unless there's something else wrong. > > In the mean time I'll check to see if newer versions of Java have a more > reasonable solution to the problem. > >> >> On Jan 25, 2012, at 1:14 PM, Mihael Hategan wrote: >> >>> On Wed, 2012-01-25 at 10:22 -0600, Jonathan Monette wrote: >>>> interactive did not seem to work either. This time this is what I got before I killed it. >>> >>> Seems to be working to some extent :) >>> >>> I gather you are not running this on a local machine (something that has >>> the keyboard you are typing on attached to it)? >>> >>>> >>>> Swift trunk swift-r5520 (swift modified locally) cog-r3354 >>>> >>>> RunID: 20120125-1013-3iuuhmu7 >>>> Progress: time: Wed, 25 Jan 2012 10:13:18 -0600 >>>> [null] ) >>>> [null] / >>>> [null] - >>>> [null] ) >>>> [null] ,/1579> >>>> Progress: time: Wed, 25 Jan 2012 10:13:26 -0600 Initializing site shared directory:1 >>>> [null] 2 >>>> [null] >>>> [null] ! >>>> [null] >>> >>> >> > > From iraicu at cs.iit.edu Wed Jan 25 21:04:23 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Wed, 25 Jan 2012 21:04:23 -0600 Subject: [Swift-user] Fwd: Reminder of CFP for SIGMOD workshop SWEET'12 References: Message-ID: <489F81AB-1789-4D56-8E5A-B8DB815A1DDA@cs.iit.edu> Hi all, I think this workshop seems relevant to the Swift community. Cheers, Ioan -- ================================================ Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================ Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================ Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================ ================================================ Begin forwarded message: > From: Jan Hidders > Date: January 25, 2012 4:02:18 PM CST > To: iraicu at cs.iit.edu > Subject: Reminder of CFP for SIGMOD workshop SWEET'12 > > Dear Ioan Raicu, > > Given your expertise in the relevant area we would like to remind you of the SIGMOD workshop SWEET'12 on scalable workflow enactment engines and technologies. Enclosed you will find the final call for papers. Please note that the submission deadline, 19 February, is rapidly approaching. We hope you will have the opportunity to submit a high quality paper. > > On behalf of the organizers, > > Jan Hidders, TU Delft, The Netherlands > Paolo Missier, Newcastle University, UK > Jacek Sroka, University of Warsaw, Poland > > > > ************************* > * Final Call for Papers * > ************************* > > SWEET'12 > 1st International Workshop on Scalable Workflow Enactment Engines and Technologies > http://sites.google.com/site/sweetworkshop2012 > inquiries: sweet2012 at easychair.org > > Held in conjunction with SIGMOD 2012 > Scottsdale, Arizona, USA, May 20, 2012 > http://www.sigmod.org/2012/ > > ---------------- > IMPORTANT DATES: > ---------------- > Papers submission deadline: February, 19th, 2012 > Authors notification: April 8th > Deadline for camera-ready copy: May 13th > Workshop: May 20 > > ----- > FOCUS > ----- > The goal of the workshop is to bring together researchers and practitioners to explore the potential of cloud-based computing in facilitating the convergence between workflows and large-scale data processing. Concretely, the workshop is expected to provide insight into: > > - performance issues: efficient data processing using cloud-based workflows, > - modelling issues: best practices in data-intensive workflow modelling and enactment, > - support technology issues: how the potential synergy between large-scale data processing and workflow technology can be exploited in a principled way. > > The workshop aims to address issues of (i) Architecture, (ii) Models and Languages, (iii) Applications of cloud-based workflows. Specific topics include (but, as usual, are not limited to): > > Architectures: > + cloud-based, scalable workflow enactment architectures, > + efficient data storage for data-intensive workflows, > + optimizing execution of data-intensive workflows, > + workflow scheduling in cloud computing. > > Models, Languages: > + languages for data-intensive workflows, data processing pipelines and data-mashups, > + verification and validation of data-intensive workflows, > + programming models for cloud computing, > + access control and authorisation models, privacy, security, risk and trust issues, > + workflow patterns for data-intensive workflows. > > Applications of cloud-based workflow: > + bioinformatics, > + data mashups, > + semantic web data management, > + big data analytics. > > ---------------- > SUBMISSION GUIDELINES > ---------------- > We invite full research or experience papers (up to 12 pages), or short papers (up to 6 pages) describing research in progress, > formatted using the ACM proceedings style (http://www.acm.org/sigs/publications/proceedings-templates) > > ---------------- > PUBLICATION > ---------------- > The workshop proceedings will be part published by CEUR and will be included in the ACM DL. > > In addition, we have an agreement with the Fundamenta Informaticae journal to fast-track a few selected paper for further publication. > > --------------------------- > KEYNOTE > --------------------------- > Dr. Pawel Garbacki from Google Inc.: "Data Processing at Scale" > > --------------------------- > CHAIRS > --------------------------- > Jan Hidders, TU Delft, The Netherlands > Jacek Sroka University of Warsaw, Poland > Paolo Missier, Newcastle University, UK > > > --------------------------- > Program Committee > --------------------------- > > Sarah Cohen-Boulakia, LRI, Universite Paris-Sud, France > Juliana Freire, NYU Poly, USA > Khalid Belhajjame, University of Manchester, UK > Vasa Curcin, Imperial college, London, UK > Paul Groth, VU University Amsterdam, NL > Paul Watson, Newcastle University, UK > Hugo Hiden, Newcastle University, UK > Matthew Jones, University of California Santa Barbara, USA > Bertram Ludaescher, UC Davis, USA > Marta Mattoso, COPPE- Federal Univ. Rio de Janeiro, Brasil > Norman Paton, University of Manchester, UK > Jelena Pjesivac-Grbovic, Google, USA > Benjamin Reed, Yahoo! Research > Yogesh Simmhan, University of Southern California, USA > Krzysztof Stencel, University of Warsaw, Poland > Wei Tan, J.T. Watson IBM Research, USA > Giovanni Tummarello, DERI, National University of Ireland Galway, Ireland > Jerzy Tyszkiewicz, Institute of Informatics, Warsaw University, PL > Jan Van Den Bussche, Hasselt University & Transnational University of Limburg, Belgium > Aad Van Moorsel, Newcastle University, UK, USA > Simon Woodman, Newcastle University, UK > Suraj Pandey, University of Melbourne, Australia > Jianwu Wang, University of California, San Diego, USA -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.iit.edu Thu Jan 26 15:59:41 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 26 Jan 2012 15:59:41 -0600 Subject: [Swift-user] CFP: The 9th Int. Conf. on Autonomic Computing (ICAC 2012) -- San Jose CA Message-ID: <4F21CCCD.6070106@cs.iit.edu> CALL FOR PAPERS and WORKSHOP PROPOSALS The 9th International Conference on Autonomic Computing (ICAC 2012) September 17-21, 2012. San Jose, CA, USA http://icac2012.cs.fiu.edu/ ----------------------------------------------------------------- IMPORTANT DATES Paper and Poster Submission: March 9, 2012, 11:59pm PST Notification: May 18, 2012 Camera-ready Due: June 8, 2012 Workshop Proposal Submission: February 10, 2012 ----------------------------------------------------------------- OVERVIEW ICAC is the leading conference on autonomic computing techniques, foundations, and applications. Autonomic computing refers to methods and means for automated management of performance, fault, security, and configuration with little involvement of users or administrators. Systems introducing new autonomic features are becoming increasingly prevalent, motivating research that spans a variety of areas, from computer systems, networking, software engineering, and data management to machine learning, control theory, and bio-inspired computing. ICAC brings together researchers and practitioners across these disciplines to address multiple facets of adaptation and self-management in computing systems and applications from different perspectives. Autonomic computing solutions are sought for clouds, grids, data centers, enterprise software, internet services, data services, smart phones, embedded systems, and sensor networks. In these environments, resources and applications must be managed to maximize performance and minimize cost, while maintaining predictable and reliable behavior in the face of varying workloads, failures, and malicious threats. Papers are solicited from all areas of autonomic computing, including (but not limited to): * End-to-end techniques for management of resources, workloads, performance, faults, power/cooling, security, and others. * Self-managing components, such as server, storage, network protocols, or specific application elements, and embedded and mobile end systems such as smart phones. * Decision and analysis techniques and their use, such as machine learning, control theory, predictive methods, probability and stochastic processes, queuing theory methodologies, emergent behavior, rule-based systems, and bio-inspired techniques. * Monitoring systems for autonomic computing. * Hypervisor, operating systems, hardware, or application support for autonomic computing. * Novel human interfaces for monitoring and controlling autonomic systems. * Management topics, such as specification and modeling of service-level agreements, behavior enforcement and tie-in with IT governance. * Toolkits, frameworks, principles and architectures, from software engineering practices and experimental methodologies to agent-based techniques and virtualization. * Fundamental science and theory of self-managing systems: understanding, controlling or exploiting system behaviors to enforce autonomic properties. * Applications of autonomic computing and experiences with prototyped or deployed systems solving real-world problems in science, engineering, business and society. Papers will be judged on originality, significance, interest, correctness, clarity and relevance to the broader community. Papers should report on experiences, measurements, user studies, or other evaluations, as appropriate. Evaluations of a prototype or large-scale deployment of systems and applications is expected. PAPER AND POSTER SUBMISSIONS Full papers (a maximum of 10 pages in the two-column ACM proceedings format) and posters (2 pages) are invited on a wide variety of topics relating to autonomic computing. Submitted papers must be original work, and may not be under consideration for another conference or journal. Complete formatting and submission instructions can be found on the conference web site. Accepted papers and posters will appear in proceedings distributed at the conference and available electronically. Relevant top ICAC'12 papers will be invited for "fast-track" submissions to the ACM Transactions on Autonomous and Adaptive Systems (TAAS). WORKSHOPS, DEMONSTRATIONS AND EXHIBITION ICAC'12 welcomes proposals for co-located workshops on topics of interest to the autonomic computing community. Workshop proposals should be submitted to the Workshop Chair, Fred Douglis (f.douglis at computer.org) by February 10, 2012. Workshops are expected to publish proceedings, and should cover areas that complement the main program. ICAC'12 will also feature a demonstration and exhibition session consisting of prototypes and technology artifacts such as demonstrating autonomic software or autonomic computing principles. Entries will be judged by a separate committee led by the demo/exhibit chair. INDUSTRY SESSION One of ICAC's important roles is to bring together researchers and practitioners from academia and industry. In its industry session, ICAC helps fulfill this role by presenting an industry viewpoint on technologies, products, and market needs. The industry session also addresses current challenges, and opportunities for academic and corporate research collaborations. We encourage industry leaders, including entrepreneurs, product developers, architects, managers, marketers and end users, to submit their papers and posters reflecting such industry perspectives as part of the regular submission process. ------------------------------------------------------------------ ORGANIZERS GENERAL CHAIR Dejan Milojicic, HP Labs PROGRAM CHAIRS Dongyan Xu, Purdue University Vanish Talwar, HP Labs INDUSTRY CHAIR Xiaoyun Zhu, VMware WORKSHOPS CHAIR Fred Douglis, EMC POSTERS/DEMO/EXHIBITS CHAIR Eno Thereska, Microsoft Research FINANCE CHAIR Michael Kozuch, Intel LOCAL ARRANGEMENT CHAIR Jessica Blaine PUBLICITY CHAIRS Daniel Batista, University of S?o Paulo Vartan Padaryan, ISP/Russian Academy of Sci. Ioan Raicu, Illinois Inst. of Technology Jianfeng Zhan, ICT/Chinese Academy of Sci. Ming Zhao, Florida Intl. University PROGRAM COMMITTEE Tarek Abdelzaher, UIUC Umesh Bellur, IIT, Bombay Ken Birman, Cornell University Rajkumar Buyya, Univ. of Melbourne Rocky Chang, Hong Kong Polytechnic University Yuan Chen, HP Labs Alva Couch, Tufts University Peter Dinda, Northwestern University Fred Douglis, EMC Renato Figueiredo, University of Florida Mohamed Hefeeda, Qatar Computing Research Institute Joe Hellerstein, Google Geoff Jiang, NEC Labs Jeff Kephart, IBM Research Emre Kiciman, Microsoft Research Fabio Kon, University of S?o Paulo Michael Kozuch, Intel Dejan Milojicic, HP Labs Klara Nahrstedt, UIUC Priya Narasimhan, CMU Manish Parashar, Rutgers University Ioan Raicu, Illinois Inst. of Technology Omer Rana, Cardiff University Masoud Sadjadi, Florida Intl. University Rick Schlichting, AT&T Labs Hartmut Schmeck, KIT Karsten Schwan, Georgia Tech Onn Shehory, IBM Research Eno Thereska, Microsoft Research Xiaoyun Zhu, VMware -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= From iraicu at cs.iit.edu Thu Jan 26 16:35:52 2012 From: iraicu at cs.iit.edu (Ioan Raicu) Date: Thu, 26 Jan 2012 16:35:52 -0600 Subject: [Swift-user] CFP: 3rd Cloud Futures Workshop 2012: Hot Topics in Research and Education -- Berkeley CA Message-ID: <4F21D548.6040606@cs.iit.edu> *3rd Cloud Futures Workshop 2012*: Hot Topics in Research and Education May 7--8, 2012 | Berkeley, California, United States *Website:*http://research.microsoft.com/en-US/events/cloudfutures2012/ Cloud computing is an exciting platform for research and education. It has the potential to advance scientific and technological progress by making data and computing resources readily available at unprecedented economy of scale and nearly infinite scalability. To realize the full promise of cloud computing for research and education, however, we must think about the cloud as a holistic platform for creating new services, new experiences, and new methods to pursue research, teaching, and scholarly communication. This goal presents a broad range of interesting questions. The Cloud Futures Workshop series brings together thought leaders from academia, industry, and government to discuss the role of cloud computing across a variety of research and educational areas. Presentations, posters and discussions will investigate how new programming techniques, software platforms, software engineering and methodology, and methods of research and teaching in the cloud may solve distinct challenges arising in diverse areas of society. *General Co-Chairs* Michaael J Franklin, University of California, Berkeley Tony Hey, Microsoft Research *Keynote speakers* Joseph L Hellerstein, Manager, Big Science , Google Inc. Yousef Khalidi, Distinguished Engineer, Microsoft Corporation *Call for Abstracts and Participation* This year, we are looking for extended abstracts on hot topics in cloud computing to be presented at the workshop---either as talks or posters. Abstracts should highlight how new techniques and methods of research in the cloud may solve distinct challenges arising in diverse areas, including computer science, engineering, earth sciences, healthcare, humanities, interactive games, life sciences, and social sciences. We encourage abstracts that describe practical experiences, experimental results, and vision papers. All papers will be peer-reviewed. *Submission Instructions* ?Submit abstracts of five pages, including references. ?Your submission should include a bio (150 words maximum). ?Submit your abstracts through the online form . *Important Dates* ?Abstracts due: February 29, 2012 ?Results available: March 23, 2012 ?Workshop: May 7--8, 2012 *About the Workshop* The Cloud Futures 2012 workshop is a joint venture between the Microsoft Research Connections, Azure Research Engagement, and Developer & Platform Evangelism Academic groups, and is in association with and co-supported by the University of California, Berkeley. -- ================================================================= Ioan Raicu, Ph.D. Assistant Professor, Illinois Institute of Technology (IIT) Guest Research Faculty, Argonne National Laboratory (ANL) ================================================================= Data-Intensive Distributed Systems Laboratory, CS/IIT Distributed Systems Laboratory, MCS/ANL ================================================================= Cel: 1-847-722-0876 Office: 1-312-567-5704 Email: iraicu at cs.iit.edu Web: http://www.cs.iit.edu/~iraicu/ Web: http://datasys.cs.iit.edu/ ================================================================= ================================================================= -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Thu Jan 26 23:55:50 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Thu, 26 Jan 2012 23:55:50 -0600 Subject: [Swift-user] provider staging config file Message-ID: <8E56CC45-33A4-4D0D-BFA8-10019F037080@mcs.anl.gov> Does anyone have a swift config file that uses provider staging that they could provide? From svemalayan at yahoo.com Fri Jan 27 00:24:21 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Thu, 26 Jan 2012 22:24:21 -0800 (PST) Subject: [Swift-user] Swift temporary files / logfiles Message-ID: <1327645461.7657.YahooMailNeo@web39501.mail.mud.yahoo.com> Dear All, When I run a application with Swift, I can see some temporary files (log files and other intermediate files ) are being created inside working directory. Is there any way to switch off / reduce these IO operations during performance runs ? Thank you Emalayan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Fri Jan 27 08:05:15 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Fri, 27 Jan 2012 08:05:15 -0600 Subject: [Swift-user] provider staging config file In-Reply-To: <8E56CC45-33A4-4D0D-BFA8-10019F037080@mcs.anl.gov> References: <8E56CC45-33A4-4D0D-BFA8-10019F037080@mcs.anl.gov> Message-ID: Here is the one I use: wrapperlog.always.transfer=false sitedir.keep=true execution.retries=2 lazy.errors=true status.mode=provider use.provider.staging=true provider.staging.pin.swiftfiles=false foreach.max.threads=300 ====== On Thu, Jan 26, 2012 at 11:55 PM, Jonathan Monette wrote: > Does anyone have a swift config file that uses provider staging that they > could provide? > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonmon at mcs.anl.gov Fri Jan 27 08:15:31 2012 From: jonmon at mcs.anl.gov (Jonathan Monette) Date: Fri, 27 Jan 2012 08:15:31 -0600 Subject: [Swift-user] provider staging config file In-Reply-To: References: <8E56CC45-33A4-4D0D-BFA8-10019F037080@mcs.anl.gov> Message-ID: <1A4E483A-1D9C-4154-B75A-AA1E8C4A78AD@mcs.anl.gov> Thanks Ketan! On Jan 27, 2012, at 8:05 AM, Ketan Maheshwari wrote: > Here is the one I use: > > wrapperlog.always.transfer=false > sitedir.keep=true > execution.retries=2 > lazy.errors=true > status.mode=provider > use.provider.staging=true > provider.staging.pin.swiftfiles=false > foreach.max.threads=300 > ====== > > > On Thu, Jan 26, 2012 at 11:55 PM, Jonathan Monette wrote: > Does anyone have a swift config file that uses provider staging that they could provide? > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > -- > Ketan > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Sun Jan 29 09:57:01 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Sun, 29 Jan 2012 09:57:01 -0600 Subject: [Swift-user] Swift temporary files / logfiles In-Reply-To: <1327645461.7657.YahooMailNeo@web39501.mail.mud.yahoo.com> References: <1327645461.7657.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: Emalayan, Swift uses log4j for logging. You should be able to tune the log4j.properties to decrease the amount of logs produced. Alternatively, you could completely disable log4j from sources. Regards, Ketan On Fri, Jan 27, 2012 at 12:24 AM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Dear All, > > When I run a application with Swift, I can see some temporary files (log > files and other intermediate files ) are being created inside working > directory. Is there any way to switch off / reduce these IO operations > during performance runs ? > > Thank you > Emalayan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 30 13:50:39 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 30 Jan 2012 11:50:39 -0800 (PST) Subject: [Swift-user] Swift temporary files / logfiles In-Reply-To: References: <1327645461.7657.YahooMailNeo@web39501.mail.mud.yahoo.com> Message-ID: <1327953039.61961.YahooMailNeo@web39508.mail.mud.yahoo.com> Hi Ketan, Thank you very much. By the way how I can tune log4j.properties / disabling log4j ? (Should I disable it via config file ?) Thank you Emalayan ________________________________ From: Ketan Maheshwari To: Emalayan Vairavanathan Cc: swift user Sent: Sunday, 29 January 2012 7:57 AM Subject: Re: [Swift-user] Swift temporary files / logfiles Emalayan,? Swift uses log4j for logging. You should be able to tune the log4j.properties to decrease the amount of logs produced. Alternatively, you could completely disable log4j from sources. Regards, Ketan On Fri, Jan 27, 2012 at 12:24 AM, Emalayan Vairavanathan wrote: Dear All, > > >When I run a application with Swift, I can see some temporary files (log files and other intermediate files ) are being created inside working directory. Is there any way to switch off / reduce these IO operations during performance runs ? > > >Thank you >Emalayan > >_______________________________________________ >Swift-user mailing list >Swift-user at ci.uchicago.edu >https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From svemalayan at yahoo.com Mon Jan 30 13:53:33 2012 From: svemalayan at yahoo.com (Emalayan Vairavanathan) Date: Mon, 30 Jan 2012 11:53:33 -0800 (PST) Subject: [Swift-user] Montage+Swift+Coasters References: <1327017075.46246.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327024520.62801.YahooMailNeo@web39505.mail.mud.yahoo.com> <4D008EF1-8072-4A18-8A5A-18D74A2ACDA2@mcs.anl.gov> <1327346756.49322.YahooMailNeo@web39506.mail.mud.yahoo.com> <1327348246.89718.YahooMailNeo@web39507.mail.mud.yahoo.com> <1327350350.29451.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327359158.12794.YahooMailNeo@web39505.mail.mud.yahoo.com> <1327375262.49286.YahooMailNeo@web39503.mail.mud.yahoo.com> <1327377349.65265.Yahoo! M ailNeo@web39505.mail.mud.yahoo.com> <1327433855.75566.YahooMailNeo@web39505.mail.mud.yahoo.com> <96C93FE9-5535-45B5-9D70-1356268B73D5@mcs.anl.gov> Message-ID: <1327953213.34872.YahooMailNeo@web39502.mail.mud.yahoo.com> Hi Jon, I added mBackground to tc file. But I am still getting the same error. To answer your questions: Yes I am using only the local machine for now (No workers are started in remote nodes). By the way how we can debug the problem ? Do you think whether having a Skpye session is a good idea ? Thank you Emalayan ________________________________ From: Jonathan Monette To: Emalayan Vairavanathan Cc: swift user Sent: Tuesday, 24 January 2012 1:11 PM Subject: Re: [Swift-user] Montage+Swift+Coasters Ok.? Try adding it to your tc.? You are only using the local machine correct?? You aren't using some remote cluster are you? On Jan 24, 2012, at 2:41 PM, Jonathan Monette wrote: > Error code 127 means that the command wasn't found.? This line was in the log > > stderr.txt: /bin/sh: mBackground: command not found > > So?.do you have all the Montage binaries in your path?? What happens when you say mBackground in the terminal? > > On Jan 24, 2012, at 1:37 PM, Emalayan Vairavanathan wrote: > >> > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Jan 30 14:06:48 2012 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 30 Jan 2012 14:06:48 -0600 Subject: [Swift-user] Swift temporary files / logfiles In-Reply-To: <1327953039.61961.YahooMailNeo@web39508.mail.mud.yahoo.com> References: <1327645461.7657.YahooMailNeo@web39501.mail.mud.yahoo.com> <1327953039.61961.YahooMailNeo@web39508.mail.mud.yahoo.com> Message-ID: I haven't tested but you should be able to disable logs by putting the log level to OFF in the log4j.properties file. On Mon, Jan 30, 2012 at 1:50 PM, Emalayan Vairavanathan < svemalayan at yahoo.com> wrote: > Hi Ketan, > > Thank you very much. By the way how I can tune log4j.properties / > disabling log4j ? (Should I disable it via config file ?) > > Thank you > Emalayan > > ------------------------------ > *From:* Ketan Maheshwari > *To:* Emalayan Vairavanathan > *Cc:* swift user > *Sent:* Sunday, 29 January 2012 7:57 AM > *Subject:* Re: [Swift-user] Swift temporary files / logfiles > > Emalayan, > > Swift uses log4j for logging. You should be able to tune the > log4j.properties to decrease the amount of logs produced. Alternatively, > you could completely disable log4j from sources. > > Regards, > Ketan > > On Fri, Jan 27, 2012 at 12:24 AM, Emalayan Vairavanathan < > svemalayan at yahoo.com> wrote: > > Dear All, > > When I run a application with Swift, I can see some temporary files (log > files and other intermediate files ) are being created inside working > directory. Is there any way to switch off / reduce these IO operations > during performance runs ? > > Thank you > Emalayan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > -- > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From turam at mcs.anl.gov Tue Jan 31 17:22:52 2012 From: turam at mcs.anl.gov (Thomas Uram) Date: Tue, 31 Jan 2012 17:22:52 -0600 Subject: [Swift-user] ProxyPathValidatorException: No relevant signing policy for CA Message-ID: I'm encountering the following running on PADS via coaster/ssh:pbs , running on various CI machines, including login1.pads.ci.uchicago.edu itself. As another datapoint, gsissh works to login1.pads.ci.uchicago.edu using this proxy certificate; I would guess gsissh would be validating the signing policy, too. Authentication failed. Caused by Defective credential detected. Caused by org.globus.gsi.proxy.ProxyPathValidatorException: No relevant signing policy for CA "/DC=edu/DC=uchicago/DC=ci/OU=myproxy/CN=grid.ci.uchicago.edu/E=support at ci.uchicago.edu" in file "/etc/grid-security/certificates/de4bc9f5.signing_policy" at org.globus.gsi.proxy.ProxyPathValidator.checkSigningPolicy(ProxyPathValidator.java:978) at org.globus.gsi.proxy.ProxyPathValidator.validate(ProxyPathValidator.java:555) at org.globus.gsi.proxy.ProxyPathValidator.validate(ProxyPathValidator.java:354) at org.globus.gsi.gssapi.GlobusGSSContextImpl$GSSProxyPathValidator.validate(GlobusGSSContextImpl.java:695) at org.globus.gsi.gssapi.GlobusGSSContextImpl.verifyChain(GlobusGSSContextImpl.java:731) at org.globus.gsi.gssapi.GlobusGSSContextImpl.acceptSecContext(GlobusGSSContextImpl.java:325) at org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:129) at org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) at org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) at org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:41) at org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:46) at org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:44) at org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:71) at org.globus.net.BaseServer.run(BaseServer.java:247) at java.lang.Thread.run(Thread.java:662) *** signing policy file cat /etc/grid-security/certificates/de4bc9f5.signing_policy # Computation Institute MyProxy Certificate Authority Signing Policy # generated by gx-ca-update (gx-map 0.5.3.3) # See also access_id_CA X509 '/DC=edu/DC=uchicago/DC=ci/OU=myproxy/CN=grid.ci.uchicago.edu/emailAddress=support at ci.uchicago.edu' pos_rights globus CA:sign cond_subjects globus '/DC=edu/DC=uchicago/DC=ci/*' *** sites.xml 2 300 1 1 1 1 fast 5.99 10000 /home/turam/tmp From wilde at mcs.anl.gov Tue Jan 31 17:29:34 2012 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 31 Jan 2012 17:29:34 -0600 (CST) Subject: [Swift-user] ProxyPathValidatorException: No relevant signing policy for CA In-Reply-To: Message-ID: <1012074326.198750.1328052574660.JavaMail.root@zimbra.anl.gov> Tom, just a quick thing to try (as I cant looking into this more thoughtfully at the moment): On various CI settings Ive had to use more recent CA signing policy files. Can you try doing this setup to get those in your environment? source /opt/osg/setup.sh This package is installed I think on bridled and communicado. Can you try from one of those hosts to PADS after sourcing that setup.sh? Also make sure you are not manually setting X509_CERT_DIR or X509_CADIR to point to some out of date CA files. - Mike ----- Original Message ----- > From: "Thomas Uram" > To: "swift user" > Sent: Tuesday, January 31, 2012 5:22:52 PM > Subject: [Swift-user] ProxyPathValidatorException: No relevant signing policy for CA > I'm encountering the following running on PADS via coaster/ssh:pbs , > running on various CI machines, including login1.pads.ci.uchicago.edu > itself. As another datapoint, gsissh works to > login1.pads.ci.uchicago.edu using this proxy certificate; I would > guess gsissh would be validating the signing policy, too. > > Authentication failed. Caused by Defective credential detected. Caused > by org.globus.gsi.proxy.ProxyPathValidatorException: No relevant > signing policy for CA > "/DC=edu/DC=uchicago/DC=ci/OU=myproxy/CN=grid.ci.uchicago.edu/E=support at ci.uchicago.edu" > in file "/etc/grid-security/certificates/de4bc9f5.signing_policy" > at > org.globus.gsi.proxy.ProxyPathValidator.checkSigningPolicy(ProxyPathValidator.java:978) > at > org.globus.gsi.proxy.ProxyPathValidator.validate(ProxyPathValidator.java:555) > at > org.globus.gsi.proxy.ProxyPathValidator.validate(ProxyPathValidator.java:354) > at > org.globus.gsi.gssapi.GlobusGSSContextImpl$GSSProxyPathValidator.validate(GlobusGSSContextImpl.java:695) > at > org.globus.gsi.gssapi.GlobusGSSContextImpl.verifyChain(GlobusGSSContextImpl.java:731) > at > org.globus.gsi.gssapi.GlobusGSSContextImpl.acceptSecContext(GlobusGSSContextImpl.java:325) > at > org.globus.gsi.gssapi.net.GssSocket.authenticateServer(GssSocket.java:129) > at > org.globus.gsi.gssapi.net.GssSocket.startHandshake(GssSocket.java:147) > at > org.globus.gsi.gssapi.net.GssSocket.getInputStream(GssSocket.java:177) > at > org.globus.cog.karajan.workflow.service.channels.AbstractTCPChannel.setSocket(AbstractTCPChannel.java:41) > at > org.globus.cog.karajan.workflow.service.channels.GSSChannel.(GSSChannel.java:46) > at > org.globus.cog.karajan.workflow.service.ConnectionHandler.(ConnectionHandler.java:44) > at > org.globus.cog.abstraction.coaster.service.local.LocalService.handleConnection(LocalService.java:71) > at org.globus.net.BaseServer.run(BaseServer.java:247) > at java.lang.Thread.run(Thread.java:662) > > > *** signing policy file > > cat /etc/grid-security/certificates/de4bc9f5.signing_policy > # Computation Institute MyProxy Certificate Authority Signing Policy > # generated by gx-ca-update (gx-map 0.5.3.3) > # See also > > access_id_CA X509 > '/DC=edu/DC=uchicago/DC=ci/OU=myproxy/CN=grid.ci.uchicago.edu/emailAddress=support at ci.uchicago.edu' > pos_rights globus CA:sign > cond_subjects globus '/DC=edu/DC=uchicago/DC=ci/*' > > *** sites.xml > > > > url="login1.pads.ci.uchicago.edu"/> > > 2 > 300 > 1 > 1 > 1 > 1 > fast > 5.99 > 10000 > /home/turam/tmp > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory