From benc at hawaga.org.uk Mon Dec 1 12:59:20 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Dec 2008 18:59:20 +0000 (GMT) Subject: [Swift-devel] User perspective on how an app procedure call maps into an application executable call Message-ID: I wrote the below in an attempt to document what I percieve to be the key user percieved behaviour of how an app procedure call in SwiftScript is executed on a worker. This is distinct from how the present implementation causes the below to actually happen - I plan to write a related note for that. In part, I hop this (and hopefully the related implementation note) will be useful in discussing different ways in which Swift can manage data. >From a SwiftScript perspective, the below note should document what I think needs to be provided by any modified implementation. Discussion welcome. How an app procedure call maps into an application executable call, from a Swift user perspective, attempting to avoid the mechanics inside Swift. =========== In some of these notes, there is reference to this example Swift program: type file; app (file o) count(file i) { wc @i stdout=@o; } file q <"input.txt">; file r <"output.txt">; The executable for wc will be looked up in tc.data. This unix executable will then be executed in some //application procedure workspace//. This means: Each application procedure workspace will have an application workspace directory. (TODO: can collapse terms //application procedure workspace// and //application workspace directory// ? This application workspace directory will not be shared with any other //application procedure execution attempt//; all application procedure execution attempts will run with distinct application procedure workspaces. (for the avoidance of doubt: If a //SwiftScript procedure invocation// is subject to multiple application procedure execution attempts (due to Swift-level restarts, retries or replication) then each of those application rocedure execution attempts will be made in a different application procedure workspace. ) The application workspace directory will be a directory on a POSIX filesystem accessible throughout the application execution by the application executable. Before the //application executable// is executed: * The application workspace directory will exist. * The //input files// will exist inside the application workspace directory (but not necessarily as direct children; there may be subdirectories within the application workspace directory). * The //input files// will be those files //mapped// to //input parameters// of the application procedure invocation. (In the example, this means that the file input.txt will exist in the application workspace directory) * For each input file dataset, it will be the case that @filename or @filenames invoked with that dataset as a parameter will return the path relative to the application workspace directory for the file(s) that are associated with that dataset. (In the example, that means that @i will evaluate to the path "input.txt") * For each //file-bound// parameter of the Swift procedure invocation, the associated files (determined by data type?) will always exist. * The input files must be treated as read only files. This may or may not be enforced by unix file system permissions. During/after the //application executable execution//, the following must be true: * If the application executable execution was successful (in the opinion of the application executable), then the application executable should exit with //unix return code// 0; if the application executable execution was unsuccessful (in the opinion of the application executable), then the application executable should exit with //unix return code// not equal to 0. * Each file mapped from an output parameter of the SwiftScript procedure call must exist. Files will be mapped in the same way as for input files. (? Is it defined that output subdirectories will be precreated before execution or should app executables expect to make them? That's probably determined by the present behaviour of wrapper.sh) Things to not assume: * anything about the path of the application workspace directory * that either the application workspace directory will be deleted or will continue to exist or will remain unmodified after execution has finished * that files can be passed(?def) between application procedure invocations through any mechanism except through files known to Swift through the mapping mechanism (there is some exception here for extern datasets - there are a separate set of assertions that hold for extern datasets) -- From benc at hawaga.org.uk Mon Dec 1 16:00:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Dec 2008 22:00:16 +0000 (GMT) Subject: [Swift-devel] notes on how swift implements file input and output In-Reply-To: References: Message-ID: read this in conjunction with previous note, "Subject: User perspective on how an app procedure call maps into an application executable call" This note details the implementation of Swift file input and output in application blocks; it is intended to be read in conjunction with a previous note 'How an app procedure call maps into an application call, from a Swift user perspective, attempting to avoid the mechanics inside Swift.' Swift executes application procedures on one or more //sites//. Each site consists of: * worker nodes. There is some //execution mechanism// through which the Swift client side executable can execute its //wrapper script// on those worker nodes. This is commonly GRAM or Falkon or coasters. * a site-shared file system. This site shared filesystem is accessible through some //file transfer mechanism// from the Swift client side executable. This is commonly GridFTP or coasters. This site shared filesystem is also accessible through the posix file system on all worker nodes, mounted at the same location as seen through the file transfer mechanism. Swift is configured with the location of some //site working directory// on that site-shared file system. There is no assumption that the site shared file system for one site is accessible from another site. For each workflow run, on each site that is used by that run, a //run directory// is created in the site working directory, by the Swift client side. In that run directory are placed several subdirectories: * shared/ - site shared files cache * kickstart/ - when kickstart is used, kickstart record files for each job that has generated a kickstart * info/ - wrapper script log files * status/ - job status files * jobs/ //application workspace directories// (optionally placed here - see below) Application execution looks like this: For each application procedure call: The Swift client side selects a site; copies the input files for that procedure call to the site shared file cache if they are not already in the cache, using the file transfer mechanism; and then invokes the wrapper script on that site using the execution mechanism. The wrapper script creates the application workspace directory; places the input files for that job into the application workspace directory using either cp or ln -s (depending on a configuration option); executes the application unix executable; copies output files from the application workspace directory to the site shared directory using cp; creates a status file under the status/ directory; and exits, returning control to the Swift client side. Logs created during the execution of the wrapper script are stored under the info/ directory. The Swift client side then checks for the presence of and deletes a status file indicating success; copies files from the site shared directory to the appropriate client side location. The job directory is created (in the default mode) under the jobs/ directory. However, it can be created under an arbitrary other path, which allows it to be created on a different file system (such as a worker node local file system in the case that the worker node has a local file system). -- From zhaozhang at uchicago.edu Mon Dec 1 16:15:54 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 01 Dec 2008 16:15:54 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers Message-ID: <4934621A.5090109@uchicago.edu> Hi, All The following alternatives is a summary from a talk between Mike and Zhao. We are trying to optimize the data IO performance for swift on supercomputers, includes BGP, Ranger, and possibly Jaguar. We are trying to eliminate all unnecessary data IO during stages of computation. Scenario 1: Say a computation has 2 stages, the 2nd stage would take the output from the 1st stage as the input data. Data Flow in current swift system: 1st stage will write the output data to GPFS, where swift knows this output data is the input for the 2nd stage. Then send the task to on worker on CN. Desired Data Flow: 1st stage of computation knows the output data will be used as the input for the next stage, thus the data is not copied back to GPFS, then the 2nd stage task arrived and consumed this data. Key Issue: the 2nd stage task has no idea of where the 1st stage output data is. Design Alternatives: 1. Data aware task scheduling: Both swift and falkon need to be data aware. Swift should know where the output of 1st stage is, which means, which pset, or say which falkon service. And the falkon service should know which CN has the data for the 2nd stage computation. 2. Swift patch jobs vertically Before sending out any jobs, swift knows those 2 stage jobs has data dependency, thus send out 1 batched job as 1 to each worker. 3. Collective IO Build a shared file system which could be accessed by all CN, instead of writing output data to GPFS, workers copy intermediate output data to this shared ram-disk. And retrieve the data from IFS. Several Concerns: a) reliability of torus network --- we need to test more about this. b) performance of torus network --- could this be really performing better than GPFS? If not, at what scale could torus perform better than GPFS? 4. Half-Collective IO All workers wirte data to IFS, and the data will be periodically copied back to GPFS. In this case, we only optimize the output phase, leave the input phase as is. Any other ideas? Thanks so much. best wishes zhangzhao From hategan at mcs.anl.gov Mon Dec 1 16:35:19 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Dec 2008 16:35:19 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <4934621A.5090109@uchicago.edu> References: <4934621A.5090109@uchicago.edu> Message-ID: <1228170919.3817.11.camel@localhost> On Mon, 2008-12-01 at 16:15 -0600, Zhao Zhang wrote: > Desired Data Flow: 1st stage of computation knows the output data will > be used as the input for the next > stage, thus the data is not copied back to GPFS, then the 2nd stage task > arrived and consumed this data. This assumes a sequential workflow (t1 -> t2 ->... -> tn). For anything more complex, this becomes a nasty scheduling problem. For example: (t1, t2) -> t3 The outputs of which of t1 or t2 should not be copied back? > > Key Issue: the 2nd stage task has no idea of where the 1st stage output > data is. I beg to disagree. Swift provides the mechanism to record where data is. The key issue is that queuing systems don't allow control over the exact nodes that tasks go to. Another key issue is that you may not even want to do so, because that node may be better used running a different task (scheduling problem again). > > Design Alternatives: > 1. Data aware task scheduling: > Both swift and falkon need to be data aware. Swift should know where > the output of 1st stage is, which > means, which pset, or say which falkon service. > And the falkon service should know which CN has the data for the 2nd > stage computation. > > 2. Swift patch jobs vertically > Before sending out any jobs, swift knows those 2 stage jobs has data > dependency, thus send out 1 batched > job as 1 to each worker. > > 3. Collective IO > Build a shared file system which could be accessed by all CN, instead > of writing output data to GPFS, workers > copy intermediate output data to this shared ram-disk. And retrieve > the data from IFS. That seems awfully close to implementing a distributed filesystem, which I think is a fairly bad idea. If you're trying to avoid GPFS contention, then avoid it by carefully sticking your data in different directories. And do keep in mind that most operating systems cache filesystem data in memory, so a read after write of a reasonably small file will be very fast with any filesystem. From iraicu at cs.uchicago.edu Mon Dec 1 16:33:33 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 01 Dec 2008 16:33:33 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <4934621A.5090109@uchicago.edu> References: <4934621A.5090109@uchicago.edu> Message-ID: <4934663D.3070406@cs.uchicago.edu> Hi, I can see option (1) working as long as there is 1 Swift client and 1 Falkon service. For example, our current deployment on the BG/P would not work, as we have 1 Swift to *many *Falkon services. Now, even the 1-1 Swift-Falkon ratio won't work today, as the Falkon provider is not data-aware yet... but could be updated, maybe a few days of coding and testing; the harder part (IMO) will be making sure that the Swift data management doesn't interfere with the Falkon data management, and vice versa. Option (3) and (4), we have discussed before. The trick with these are making things general and transparent enough that it works, and works well. Getting a Torus aggregate throughput to exceed 8GB/s shouldn't be that hard, with probably a fraction of the machine (several racks). Any word on the latest numbers of the improved GPFS, which is supposed to upgrade the number of servers from 8 or 16 up to 100+? With linear scalability, that would mean 80GB/s, the peak of the SAN throughputs I saw a while back in some slides from an ALCF talk. For us to get 80GB/s using CIO, we'd need 2MB/s per node. I bet we can easily achieve that, but it would probably be at the larger scales of 10s of racks. I recall getting 100MB/s+ per node, right? This would give us a theoretical upper bound of 4000GB/s, so in theory, there is plenty of room between 80GB/s and 4000GB/s. I bet in practice, we'd only get a small fraction of that 4000GB/s, but it would be interesting how much we can really get without thinking of the network topology, and also how far we can get if we do take the network topology into consideration. Option (2), I haven't thought of before, but it only works if an output file is only needed as 1 input file. What do you do if you have 1 output file needed for N input files? Do you replicate the first job N times, just so you can get the output file in N locations? Or do you group the jobs in 1+N jobs, where the N jobs execute in serial order on 1 processor/node? This might be worth investigating, but I think you'll be restricting the natural parallelism, or repeating work just to avoid data management. Ioan Zhao Zhang wrote: > Hi, All > > The following alternatives is a summary from a talk between Mike and > Zhao. We are trying > to optimize the data IO performance for swift on supercomputers, > includes BGP, Ranger, > and possibly Jaguar. We are trying to eliminate all unnecessary data > IO during stages of computation. > > Scenario 1: Say a computation has 2 stages, the 2nd stage would take > the output from the 1st stage > as the input data. > > Data Flow in current swift system: 1st stage will write the output > data to GPFS, where swift knows this > output data is the input for the 2nd stage. Then send the task to on > worker on CN. > > Desired Data Flow: 1st stage of computation knows the output data will > be used as the input for the next > stage, thus the data is not copied back to GPFS, then the 2nd stage > task arrived and consumed this data. > > Key Issue: the 2nd stage task has no idea of where the 1st stage > output data is. > > Design Alternatives: > 1. Data aware task scheduling: > Both swift and falkon need to be data aware. Swift should know > where the output of 1st stage is, which > means, which pset, or say which falkon service. > And the falkon service should know which CN has the data for the > 2nd stage computation. > > 2. Swift patch jobs vertically > Before sending out any jobs, swift knows those 2 stage jobs has > data dependency, thus send out 1 batched > job as 1 to each worker. > > 3. Collective IO > Build a shared file system which could be accessed by all CN, > instead of writing output data to GPFS, workers > copy intermediate output data to this shared ram-disk. And retrieve > the data from IFS. > > Several Concerns: > a) reliability of torus network --- we need to test more about this. > b) performance of torus network --- could this be really performing > better than GPFS? If not, at what scale > could torus perform better than GPFS? > > 4. Half-Collective IO > All workers wirte data to IFS, and the data will be periodically > copied back to GPFS. In this case, we only > optimize the output phase, leave the input phase as is. > > Any other ideas? Thanks so much. > > best wishes > zhangzhao > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Dec 1 16:45:55 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 1 Dec 2008 22:45:55 +0000 (GMT) Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <4934621A.5090109@uchicago.edu> References: <4934621A.5090109@uchicago.edu> Message-ID: On Mon, 1 Dec 2008, Zhao Zhang wrote: > Scenario 1: Say a computation has 2 stages, the 2nd stage would take the > output from the 1st stage > as the input data. > > Data Flow in current swift system: 1st stage will write the output data to > GPFS, where swift knows this > output data is the input for the 2nd stage. Then send the task to on worker on > CN. > > Desired Data Flow: 1st stage of computation knows the output data will be used > as the input for the next > stage, thus the data is not copied back to GPFS, then the 2nd stage task > arrived and consumed this data. > > Key Issue: the 2nd stage task has no idea of where the 1st stage output data > is. > > Design Alternatives: > 1. Data aware task scheduling: > Both swift and falkon need to be data aware. Swift should know where the > output of 1st stage is, which > means, which pset, or say which falkon service. > And the falkon service should know which CN has the data for the 2nd stage > computation. Swift *is* data aware. However it models things at the site level, not at a worker node level. This is true at the moment: > Swift should know where theoutput of 1st stage is, which means, which > pset, or say which falkon service. There was talk before of having some data-affinity in the swift scheduler, which would mean that jobs would prefer (but perhaps not be guaranteed) to run on a site which already had their input data. I don't know if anyone did any coding towards this - I haven't seen an implementation. In the pset = site case, which is how BG/P is being used at the moment, this would at least tend to keep execution on the same site as At the moment, Falkon doesn't know about input and output files for Swift jobs, so can't act on that information to influence its scheduling. > 2. Swift patch jobs vertically > Before sending out any jobs, swift knows those 2 stage jobs has data > dependency, thus send out 1 batched > job as 1 to each worker. VDS had some clustering capability like this. It seems quite interesting to think about. In the multilevel scheduling case, where Swift is scheduling jobs between sites, and Falkon is scheduling jobs within a site, then having falkon able to do some kind of data-affinity scheduling within the site would also be perhaps interesting. Clustering jobs ahead of time is something that can perhaps reduce performance (according to the claims that running through a resource provisioner is better than clustering ahead of time) and doing it dynamically might be interesting. The difference between 1 and 2 above seems similar to the clustering vs. provisioning distinction. > 3. Collective IO > Build a shared file system which could be accessed by all CN, instead of > writing output data to GPFS, workers > copy intermediate output data to this shared ram-disk. And retrieve the data > from IFS. > > Several Concerns: > a) reliability of torus network --- we need to test more about this. > b) performance of torus network --- could this be really performing better > than GPFS? If not, at what scale > could torus perform better than GPFS? As phrased above, this seems a little strange: "rather than use a shared file system, lets build a shared file system and use it." Do you mean building some general purpose posix shared file system? If so, this seems quite hard, and seems directly in competition with PVFS and GPFS, a competition which you are pretty much guaranteed to lose. It may be that you mean something completely different - your concerns about the torus network seem unrelated to writing a posix fs, so I think that may be the case (or maybe you are overspecialising your concerns). -- From iraicu at cs.uchicago.edu Mon Dec 1 16:52:54 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 01 Dec 2008 16:52:54 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228170919.3817.11.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> Message-ID: <49346AC6.6000408@cs.uchicago.edu> Mihael Hategan wrote: > On Mon, 2008-12-01 at 16:15 -0600, Zhao Zhang wrote: > > >> Desired Data Flow: 1st stage of computation knows the output data will >> be used as the input for the next >> stage, thus the data is not copied back to GPFS, then the 2nd stage task >> arrived and consumed this data. >> > > This assumes a sequential workflow (t1 -> t2 ->... -> tn). For anything > more complex, this becomes a nasty scheduling problem. For example: > > (t1, t2) -> t3 > > The outputs of which of t1 or t2 should not be copied back? > > >> Key Issue: the 2nd stage task has no idea of where the 1st stage output >> data is. >> > > I beg to disagree. Swift provides the mechanism to record where data is. > The key issue is that queuing systems don't allow control over the exact > nodes that tasks go to. > Well, Falkon with data diffusion gives you that level of control :) > Another key issue is that you may not even want to do so, because that > node may be better used running a different task (scheduling problem > again). > > >> Design Alternatives: >> 1. Data aware task scheduling: >> Both swift and falkon need to be data aware. Swift should know where >> the output of 1st stage is, which >> means, which pset, or say which falkon service. >> And the falkon service should know which CN has the data for the 2nd >> stage computation. >> >> 2. Swift patch jobs vertically >> Before sending out any jobs, swift knows those 2 stage jobs has data >> dependency, thus send out 1 batched >> job as 1 to each worker. >> >> 3. Collective IO >> Build a shared file system which could be accessed by all CN, instead >> of writing output data to GPFS, workers >> copy intermediate output data to this shared ram-disk. And retrieve >> the data from IFS. >> > > That seems awfully close to implementing a distributed filesystem, which > I think is a fairly bad idea. If you're trying to avoid GPFS contention, > then avoid it by carefully sticking your data in different directories. > And do keep in mind that most operating systems cache filesystem data in > memory, so a read after write of a reasonably small file will be very > fast with any filesystem. > I don't think you realize how expensive GPFS access is when doing so at 100K CPU scale. Simple operations that should take milliseconds take tens of seconds to complete, maybe more. For example, the GPFS locking of writes to a single directory can take 1000s of seconds at only 16K CPU scale... the idea of creating these islands of shared file systems, that are localized to a small portion of the total number of workers, seems like a viable solution to allow more data intensive applications to scale. The problem is how is the CIO expressed in such a way that it works well, reliably, and transparently. We also have to do more measurements to see how much we gain performance wise, for the efforts we are throwing at the problem. Ioan > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Dec 1 16:57:25 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 01 Dec 2008 16:57:25 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: References: <4934621A.5090109@uchicago.edu> Message-ID: <49346BD5.7040300@cs.uchicago.edu> Ben Clifford wrote: > On Mon, 1 Dec 2008, Zhao Zhang wrote: > > > ... >> 3. Collective IO >> Build a shared file system which could be accessed by all CN, instead of >> writing output data to GPFS, workers >> copy intermediate output data to this shared ram-disk. And retrieve the data >> from IFS. >> >> Several Concerns: >> a) reliability of torus network --- we need to test more about this. >> b) performance of torus network --- could this be really performing better >> than GPFS? If not, at what scale >> could torus perform better than GPFS? >> > > As phrased above, this seems a little strange: > > "rather than use a shared file system, lets build a shared file system and > use it." > Zhao meant that instead of using 1 large global shared file system, lets build many smaller shared file systems, that can keep traffic localized within the machine (i.e. per pset) to allow relatively linear scalability with every additional processing power and network bandwidth. Depending at what level we are working at, we might also be able to relax some of the semantics of the POSIX file system, in order to improve performance and scalability, at the expense of other things. Ioan > Do you mean building some general purpose posix shared file system? If so, > this seems quite hard, and seems directly in competition with PVFS and > GPFS, a competition which you are pretty much guaranteed to lose. > > It may be that you mean something completely different - your concerns > about the torus network seem unrelated to writing a posix fs, so I think > that may be the case (or maybe you are overspecialising your concerns). > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Dec 1 17:00:05 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Dec 2008 17:00:05 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: References: <4934621A.5090109@uchicago.edu> Message-ID: <1228172405.4433.2.camel@localhost> On Mon, 2008-12-01 at 22:45 +0000, Ben Clifford wrote: > > Design Alternatives: > > 1. Data aware task scheduling: > > Both swift and falkon need to be data aware. Swift should know where the > > output of 1st stage is, which > > means, which pset, or say which falkon service. > > And the falkon service should know which CN has the data for the 2nd stage > > computation. > > Swift *is* data aware. However it models things at the site level, not at > a worker node level. There's nothing stopping us from giving a certain element of the url path a "node" semantic meaning. > This is true at the moment: > > > Swift should know where theoutput of 1st stage is, which means, which > > pset, or say which falkon service. > > There was talk before of having some data-affinity in the swift scheduler, > which would mean that jobs would prefer (but perhaps not be guaranteed) to > run on a site which already had their input data. I don't know if anyone > did any coding towards this - I haven't seen an implementation. I have some code from Ragib which I have yet to commit to SVN. > > In the pset = site case, which is how BG/P is being used at the moment, > this would at least tend to keep execution on the same site as > > At the moment, Falkon doesn't know about input and output files for Swift > jobs, so can't act on that information to influence its scheduling. > > > > 2. Swift patch jobs vertically > > Before sending out any jobs, swift knows those 2 stage jobs has data > > dependency, thus send out 1 batched > > job as 1 to each worker. > > VDS had some clustering capability like this. It seems quite interesting > to think about. VDS did full graph scheduling, unlike Swift. From hategan at mcs.anl.gov Mon Dec 1 17:04:07 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Dec 2008 17:04:07 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <49346AC6.6000408@cs.uchicago.edu> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> Message-ID: <1228172647.4433.7.camel@localhost> On Mon, 2008-12-01 at 16:52 -0600, Ioan Raicu wrote: > > I beg to disagree. Swift provides the mechanism to record where data is. > > The key issue is that queuing systems don't allow control over the exact > > nodes that tasks go to. > > > Well, Falkon with data diffusion gives you that level of control :) And if the Swift team decides to drop anything else besides Falkon, then this is even a viable alternative. > > > I don't think you realize how expensive GPFS access is when doing so > at 100K CPU scale. I don't think I understand what you mean by "access". As I said, things that generate contention are going to be slow. If the problem requires that contention to happen, then it doesn't matter what the solution is. If it does not, then I suspect that there is a way to avoid contention in GPFS, too (sticking things in different directories). From iraicu at cs.uchicago.edu Mon Dec 1 17:10:03 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 01 Dec 2008 17:10:03 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228172647.4433.7.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> Message-ID: <49346ECB.8040706@cs.uchicago.edu> Mihael Hategan wrote: > On Mon, 2008-12-01 at 16:52 -0600, Ioan Raicu wrote: > > > ... > > >>> >>> >> I don't think you realize how expensive GPFS access is when doing so >> at 100K CPU scale. >> > > I don't think I understand what you mean by "access". As I said, things > that generate contention are going to be slow. > > If the problem requires that contention to happen, then it doesn't > matter what the solution is. If it does not, then I suspect that there > is a way to avoid contention in GPFS, too (sticking things in different > directories). > The basic idea is that many smaller shared file systems will scale better than 1 large file system, as the contention is localized. The problem is that having 1 global namespace is simple and straight forward, but having N local namespaces is not, and requires extra management. If we try to beat GPFS by implementing a global shared/parallel file system, we are likely to fail, as Ben or you already mentioned... but by changing the landscape a bit by splitting the single large global space into many smaller spaces, we should gain scalability and performance larger scales. Ioan > > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Dec 1 17:43:36 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Dec 2008 17:43:36 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <49346ECB.8040706@cs.uchicago.edu> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> Message-ID: <1228175016.5031.6.camel@localhost> On Mon, 2008-12-01 at 17:10 -0600, Ioan Raicu wrote: > > > Mihael Hategan wrote: > > On Mon, 2008-12-01 at 16:52 -0600, Ioan Raicu wrote: > > > > > > ... > > > > > > > I don't think you realize how expensive GPFS access is when doing so > > > at 100K CPU scale. > > > > > > > I don't think I understand what you mean by "access". As I said, things > > that generate contention are going to be slow. > > > > If the problem requires that contention to happen, then it doesn't > > matter what the solution is. If it does not, then I suspect that there > > is a way to avoid contention in GPFS, too (sticking things in different > > directories). > > > The basic idea is that many smaller shared file systems will scale > better than 1 large file system, as the contention is localized. Which is the same behaviour you get if you have a hierarchy of directories. This is what Ben implemented in Swift. > The problem is that having 1 global namespace is simple and straight > forward, but having N local namespaces is not, and requires extra > management. Right. That's why most filesystems I know of treat directories as independent files containing file metadata (aka. "local namespaces"). From foster at anl.gov Mon Dec 1 21:31:58 2008 From: foster at anl.gov (Ian Foster) Date: Mon, 1 Dec 2008 21:31:58 -0600 Subject: [Swift-devel] Functional building blocks as concurrency patterns In-Reply-To: References: Message-ID: <60F4A4E9-D92F-4215-BE01-C7CBFF758A05@anl.gov> Ben: I've often wondered about developing a mapper that can deal with infinite/unbounded streams. Your note reminds me of how I keep wondering whether we should be doing one of two things: a) Adopting/adapting some existing functional scripting system to leverage their investment in compilers, etc.--an investment that is for us hard to sustain. b) Moving away from a language to a library, which will be clunkier and more error prone, but also avoids us having to maintain compilers, etc. Regarding nondeterminism--do we have use cases for this? Ian. On Nov 29, 2008, at 4:45 PM, Ben Clifford wrote: > > This just appeared on lambda-the-ultimate: > > http://lambda-the-ultimate.org/node/3108 > >> While teaching INGI1131, my concurrent programming course, I have >> become >> even more impressed by a concurrent paradigm, namely functional >> programming extended with threads and ports, which I call multi-agent >> dataflow programming. > > Some ofthe concepts overlap or are closely related to SwiftScript > and so > some of the discussion on that post is interesting to skim. > > Some differences are that: > > * we don't have infinite/unbounded streams of data because of the > way that > our mappers work at the moment (though it conceptually fits in with > SwiftScript); and > > * we don't have ports (or some other way of introducing non- > determininism > in the language itself) - we do have something similar at the > execution > layer, though, in the form of job replication, where we allow two job > submissions to race to start execution and only keep the one that > starts > executing (some day it might be desirable to extend this to the one > that > finishes first or some other more complex expression, but I don't > see need > at the moment) > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From iraicu at cs.uchicago.edu Mon Dec 1 21:32:24 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 01 Dec 2008 21:32:24 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228175016.5031.6.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> Message-ID: <4934AC48.5080909@cs.uchicago.edu> But its not just about directories and GPFS locking.... its about 8 or 16 large servers with 10Gb/s network connectivity (as is the case for GPFS) compared to potentially 40K servers, each with 1Gb/s connectivity (as would be the case in our example). The potential raw throughput of the later case, when we use all 40K nodes as servers to the file system, is orders of magnitude larger than a static configuration with 8 or 16 servers. Its not yet clear we can actually achieve anything close to the upper bound of performance at full scale, but it should be obvious that the performance characteristics will be quite different between GPFS and CIO. Ioan Mihael Hategan wrote: > On Mon, 2008-12-01 at 17:10 -0600, Ioan Raicu wrote: > >> Mihael Hategan wrote: >> >>> On Mon, 2008-12-01 at 16:52 -0600, Ioan Raicu wrote: >>> >>> >>> ... >>> >>> >>> >>>> I don't think you realize how expensive GPFS access is when doing so >>>> at 100K CPU scale. >>>> >>>> >>> I don't think I understand what you mean by "access". As I said, things >>> that generate contention are going to be slow. >>> >>> If the problem requires that contention to happen, then it doesn't >>> matter what the solution is. If it does not, then I suspect that there >>> is a way to avoid contention in GPFS, too (sticking things in different >>> directories). >>> >>> >> The basic idea is that many smaller shared file systems will scale >> better than 1 large file system, as the contention is localized. >> > > Which is the same behaviour you get if you have a hierarchy of > directories. This is what Ben implemented in Swift. > > >> The problem is that having 1 global namespace is simple and straight >> forward, but having N local namespaces is not, and requires extra >> management. >> > > Right. That's why most filesystems I know of treat directories as > independent files containing file metadata (aka. "local namespaces"). > > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Dec 1 21:44:46 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Dec 2008 21:44:46 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <4934AC48.5080909@cs.uchicago.edu> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> Message-ID: <1228189486.8633.5.camel@localhost> On Mon, 2008-12-01 at 21:32 -0600, Ioan Raicu wrote: > But its not just about directories and GPFS locking.... its about 8 or > 16 large servers with 10Gb/s network connectivity (as is the case for > GPFS) compared to potentially 40K servers, each with 1Gb/s > connectivity (as would be the case in our example). PVFS? From foster at anl.gov Mon Dec 1 21:43:28 2008 From: foster at anl.gov (Ian Foster) Date: Mon, 1 Dec 2008 21:43:28 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <4934AC48.5080909@cs.uchicago.edu> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> Message-ID: Dear All: I am finding it hard to sort through this chain of emails, but I wanted to make a couple of points. Zhao, Allan, Ioan, et al., have demonstrated considerable benefits from applying two methods to Swift-like workloads on the BG/P: a) "Storage hierarchy": the use of federated per-node storage (RAM on BG/P, could be local disk on other systems) as an "intermediate file system" layer in the storage hierarchy between the ultra-fast but low- capacity local storage and the high-capacity but slower GPFS. b) "Collective I/O": improving performance between intermediate file system and GPFS by aggregating many small operations into fewer large operations. These are both well-known, extensively studied, and proven methods. Furthermore, we have some nice performance data that allows us to quantify their benefits in our specific situation. Perhaps it would be worth looking at the methods from that perspective. Ian. On Dec 1, 2008, at 9:32 PM, Ioan Raicu wrote: > But its not just about directories and GPFS locking.... its about 8 > or 16 large servers with 10Gb/s network connectivity (as is the case > for GPFS) compared to potentially 40K servers, each with 1Gb/s > connectivity (as would be the case in our example). The potential > raw throughput of the later case, when we use all 40K nodes as > servers to the file system, is orders of magnitude larger than a > static configuration with 8 or 16 servers. Its not yet clear we can > actually achieve anything close to the upper bound of performance at > full scale, but it should be obvious that the performance > characteristics will be quite different between GPFS and CIO. > > Ioan > > Mihael Hategan wrote: >> >> On Mon, 2008-12-01 at 17:10 -0600, Ioan Raicu wrote: >> >>> Mihael Hategan wrote: >>> >>>> On Mon, 2008-12-01 at 16:52 -0600, Ioan Raicu wrote: >>>> >>>> >>>> ... >>>> >>>> >>>> >>>>> I don't think you realize how expensive GPFS access is when >>>>> doing so >>>>> at 100K CPU scale. >>>>> >>>>> >>>> I don't think I understand what you mean by "access". As I said, >>>> things >>>> that generate contention are going to be slow. >>>> >>>> If the problem requires that contention to happen, then it doesn't >>>> matter what the solution is. If it does not, then I suspect that >>>> there >>>> is a way to avoid contention in GPFS, too (sticking things in >>>> different >>>> directories). >>>> >>>> >>> The basic idea is that many smaller shared file systems will scale >>> better than 1 large file system, as the contention is localized. >>> >> Which is the same behaviour you get if you have a hierarchy of >> directories. This is what Ben implemented in Swift. >> >> >>> The problem is that having 1 global namespace is simple and >>> straight >>> forward, but having N local namespaces is not, and requires extra >>> management. >>> >> Right. That's why most filesystems I know of treat directories as >> independent files containing file metadata (aka. "local namespaces"). >> >> >> > > -- > =================================================== > Ioan Raicu > Ph.D. Candidate > =================================================== > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > =================================================== > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dev.globus.org/wiki/Incubator/Falkon > http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page > =================================================== > =================================================== > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Dec 1 23:03:43 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 01 Dec 2008 23:03:43 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> Message-ID: <1228194223.9335.29.camel@localhost> On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote: > Dear All: > > b) "Collective I/O": improving performance between intermediate file > system and GPFS by aggregating many small operations into fewer large > operations. > This is a part that I'm having trouble understanding. The paper mentions distributing data to different directories (in 6.2.), but not whether the experiment was done with that or not. Are the measurements taken with applications writing data to the same directory or a different directory for each application/node or was the whole thing done with Swift? From zhaozhang at uchicago.edu Mon Dec 1 23:24:36 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 01 Dec 2008 23:24:36 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228194223.9335.29.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> <1228194223.9335.29.camel@localhost> Message-ID: <4934C694.9030908@uchicago.edu> Hi, Mihael I think the attached graph could answer your question. All the tests were run 2 racks, 8k cores, with 8192 jobs. Each file created by the test is 1KB. 1_DIR_5_FILE means, all 8192 cores are writing 5 files to 1 dir on GPFS, in this test, within 300 seconds only 31 jobs returned successful. 32_DIR_5_FILE , all 8192 cores are writing 5 files to the unique directory for IO node on GPFS. 8192 jobs took 91.026 seconds 1000_DIR_5_FILE , all 8192 cores are writing 5 files to 1000 hierarchical directories on GPFS. 8192 jobs took 81.555 seconds 32_DIR_1_FILE , by batching the 5 output files, each core is wring one tarball to the directory unique for each IO node on GPFS, 8192 jobs took 23.616 seconds CIO_5_FILE , with CIO, each core write 5 files to IFS, 8192 jobs took 12.007 seconds. Then we could tell 32_DIR_5_FILE doesn't slow down the performance much comparing with 1000_DIR_5_FILE. And in this test case, each task is writing 5 files, and in the real case for CIO each IO node will write one tar ball at a time. So the performances of the two should be more closer. So, in CIO we use a unique directory for one IO node(keep in mind, each IO node has 256 workers). For the GPFS test case in the paper, we use the fixed number of 10x1000 hierarchical directories for output. Does the above thing make the question clear? best wishes zhangzhao Mihael Hategan wrote: > On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote: > >> Dear All: >> >> b) "Collective I/O": improving performance between intermediate file >> system and GPFS by aggregating many small operations into fewer large >> operations. >> >> > > This is a part that I'm having trouble understanding. > > The paper mentions distributing data to different directories (in 6.2.), > but not whether the experiment was done with that or not. > Are the measurements taken with applications writing data to the same > directory or a different directory for each application/node or was the > whole thing done with Swift? > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: dir.JPG Type: image/jpeg Size: 46601 bytes Desc: not available URL: From benc at hawaga.org.uk Tue Dec 2 07:45:15 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Dec 2008 13:45:15 +0000 (GMT) Subject: [Swift-devel] Functional building blocks as concurrency patterns In-Reply-To: <60F4A4E9-D92F-4215-BE01-C7CBFF758A05@anl.gov> References: <60F4A4E9-D92F-4215-BE01-C7CBFF758A05@anl.gov> Message-ID: On Mon, 1 Dec 2008, Ian Foster wrote: > Regarding nondeterminism--do we have use cases for this? In the past someone (Tibi?) has considered running a large number of some process, but rather than waiting for them all to complete, instead when some number less than the total have been completed, passing those on to the next step for processing. I don't know what the actual application for that was, though. -- From hategan at mcs.anl.gov Tue Dec 2 10:24:39 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Dec 2008 10:24:39 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <4934C694.9030908@uchicago.edu> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> <1228194223.9335.29.camel@localhost> <4934C694.9030908@uchicago.edu> Message-ID: <1228235079.10786.2.camel@localhost> On Mon, 2008-12-01 at 23:24 -0600, Zhao Zhang wrote: > Hi, Mihael > > I think the attached graph could answer your question. Not really. ?Is there a test with 8192 pre-created directories? > > All the tests were run 2 racks, 8k cores, with 8192 jobs. Each file > created by the test is 1KB. > > 1_DIR_5_FILE means, all 8192 cores are writing 5 files to 1 dir on > GPFS, in this test, within 300 seconds only 31 jobs returned successful. > 32_DIR_5_FILE , all 8192 cores are writing 5 files to the unique > directory for IO node on GPFS. 8192 jobs took 91.026 seconds > 1000_DIR_5_FILE , all 8192 cores are writing 5 files to 1000 > hierarchical directories on GPFS. 8192 jobs took 81.555 seconds > 32_DIR_1_FILE , by batching the 5 output files, each core is wring one > tarball to the directory unique for each IO node on GPFS, 8192 jobs took > 23.616 seconds > CIO_5_FILE , with CIO, each core write 5 files to IFS, 8192 jobs took > 12.007 seconds. > > > Then we could tell 32_DIR_5_FILE doesn't slow down the performance much > comparing with > 1000_DIR_5_FILE. And in this test case, each task is writing 5 files, > and in the real case for CIO > each IO node will write one tar ball at a time. So the performances of > the two should be more closer. > > So, in CIO we use a unique directory for one IO node(keep in mind, each > IO node has 256 workers). > For the GPFS test case in the paper, we use the fixed number of 10x1000 > hierarchical directories for output. > > Does the above thing make the question clear? > > best wishes > zhangzhao > > Mihael Hategan wrote: > > On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote: > > > >> Dear All: > >> > >> b) "Collective I/O": improving performance between intermediate file > >> system and GPFS by aggregating many small operations into fewer large > >> operations. > >> > >> > > > > This is a part that I'm having trouble understanding. > > > > The paper mentions distributing data to different directories (in 6.2.), > > but not whether the experiment was done with that or not. > > Are the measurements taken with applications writing data to the same > > directory or a different directory for each application/node or was the > > whole thing done with Swift? > > > > > > > > From zhaozhang at uchicago.edu Tue Dec 2 10:26:31 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Tue, 02 Dec 2008 10:26:31 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228235079.10786.2.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> <1228194223.9335.29.camel@localhost> <4934C694.9030908@uchicago.edu> <1228235079.10786.2.camel@localhost> Message-ID: <493561B7.8030307@uchicago.edu> Mihael Hategan wrote: > On Mon, 2008-12-01 at 23:24 -0600, Zhao Zhang wrote: > >> Hi, Mihael >> >> I think the attached graph could answer your question. >> > > Not really. ?Is there a test with 8192 pre-created directories? > nope, why do you think there is 8192 pre-created directories for 2 rack test? The case is not one unique dir for each worker, but each dir for one IO node for the CIO test. zhao > >> All the tests were run 2 racks, 8k cores, with 8192 jobs. Each file >> created by the test is 1KB. >> >> 1_DIR_5_FILE means, all 8192 cores are writing 5 files to 1 dir on >> GPFS, in this test, within 300 seconds only 31 jobs returned successful. >> 32_DIR_5_FILE , all 8192 cores are writing 5 files to the unique >> directory for IO node on GPFS. 8192 jobs took 91.026 seconds >> 1000_DIR_5_FILE , all 8192 cores are writing 5 files to 1000 >> hierarchical directories on GPFS. 8192 jobs took 81.555 seconds >> 32_DIR_1_FILE , by batching the 5 output files, each core is wring one >> tarball to the directory unique for each IO node on GPFS, 8192 jobs took >> 23.616 seconds >> CIO_5_FILE , with CIO, each core write 5 files to IFS, 8192 jobs took >> 12.007 seconds. >> >> >> Then we could tell 32_DIR_5_FILE doesn't slow down the performance much >> comparing with >> 1000_DIR_5_FILE. And in this test case, each task is writing 5 files, >> and in the real case for CIO >> each IO node will write one tar ball at a time. So the performances of >> the two should be more closer. >> >> So, in CIO we use a unique directory for one IO node(keep in mind, each >> IO node has 256 workers). >> For the GPFS test case in the paper, we use the fixed number of 10x1000 >> hierarchical directories for output. >> >> Does the above thing make the question clear? >> >> best wishes >> zhangzhao >> >> Mihael Hategan wrote: >> >>> On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote: >>> >>> >>>> Dear All: >>>> >>>> b) "Collective I/O": improving performance between intermediate file >>>> system and GPFS by aggregating many small operations into fewer large >>>> operations. >>>> >>>> >>>> >>> This is a part that I'm having trouble understanding. >>> >>> The paper mentions distributing data to different directories (in 6.2.), >>> but not whether the experiment was done with that or not. >>> Are the measurements taken with applications writing data to the same >>> directory or a different directory for each application/node or was the >>> whole thing done with Swift? >>> >>> >>> >>> >>> > > > From hategan at mcs.anl.gov Tue Dec 2 10:39:32 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Dec 2008 10:39:32 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <493561B7.8030307@uchicago.edu> References: <4934621A.5090109@uchicago.edu> <1228170919.3817.11.camel@localhost> <49346AC6.6000408@cs.uchicago.edu> <1228172647.4433.7.camel@localhost> <49346ECB.8040706@cs.uchicago.edu> <1228175016.5031.6.camel@localhost> <4934AC48.5080909@cs.uchicago.edu> <1228194223.9335.29.camel@localhost> <4934C694.9030908@uchicago.edu> <1228235079.10786.2.camel@localhost> <493561B7.8030307@uchicago.edu> Message-ID: <1228235972.10786.6.camel@localhost> On Tue, 2008-12-02 at 10:26 -0600, Zhao Zhang wrote: > > Mihael Hategan wrote: > > On Mon, 2008-12-01 at 23:24 -0600, Zhao Zhang wrote: > > > >> Hi, Mihael > >> > >> I think the attached graph could answer your question. > >> > > > > Not really. ?Is there a test with 8192 pre-created directories? > > > nope, why do you think there is 8192 pre-created directories for 2 rack > test? I don't. It's something you would do. > The case is not one unique dir for each worker, but each dir for > one IO node > for the CIO test. > > zhao > > > >> All the tests were run 2 racks, 8k cores, with 8192 jobs. Each file > >> created by the test is 1KB. > >> > >> 1_DIR_5_FILE means, all 8192 cores are writing 5 files to 1 dir on > >> GPFS, in this test, within 300 seconds only 31 jobs returned successful. > >> 32_DIR_5_FILE , all 8192 cores are writing 5 files to the unique > >> directory for IO node on GPFS. 8192 jobs took 91.026 seconds > >> 1000_DIR_5_FILE , all 8192 cores are writing 5 files to 1000 > >> hierarchical directories on GPFS. 8192 jobs took 81.555 seconds > >> 32_DIR_1_FILE , by batching the 5 output files, each core is wring one > >> tarball to the directory unique for each IO node on GPFS, 8192 jobs took > >> 23.616 seconds > >> CIO_5_FILE , with CIO, each core write 5 files to IFS, 8192 jobs took > >> 12.007 seconds. > >> > >> > >> Then we could tell 32_DIR_5_FILE doesn't slow down the performance much > >> comparing with > >> 1000_DIR_5_FILE. And in this test case, each task is writing 5 files, > >> and in the real case for CIO > >> each IO node will write one tar ball at a time. So the performances of > >> the two should be more closer. > >> > >> So, in CIO we use a unique directory for one IO node(keep in mind, each > >> IO node has 256 workers). > >> For the GPFS test case in the paper, we use the fixed number of 10x1000 > >> hierarchical directories for output. > >> > >> Does the above thing make the question clear? > >> > >> best wishes > >> zhangzhao > >> > >> Mihael Hategan wrote: > >> > >>> On Mon, 2008-12-01 at 21:43 -0600, Ian Foster wrote: > >>> > >>> > >>>> Dear All: > >>>> > >>>> b) "Collective I/O": improving performance between intermediate file > >>>> system and GPFS by aggregating many small operations into fewer large > >>>> operations. > >>>> > >>>> > >>>> > >>> This is a part that I'm having trouble understanding. > >>> > >>> The paper mentions distributing data to different directories (in 6.2.), > >>> but not whether the experiment was done with that or not. > >>> Are the measurements taken with applications writing data to the same > >>> directory or a different directory for each application/node or was the > >>> whole thing done with Swift? > >>> > >>> > >>> > >>> > >>> > > > > > > From hategan at mcs.anl.gov Tue Dec 2 12:01:53 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Dec 2008 12:01:53 -0600 Subject: [Swift-devel] Functional building blocks as concurrency patterns In-Reply-To: <60F4A4E9-D92F-4215-BE01-C7CBFF758A05@anl.gov> References: <60F4A4E9-D92F-4215-BE01-C7CBFF758A05@anl.gov> Message-ID: <1228240913.10786.17.camel@localhost> On Mon, 2008-12-01 at 21:31 -0600, Ian Foster wrote: > Ben: > > I've often wondered about developing a mapper that can deal with > infinite/unbounded streams. I think we talked about this two years ago, but it wasn't a priority then and it got forgotten. From benc at hawaga.org.uk Tue Dec 2 13:55:01 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Dec 2008 19:55:01 +0000 (GMT) Subject: [Swift-devel] a couple of file staging management ideas in the next 3 months Message-ID: Here are a couple of ideas that might be useful to implement over the next three months. This is intended to be read in context of two previous notes documenting what I think is the abstract model that SwiftScript provides, and the present Swift implementation of that. These notes refer to changes that could reasonably be made to the production Swift in the next ~3 months, to both facilitate the experimental side of things (i.e. Falkon) and to broaden the range of sites which Swift could be used on in production (i.e. gLite). Specifically, they are not intended discuss approaches to architecting Falkon, other than to discuss how approaches in Falkon might be coherently interfaced to Swift. Two distinct approaches (which are complimentary and could both implemented) seem feasible to implement in the next two months and interesting at the Swift layer (in as much as they facilitate ongoing work): * change wrapper.sh to access the shared filesystem in various selectable/configurable/pluggable ways; keep everything else the same. So, this changes the requirement that wrapper.sh have posix access to the site cache into a requirement that there is command-line scriptable access to the site cache. This would facilitate gLite's model of no posix access to the site storage system without radically changing architecture. Some OSG users have suggested that this is desirable on OSG. * Fiddle with the abstraction at the execute2 layer execute2 is a routine in the Swift library that selects a site, and then causes stagein to site shared directory, execution on the remote site, and stageout from that site to occur. This layer could be adjusted so the sequence: stagein/execute/stageout is pluggable. By default the exist implementation would be used; but it would allow other code to be plugged in that could replace both the existing client site stagein/execute/stageout code and the remote wrapper.sh code. This would facilitate running in environments where the same underlying layer wants to do both data management and execution management (such as falkon) In the falkon case, what is now provider-deef would plug in at this level and offload all staging responsibility from core Swift. -- From hategan at mcs.anl.gov Tue Dec 2 14:48:19 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Dec 2008 14:48:19 -0600 Subject: [Swift-devel] a couple of file staging management ideas in the next 3 months In-Reply-To: References: Message-ID: <1228250899.18726.9.camel@localhost> I'm a bit fuzzy on the kind of generic changes that would facilitate the move to Falkon. I'd rather like to see the concrete things that are needed. On Tue, 2008-12-02 at 19:55 +0000, Ben Clifford wrote: > Here are a couple of ideas that might be useful to implement over the next > three months. > > This is intended to be read in context of two previous notes documenting > what I think is the abstract model that SwiftScript provides, and the > present Swift implementation of that. > > These notes refer to changes that could reasonably be made to the > production Swift in the next ~3 months, to both facilitate the > experimental side of things (i.e. Falkon) and to broaden the range of > sites which Swift could be used on in production (i.e. gLite). > > Specifically, they are not intended discuss approaches to architecting > Falkon, other than to discuss how approaches in Falkon might be coherently > interfaced to Swift. > > Two distinct approaches (which are complimentary and could both > implemented) seem feasible to implement in the next two months and > interesting at the Swift layer (in as much as they facilitate ongoing > work): > > * change wrapper.sh to access the shared filesystem in various > selectable/configurable/pluggable ways; keep everything else the same. > > So, this changes the requirement that wrapper.sh have posix access to > the site cache into a requirement that there is command-line scriptable > access to the site cache. > > This would facilitate gLite's model of no posix access to the site > storage system without radically changing architecture. Some OSG users > have suggested that this is desirable on OSG. > > * Fiddle with the abstraction at the execute2 layer > > execute2 is a routine in the Swift library that selects a site, > and then causes stagein to site shared directory, execution on the > remote site, and stageout from that site to occur. > > This layer could be adjusted so the sequence: stagein/execute/stageout > is pluggable. By default the exist implementation would be used; but it > would allow other code to be plugged in that could replace both the > existing client site stagein/execute/stageout code and the remote > wrapper.sh code. This would facilitate running in environments where > the same underlying layer wants to do both data management and > execution management (such as falkon) > > In the falkon case, what is now provider-deef would plug in at this > level and offload all staging responsibility from core Swift. > From iraicu at cs.uchicago.edu Tue Dec 2 14:53:37 2008 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 02 Dec 2008 14:53:37 -0600 Subject: [Swift-devel] a couple of file staging management ideas in the next 3 months In-Reply-To: <1228250899.18726.9.camel@localhost> References: <1228250899.18726.9.camel@localhost> Message-ID: <4935A051.6020707@cs.uchicago.edu> Ben and I are meeting on Thursday to hopefully flesh out some of these concrete things. We'll send out a follow-up message then. Ioan Mihael Hategan wrote: > I'm a bit fuzzy on the kind of generic changes that would facilitate the > move to Falkon. I'd rather like to see the concrete things that are > needed. > > On Tue, 2008-12-02 at 19:55 +0000, Ben Clifford wrote: > >> Here are a couple of ideas that might be useful to implement over the next >> three months. >> >> This is intended to be read in context of two previous notes documenting >> what I think is the abstract model that SwiftScript provides, and the >> present Swift implementation of that. >> >> These notes refer to changes that could reasonably be made to the >> production Swift in the next ~3 months, to both facilitate the >> experimental side of things (i.e. Falkon) and to broaden the range of >> sites which Swift could be used on in production (i.e. gLite). >> >> Specifically, they are not intended discuss approaches to architecting >> Falkon, other than to discuss how approaches in Falkon might be coherently >> interfaced to Swift. >> >> Two distinct approaches (which are complimentary and could both >> implemented) seem feasible to implement in the next two months and >> interesting at the Swift layer (in as much as they facilitate ongoing >> work): >> >> * change wrapper.sh to access the shared filesystem in various >> selectable/configurable/pluggable ways; keep everything else the same. >> >> So, this changes the requirement that wrapper.sh have posix access to >> the site cache into a requirement that there is command-line scriptable >> access to the site cache. >> >> This would facilitate gLite's model of no posix access to the site >> storage system without radically changing architecture. Some OSG users >> have suggested that this is desirable on OSG. >> >> * Fiddle with the abstraction at the execute2 layer >> >> execute2 is a routine in the Swift library that selects a site, >> and then causes stagein to site shared directory, execution on the >> remote site, and stageout from that site to occur. >> >> This layer could be adjusted so the sequence: stagein/execute/stageout >> is pluggable. By default the exist implementation would be used; but it >> would allow other code to be plugged in that could replace both the >> existing client site stagein/execute/stageout code and the remote >> wrapper.sh code. This would facilitate running in environments where >> the same underlying layer wants to do both data management and >> execution management (such as falkon) >> >> In the falkon case, what is now provider-deef would plug in at this >> level and offload all staging responsibility from core Swift. >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- =================================================== Ioan Raicu Ph.D. Candidate =================================================== Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 =================================================== Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dev.globus.org/wiki/Incubator/Falkon http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page =================================================== =================================================== -------------- next part -------------- An HTML attachment was scrubbed... URL: From foster at cs.uchicago.edu Tue Dec 2 14:48:30 2008 From: foster at cs.uchicago.edu (Ian Foster) Date: Tue, 2 Dec 2008 14:48:30 -0600 Subject: [Swift-devel] Fwd: [Faculty] 12/5 at 3:30: Workshop on Language, Cognition, and Computation: STEVEN SMALL References: <493577FE.5080500@cs.uchicago.edu> Message-ID: <3DB19D96-F289-421B-A6A3-ED11FD990508@cs.uchicago.edu> Begin forwarded message: > From: Morgan Sonderegger > Date: December 2, 2008 12:01:34 PM CST > To: uclinguist at listhost.uchicago.edu, ling- > grads at listhost.uchicago.edu, phd-students at mailman.cs.uchicago.edu, faculty at mailman.cs.uchicago.edu > , psych-faculty at listhost.uchicago.edu, psych-grad at listhost.uchicago.edu > , psych-friends at listhost.uchicago.edu, psych-postdoc at listhost.uchicago.edu > Subject: [Faculty] 12/5 at 3:30: Workshop on Language, Cognition, > and Computation: STEVEN SMALL > > Workshop on Language, Cognition, and Computation > Sponsored by the Council on Advanced Studies > > We are pleased to announce a talk by: > > STEVEN SMALL > (Neuroscience, University of Chicago) > > Friday, December 5, at 3:30pm, in the Karen Landahl Center (basement > of > Social Science) > > This is the fourth meeting of a year-long workshop series on Language, > Cognition, and Computation, with a particular focus on language > learning > and language change. Meetings are held on Fridays 3-4 times monthly. > Each meeting features a presentation, followed by discussion. > > We have a number of exciting speakers lined up for this series, both > local and from abroad. The current list and meeting schedule, together > with further information about the workshop, can be found at: > http://cas.uchicago.edu/workshops/language/ > > If you are interested in presenting recent research as part of this > series, please email one of us (listed below). Graduate students of > any > department are especially invited! > > Any persons with a disability who believe they might need assistance > in > attending the workshops are asked to contact Morgan Sonderegger in > advance at morgan at cs.uchicago.edu. > > Abstract: > ------ > > "The Biology of Face-to-Face Communication: Action Understanding and > Language" > Steven L. Small, The University of Chicago > > An important source of information for language comprehension comes > from > the perception of action, including the movements of the mouth and > hands. The neural interactions involved in processing this information > involve the premotor cortex, the inferior parietal lobule, and the > superior temporal gyrus. These regions and the neural connections > among > them comprise a human system for observation-execution matching that > appears to have a phylogenetic basis in the "mirror neuron" system of > the macaque. It appears that this system operates in part by covert > simulation of perceived action. In this talk, we present data from > several studies of audiovisual language comprehension that support > this > thesis. First we discuss the role of action understanding in speech > perception, and show how it aids phonological disambiguation across > environmental and contextual variation, and that the motor cortex > plays > a fundamental role in the process. We also show evidence for the > existence of abstract neural codes for speech percepts that are > independent of their auditory or visual components. In the second part > of the talk, we discuss the role of action understanding in higher > order > language comprehension. We conclude that the process of understanding > language involves multimodal sensory processing, motor simulation, and > processing of derived abstract representations, which collectively > form > a distributed circuit encoding comprehension. > > > > > Sincerely, > -- > Morgan Sonderegger, Student Coordinator, Computer Science > Jason Riggle, Faculty Co-Sponsor, Linguistics > Alan Yu, Faculty Co-Sponsor, Linguistics > Partha Niyogi, Faculty Co-Sponsor, Computer Science > _______________________________________________ > faculty mailing list - faculty at mailman.cs.uchicago.edu > https://mailman.cs.uchicago.edu/mailman/listinfo/faculty > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Dec 2 14:58:13 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 2 Dec 2008 20:58:13 +0000 (GMT) Subject: [Swift-devel] a couple of file staging management ideas in the next 3 months In-Reply-To: <4935A051.6020707@cs.uchicago.edu> References: <1228250899.18726.9.camel@localhost> <4935A051.6020707@cs.uchicago.edu> Message-ID: On Tue, 2 Dec 2008, Ioan Raicu wrote: > Ben and I are meeting on Thursday to hopefully flesh out some of these > concrete things. We'll send out a follow-up message then. I think probably in the IO generalization portion of the swift meeting we can talk about API layer issues (*not* in-depth falkon implementations and the relative validity thereof). -- From hategan at mcs.anl.gov Tue Dec 2 15:28:04 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Dec 2008 15:28:04 -0600 Subject: [Swift-devel] a couple of file staging management ideas in the next 3 months In-Reply-To: References: <1228250899.18726.9.camel@localhost> <4935A051.6020707@cs.uchicago.edu> Message-ID: <1228253284.19658.0.camel@localhost> On Tue, 2008-12-02 at 20:58 +0000, Ben Clifford wrote: > On Tue, 2 Dec 2008, Ioan Raicu wrote: > > > Ben and I are meeting on Thursday to hopefully flesh out some of these > > concrete things. We'll send out a follow-up message then. > > I think probably in the IO generalization portion of the swift meeting we > can talk about API layer issues (*not* in-depth falkon implementations and > the relative validity thereof). > That is a good point I should keep in mind in general. From hategan at mcs.anl.gov Tue Dec 2 17:29:28 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 02 Dec 2008 17:29:28 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: References: <4934621A.5090109@uchicago.edu> Message-ID: <1228260568.19725.4.camel@localhost> On Mon, 2008-12-01 at 22:45 +0000, Ben Clifford wrote: > Do you mean building some general purpose posix shared file system? If so, > this seems quite hard, and seems directly in competition with PVFS and > GPFS, a competition which you are pretty much guaranteed to lose. I thought about that a bit. Perhaps it's not such a bad idea in principle. On such large yet tightly coupled systems it may be interesting to explore a p2p filesystem. From aespinosa at cs.uchicago.edu Tue Dec 2 17:34:31 2008 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 2 Dec 2008 17:34:31 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228260568.19725.4.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228260568.19725.4.camel@localhost> Message-ID: <20081202233430.GA4799@shanti.cs.uchicago.edu> Unless the filesystem is easy to deploy (like in the userspace level). Parrot [1] supports mounting remote gridftp, http and chirp (their own fs). I tested this in the early part of deploying the CIO system. [1] http://www.cse.nd.edu/~ccl/software/parrot/ On Tue, Dec 02, 2008 at 05:29:28PM -0600, Mihael Hategan wrote: > On Mon, 2008-12-01 at 22:45 +0000, Ben Clifford wrote: > > > Do you mean building some general purpose posix shared file system? If so, > > this seems quite hard, and seems directly in competition with PVFS and > > GPFS, a competition which you are pretty much guaranteed to lose. > > I thought about that a bit. Perhaps it's not such a bad idea in > principle. On such large yet tightly coupled systems it may be > interesting to explore a p2p filesystem. From foster at anl.gov Tue Dec 2 19:07:27 2008 From: foster at anl.gov (Ian Foster) Date: Tue, 2 Dec 2008 19:07:27 -0600 Subject: [Swift-devel] several alternatives to design the data management system for Swift on SuperComputers In-Reply-To: <1228260568.19725.4.camel@localhost> References: <4934621A.5090109@uchicago.edu> <1228260568.19725.4.camel@localhost> Message-ID: <2D65DC17-AF8D-4DA4-9492-086295164534@anl.gov> We should definitely be talking with Rob Ross and the PVFS people about this. Also with the other groups that we are already in close contact with, e.g., Doug Thain and Matei Ripeanu. On Dec 2, 2008, at 5:29 PM, Mihael Hategan wrote: > On Mon, 2008-12-01 at 22:45 +0000, Ben Clifford wrote: > >> Do you mean building some general purpose posix shared file system? >> If so, >> this seems quite hard, and seems directly in competition with PVFS >> and >> GPFS, a competition which you are pretty much guaranteed to lose. > > I thought about that a bit. Perhaps it's not such a bad idea in > principle. On such large yet tightly coupled systems it may be > interesting to explore a p2p filesystem. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From foster at anl.gov Thu Dec 4 20:30:22 2008 From: foster at anl.gov (Ian Foster) Date: Thu, 4 Dec 2008 20:30:22 -0600 Subject: [Swift-devel] User perspective on how an app procedure call maps into an application executable call In-Reply-To: References: Message-ID: Ben: I like the description of the semantics. One question: "The input files must be treated as read only files." This made me wonder what happens if I violate the condition. Do we test for this condition and make the application fail if it is violated? Or are we really saying "if you make any chances they will be lost as we don't copy the file back." Ian From foster at anl.gov Thu Dec 4 20:39:21 2008 From: foster at anl.gov (Ian Foster) Date: Thu, 4 Dec 2008 20:39:21 -0600 Subject: [Swift-devel] notes on how swift implements file input and output In-Reply-To: References: Message-ID: Ben: I don't know if these questions can easily be answered via email, maybe we need to talk on the phone (are you in Australia next week?) or they might be suitable for a next rev of these notes. a) Am I correct in assuming that Swift currently will not run on a site that does not support a shared file system? b) Can we build on this document to introduce means by which we could make use of methods such as bulk transfer of many input files, collective I/O as on BG/P, etc.? c) What are the pros and cons of copying all input and output files twice, once to the site, and once to the node. Is this ever a source of overhead? Ian. On Dec 1, 2008, at 4:00 PM, Ben Clifford wrote: > > read this in conjunction with previous note, "Subject: User > perspective on > how an app procedure call maps into an application executable call" > > > This note details the implementation of Swift file input and output in > application blocks; it is intended to be read in conjunction with a > previous note 'How an app procedure call maps into an application > call, > from a Swift user perspective, attempting to avoid the mechanics > inside > Swift.' > > > Swift executes application procedures on one or more //sites//. > > Each site consists of: > > * worker nodes. There is some //execution mechanism// through which > the > Swift client side executable can execute its //wrapper script// on > those > worker nodes. This is commonly GRAM or Falkon or coasters. > > * a site-shared file system. This site shared filesystem is accessible > through some //file transfer mechanism// from the Swift client side > executable. This is commonly GridFTP or coasters. This site shared > filesystem is also accessible through the posix file system on all > worker > nodes, mounted at the same location as seen through the file transfer > mechanism. Swift is configured with the location of some //site > working > directory// on that site-shared file system. > > There is no assumption that the site shared file system for one site > is > accessible from another site. > > For each workflow run, on each site that is used by that run, a //run > directory// is created in the site working directory, by the Swift > client > side. > > In that run directory are placed several subdirectories: > > * shared/ - site shared files cache > > * kickstart/ - when kickstart is used, kickstart record files > for each job that has generated a kickstart > > * info/ - wrapper script log files > > * status/ - job status files > > * jobs/ //application workspace directories// (optionally placed > here - > see below) > > Application execution looks like this: > > For each application procedure call: > > The Swift client side selects a site; copies the input files for that > procedure call to the site shared file cache if they are not already > in > the cache, using the file transfer mechanism; and then invokes the > wrapper > script on that site using the execution mechanism. > > The wrapper script creates the application workspace directory; > places the > input files for that job into the application workspace directory > using > either cp or ln -s (depending on a configuration option); executes the > application unix executable; copies output files from the application > workspace directory to the site shared directory using cp; creates a > status file under the status/ directory; and exits, returning > control to > the Swift client side. Logs created during the execution of the > wrapper > script are stored under the info/ directory. > > The Swift client side then checks for the presence of and deletes a > status > file indicating success; copies files from the site shared directory > to > the appropriate client side location. > > The job directory is created (in the default mode) under the jobs/ > directory. However, it can be created under an arbitrary other path, > which > allows it to be created on a different file system (such as a worker > node > local file system in the case that the worker node has a local file > system). > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Fri Dec 5 00:19:16 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 05 Dec 2008 00:19:16 -0600 Subject: [Swift-devel] User perspective on how an app procedure call maps into an application executable call In-Reply-To: References: Message-ID: <1228457956.11289.5.camel@localhost> On Thu, 2008-12-04 at 20:30 -0600, Ian Foster wrote: > Ben: > > I like the description of the semantics. One question: > > "The input files must be treated as read only files." > > This made me wonder what happens if I violate the condition. Do we > test for this condition and make the application fail if it is > violated? Or are we really saying "if you make any chances they will > be lost as we don't copy the file back." I'm not ben, but... we don't currently check for modification of input files, and modifications are probably lost unless the input file is also an output file, which will very likely violate the swift rule that says that two different swift variables cannot be mapped to the same file. From benc at hawaga.org.uk Fri Dec 5 11:02:15 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 5 Dec 2008 17:02:15 +0000 (GMT) Subject: [Swift-devel] User perspective on how an app procedure call maps into an application executable call In-Reply-To: <1228457956.11289.5.camel@localhost> References: <1228457956.11289.5.camel@localhost> Message-ID: > > "The input files must be treated as read only files." > > > > This made me wonder what happens if I violate the condition. Do we > > test for this condition and make the application fail if it is > > violated? Or are we really saying "if you make any chances they will > > be lost as we don't copy the file back." [...] > modifications are probably lost [...] Its likely that modifications are lost post-workflow (though not guaranteed - for example, I made a cog provider that made local filesystem stageins happen using hard links. In that case, there is potential for a misbehaving application to modify the input file at its original master location) Within a workflow run, writing to an input file will give similar non-determinism as happens in other replica-management situations where the replicas are not actually replcias. Subsequently executed jobs that make use of the same input file may be fed the modified input file (if on the same site and still in the cache) or may be fed the original input file (if on a different site). The consequence of that is that writing to an input file may make the outputs of all other procedures that use that input non-deterministically wrong. The architecture at the moment would fairly straightforwardly support a more-assertions/more-overhead mode to prevent or detect such changes, if it was desirable; however, I've not seen it happen in practice and our present trend is towards less overhead, not more. -- From benc at hawaga.org.uk Fri Dec 5 12:46:31 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 5 Dec 2008 18:46:31 +0000 (GMT) Subject: [Swift-devel] notes on how swift implements file input and output In-Reply-To: References: Message-ID: On Thu, 4 Dec 2008, Ian Foster wrote: > a) Am I correct in assuming that Swift currently will not run on a site that > does not support a shared file system? yes. > b) Can we build on this document to introduce means by which we could make use > of methods such as bulk transfer of many input files, collective I/O as on > BG/P, etc.? I wrote this to motivate discussion for the meeting we had yesterday involving myself, mike wilde, hategan, ioan, zhao and allan espinosa which was pretty much centered on that topic. We looked at three specific cases of swift + something else: * swift + gLite * swift + falkon data diffusion * swift + mike/zhao/allan's collective IO work In all three cases, the modifications necessary to core swift seem fairly simple. In the gLite and falkon data diffusion case, it seems straightforward to change the abstractions a bit so that, which there is still a concept of a site shared filesystem, there is no requirement that this be posix accessible; instead glite or falkon data diffusion specific mechanisms can be used to move data from the site shared filesystem to the appropriate worker node. In the collective IO work, the new filesystem there exposes itself through posix anyway; getting that working with swift seems mostly to be integration work rather than new coding. That being said, the discussion above was mostly about the mechanics of plugging the pieces together. The more interesting and harder part of that is likely to be performance characterisation and improvement; for collective IO and data diffusion, it is improvement over traditional file systems rather than whether it works or not that seems to be the goal. > c) What are the pros and cons of copying all input and output files twice, > once to the site, and once to the node. Is this ever a source of overhead? They're not always copied to the node. In the present case, it is an option whether to copy input files entirely to a worker node or to access them directly off the site shared filesystem; its been seen through experiment that it can be faster in some cases to copy a file using some specialised posix data transfer tool like /bin/cp and then have local access to it; conversely though if the input file is large and only small parts of it are accessed randomly, then keeping it on the shared file system may be a better approach. Having a site-shared filesystem as part of the abstraction gives a fairly straightforward way to handle site-side data caching so that input files do not have to be staged in multiple times for multiple jobs; it also gives a pretty portable way to get data to a worker node from its stagein location that is closely aligned with how traditional grid sites are configured. There are pages more that could be written comparing different approaches to doing this... -- From benc at hawaga.org.uk Sat Dec 6 18:36:43 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 7 Dec 2008 00:36:43 +0000 (GMT) Subject: [Swift-devel] readdata reading from datasets Message-ID: readdata presently takes a string as a parameter listing the file to read. this means that it can't be used to read from computed files. it might be nice to be able to do that. -- From bugzilla-daemon at mcs.anl.gov Sat Dec 6 18:59:02 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 6 Dec 2008 18:59:02 -0600 (CST) Subject: [Swift-devel] [Bug 159] New: readdata typechecking too loose Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=159 Summary: readdata typechecking too loose Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk readdata takes only a string. but passing non-strings does not cause a type exception in swift 0.7 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Sat Dec 6 19:16:26 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 6 Dec 2008 19:16:26 -0600 (CST) Subject: [Swift-devel] [Bug 157] Identify Swift as generator of gridftp client info In-Reply-To: Message-ID: <20081207011626.3D328164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=157 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from benc at hawaga.org.uk 2008-12-06 19:16 ------- *** This bug has been marked as a duplicate of 156 *** -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Sat Dec 6 19:16:26 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 6 Dec 2008 19:16:26 -0600 (CST) Subject: [Swift-devel] [Bug 156] Identify Swift as generator of gridftp client info In-Reply-To: Message-ID: <20081207011626.5C7E8164B9@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=156 ------- Comment #1 from benc at hawaga.org.uk 2008-12-06 19:16 ------- *** Bug 157 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Mon Dec 8 14:46:50 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 8 Dec 2008 14:46:50 -0600 (CST) Subject: [Swift-devel] [Bug 160] New: some mappers fail for complex data Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=160 Summary: some mappers fail for complex data Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: hategan at mcs.anl.gov when using structures with arrays of arrays, both the filesys and ext mappers cause consistency check failures. Here's a test case: ------------------- type file{} type FVec { file columns[]; } type FMat { FVec rows[]; } app (file r) dummy() { echo "x" stdout=@r; } FMat m ; m.rows[0].columns[0] = dummy(); ------------------- This should at least complete successfully. Instead, I see the following error: Execution failed: Mapper failed to map org.griphyn.vdl.mapping.DataNode identifier tag:benc at ci.uchicago.edu,2008:swift:dataset:20081208-1441-2nuvzaa8:720000000008 with no value at dataset=m path=.rows[0].columns[0] (not closed) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Mon Dec 8 18:52:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 9 Dec 2008 00:52:16 +0000 (GMT) Subject: [Swift-devel] default output Message-ID: It was commented the other day that the default output of swift is overwhelmed by " starting" and " completed" messages, masking the other standard output information of jobs-in-each-state. One thing to do there is to remove the started|completed messages from the default output and instead have only the periodic ticker message. -- From hategan at mcs.anl.gov Mon Dec 8 18:59:00 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 08 Dec 2008 18:59:00 -0600 Subject: [Swift-devel] default output In-Reply-To: References: Message-ID: <1228784340.27303.0.camel@localhost> On Tue, 2008-12-09 at 00:52 +0000, Ben Clifford wrote: > It was commented the other day that the default output of swift is > overwhelmed by " starting" and " completed" messages, masking > the other standard output information of jobs-in-each-state. > > One thing to do there is to remove the started|completed messages > from the default output and instead have only the periodic ticker message. Yes. App started|completed could be moved to -v. From bugzilla-daemon at mcs.anl.gov Tue Dec 9 14:49:02 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 9 Dec 2008 14:49:02 -0600 (CST) Subject: [Swift-devel] [Bug 161] New: poor error when swiftscript source file specification has no extension Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=161 Summary: poor error when swiftscript source file specification has no extension Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: minor Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: benc at hawaga.org.uk For any of: $ swift /home $ swift /etc/group I get this: Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1938) at org.griphyn.vdl.karajan.Loader.projectName(Loader.java:369) at org.griphyn.vdl.karajan.Loader.main(Loader.java:97) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Tue Dec 9 14:54:49 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 9 Dec 2008 14:54:49 -0600 (CST) Subject: [Swift-devel] [Bug 162] New: error message syntax makes distinct values look like a bizarre path Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=162 Summary: error message syntax makes distinct values look like a bizarre path Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk The syntax of this error message: Failed to transfer wrapper log from 066-many-20081209-1448-hoqecv01/info/8/Clemson-ciTeam has multiple times made me thing that Swift was trying to work with a .../Clemson-ciTeam (or whatever site name) directory, which it is not. Syntax should be changed to separate the path and host components with something other than a / -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Tue Dec 9 19:57:43 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Dec 2008 01:57:43 +0000 (GMT) Subject: [Swift-devel] fun with osg Message-ID: I played some with Mats Rynge at RENCI today. He modified some code he already had to output sites.xml based on the OSG ReSS information system. This made it straightforward to submit jobs to sites that are in OSGEDU (the only OSG VO that I am a member of) running executables that are already on the system $PATH. However, of the 13 sites advertising that they support OSG EDU, only two are actually able to run Swift jobs this afternoon. I just got setup to submit jobs to the OSG Engagement VO; this changes the range of sites available, but also opens up some more opportunity for narrowing down the range of available sites as the Engagement VO has a richer set of site availability information that can be used to construct a more-likely-to-work sites.xml file. Of the two working-and-published OSGEDU sites, I also tried running with coasters; that failed dismally on both - in the case of one site, the fork job manager was the ever-more-common ManagedFork jobmanager, which appears unable to be able to execute the head jobs. The other site ran the head job but worker nodes could not communicate properly with that. so frrrr. There also remains the problem of application location and deployment; in my playing today I used default $PATH method of finding my test executable (touch) - this is still hard, though I have some other notes to write about that elsewhere. Replication seems to do a good job with failing sites, although not a completely perfect job. One site takes jobs to the karajan Active state and then stays there forever. Replication, as presently implemented, doesn't cope with that. So nothing particularly new - OSG as usual... -- From tiberius at ci.uchicago.edu Tue Dec 9 22:28:44 2008 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 9 Dec 2008 22:28:44 -0600 Subject: [Swift-devel] log-processing Message-ID: Hi I was wondering if the log processing tool can create some graph where I can see useful information on a per-atomic-application basis. I'm interested in average, min. max time spent in each atomic application. Thank you Tibi -- Tiberiu (Tibi) Stef-Praun, PhD Computational Sciences Researcher Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Wed Dec 10 07:28:29 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Dec 2008 13:28:29 +0000 (GMT) Subject: [Swift-devel] log-processing In-Reply-To: References: Message-ID: On Tue, 9 Dec 2008, Tiberiu Stef-Praun wrote: > I was wondering if the log processing tool can create some graph where > I can see useful information on a per-atomic-application basis. I'm > interested in average, min. max time spent in each atomic application. As an already implemented graph, not at the moment; but most of the stuff in in place to do so. By 'time spent in each', do you mean time spent with a compute node running your app code or with overhead such as stagein and queue time. Information for both of those options is available (but not filtered by app) at the moment. -- From wilde at mcs.anl.gov Wed Dec 10 09:35:37 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Dec 2008 09:35:37 -0600 Subject: [Swift-devel] log-processing In-Reply-To: References: Message-ID: <493FE1C9.6050007@mcs.anl.gov> I did something like this a while back with a perl script that summarized kickstart data. I agree - its very useful to the user. Also for multi-site runs, to correlate performance with site architecture (and in other studies, with arguments/file sizes, but thats another step). So I think this would be a good time to make a decision on kickstart data vs wrapper.sh data. I have no preference. I tend to favor data thats either tabular or can readily be converted to tabular. Jens's two versions of kickstart both have a lot of features we'll likely find useful. wrapper.sh is nice, simple, and lightweight. Ben and/or Mihael, I think you should work out an efficient and effective architecture for this important stats capture mechanism that will be useful for users and get it into general use. - Mike On 12/10/08 7:28 AM, Ben Clifford wrote: > > On Tue, 9 Dec 2008, Tiberiu Stef-Praun wrote: > >> I was wondering if the log processing tool can create some graph where >> I can see useful information on a per-atomic-application basis. I'm >> interested in average, min. max time spent in each atomic application. > > As an already implemented graph, not at the moment; but most of the stuff > in in place to do so. > > By 'time spent in each', do you mean time spent with a compute node > running your app code or with overhead such as stagein and queue time. > > Information for both of those options is available (but not filtered by > app) at the moment. > From benc at hawaga.org.uk Wed Dec 10 09:41:36 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Dec 2008 15:41:36 +0000 (GMT) Subject: [Swift-devel] log-processing In-Reply-To: <493FE1C9.6050007@mcs.anl.gov> References: <493FE1C9.6050007@mcs.anl.gov> Message-ID: On Wed, 10 Dec 2008, Michael Wilde wrote: > So I think this would be a good time to make a decision on kickstart > data vs wrapper.sh data. I have no preference. I tend to favor data > thats either tabular or can readily be converted to tabular. The user community decided long ago to not use kickstart, I think. The information from both wrapper logs and kickstart is available from the log processing code in a job-per-line tabular form. > Ben and/or Mihael, I think you should work out an efficient and > effective architecture for this important stats capture mechanism that > will be useful for users and get it into general use. An architecture exists. Its the log-processing code. People even use it! Its likely to be in the Swift 0.8 release. There is plenty of scope for improvement there, but it is the existing base. -- From wilde at mcs.anl.gov Wed Dec 10 10:12:00 2008 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 10 Dec 2008 10:12:00 -0600 Subject: [Swift-devel] log-processing In-Reply-To: References: <493FE1C9.6050007@mcs.anl.gov> Message-ID: <493FEA50.7010301@mcs.anl.gov> On 12/10/08 9:41 AM, Ben Clifford wrote: > On Wed, 10 Dec 2008, Michael Wilde wrote: > >> So I think this would be a good time to make a decision on kickstart >> data vs wrapper.sh data. I have no preference. I tend to favor data >> thats either tabular or can readily be converted to tabular. > > The user community decided long ago to not use kickstart, I think. ok. Should we remove the documentation for it and eventually phase it of of the code, then? Low prio, but it determines where we put our effort in enhancing stats. > The information from both wrapper logs and kickstart is available from the > log processing code in a job-per-line tabular form. ok. Then next step seems to be documenting it further. When I read http://www.ci.uchicago.edu/swift/guides/log-processing.php what Tibi is asking for doesnt jump out at me. Are you referring to: "webpage.info - details of execute-site wrapper logs. For versions of Swift prior to r1700, you will need to stage the *-info logs back to the same place as the Swift log manually. AFter r1700, they are staged back automatically under the control of the wrapperlog.always.transfer property. The IDIR variable must be set to point to the directory containg the logs: make LOG=/path/fmri-20080304-0901-h8h78lnf.log \ IDIR=/path/130-fmri-20080304-0901-h8h78lnf.d/ clean webpage.info webpage " ? Sounds like some polishing of this may be the next step then. So sounds like Tibi, you should try whats there, and report back if it meets your needs or not. >> Ben and/or Mihael, I think you should work out an efficient and >> effective architecture for this important stats capture mechanism that >> will be useful for users and get it into general use. > > An architecture exists. Its the log-processing code. People even use it! > Its likely to be in the Swift 0.8 release. There is plenty of scope for > improvement there, but it is the existing base. OK. Sounds good. I (or better you) can readily guess a few of the main things a user will want to ask about a run, and provide examples of the inputs and outputs in the user guide. Eg: "For each app run, create a list of: appname sitename starttime runtime cputime memusage (a few vars) etc" "For each app, summarize these by site, by app, etc." - Mike From benc at hawaga.org.uk Wed Dec 10 10:22:16 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Dec 2008 16:22:16 +0000 (GMT) Subject: [Swift-devel] log-processing In-Reply-To: <493FEA50.7010301@mcs.anl.gov> References: <493FE1C9.6050007@mcs.anl.gov> <493FEA50.7010301@mcs.anl.gov> Message-ID: On Wed, 10 Dec 2008, Michael Wilde wrote: > So sounds like Tibi, you should try whats there, and report back if it > meets your needs or not. He already did that... the first message on this thread is him asking for a specific set of information from log-processing that it doesn't provide (but easily could). Pay attention, 007! -- From benc at hawaga.org.uk Wed Dec 10 10:37:47 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Dec 2008 16:37:47 +0000 (GMT) Subject: [Swift-devel] log-processing In-Reply-To: References: Message-ID: On Tue, 9 Dec 2008, Tiberiu Stef-Praun wrote: > I was wondering if the log processing tool can create some graph where > I can see useful information on a per-atomic-application basis. I'm > interested in average, min. max time spent in each atomic application. If you're intereted in doing some analysis yourself, I just modified the log-processing code to give you something more easy to work with. Using log-processing >= r2365 will let you say: swift-plot-log 066-many-20081210-0834-xk5h3us8.log info.event (i.e. add info.event onto the end of the command line) This will give you a file info.event (in a directory under /tmp indicated at the end of the command output) with one line per info file, that looks like this: 1228919959.621686561 0.288390398025513 touch-0cej8i3j END touch The fields are space separated and are, in order: 1. start time in seconds since epoch 2. time of wrapper script execution on worker node 3. job id 4. final wrapper script status 5. application name You can use this information now if you want to do your own further analysis, or you can wait for me to implement per-app stats (that will likely look like the per-site stats that appear at the end of execute2.html in a log report). -- From abejan at ci.uchicago.edu Wed Dec 10 11:31:39 2008 From: abejan at ci.uchicago.edu (Alina Bejan) Date: Wed, 10 Dec 2008 11:31:39 -0600 Subject: [Swift-devel] Re: fun with osg [Swift-devel Digest, Vol 23, Issue 12] In-Reply-To: <20081210162237.0AFE82C0044@mail.ci.uchicago.edu> References: <20081210162237.0AFE82C0044@mail.ci.uchicago.edu> Message-ID: <493FFCFB.6060600@ci.uchicago.edu> Hi Ben, For OSGEDU: the ReSS information should provide only 2 sites available, since this is the reality. It hasn't been 13 sites for this VO for more than 6 months. But the information seems to still be stale. Situation might be slightly different if you run under OSG VO, since more sites would be available. If you wish to pursue this option as well, let me know. We'll be able to add you to OSG VO if you plan on running more experiments that way. Alina swift-devel-request at ci.uchicago.edu wrote: > Send Swift-devel mailing list submissions to > swift-devel at ci.uchicago.edu > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > or, via email, send a message with subject or body 'help' to > swift-devel-request at ci.uchicago.edu > > You can reach the person managing the list at > swift-devel-owner at ci.uchicago.edu > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Swift-devel digest..." > > > Today's Topics: > > 1. [Bug 161] New: poor error when swiftscript source file > specification has no extension (bugzilla-daemon at mcs.anl.gov) > 2. [Bug 162] New: error message syntax makes distinct values > look like a bizarre path (bugzilla-daemon at mcs.anl.gov) > 3. fun with osg (Ben Clifford) > 4. log-processing (Tiberiu Stef-Praun) > 5. Re: log-processing (Ben Clifford) > 6. Re: log-processing (Michael Wilde) > 7. Re: log-processing (Ben Clifford) > 8. Re: log-processing (Michael Wilde) > 9. Re: log-processing (Ben Clifford) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 9 Dec 2008 14:49:02 -0600 (CST) > From: bugzilla-daemon at mcs.anl.gov > Subject: [Swift-devel] [Bug 161] New: poor error when swiftscript > source file specification has no extension > To: swift-devel at ci.uchicago.edu > Message-ID: > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=161 > > Summary: poor error when swiftscript source file specification > has no extension > Product: Swift > Version: unspecified > Platform: Macintosh > OS/Version: Mac OS > Status: NEW > Severity: minor > Priority: P2 > Component: General > AssignedTo: benc at hawaga.org.uk > ReportedBy: benc at hawaga.org.uk > CC: benc at hawaga.org.uk > > > For any of: > $ swift /home > $ swift /etc/group > > I get this: > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String > index out of range: -1 > at java.lang.String.substring(String.java:1938) > at org.griphyn.vdl.karajan.Loader.projectName(Loader.java:369) > at org.griphyn.vdl.karajan.Loader.main(Loader.java:97) > > > From benc at hawaga.org.uk Wed Dec 10 12:59:47 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 10 Dec 2008 18:59:47 +0000 (GMT) Subject: [Swift-devel] Re: fun with osg [Swift-devel Digest, Vol 23, Issue 12] In-Reply-To: <493FFCFB.6060600@ci.uchicago.edu> References: <20081210162237.0AFE82C0044@mail.ci.uchicago.edu> <493FFCFB.6060600@ci.uchicago.edu> Message-ID: On Wed, 10 Dec 2008, Alina Bejan wrote: > For OSGEDU: the ReSS information should provide only 2 sites > available, since this is the reality. It hasn't been 13 sites for this > VO for more than 6 months. But the information seems to still be stale. > > Situation might be slightly different if you run under OSG VO, since > more sites would be available. If you wish to pursue this option as > well, let me know. We'll be able to add you to OSG VO if you plan on > running more experiments that way. I'm in the Engage VO now as well, as there is a lot of work already done (and ongoing) there to keep that VO's information system fairly fresh - that seems most attractive to me from a Swift development perspective. -- From zhaozhang at uchicago.edu Thu Dec 11 16:14:31 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 16:14:31 -0600 Subject: [Swift-devel] ssh data provider Message-ID: <494190C7.3030903@uchicago.edu> Hi, All I am trying to make the ssh data provider working between BG login node and IO node. In this context, we could ssh from login nodes to IO nodes without a passphrase. I made a test, it failed: zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml -tc.file ./tc.data first.swift Swift svn swift-r2334 (Swift modified locally) cog-r2216 RunID: 20081211-1604-6otxvqtb Progress: echo started Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] Sorted: [bgp000:999.590(98.544):0/789 overload: 0] Sorted: [bgp000:999.180(98.544):0/789 overload: 0] echo failed Execution failed: Could not initialize shared directory on bgp000 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on 172.16.3.6:22 Caused by: Public Key Authentication failed My sites.xml is like belwo: zzhang at login6.surveyor:~/swift/test> cat sites.xml /tmp 8 1000 My auth.defaults is this: zzhang at login6.surveyor:~/swift/test> cat ~/.ssh/auth.defaults 172.16.3.6.type=key 172.16.3.6.username=zzhang 172.16.3.6.key=/home/zzhang/.ssh/id_rsa 172.16.3.6.passphrase="" The log file could be reached at http://www.ci.uchicago.edu/~zzhang/first-20081211-1604-6otxvqtb.log Thanks. best wishes zhangzhao From hategan at mcs.anl.gov Thu Dec 11 16:27:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 16:27:21 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <494190C7.3030903@uchicago.edu> References: <494190C7.3030903@uchicago.edu> Message-ID: <1229034441.10267.0.camel@localhost> Can you remove the quotes after .passphrase= and try again? On Thu, 2008-12-11 at 16:14 -0600, Zhao Zhang wrote: > Hi, All > > I am trying to make the ssh data provider working between BG login node > and IO node. > In this context, we could ssh from login nodes to IO nodes without a > passphrase. > > I made a test, it failed: > zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > -tc.file ./tc.data first.swift > Swift svn swift-r2334 (Swift modified locally) cog-r2216 > > RunID: 20081211-1604-6otxvqtb > Progress: > echo started > Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > echo failed > Execution failed: > Could not initialize shared directory on bgp000 > Caused by: > org.globus.cog.abstraction.impl.file.FileResourceException: > Error while communicating with the SSH server on 172.16.3.6:22 > Caused by: > Public Key Authentication failed > > > My sites.xml is like belwo: > zzhang at login6.surveyor:~/swift/test> cat sites.xml > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.5"> > > > url="http://172.16.3.6:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> > /tmp > 8 > 1000 > > > > > My auth.defaults is this: > zzhang at login6.surveyor:~/swift/test> cat ~/.ssh/auth.defaults > 172.16.3.6.type=key > 172.16.3.6.username=zzhang > 172.16.3.6.key=/home/zzhang/.ssh/id_rsa > 172.16.3.6.passphrase="" > > The log file could be reached at > http://www.ci.uchicago.edu/~zzhang/first-20081211-1604-6otxvqtb.log > > Thanks. > > best wishes > zhangzhao > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Thu Dec 11 16:25:15 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 11 Dec 2008 22:25:15 +0000 (GMT) Subject: [Swift-devel] ssh data provider In-Reply-To: <494190C7.3030903@uchicago.edu> References: <494190C7.3030903@uchicago.edu> Message-ID: On Thu, 11 Dec 2008, Zhao Zhang wrote: > I am trying to make the ssh data provider working between BG login node and IO > node. > In this context, we could ssh from login nodes to IO nodes without a > passphrase. Have you had the ssh provider work in any other situation? It might be useful to check that you are using it correctly between two standard linux boxes first. Can you ssh from the command-line ssh utility from that login node to that IO node using those credentials? If so, please paste the entire output of sshing using the -v parameter. -- From zhaozhang at uchicago.edu Thu Dec 11 18:21:49 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 18:21:49 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941927D.6060904@mcs.anl.gov> References: <494190C7.3030903@uchicago.edu> <4941927D.6060904@mcs.anl.gov> Message-ID: <4941AE9D.20808@uchicago.edu> Hi, Mike On BGP, we could ssh to IO nodes even if there is no private key. So I was assuming there should not be any passphrase either. I created one pair of keys, and tried with the private key, even though the public key is not deployed on IO nodes, it still worked. Ben, this is also what you ask for. zzhang at login6.surveyor:~> ssh -v ion-1 OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: /etc/ssh/ssh_config line 25: Deprecated option "RhostsAuthentication" debug1: Connecting to ion-1 [172.16.3.1] port 22. debug1: Connection established. debug1: identity file /home/zzhang/.ssh/identity type -1 debug1: identity file /home/zzhang/.ssh/id_rsa type -1 debug1: identity file /home/zzhang/.ssh/id_dsa type -1 debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 debug1: match: OpenSSH_4.2 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.2 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'ion-1' is known and matches the RSA host key. debug1: Found key in /home/zzhang/.ssh/known_hosts:1 debug1: ssh_rsa_verify: signature correct debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Next authentication method: hostbased debug1: Remote: Accepted for login6-data.surveyor.alcf.anl.gov [172.17.3.16] by /etc/ssh/shosts.equiv. debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Remote: Accepted for login6-data.surveyor.alcf.anl.gov [172.17.3.16] by /etc/ssh/shosts.equiv. debug1: Authentication succeeded (hostbased). debug1: channel 0: new [client-session] debug1: Entering interactive session. Last login: Thu Dec 11 18:15:46 2008 from login6-data.surveyor.alcf.anl.gov BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) Enter 'help' for a list of built-in commands. /gpfs/home/zzhang $ zhao Michael Wilde wrote: > Zhao: be *very* careful with creating and placing an ssh key without a > passphrase. > > Anyone that can obtain your private key can then get in anywhere you > have this key placed. > > So dont put the key anywhere outside of intrepid (which is well > protected by its cryptocard-only access) > > - Mike > > > On 12/11/08 4:14 PM, Zhao Zhang wrote: >> Hi, All >> >> I am trying to make the ssh data provider working between BG login >> node and IO node. >> In this context, we could ssh from login nodes to IO nodes without a >> passphrase. >> >> I made a test, it failed: >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >> -tc.file ./tc.data first.swift >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >> >> RunID: 20081211-1604-6otxvqtb >> Progress: >> echo started >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >> echo failed >> Execution failed: >> Could not initialize shared directory on bgp000 >> Caused by: >> org.globus.cog.abstraction.impl.file.FileResourceException: >> Error while communicating with the SSH server on 172.16.3.6:22 >> Caused by: >> Public Key Authentication failed >> >> >> My sites.xml is like belwo: >> zzhang at login6.surveyor:~/swift/test> cat sites.xml >> >> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.5"> >> >> >> > url="http://172.16.3.6:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> >> >> /tmp >> 8 >> 1000 >> >> >> >> >> My auth.defaults is this: >> zzhang at login6.surveyor:~/swift/test> cat ~/.ssh/auth.defaults >> 172.16.3.6.type=key >> 172.16.3.6.username=zzhang >> 172.16.3.6.key=/home/zzhang/.ssh/id_rsa >> 172.16.3.6.passphrase="" >> >> The log file could be reached at >> http://www.ci.uchicago.edu/~zzhang/first-20081211-1604-6otxvqtb.log >> >> Thanks. >> >> best wishes >> zhangzhao >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From zhaozhang at uchicago.edu Thu Dec 11 18:27:00 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 18:27:00 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: References: <494190C7.3030903@uchicago.edu> Message-ID: <4941AFD4.8060206@uchicago.edu> Oops, I think the one I pasted in the previous email is not what you want, this is zzhang at login6.surveyor:~> ssh -l zzhang -i /home/zzhang/.ssh/ir_rsa -v ion-1 Warning: Identity file /home/zzhang/.ssh/ir_rsa not accessible: No such file or directory. OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: /etc/ssh/ssh_config line 25: Deprecated option "RhostsAuthentication" debug1: Connecting to ion-1 [172.16.3.1] port 22. debug1: Connection established. debug1: identity file /home/zzhang/.ssh/identity type -1 debug1: identity file /home/zzhang/.ssh/id_rsa type 1 debug1: identity file /home/zzhang/.ssh/id_dsa type -1 debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 debug1: match: OpenSSH_4.2 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.2 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'ion-1' is known and matches the RSA host key. debug1: Found key in /home/zzhang/.ssh/known_hosts:40 Warning: the RSA host key for 'ion-1' differs from the key for the IP address '172.16.3.1' Offending key for IP in /home/zzhang/.ssh/known_hosts:3 Matching host key in /home/zzhang/.ssh/known_hosts:40 debug1: ssh_rsa_verify: signature correct debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Next authentication method: hostbased debug1: Remote: Accepted for login6-data.surveyor.alcf.anl.gov [172.17.3.16] by /etc/ssh/shosts.equiv. debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Remote: Accepted for login6-data.surveyor.alcf.anl.gov [172.17.3.16] by /etc/ssh/shosts.equiv. debug1: Authentication succeeded (hostbased). debug1: channel 0: new [client-session] debug1: Entering interactive session. Last login: Thu Dec 11 18:22:04 2008 from login6-data.surveyor.alcf.anl.gov BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) Enter 'help' for a list of built-in commands. /gpfs/home/zzhang $ Ben Clifford wrote: > On Thu, 11 Dec 2008, Zhao Zhang wrote: > > >> I am trying to make the ssh data provider working between BG login node and IO >> node. >> In this context, we could ssh from login nodes to IO nodes without a >> passphrase. >> > > Have you had the ssh provider work in any other situation? It might be > useful to check that you are using it correctly between two standard linux > boxes first. > > Can you ssh from the command-line ssh utility from that login node to that > IO node using those credentials? If so, please paste the entire output of > sshing using the -v parameter. > > From zhaozhang at uchicago.edu Thu Dec 11 18:30:16 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 18:30:16 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229034441.10267.0.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> Message-ID: <4941B098.1080105@uchicago.edu> Hi, Mihael If I put .passphrase= there, I got this: zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml -tc.file ./tc.data first.swift Swift svn swift-r2334 (Swift modified locally) cog-r2216 RunID: 20081211-1603-qsfmaeif Progress: echo started Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] Sorted: [bgp000:999.590(98.544):0/789 overload: 0] Sorted: [bgp000:999.180(98.544):0/789 overload: 0] echo failed Execution failed: Could not initialize shared directory on bgp000 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on 172.16.3.6:22 Caused by: java.lang.NullPointerException at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.loadDefaultCredentials(SSHChannelManager.java:160) at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.getDefaultCredentials(SSHChannelManager.java:120) at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.getCredentials(SSHChannelManager.java:79) at org.globus.cog.abstraction.impl.ssh.SSHChannelManager.getChannel(SSHChannelManager.java:62) at org.globus.cog.abstraction.impl.ssh.file.FileResourceImpl.start(FileResourceImpl.java:81) at org.globus.cog.abstraction.impl.file.FileResourceCache.getResource(FileResourceCache.java:98) at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.getResource(CachingDelegatedFileOperationHandler.java:75) at org.globus.cog.abstraction.impl.file.CachingDelegatedFileOperationHandler.submit(CachingDelegatedFileOperationHandler.java:40) at org.globus.cog.abstraction.impl.common.task.CachingFileOperationTaskHandler.submit(CachingFileOperationTaskHandler.java:28) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:86) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:571) zhao Mihael Hategan wrote: > Can you remove the quotes after .passphrase= and try again? > > On Thu, 2008-12-11 at 16:14 -0600, Zhao Zhang wrote: > >> Hi, All >> >> I am trying to make the ssh data provider working between BG login node >> and IO node. >> In this context, we could ssh from login nodes to IO nodes without a >> passphrase. >> >> I made a test, it failed: >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >> -tc.file ./tc.data first.swift >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >> >> RunID: 20081211-1604-6otxvqtb >> Progress: >> echo started >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >> echo failed >> Execution failed: >> Could not initialize shared directory on bgp000 >> Caused by: >> org.globus.cog.abstraction.impl.file.FileResourceException: >> Error while communicating with the SSH server on 172.16.3.6:22 >> Caused by: >> Public Key Authentication failed >> >> >> My sites.xml is like belwo: >> zzhang at login6.surveyor:~/swift/test> cat sites.xml >> >> > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.5"> >> >> >> > url="http://172.16.3.6:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> >> /tmp >> 8 >> 1000 >> >> >> >> >> My auth.defaults is this: >> zzhang at login6.surveyor:~/swift/test> cat ~/.ssh/auth.defaults >> 172.16.3.6.type=key >> 172.16.3.6.username=zzhang >> 172.16.3.6.key=/home/zzhang/.ssh/id_rsa >> 172.16.3.6.passphrase="" >> >> The log file could be reached at >> http://www.ci.uchicago.edu/~zzhang/first-20081211-1604-6otxvqtb.log >> >> Thanks. >> >> best wishes >> zhangzhao >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > From hategan at mcs.anl.gov Thu Dec 11 18:39:28 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 18:39:28 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941B098.1080105@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> Message-ID: <1229042368.12545.1.camel@localhost> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > Hi, Mihael > > If I put .passphrase= there, I got this: With the IP address before .passphrase, of course. I.e. 172.16.3.6.passphrase= From zhaozhang at uchicago.edu Thu Dec 11 18:38:13 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 18:38:13 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229042368.12545.1.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> Message-ID: <4941B275.4060904@uchicago.edu> sure, it is 172.16.3.6.passphrase= Mihael Hategan wrote: > On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >> Hi, Mihael >> >> If I put .passphrase= there, I got this: >> > > With the IP address before .passphrase, of course. I.e. > 172.16.3.6.passphrase= > > > > > From hategan at mcs.anl.gov Thu Dec 11 18:41:48 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 18:41:48 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941AE9D.20808@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <4941927D.6060904@mcs.anl.gov> <4941AE9D.20808@uchicago.edu> Message-ID: <1229042508.12545.3.camel@localhost> On Thu, 2008-12-11 at 18:21 -0600, Zhao Zhang wrote: > Hi, Mike > > On BGP, we could ssh to IO nodes even if there is no private key. So I > was assuming there should not be any passphrase either. > I created one pair of keys, and tried with the private key, even though > the public key is not deployed on IO nodes, it still worked. If you cannot directly log into the IO nodes from outside, then this is no worse than the hostbased authentication that is in place. From hategan at mcs.anl.gov Thu Dec 11 18:46:23 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 18:46:23 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941B275.4060904@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> Message-ID: <1229042783.12545.6.camel@localhost> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > sure, it is 172.16.3.6.passphrase= I don't believe you. Can you paste the file? > > Mihael Hategan wrote: > > On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > > > >> Hi, Mihael > >> > >> If I put .passphrase= there, I got this: > >> > > > > With the IP address before .passphrase, of course. I.e. > > 172.16.3.6.passphrase= > > > > > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 18:45:30 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 18:45:30 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229042783.12545.6.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> Message-ID: <4941B42A.8090206@uchicago.edu> ha, here it is 172.16.3.6.type=key 172.16.3.6.username=zzhang 172.16.3.6.key=/home/zzhang/.ssh/id_rsa 172.16.3.6.passphrase= Mihael Hategan wrote: > On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >> sure, it is 172.16.3.6.passphrase= >> > > I don't believe you. Can you paste the file? > > >> Mihael Hategan wrote: >> >>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>> >>> >>>> Hi, Mihael >>>> >>>> If I put .passphrase= there, I got this: >>>> >>>> >>> With the IP address before .passphrase, of course. I.e. >>> 172.16.3.6.passphrase= >>> >>> >>> >>> >>> >>> > > > From zhaozhang at uchicago.edu Thu Dec 11 18:51:28 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 18:51:28 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229042783.12545.6.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> Message-ID: <4941B590.7000903@uchicago.edu> Ha, you are right, i put a wrong log here. I rerun it, if failed with the following message. zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml -tc.file ./tc.data first.swift Swift svn swift-r2334 (Swift modified locally) cog-r2216 RunID: 20081211-1850-rcrr2fk0 Progress: echo started Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] Sorted: [bgp000:999.590(98.544):0/789 overload: 0] Sorted: [bgp000:999.180(98.544):0/789 overload: 0] echo failed Execution failed: Could not initialize shared directory on bgp000 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on 172.16.3.7:22 Caused by: Public Key Authentication failed zhao Mihael Hategan wrote: > On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >> sure, it is 172.16.3.6.passphrase= >> > > I don't believe you. Can you paste the file? > > >> Mihael Hategan wrote: >> >>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>> >>> >>>> Hi, Mihael >>>> >>>> If I put .passphrase= there, I got this: >>>> >>>> >>> With the IP address before .passphrase, of course. I.e. >>> 172.16.3.6.passphrase= >>> >>> >>> >>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 19:02:27 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 19:02:27 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941B590.7000903@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> Message-ID: <1229043747.12948.5.camel@localhost> I looked at the ssh logs, and it seems like you're logging in using hostbased authentication. Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang -i /home/zzhang/.ssh/id_rsa ion-1 Also, note that you misspelled "id_rsa": Warning: Identity file /home/zzhang/.ssh/ir_rsa not accessible: No such file or directory. On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > Ha, you are right, i put a wrong log here. > > I rerun it, if failed with the following message. > > zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > -tc.file ./tc.data first.swift > Swift svn swift-r2334 (Swift modified locally) cog-r2216 > > RunID: 20081211-1850-rcrr2fk0 > Progress: > echo started > Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > echo failed > Execution failed: > Could not initialize shared directory on bgp000 > Caused by: > org.globus.cog.abstraction.impl.file.FileResourceException: > Error while communicating with the SSH server on 172.16.3.7:22 > Caused by: > Public Key Authentication failed > > zhao > > Mihael Hategan wrote: > > On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > > > >> sure, it is 172.16.3.6.passphrase= > >> > > > > I don't believe you. Can you paste the file? > > > > > >> Mihael Hategan wrote: > >> > >>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >>> > >>> > >>>> Hi, Mihael > >>>> > >>>> If I put .passphrase= there, I got this: > >>>> > >>>> > >>> With the IP address before .passphrase, of course. I.e. > >>> 172.16.3.6.passphrase= > >>> > >>> > >>> > >>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 19:10:32 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 19:10:32 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229043747.12948.5.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> Message-ID: <4941BA08.8010406@uchicago.edu> Then it failed zzhang at login6.surveyor:~/swift/test> ssh -v -o HostbasedAuthentication=no -l zzh ang -i /home/zzhang/.ssh/id_rsa ion-7 OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: /etc/ssh/ssh_config line 25: Deprecated option "RhostsAuthentication" debug1: Connecting to ion-7 [172.16.3.7] port 22. debug1: Connection established. debug1: identity file /home/zzhang/.ssh/id_rsa type 1 debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 debug1: match: OpenSSH_4.2 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.2 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'ion-7' is known and matches the RSA host key. debug1: Found key in /home/zzhang/.ssh/known_hosts:43 debug1: ssh_rsa_verify: signature correct debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Next authentication method: publickey debug1: Offering public key: /home/zzhang/.ssh/id_rsa debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Next authentication method: keyboard-interactive debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: No more authentication methods to try. Permission denied (publickey,keyboard-interactive,hostbased). zzhang at login6.surveyor:~/swift/test> Mihael Hategan wrote: > I looked at the ssh logs, and it seems like you're logging in using > hostbased authentication. > > Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > -i /home/zzhang/.ssh/id_rsa ion-1 > > Also, note that you misspelled "id_rsa": Warning: Identity > file /home/zzhang/.ssh/ir_rsa not accessible: No such > file or directory. > > > On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > >> Ha, you are right, i put a wrong log here. >> >> I rerun it, if failed with the following message. >> >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >> -tc.file ./tc.data first.swift >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >> >> RunID: 20081211-1850-rcrr2fk0 >> Progress: >> echo started >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >> echo failed >> Execution failed: >> Could not initialize shared directory on bgp000 >> Caused by: >> org.globus.cog.abstraction.impl.file.FileResourceException: >> Error while communicating with the SSH server on 172.16.3.7:22 >> Caused by: >> Public Key Authentication failed >> >> zhao >> >> Mihael Hategan wrote: >> >>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>> >>> >>>> sure, it is 172.16.3.6.passphrase= >>>> >>>> >>> I don't believe you. Can you paste the file? >>> >>> >>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> Hi, Mihael >>>>>> >>>>>> If I put .passphrase= there, I got this: >>>>>> >>>>>> >>>>>> >>>>> With the IP address before .passphrase, of course. I.e. >>>>> 172.16.3.6.passphrase= >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 19:46:41 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 19:46:41 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941BA08.8010406@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> Message-ID: <1229046401.13926.0.camel@localhost> Have you installed the public key on ion-1? On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > Then it failed > > zzhang at login6.surveyor:~/swift/test> ssh -v -o > HostbasedAuthentication=no -l zzh > ang -i /home/zzhang/.ssh/id_rsa ion-7 > OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > debug1: Reading configuration data /etc/ssh/ssh_config > debug1: Applying options for * > debug1: /etc/ssh/ssh_config line 25: Deprecated option > "RhostsAuthentication" > debug1: Connecting to ion-7 [172.16.3.7] port 22. > debug1: Connection established. > debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > debug1: match: OpenSSH_4.2 pat OpenSSH* > debug1: Enabling compatibility mode for protocol 2.0 > debug1: Local version string SSH-2.0-OpenSSH_4.2 > debug1: SSH2_MSG_KEXINIT sent > debug1: SSH2_MSG_KEXINIT received > debug1: kex: server->client aes128-cbc hmac-md5 none > debug1: kex: client->server aes128-cbc hmac-md5 none > debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > debug1: Host 'ion-7' is known and matches the RSA host key. > debug1: Found key in /home/zzhang/.ssh/known_hosts:43 > debug1: ssh_rsa_verify: signature correct > debug1: SSH2_MSG_NEWKEYS sent > debug1: expecting SSH2_MSG_NEWKEYS > debug1: SSH2_MSG_NEWKEYS received > debug1: SSH2_MSG_SERVICE_REQUEST sent > debug1: SSH2_MSG_SERVICE_ACCEPT received > debug1: Authentications that can continue: > publickey,keyboard-interactive,hostbased > debug1: Next authentication method: publickey > debug1: Offering public key: /home/zzhang/.ssh/id_rsa > debug1: Authentications that can continue: > publickey,keyboard-interactive,hostbased > debug1: Next authentication method: keyboard-interactive > debug1: Authentications that can continue: > publickey,keyboard-interactive,hostbased > debug1: No more authentication methods to try. > Permission denied (publickey,keyboard-interactive,hostbased). > zzhang at login6.surveyor:~/swift/test> > > > Mihael Hategan wrote: > > I looked at the ssh logs, and it seems like you're logging in using > > hostbased authentication. > > > > Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > > -i /home/zzhang/.ssh/id_rsa ion-1 > > > > Also, note that you misspelled "id_rsa": Warning: Identity > > file /home/zzhang/.ssh/ir_rsa not accessible: No such > > file or directory. > > > > > > On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > > > >> Ha, you are right, i put a wrong log here. > >> > >> I rerun it, if failed with the following message. > >> > >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > >> -tc.file ./tc.data first.swift > >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > >> > >> RunID: 20081211-1850-rcrr2fk0 > >> Progress: > >> echo started > >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > >> echo failed > >> Execution failed: > >> Could not initialize shared directory on bgp000 > >> Caused by: > >> org.globus.cog.abstraction.impl.file.FileResourceException: > >> Error while communicating with the SSH server on 172.16.3.7:22 > >> Caused by: > >> Public Key Authentication failed > >> > >> zhao > >> > >> Mihael Hategan wrote: > >> > >>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >>> > >>> > >>>> sure, it is 172.16.3.6.passphrase= > >>>> > >>>> > >>> I don't believe you. Can you paste the file? > >>> > >>> > >>> > >>>> Mihael Hategan wrote: > >>>> > >>>> > >>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Hi, Mihael > >>>>>> > >>>>>> If I put .passphrase= there, I got this: > >>>>>> > >>>>>> > >>>>>> > >>>>> With the IP address before .passphrase, of course. I.e. > >>>>> 172.16.3.6.passphrase= > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 20:00:10 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 20:00:10 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229046401.13926.0.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> Message-ID: <4941C5AA.5000604@uchicago.edu> nope, we don't need to since ssh works for us. Besides, I have no idea where the ssh on IO nodes saves the public key. zhao Mihael Hategan wrote: > Have you installed the public key on ion-1? > > On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > >> Then it failed >> >> zzhang at login6.surveyor:~/swift/test> ssh -v -o >> HostbasedAuthentication=no -l zzh >> ang -i /home/zzhang/.ssh/id_rsa ion-7 >> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >> debug1: Reading configuration data /etc/ssh/ssh_config >> debug1: Applying options for * >> debug1: /etc/ssh/ssh_config line 25: Deprecated option >> "RhostsAuthentication" >> debug1: Connecting to ion-7 [172.16.3.7] port 22. >> debug1: Connection established. >> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >> debug1: match: OpenSSH_4.2 pat OpenSSH* >> debug1: Enabling compatibility mode for protocol 2.0 >> debug1: Local version string SSH-2.0-OpenSSH_4.2 >> debug1: SSH2_MSG_KEXINIT sent >> debug1: SSH2_MSG_KEXINIT received >> debug1: kex: server->client aes128-cbc hmac-md5 none >> debug1: kex: client->server aes128-cbc hmac-md5 none >> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >> debug1: Host 'ion-7' is known and matches the RSA host key. >> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 >> debug1: ssh_rsa_verify: signature correct >> debug1: SSH2_MSG_NEWKEYS sent >> debug1: expecting SSH2_MSG_NEWKEYS >> debug1: SSH2_MSG_NEWKEYS received >> debug1: SSH2_MSG_SERVICE_REQUEST sent >> debug1: SSH2_MSG_SERVICE_ACCEPT received >> debug1: Authentications that can continue: >> publickey,keyboard-interactive,hostbased >> debug1: Next authentication method: publickey >> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >> debug1: Authentications that can continue: >> publickey,keyboard-interactive,hostbased >> debug1: Next authentication method: keyboard-interactive >> debug1: Authentications that can continue: >> publickey,keyboard-interactive,hostbased >> debug1: No more authentication methods to try. >> Permission denied (publickey,keyboard-interactive,hostbased). >> zzhang at login6.surveyor:~/swift/test> >> >> >> Mihael Hategan wrote: >> >>> I looked at the ssh logs, and it seems like you're logging in using >>> hostbased authentication. >>> >>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang >>> -i /home/zzhang/.ssh/id_rsa ion-1 >>> >>> Also, note that you misspelled "id_rsa": Warning: Identity >>> file /home/zzhang/.ssh/ir_rsa not accessible: No such >>> file or directory. >>> >>> >>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: >>> >>> >>>> Ha, you are right, i put a wrong log here. >>>> >>>> I rerun it, if failed with the following message. >>>> >>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>> -tc.file ./tc.data first.swift >>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>> >>>> RunID: 20081211-1850-rcrr2fk0 >>>> Progress: >>>> echo started >>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>> echo failed >>>> Execution failed: >>>> Could not initialize shared directory on bgp000 >>>> Caused by: >>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>> Error while communicating with the SSH server on 172.16.3.7:22 >>>> Caused by: >>>> Public Key Authentication failed >>>> >>>> zhao >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> sure, it is 172.16.3.6.passphrase= >>>>>> >>>>>> >>>>>> >>>>> I don't believe you. Can you paste the file? >>>>> >>>>> >>>>> >>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Hi, Mihael >>>>>>>> >>>>>>>> If I put .passphrase= there, I got this: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> With the IP address before .passphrase, of course. I.e. >>>>>>> 172.16.3.6.passphrase= >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 20:10:04 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 20:10:04 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941C5AA.5000604@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> Message-ID: <1229047804.14275.4.camel@localhost> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: > nope, we don't need to since ssh works for us. Mmm, obviously not. May I suggest typing "man ssh" and reading the section on authentication? > Besides, I have no idea > where the ssh on IO nodes saves the public key. For public key authentication you need to put the public key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote machine. This is the public key that corresponds to your private key. > > zhao > > Mihael Hategan wrote: > > Have you installed the public key on ion-1? > > > > On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > > > >> Then it failed > >> > >> zzhang at login6.surveyor:~/swift/test> ssh -v -o > >> HostbasedAuthentication=no -l zzh > >> ang -i /home/zzhang/.ssh/id_rsa ion-7 > >> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > >> debug1: Reading configuration data /etc/ssh/ssh_config > >> debug1: Applying options for * > >> debug1: /etc/ssh/ssh_config line 25: Deprecated option > >> "RhostsAuthentication" > >> debug1: Connecting to ion-7 [172.16.3.7] port 22. > >> debug1: Connection established. > >> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > >> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > >> debug1: match: OpenSSH_4.2 pat OpenSSH* > >> debug1: Enabling compatibility mode for protocol 2.0 > >> debug1: Local version string SSH-2.0-OpenSSH_4.2 > >> debug1: SSH2_MSG_KEXINIT sent > >> debug1: SSH2_MSG_KEXINIT received > >> debug1: kex: server->client aes128-cbc hmac-md5 none > >> debug1: kex: client->server aes128-cbc hmac-md5 none > >> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > >> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > >> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > >> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > >> debug1: Host 'ion-7' is known and matches the RSA host key. > >> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 > >> debug1: ssh_rsa_verify: signature correct > >> debug1: SSH2_MSG_NEWKEYS sent > >> debug1: expecting SSH2_MSG_NEWKEYS > >> debug1: SSH2_MSG_NEWKEYS received > >> debug1: SSH2_MSG_SERVICE_REQUEST sent > >> debug1: SSH2_MSG_SERVICE_ACCEPT received > >> debug1: Authentications that can continue: > >> publickey,keyboard-interactive,hostbased > >> debug1: Next authentication method: publickey > >> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > >> debug1: Authentications that can continue: > >> publickey,keyboard-interactive,hostbased > >> debug1: Next authentication method: keyboard-interactive > >> debug1: Authentications that can continue: > >> publickey,keyboard-interactive,hostbased > >> debug1: No more authentication methods to try. > >> Permission denied (publickey,keyboard-interactive,hostbased). > >> zzhang at login6.surveyor:~/swift/test> > >> > >> > >> Mihael Hategan wrote: > >> > >>> I looked at the ssh logs, and it seems like you're logging in using > >>> hostbased authentication. > >>> > >>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > >>> -i /home/zzhang/.ssh/id_rsa ion-1 > >>> > >>> Also, note that you misspelled "id_rsa": Warning: Identity > >>> file /home/zzhang/.ssh/ir_rsa not accessible: No such > >>> file or directory. > >>> > >>> > >>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > >>> > >>> > >>>> Ha, you are right, i put a wrong log here. > >>>> > >>>> I rerun it, if failed with the following message. > >>>> > >>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > >>>> -tc.file ./tc.data first.swift > >>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > >>>> > >>>> RunID: 20081211-1850-rcrr2fk0 > >>>> Progress: > >>>> echo started > >>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > >>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > >>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > >>>> echo failed > >>>> Execution failed: > >>>> Could not initialize shared directory on bgp000 > >>>> Caused by: > >>>> org.globus.cog.abstraction.impl.file.FileResourceException: > >>>> Error while communicating with the SSH server on 172.16.3.7:22 > >>>> Caused by: > >>>> Public Key Authentication failed > >>>> > >>>> zhao > >>>> > >>>> Mihael Hategan wrote: > >>>> > >>>> > >>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >>>>> > >>>>> > >>>>> > >>>>>> sure, it is 172.16.3.6.passphrase= > >>>>>> > >>>>>> > >>>>>> > >>>>> I don't believe you. Can you paste the file? > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> Mihael Hategan wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Hi, Mihael > >>>>>>>> > >>>>>>>> If I put .passphrase= there, I got this: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> With the IP address before .passphrase, of course. I.e. > >>>>>>> 172.16.3.6.passphrase= > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 20:16:24 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 20:16:24 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229047804.14275.4.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> Message-ID: <4941C978.6020102@uchicago.edu> Mihael Hategan wrote: > On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: > >> nope, we don't need to since ssh works for us. >> > > Mmm, obviously not. May I suggest typing "man ssh" and reading the > section on authentication? > By "it works" I mean it works for our ordinary use, we could login IO nodes with that host based authentication. > >> Besides, I have no idea >> where the ssh on IO nodes saves the public key. >> > > For public key authentication you need to put the public > key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote > machine. This is the public key that corresponds to your private key. > Ha, it works now. zzhang at login6.surveyor:~/swift/test> ssh -v -o HostbasedAuthentication=no -l zzh ang -i /home/zzhang/.ssh/id_rsa ion-1 OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: /etc/ssh/ssh_config line 25: Deprecated option "RhostsAuthentication" debug1: Connecting to ion-1 [172.16.3.1] port 22. debug1: Connection established. debug1: identity file /home/zzhang/.ssh/id_rsa type 1 debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 debug1: match: OpenSSH_4.2 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.2 debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'ion-1' is known and matches the RSA host key. debug1: Found key in /home/zzhang/.ssh/known_hosts:40 Warning: the RSA host key for 'ion-1' differs from the key for the IP address '172.16.3.1' Offending key for IP in /home/zzhang/.ssh/known_hosts:3 Matching host key in /home/zzhang/.ssh/known_hosts:40 debug1: ssh_rsa_verify: signature correct debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug1: SSH2_MSG_NEWKEYS received debug1: SSH2_MSG_SERVICE_REQUEST sent debug1: SSH2_MSG_SERVICE_ACCEPT received debug1: Authentications that can continue: publickey,keyboard-interactive,hostbased debug1: Next authentication method: publickey debug1: Offering public key: /home/zzhang/.ssh/id_rsa debug1: Server accepts key: pkalg ssh-rsa blen 277 debug1: read PEM private key done: type RSA debug1: Authentication succeeded (publickey). debug1: channel 0: new [client-session] debug1: Entering interactive session. Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) Enter 'help' for a list of built-in commands. /gpfs/home/zzhang $ > >> zhao >> >> Mihael Hategan wrote: >> >>> Have you installed the public key on ion-1? >>> >>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: >>> >>> >>>> Then it failed >>>> >>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>> HostbasedAuthentication=no -l zzh >>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 >>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>> debug1: Applying options for * >>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>> "RhostsAuthentication" >>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. >>>> debug1: Connection established. >>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>> debug1: Enabling compatibility mode for protocol 2.0 >>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>> debug1: SSH2_MSG_KEXINIT sent >>>> debug1: SSH2_MSG_KEXINIT received >>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>> debug1: Host 'ion-7' is known and matches the RSA host key. >>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 >>>> debug1: ssh_rsa_verify: signature correct >>>> debug1: SSH2_MSG_NEWKEYS sent >>>> debug1: expecting SSH2_MSG_NEWKEYS >>>> debug1: SSH2_MSG_NEWKEYS received >>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>> debug1: Authentications that can continue: >>>> publickey,keyboard-interactive,hostbased >>>> debug1: Next authentication method: publickey >>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>> debug1: Authentications that can continue: >>>> publickey,keyboard-interactive,hostbased >>>> debug1: Next authentication method: keyboard-interactive >>>> debug1: Authentications that can continue: >>>> publickey,keyboard-interactive,hostbased >>>> debug1: No more authentication methods to try. >>>> Permission denied (publickey,keyboard-interactive,hostbased). >>>> zzhang at login6.surveyor:~/swift/test> >>>> >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> I looked at the ssh logs, and it seems like you're logging in using >>>>> hostbased authentication. >>>>> >>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang >>>>> -i /home/zzhang/.ssh/id_rsa ion-1 >>>>> >>>>> Also, note that you misspelled "id_rsa": Warning: Identity >>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such >>>>> file or directory. >>>>> >>>>> >>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> Ha, you are right, i put a wrong log here. >>>>>> >>>>>> I rerun it, if failed with the following message. >>>>>> >>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>>>> -tc.file ./tc.data first.swift >>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>>>> >>>>>> RunID: 20081211-1850-rcrr2fk0 >>>>>> Progress: >>>>>> echo started >>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>>>> echo failed >>>>>> Execution failed: >>>>>> Could not initialize shared directory on bgp000 >>>>>> Caused by: >>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>>>> Error while communicating with the SSH server on 172.16.3.7:22 >>>>>> Caused by: >>>>>> Public Key Authentication failed >>>>>> >>>>>> zhao >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> sure, it is 172.16.3.6.passphrase= >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> I don't believe you. Can you paste the file? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Mihael Hategan wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi, Mihael >>>>>>>>>> >>>>>>>>>> If I put .passphrase= there, I got this: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> With the IP address before .passphrase, of course. I.e. >>>>>>>>> 172.16.3.6.passphrase= >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 20:41:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 20:41:21 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941C978.6020102@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> Message-ID: <1229049681.14869.0.camel@localhost> now try swift On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: > > Mihael Hategan wrote: > > On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: > > > >> nope, we don't need to since ssh works for us. > >> > > > > Mmm, obviously not. May I suggest typing "man ssh" and reading the > > section on authentication? > > > By "it works" I mean it works for our ordinary use, we could login IO > nodes with that host based authentication. > > > >> Besides, I have no idea > >> where the ssh on IO nodes saves the public key. > >> > > > > For public key authentication you need to put the public > > key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote > > machine. This is the public key that corresponds to your private key. > > > Ha, it works now. > > zzhang at login6.surveyor:~/swift/test> ssh -v -o > HostbasedAuthentication=no -l zzh > ang -i /home/zzhang/.ssh/id_rsa ion-1 > OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > debug1: Reading configuration data /etc/ssh/ssh_config > debug1: Applying options for * > debug1: /etc/ssh/ssh_config line 25: Deprecated option > "RhostsAuthentication" > debug1: Connecting to ion-1 [172.16.3.1] port 22. > debug1: Connection established. > debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > debug1: match: OpenSSH_4.2 pat OpenSSH* > debug1: Enabling compatibility mode for protocol 2.0 > debug1: Local version string SSH-2.0-OpenSSH_4.2 > debug1: SSH2_MSG_KEXINIT sent > debug1: SSH2_MSG_KEXINIT received > debug1: kex: server->client aes128-cbc hmac-md5 none > debug1: kex: client->server aes128-cbc hmac-md5 none > debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > debug1: Host 'ion-1' is known and matches the RSA host key. > debug1: Found key in /home/zzhang/.ssh/known_hosts:40 > Warning: the RSA host key for 'ion-1' differs from the key for the IP > address '172.16.3.1' > Offending key for IP in /home/zzhang/.ssh/known_hosts:3 > Matching host key in /home/zzhang/.ssh/known_hosts:40 > debug1: ssh_rsa_verify: signature correct > debug1: SSH2_MSG_NEWKEYS sent > debug1: expecting SSH2_MSG_NEWKEYS > debug1: SSH2_MSG_NEWKEYS received > debug1: SSH2_MSG_SERVICE_REQUEST sent > debug1: SSH2_MSG_SERVICE_ACCEPT received > debug1: Authentications that can continue: > publickey,keyboard-interactive,hostbased > debug1: Next authentication method: publickey > debug1: Offering public key: /home/zzhang/.ssh/id_rsa > debug1: Server accepts key: pkalg ssh-rsa blen 277 > debug1: read PEM private key done: type RSA > debug1: Authentication succeeded (publickey). > debug1: channel 0: new [client-session] > debug1: Entering interactive session. > Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov > > > BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) > Enter 'help' for a list of built-in commands. > > /gpfs/home/zzhang $ > > > >> zhao > >> > >> Mihael Hategan wrote: > >> > >>> Have you installed the public key on ion-1? > >>> > >>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > >>> > >>> > >>>> Then it failed > >>>> > >>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o > >>>> HostbasedAuthentication=no -l zzh > >>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 > >>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > >>>> debug1: Reading configuration data /etc/ssh/ssh_config > >>>> debug1: Applying options for * > >>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option > >>>> "RhostsAuthentication" > >>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. > >>>> debug1: Connection established. > >>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > >>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > >>>> debug1: match: OpenSSH_4.2 pat OpenSSH* > >>>> debug1: Enabling compatibility mode for protocol 2.0 > >>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 > >>>> debug1: SSH2_MSG_KEXINIT sent > >>>> debug1: SSH2_MSG_KEXINIT received > >>>> debug1: kex: server->client aes128-cbc hmac-md5 none > >>>> debug1: kex: client->server aes128-cbc hmac-md5 none > >>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > >>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > >>>> debug1: Host 'ion-7' is known and matches the RSA host key. > >>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 > >>>> debug1: ssh_rsa_verify: signature correct > >>>> debug1: SSH2_MSG_NEWKEYS sent > >>>> debug1: expecting SSH2_MSG_NEWKEYS > >>>> debug1: SSH2_MSG_NEWKEYS received > >>>> debug1: SSH2_MSG_SERVICE_REQUEST sent > >>>> debug1: SSH2_MSG_SERVICE_ACCEPT received > >>>> debug1: Authentications that can continue: > >>>> publickey,keyboard-interactive,hostbased > >>>> debug1: Next authentication method: publickey > >>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > >>>> debug1: Authentications that can continue: > >>>> publickey,keyboard-interactive,hostbased > >>>> debug1: Next authentication method: keyboard-interactive > >>>> debug1: Authentications that can continue: > >>>> publickey,keyboard-interactive,hostbased > >>>> debug1: No more authentication methods to try. > >>>> Permission denied (publickey,keyboard-interactive,hostbased). > >>>> zzhang at login6.surveyor:~/swift/test> > >>>> > >>>> > >>>> Mihael Hategan wrote: > >>>> > >>>> > >>>>> I looked at the ssh logs, and it seems like you're logging in using > >>>>> hostbased authentication. > >>>>> > >>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > >>>>> -i /home/zzhang/.ssh/id_rsa ion-1 > >>>>> > >>>>> Also, note that you misspelled "id_rsa": Warning: Identity > >>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such > >>>>> file or directory. > >>>>> > >>>>> > >>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Ha, you are right, i put a wrong log here. > >>>>>> > >>>>>> I rerun it, if failed with the following message. > >>>>>> > >>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > >>>>>> -tc.file ./tc.data first.swift > >>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > >>>>>> > >>>>>> RunID: 20081211-1850-rcrr2fk0 > >>>>>> Progress: > >>>>>> echo started > >>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > >>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > >>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > >>>>>> echo failed > >>>>>> Execution failed: > >>>>>> Could not initialize shared directory on bgp000 > >>>>>> Caused by: > >>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: > >>>>>> Error while communicating with the SSH server on 172.16.3.7:22 > >>>>>> Caused by: > >>>>>> Public Key Authentication failed > >>>>>> > >>>>>> zhao > >>>>>> > >>>>>> Mihael Hategan wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> sure, it is 172.16.3.6.passphrase= > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> I don't believe you. Can you paste the file? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Mihael Hategan wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Hi, Mihael > >>>>>>>>>> > >>>>>>>>>> If I put .passphrase= there, I got this: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> With the IP address before .passphrase, of course. I.e. > >>>>>>>>> 172.16.3.6.passphrase= > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 20:52:28 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 20:52:28 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229049681.14869.0.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> Message-ID: <4941D1EC.80905@uchicago.edu> got this zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml -tc.file ./tc.data first.swift Swift svn swift-r2334 (Swift modified locally) cog-r2216 RunID: 20081211-2021-oi8c3r0b Progress: echo started Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] Sorted: [bgp000:999.590(98.544):0/789 overload: 0] Sorted: [bgp000:999.180(98.544):0/789 overload: 0] echo failed Execution failed: Could not initialize shared directory on bgp000 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error while communicating with the SSH server on 172.16.3.2:22 Caused by: Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 Mihael Hategan wrote: > now try swift > > On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: > >> Mihael Hategan wrote: >> >>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: >>> >>> >>>> nope, we don't need to since ssh works for us. >>>> >>>> >>> Mmm, obviously not. May I suggest typing "man ssh" and reading the >>> section on authentication? >>> >>> >> By "it works" I mean it works for our ordinary use, we could login IO >> nodes with that host based authentication. >> >>> >>> >>>> Besides, I have no idea >>>> where the ssh on IO nodes saves the public key. >>>> >>>> >>> For public key authentication you need to put the public >>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote >>> machine. This is the public key that corresponds to your private key. >>> >>> >> Ha, it works now. >> >> zzhang at login6.surveyor:~/swift/test> ssh -v -o >> HostbasedAuthentication=no -l zzh >> ang -i /home/zzhang/.ssh/id_rsa ion-1 >> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >> debug1: Reading configuration data /etc/ssh/ssh_config >> debug1: Applying options for * >> debug1: /etc/ssh/ssh_config line 25: Deprecated option >> "RhostsAuthentication" >> debug1: Connecting to ion-1 [172.16.3.1] port 22. >> debug1: Connection established. >> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >> debug1: match: OpenSSH_4.2 pat OpenSSH* >> debug1: Enabling compatibility mode for protocol 2.0 >> debug1: Local version string SSH-2.0-OpenSSH_4.2 >> debug1: SSH2_MSG_KEXINIT sent >> debug1: SSH2_MSG_KEXINIT received >> debug1: kex: server->client aes128-cbc hmac-md5 none >> debug1: kex: client->server aes128-cbc hmac-md5 none >> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >> debug1: Host 'ion-1' is known and matches the RSA host key. >> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 >> Warning: the RSA host key for 'ion-1' differs from the key for the IP >> address '172.16.3.1' >> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 >> Matching host key in /home/zzhang/.ssh/known_hosts:40 >> debug1: ssh_rsa_verify: signature correct >> debug1: SSH2_MSG_NEWKEYS sent >> debug1: expecting SSH2_MSG_NEWKEYS >> debug1: SSH2_MSG_NEWKEYS received >> debug1: SSH2_MSG_SERVICE_REQUEST sent >> debug1: SSH2_MSG_SERVICE_ACCEPT received >> debug1: Authentications that can continue: >> publickey,keyboard-interactive,hostbased >> debug1: Next authentication method: publickey >> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >> debug1: Server accepts key: pkalg ssh-rsa blen 277 >> debug1: read PEM private key done: type RSA >> debug1: Authentication succeeded (publickey). >> debug1: channel 0: new [client-session] >> debug1: Entering interactive session. >> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov >> >> >> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) >> Enter 'help' for a list of built-in commands. >> >> /gpfs/home/zzhang $ >> >>> >>> >>>> zhao >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> Have you installed the public key on ion-1? >>>>> >>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> Then it failed >>>>>> >>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>>>> HostbasedAuthentication=no -l zzh >>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 >>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>>>> debug1: Applying options for * >>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>>>> "RhostsAuthentication" >>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. >>>>>> debug1: Connection established. >>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>>>> debug1: Enabling compatibility mode for protocol 2.0 >>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>>>> debug1: SSH2_MSG_KEXINIT sent >>>>>> debug1: SSH2_MSG_KEXINIT received >>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. >>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 >>>>>> debug1: ssh_rsa_verify: signature correct >>>>>> debug1: SSH2_MSG_NEWKEYS sent >>>>>> debug1: expecting SSH2_MSG_NEWKEYS >>>>>> debug1: SSH2_MSG_NEWKEYS received >>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>>>> debug1: Authentications that can continue: >>>>>> publickey,keyboard-interactive,hostbased >>>>>> debug1: Next authentication method: publickey >>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>>>> debug1: Authentications that can continue: >>>>>> publickey,keyboard-interactive,hostbased >>>>>> debug1: Next authentication method: keyboard-interactive >>>>>> debug1: Authentications that can continue: >>>>>> publickey,keyboard-interactive,hostbased >>>>>> debug1: No more authentication methods to try. >>>>>> Permission denied (publickey,keyboard-interactive,hostbased). >>>>>> zzhang at login6.surveyor:~/swift/test> >>>>>> >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> I looked at the ssh logs, and it seems like you're logging in using >>>>>>> hostbased authentication. >>>>>>> >>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang >>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 >>>>>>> >>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity >>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such >>>>>>> file or directory. >>>>>>> >>>>>>> >>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Ha, you are right, i put a wrong log here. >>>>>>>> >>>>>>>> I rerun it, if failed with the following message. >>>>>>>> >>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>>>>>> -tc.file ./tc.data first.swift >>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>>>>>> >>>>>>>> RunID: 20081211-1850-rcrr2fk0 >>>>>>>> Progress: >>>>>>>> echo started >>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>>>>>> echo failed >>>>>>>> Execution failed: >>>>>>>> Could not initialize shared directory on bgp000 >>>>>>>> Caused by: >>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 >>>>>>>> Caused by: >>>>>>>> Public Key Authentication failed >>>>>>>> >>>>>>>> zhao >>>>>>>> >>>>>>>> Mihael Hategan wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> sure, it is 172.16.3.6.passphrase= >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> I don't believe you. Can you paste the file? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Hi, Mihael >>>>>>>>>>>> >>>>>>>>>>>> If I put .passphrase= there, I got this: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> With the IP address before .passphrase, of course. I.e. >>>>>>>>>>> 172.16.3.6.passphrase= >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 20:57:44 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 20:57:44 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941D1EC.80905@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> <4941D1EC.80905@uchicago.edu> Message-ID: <1229050664.15189.1.camel@localhost> You could ask the folks who administer the BG to enable sftp on the io nodes. This is enabled by default with openssh. On Thu, 2008-12-11 at 20:52 -0600, Zhao Zhang wrote: > got this > > zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > -tc.file ./tc.data first.swift > Swift svn swift-r2334 (Swift modified locally) cog-r2216 > > RunID: 20081211-2021-oi8c3r0b > Progress: > echo started > Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > echo failed > Execution failed: > Could not initialize shared directory on bgp000 > Caused by: > org.globus.cog.abstraction.impl.file.FileResourceException: > Error while communicating with the SSH server on 172.16.3.2:22 > Caused by: > Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 > > > Mihael Hategan wrote: > > now try swift > > > > On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: > > > >> Mihael Hategan wrote: > >> > >>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: > >>> > >>> > >>>> nope, we don't need to since ssh works for us. > >>>> > >>>> > >>> Mmm, obviously not. May I suggest typing "man ssh" and reading the > >>> section on authentication? > >>> > >>> > >> By "it works" I mean it works for our ordinary use, we could login IO > >> nodes with that host based authentication. > >> > >>> > >>> > >>>> Besides, I have no idea > >>>> where the ssh on IO nodes saves the public key. > >>>> > >>>> > >>> For public key authentication you need to put the public > >>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote > >>> machine. This is the public key that corresponds to your private key. > >>> > >>> > >> Ha, it works now. > >> > >> zzhang at login6.surveyor:~/swift/test> ssh -v -o > >> HostbasedAuthentication=no -l zzh > >> ang -i /home/zzhang/.ssh/id_rsa ion-1 > >> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > >> debug1: Reading configuration data /etc/ssh/ssh_config > >> debug1: Applying options for * > >> debug1: /etc/ssh/ssh_config line 25: Deprecated option > >> "RhostsAuthentication" > >> debug1: Connecting to ion-1 [172.16.3.1] port 22. > >> debug1: Connection established. > >> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > >> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > >> debug1: match: OpenSSH_4.2 pat OpenSSH* > >> debug1: Enabling compatibility mode for protocol 2.0 > >> debug1: Local version string SSH-2.0-OpenSSH_4.2 > >> debug1: SSH2_MSG_KEXINIT sent > >> debug1: SSH2_MSG_KEXINIT received > >> debug1: kex: server->client aes128-cbc hmac-md5 none > >> debug1: kex: client->server aes128-cbc hmac-md5 none > >> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > >> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > >> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > >> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > >> debug1: Host 'ion-1' is known and matches the RSA host key. > >> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 > >> Warning: the RSA host key for 'ion-1' differs from the key for the IP > >> address '172.16.3.1' > >> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 > >> Matching host key in /home/zzhang/.ssh/known_hosts:40 > >> debug1: ssh_rsa_verify: signature correct > >> debug1: SSH2_MSG_NEWKEYS sent > >> debug1: expecting SSH2_MSG_NEWKEYS > >> debug1: SSH2_MSG_NEWKEYS received > >> debug1: SSH2_MSG_SERVICE_REQUEST sent > >> debug1: SSH2_MSG_SERVICE_ACCEPT received > >> debug1: Authentications that can continue: > >> publickey,keyboard-interactive,hostbased > >> debug1: Next authentication method: publickey > >> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > >> debug1: Server accepts key: pkalg ssh-rsa blen 277 > >> debug1: read PEM private key done: type RSA > >> debug1: Authentication succeeded (publickey). > >> debug1: channel 0: new [client-session] > >> debug1: Entering interactive session. > >> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov > >> > >> > >> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) > >> Enter 'help' for a list of built-in commands. > >> > >> /gpfs/home/zzhang $ > >> > >>> > >>> > >>>> zhao > >>>> > >>>> Mihael Hategan wrote: > >>>> > >>>> > >>>>> Have you installed the public key on ion-1? > >>>>> > >>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > >>>>> > >>>>> > >>>>> > >>>>>> Then it failed > >>>>>> > >>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o > >>>>>> HostbasedAuthentication=no -l zzh > >>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 > >>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > >>>>>> debug1: Reading configuration data /etc/ssh/ssh_config > >>>>>> debug1: Applying options for * > >>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option > >>>>>> "RhostsAuthentication" > >>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. > >>>>>> debug1: Connection established. > >>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > >>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > >>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* > >>>>>> debug1: Enabling compatibility mode for protocol 2.0 > >>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 > >>>>>> debug1: SSH2_MSG_KEXINIT sent > >>>>>> debug1: SSH2_MSG_KEXINIT received > >>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none > >>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none > >>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > >>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > >>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > >>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > >>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. > >>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 > >>>>>> debug1: ssh_rsa_verify: signature correct > >>>>>> debug1: SSH2_MSG_NEWKEYS sent > >>>>>> debug1: expecting SSH2_MSG_NEWKEYS > >>>>>> debug1: SSH2_MSG_NEWKEYS received > >>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent > >>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received > >>>>>> debug1: Authentications that can continue: > >>>>>> publickey,keyboard-interactive,hostbased > >>>>>> debug1: Next authentication method: publickey > >>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > >>>>>> debug1: Authentications that can continue: > >>>>>> publickey,keyboard-interactive,hostbased > >>>>>> debug1: Next authentication method: keyboard-interactive > >>>>>> debug1: Authentications that can continue: > >>>>>> publickey,keyboard-interactive,hostbased > >>>>>> debug1: No more authentication methods to try. > >>>>>> Permission denied (publickey,keyboard-interactive,hostbased). > >>>>>> zzhang at login6.surveyor:~/swift/test> > >>>>>> > >>>>>> > >>>>>> Mihael Hategan wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> I looked at the ssh logs, and it seems like you're logging in using > >>>>>>> hostbased authentication. > >>>>>>> > >>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > >>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 > >>>>>>> > >>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity > >>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such > >>>>>>> file or directory. > >>>>>>> > >>>>>>> > >>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Ha, you are right, i put a wrong log here. > >>>>>>>> > >>>>>>>> I rerun it, if failed with the following message. > >>>>>>>> > >>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > >>>>>>>> -tc.file ./tc.data first.swift > >>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > >>>>>>>> > >>>>>>>> RunID: 20081211-1850-rcrr2fk0 > >>>>>>>> Progress: > >>>>>>>> echo started > >>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > >>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > >>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > >>>>>>>> echo failed > >>>>>>>> Execution failed: > >>>>>>>> Could not initialize shared directory on bgp000 > >>>>>>>> Caused by: > >>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: > >>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 > >>>>>>>> Caused by: > >>>>>>>> Public Key Authentication failed > >>>>>>>> > >>>>>>>> zhao > >>>>>>>> > >>>>>>>> Mihael Hategan wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> sure, it is 172.16.3.6.passphrase= > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> I don't believe you. Can you paste the file? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Mihael Hategan wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Hi, Mihael > >>>>>>>>>>>> > >>>>>>>>>>>> If I put .passphrase= there, I got this: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> With the IP address before .passphrase, of course. I.e. > >>>>>>>>>>> 172.16.3.6.passphrase= > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 20:57:32 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 20:57:32 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229050664.15189.1.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> <4941D1EC.80905@uchicago.edu> <1229050664.15189.1.camel@localhost> Message-ID: <4941D31C.9050402@uchicago.edu> I tried to run sftp on IO nodes, bash-3.1$ sftp usage: sftp [-1Cv] [-B buffer_size] [-b batchfile] [-F ssh_config] [-o ssh_option] [-P sftp_server_path] [-R num_requests] [-S program] [-s subsystem | sftp_server] host sftp [[user@]host[:file [file]]] sftp [[user@]host[:dir[/]]] sftp -b batchfile [user@]host it seems that there is a working version zhao Mihael Hategan wrote: > You could ask the folks who administer the BG to enable sftp on the io > nodes. This is enabled by default with openssh. > > On Thu, 2008-12-11 at 20:52 -0600, Zhao Zhang wrote: > >> got this >> >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >> -tc.file ./tc.data first.swift >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >> >> RunID: 20081211-2021-oi8c3r0b >> Progress: >> echo started >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >> echo failed >> Execution failed: >> Could not initialize shared directory on bgp000 >> Caused by: >> org.globus.cog.abstraction.impl.file.FileResourceException: >> Error while communicating with the SSH server on 172.16.3.2:22 >> Caused by: >> Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 >> >> >> Mihael Hategan wrote: >> >>> now try swift >>> >>> On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: >>> >>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> nope, we don't need to since ssh works for us. >>>>>> >>>>>> >>>>>> >>>>> Mmm, obviously not. May I suggest typing "man ssh" and reading the >>>>> section on authentication? >>>>> >>>>> >>>>> >>>> By "it works" I mean it works for our ordinary use, we could login IO >>>> nodes with that host based authentication. >>>> >>>> >>>>> >>>>> >>>>> >>>>>> Besides, I have no idea >>>>>> where the ssh on IO nodes saves the public key. >>>>>> >>>>>> >>>>>> >>>>> For public key authentication you need to put the public >>>>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote >>>>> machine. This is the public key that corresponds to your private key. >>>>> >>>>> >>>>> >>>> Ha, it works now. >>>> >>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>> HostbasedAuthentication=no -l zzh >>>> ang -i /home/zzhang/.ssh/id_rsa ion-1 >>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>> debug1: Applying options for * >>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>> "RhostsAuthentication" >>>> debug1: Connecting to ion-1 [172.16.3.1] port 22. >>>> debug1: Connection established. >>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>> debug1: Enabling compatibility mode for protocol 2.0 >>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>> debug1: SSH2_MSG_KEXINIT sent >>>> debug1: SSH2_MSG_KEXINIT received >>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>> debug1: Host 'ion-1' is known and matches the RSA host key. >>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 >>>> Warning: the RSA host key for 'ion-1' differs from the key for the IP >>>> address '172.16.3.1' >>>> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 >>>> Matching host key in /home/zzhang/.ssh/known_hosts:40 >>>> debug1: ssh_rsa_verify: signature correct >>>> debug1: SSH2_MSG_NEWKEYS sent >>>> debug1: expecting SSH2_MSG_NEWKEYS >>>> debug1: SSH2_MSG_NEWKEYS received >>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>> debug1: Authentications that can continue: >>>> publickey,keyboard-interactive,hostbased >>>> debug1: Next authentication method: publickey >>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>> debug1: Server accepts key: pkalg ssh-rsa blen 277 >>>> debug1: read PEM private key done: type RSA >>>> debug1: Authentication succeeded (publickey). >>>> debug1: channel 0: new [client-session] >>>> debug1: Entering interactive session. >>>> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov >>>> >>>> >>>> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) >>>> Enter 'help' for a list of built-in commands. >>>> >>>> /gpfs/home/zzhang $ >>>> >>>> >>>>> >>>>> >>>>> >>>>>> zhao >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Have you installed the public key on ion-1? >>>>>>> >>>>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Then it failed >>>>>>>> >>>>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>>>>>> HostbasedAuthentication=no -l zzh >>>>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 >>>>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>>>>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>>>>>> debug1: Applying options for * >>>>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>>>>>> "RhostsAuthentication" >>>>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. >>>>>>>> debug1: Connection established. >>>>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>>>>>> debug1: Enabling compatibility mode for protocol 2.0 >>>>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>>>>>> debug1: SSH2_MSG_KEXINIT sent >>>>>>>> debug1: SSH2_MSG_KEXINIT received >>>>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. >>>>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 >>>>>>>> debug1: ssh_rsa_verify: signature correct >>>>>>>> debug1: SSH2_MSG_NEWKEYS sent >>>>>>>> debug1: expecting SSH2_MSG_NEWKEYS >>>>>>>> debug1: SSH2_MSG_NEWKEYS received >>>>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>>>>>> debug1: Authentications that can continue: >>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>> debug1: Next authentication method: publickey >>>>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>>>>>> debug1: Authentications that can continue: >>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>> debug1: Next authentication method: keyboard-interactive >>>>>>>> debug1: Authentications that can continue: >>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>> debug1: No more authentication methods to try. >>>>>>>> Permission denied (publickey,keyboard-interactive,hostbased). >>>>>>>> zzhang at login6.surveyor:~/swift/test> >>>>>>>> >>>>>>>> >>>>>>>> Mihael Hategan wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> I looked at the ssh logs, and it seems like you're logging in using >>>>>>>>> hostbased authentication. >>>>>>>>> >>>>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang >>>>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 >>>>>>>>> >>>>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity >>>>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such >>>>>>>>> file or directory. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Ha, you are right, i put a wrong log here. >>>>>>>>>> >>>>>>>>>> I rerun it, if failed with the following message. >>>>>>>>>> >>>>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>>>>>>>> -tc.file ./tc.data first.swift >>>>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>>>>>>>> >>>>>>>>>> RunID: 20081211-1850-rcrr2fk0 >>>>>>>>>> Progress: >>>>>>>>>> echo started >>>>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>>>>>>>> echo failed >>>>>>>>>> Execution failed: >>>>>>>>>> Could not initialize shared directory on bgp000 >>>>>>>>>> Caused by: >>>>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 >>>>>>>>>> Caused by: >>>>>>>>>> Public Key Authentication failed >>>>>>>>>> >>>>>>>>>> zhao >>>>>>>>>> >>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> sure, it is 172.16.3.6.passphrase= >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I don't believe you. Can you paste the file? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, Mihael >>>>>>>>>>>>>> >>>>>>>>>>>>>> If I put .passphrase= there, I got this: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> With the IP address before .passphrase, of course. I.e. >>>>>>>>>>>>> 172.16.3.6.passphrase= >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 21:00:44 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 21:00:44 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <4941D31C.9050402@uchicago.edu> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> <4941D1EC.80905@uchicago.edu> <1229050664.15189.1.camel@localhost> <4941D31C.9050402@uchicago.edu> Message-ID: <1229050844.15302.0.camel@localhost> Ok. Can you paste a link to the log file? On Thu, 2008-12-11 at 20:57 -0600, Zhao Zhang wrote: > I tried to run sftp on IO nodes, > > bash-3.1$ sftp > usage: sftp [-1Cv] [-B buffer_size] [-b batchfile] [-F ssh_config] > [-o ssh_option] [-P sftp_server_path] [-R num_requests] > [-S program] [-s subsystem | sftp_server] host > sftp [[user@]host[:file [file]]] > sftp [[user@]host[:dir[/]]] > sftp -b batchfile [user@]host > > it seems that there is a working version > > zhao > > Mihael Hategan wrote: > > You could ask the folks who administer the BG to enable sftp on the io > > nodes. This is enabled by default with openssh. > > > > On Thu, 2008-12-11 at 20:52 -0600, Zhao Zhang wrote: > > > >> got this > >> > >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > >> -tc.file ./tc.data first.swift > >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > >> > >> RunID: 20081211-2021-oi8c3r0b > >> Progress: > >> echo started > >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > >> echo failed > >> Execution failed: > >> Could not initialize shared directory on bgp000 > >> Caused by: > >> org.globus.cog.abstraction.impl.file.FileResourceException: > >> Error while communicating with the SSH server on 172.16.3.2:22 > >> Caused by: > >> Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 > >> > >> > >> Mihael Hategan wrote: > >> > >>> now try swift > >>> > >>> On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: > >>> > >>> > >>>> Mihael Hategan wrote: > >>>> > >>>> > >>>>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: > >>>>> > >>>>> > >>>>> > >>>>>> nope, we don't need to since ssh works for us. > >>>>>> > >>>>>> > >>>>>> > >>>>> Mmm, obviously not. May I suggest typing "man ssh" and reading the > >>>>> section on authentication? > >>>>> > >>>>> > >>>>> > >>>> By "it works" I mean it works for our ordinary use, we could login IO > >>>> nodes with that host based authentication. > >>>> > >>>> > >>>>> > >>>>> > >>>>> > >>>>>> Besides, I have no idea > >>>>>> where the ssh on IO nodes saves the public key. > >>>>>> > >>>>>> > >>>>>> > >>>>> For public key authentication you need to put the public > >>>>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote > >>>>> machine. This is the public key that corresponds to your private key. > >>>>> > >>>>> > >>>>> > >>>> Ha, it works now. > >>>> > >>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o > >>>> HostbasedAuthentication=no -l zzh > >>>> ang -i /home/zzhang/.ssh/id_rsa ion-1 > >>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > >>>> debug1: Reading configuration data /etc/ssh/ssh_config > >>>> debug1: Applying options for * > >>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option > >>>> "RhostsAuthentication" > >>>> debug1: Connecting to ion-1 [172.16.3.1] port 22. > >>>> debug1: Connection established. > >>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > >>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > >>>> debug1: match: OpenSSH_4.2 pat OpenSSH* > >>>> debug1: Enabling compatibility mode for protocol 2.0 > >>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 > >>>> debug1: SSH2_MSG_KEXINIT sent > >>>> debug1: SSH2_MSG_KEXINIT received > >>>> debug1: kex: server->client aes128-cbc hmac-md5 none > >>>> debug1: kex: client->server aes128-cbc hmac-md5 none > >>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > >>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > >>>> debug1: Host 'ion-1' is known and matches the RSA host key. > >>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 > >>>> Warning: the RSA host key for 'ion-1' differs from the key for the IP > >>>> address '172.16.3.1' > >>>> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 > >>>> Matching host key in /home/zzhang/.ssh/known_hosts:40 > >>>> debug1: ssh_rsa_verify: signature correct > >>>> debug1: SSH2_MSG_NEWKEYS sent > >>>> debug1: expecting SSH2_MSG_NEWKEYS > >>>> debug1: SSH2_MSG_NEWKEYS received > >>>> debug1: SSH2_MSG_SERVICE_REQUEST sent > >>>> debug1: SSH2_MSG_SERVICE_ACCEPT received > >>>> debug1: Authentications that can continue: > >>>> publickey,keyboard-interactive,hostbased > >>>> debug1: Next authentication method: publickey > >>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > >>>> debug1: Server accepts key: pkalg ssh-rsa blen 277 > >>>> debug1: read PEM private key done: type RSA > >>>> debug1: Authentication succeeded (publickey). > >>>> debug1: channel 0: new [client-session] > >>>> debug1: Entering interactive session. > >>>> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov > >>>> > >>>> > >>>> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) > >>>> Enter 'help' for a list of built-in commands. > >>>> > >>>> /gpfs/home/zzhang $ > >>>> > >>>> > >>>>> > >>>>> > >>>>> > >>>>>> zhao > >>>>>> > >>>>>> Mihael Hategan wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Have you installed the public key on ion-1? > >>>>>>> > >>>>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> Then it failed > >>>>>>>> > >>>>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o > >>>>>>>> HostbasedAuthentication=no -l zzh > >>>>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 > >>>>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > >>>>>>>> debug1: Reading configuration data /etc/ssh/ssh_config > >>>>>>>> debug1: Applying options for * > >>>>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option > >>>>>>>> "RhostsAuthentication" > >>>>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. > >>>>>>>> debug1: Connection established. > >>>>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > >>>>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > >>>>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* > >>>>>>>> debug1: Enabling compatibility mode for protocol 2.0 > >>>>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 > >>>>>>>> debug1: SSH2_MSG_KEXINIT sent > >>>>>>>> debug1: SSH2_MSG_KEXINIT received > >>>>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none > >>>>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none > >>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > >>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > >>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > >>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > >>>>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. > >>>>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 > >>>>>>>> debug1: ssh_rsa_verify: signature correct > >>>>>>>> debug1: SSH2_MSG_NEWKEYS sent > >>>>>>>> debug1: expecting SSH2_MSG_NEWKEYS > >>>>>>>> debug1: SSH2_MSG_NEWKEYS received > >>>>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent > >>>>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received > >>>>>>>> debug1: Authentications that can continue: > >>>>>>>> publickey,keyboard-interactive,hostbased > >>>>>>>> debug1: Next authentication method: publickey > >>>>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > >>>>>>>> debug1: Authentications that can continue: > >>>>>>>> publickey,keyboard-interactive,hostbased > >>>>>>>> debug1: Next authentication method: keyboard-interactive > >>>>>>>> debug1: Authentications that can continue: > >>>>>>>> publickey,keyboard-interactive,hostbased > >>>>>>>> debug1: No more authentication methods to try. > >>>>>>>> Permission denied (publickey,keyboard-interactive,hostbased). > >>>>>>>> zzhang at login6.surveyor:~/swift/test> > >>>>>>>> > >>>>>>>> > >>>>>>>> Mihael Hategan wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> I looked at the ssh logs, and it seems like you're logging in using > >>>>>>>>> hostbased authentication. > >>>>>>>>> > >>>>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > >>>>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 > >>>>>>>>> > >>>>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity > >>>>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such > >>>>>>>>> file or directory. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Ha, you are right, i put a wrong log here. > >>>>>>>>>> > >>>>>>>>>> I rerun it, if failed with the following message. > >>>>>>>>>> > >>>>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > >>>>>>>>>> -tc.file ./tc.data first.swift > >>>>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > >>>>>>>>>> > >>>>>>>>>> RunID: 20081211-1850-rcrr2fk0 > >>>>>>>>>> Progress: > >>>>>>>>>> echo started > >>>>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > >>>>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > >>>>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > >>>>>>>>>> echo failed > >>>>>>>>>> Execution failed: > >>>>>>>>>> Could not initialize shared directory on bgp000 > >>>>>>>>>> Caused by: > >>>>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: > >>>>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 > >>>>>>>>>> Caused by: > >>>>>>>>>> Public Key Authentication failed > >>>>>>>>>> > >>>>>>>>>> zhao > >>>>>>>>>> > >>>>>>>>>> Mihael Hategan wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> sure, it is 172.16.3.6.passphrase= > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> I don't believe you. Can you paste the file? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> Mihael Hategan wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, Mihael > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> If I put .passphrase= there, I got this: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> With the IP address before .passphrase, of course. I.e. > >>>>>>>>>>>>> 172.16.3.6.passphrase= > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>> > >>> > > > > > > From zhaozhang at uchicago.edu Thu Dec 11 21:12:45 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 21:12:45 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229050844.15302.0.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> <4941D1EC.80905@uchicago.edu> <1229050664.15189.1.camel@localhost> <4941D31C.9050402@uchicago.edu> <1229050844.15302.0.camel@localhost> Message-ID: <4941D6AD.7050801@uchicago.edu> sure, http://www.ci.uchicago.edu/~zzhang/first-20081211-2021-oi8c3r0b.log zhao Mihael Hategan wrote: > Ok. Can you paste a link to the log file? > > On Thu, 2008-12-11 at 20:57 -0600, Zhao Zhang wrote: > >> I tried to run sftp on IO nodes, >> >> bash-3.1$ sftp >> usage: sftp [-1Cv] [-B buffer_size] [-b batchfile] [-F ssh_config] >> [-o ssh_option] [-P sftp_server_path] [-R num_requests] >> [-S program] [-s subsystem | sftp_server] host >> sftp [[user@]host[:file [file]]] >> sftp [[user@]host[:dir[/]]] >> sftp -b batchfile [user@]host >> >> it seems that there is a working version >> >> zhao >> >> Mihael Hategan wrote: >> >>> You could ask the folks who administer the BG to enable sftp on the io >>> nodes. This is enabled by default with openssh. >>> >>> On Thu, 2008-12-11 at 20:52 -0600, Zhao Zhang wrote: >>> >>> >>>> got this >>>> >>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>> -tc.file ./tc.data first.swift >>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>> >>>> RunID: 20081211-2021-oi8c3r0b >>>> Progress: >>>> echo started >>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>> echo failed >>>> Execution failed: >>>> Could not initialize shared directory on bgp000 >>>> Caused by: >>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>> Error while communicating with the SSH server on 172.16.3.2:22 >>>> Caused by: >>>> Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 >>>> >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> now try swift >>>>> >>>>> On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: >>>>> >>>>> >>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> nope, we don't need to since ssh works for us. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Mmm, obviously not. May I suggest typing "man ssh" and reading the >>>>>>> section on authentication? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> By "it works" I mean it works for our ordinary use, we could login IO >>>>>> nodes with that host based authentication. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Besides, I have no idea >>>>>>>> where the ssh on IO nodes saves the public key. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> For public key authentication you need to put the public >>>>>>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote >>>>>>> machine. This is the public key that corresponds to your private key. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Ha, it works now. >>>>>> >>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>>>> HostbasedAuthentication=no -l zzh >>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-1 >>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>>>> debug1: Applying options for * >>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>>>> "RhostsAuthentication" >>>>>> debug1: Connecting to ion-1 [172.16.3.1] port 22. >>>>>> debug1: Connection established. >>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>>>> debug1: Enabling compatibility mode for protocol 2.0 >>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>>>> debug1: SSH2_MSG_KEXINIT sent >>>>>> debug1: SSH2_MSG_KEXINIT received >>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>>>> debug1: Host 'ion-1' is known and matches the RSA host key. >>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 >>>>>> Warning: the RSA host key for 'ion-1' differs from the key for the IP >>>>>> address '172.16.3.1' >>>>>> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 >>>>>> Matching host key in /home/zzhang/.ssh/known_hosts:40 >>>>>> debug1: ssh_rsa_verify: signature correct >>>>>> debug1: SSH2_MSG_NEWKEYS sent >>>>>> debug1: expecting SSH2_MSG_NEWKEYS >>>>>> debug1: SSH2_MSG_NEWKEYS received >>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>>>> debug1: Authentications that can continue: >>>>>> publickey,keyboard-interactive,hostbased >>>>>> debug1: Next authentication method: publickey >>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>>>> debug1: Server accepts key: pkalg ssh-rsa blen 277 >>>>>> debug1: read PEM private key done: type RSA >>>>>> debug1: Authentication succeeded (publickey). >>>>>> debug1: channel 0: new [client-session] >>>>>> debug1: Entering interactive session. >>>>>> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov >>>>>> >>>>>> >>>>>> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) >>>>>> Enter 'help' for a list of built-in commands. >>>>>> >>>>>> /gpfs/home/zzhang $ >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> zhao >>>>>>>> >>>>>>>> Mihael Hategan wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Have you installed the public key on ion-1? >>>>>>>>> >>>>>>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Then it failed >>>>>>>>>> >>>>>>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>>>>>>>> HostbasedAuthentication=no -l zzh >>>>>>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 >>>>>>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>>>>>>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>>>>>>>> debug1: Applying options for * >>>>>>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>>>>>>>> "RhostsAuthentication" >>>>>>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. >>>>>>>>>> debug1: Connection established. >>>>>>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>>>>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>>>>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>>>>>>>> debug1: Enabling compatibility mode for protocol 2.0 >>>>>>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>>>>>>>> debug1: SSH2_MSG_KEXINIT sent >>>>>>>>>> debug1: SSH2_MSG_KEXINIT received >>>>>>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>>>>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>>>>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. >>>>>>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 >>>>>>>>>> debug1: ssh_rsa_verify: signature correct >>>>>>>>>> debug1: SSH2_MSG_NEWKEYS sent >>>>>>>>>> debug1: expecting SSH2_MSG_NEWKEYS >>>>>>>>>> debug1: SSH2_MSG_NEWKEYS received >>>>>>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>>>>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>>>>>>>> debug1: Authentications that can continue: >>>>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>>>> debug1: Next authentication method: publickey >>>>>>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>>>>>>>> debug1: Authentications that can continue: >>>>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>>>> debug1: Next authentication method: keyboard-interactive >>>>>>>>>> debug1: Authentications that can continue: >>>>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>>>> debug1: No more authentication methods to try. >>>>>>>>>> Permission denied (publickey,keyboard-interactive,hostbased). >>>>>>>>>> zzhang at login6.surveyor:~/swift/test> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> I looked at the ssh logs, and it seems like you're logging in using >>>>>>>>>>> hostbased authentication. >>>>>>>>>>> >>>>>>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang >>>>>>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 >>>>>>>>>>> >>>>>>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity >>>>>>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such >>>>>>>>>>> file or directory. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Ha, you are right, i put a wrong log here. >>>>>>>>>>>> >>>>>>>>>>>> I rerun it, if failed with the following message. >>>>>>>>>>>> >>>>>>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>>>>>>>>>> -tc.file ./tc.data first.swift >>>>>>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>>>>>>>>>> >>>>>>>>>>>> RunID: 20081211-1850-rcrr2fk0 >>>>>>>>>>>> Progress: >>>>>>>>>>>> echo started >>>>>>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>>>>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>>>>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>>>>>>>>>> echo failed >>>>>>>>>>>> Execution failed: >>>>>>>>>>>> Could not initialize shared directory on bgp000 >>>>>>>>>>>> Caused by: >>>>>>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>>>>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 >>>>>>>>>>>> Caused by: >>>>>>>>>>>> Public Key Authentication failed >>>>>>>>>>>> >>>>>>>>>>>> zhao >>>>>>>>>>>> >>>>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> sure, it is 172.16.3.6.passphrase= >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> I don't believe you. Can you paste the file? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi, Mihael >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If I put .passphrase= there, I got this: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> With the IP address before .passphrase, of course. I.e. >>>>>>>>>>>>>>> 172.16.3.6.passphrase= >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>> >>> > > > From hategan at mcs.anl.gov Thu Dec 11 21:25:31 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 11 Dec 2008 21:25:31 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229050844.15302.0.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> <4941D1EC.80905@uchicago.edu> <1229050664.15189.1.camel@localhost> <4941D31C.9050402@uchicago.edu> <1229050844.15302.0.camel@localhost> Message-ID: <1229052331.15691.1.camel@localhost> Uh. Sorry. Let me be more clear. They need to enable sftp in the openssh server. This has little to do with whether the sftp client tool is installed or not. On Thu, 2008-12-11 at 21:00 -0600, Mihael Hategan wrote: > Ok. Can you paste a link to the log file? > > On Thu, 2008-12-11 at 20:57 -0600, Zhao Zhang wrote: > > I tried to run sftp on IO nodes, > > > > bash-3.1$ sftp > > usage: sftp [-1Cv] [-B buffer_size] [-b batchfile] [-F ssh_config] > > [-o ssh_option] [-P sftp_server_path] [-R num_requests] > > [-S program] [-s subsystem | sftp_server] host > > sftp [[user@]host[:file [file]]] > > sftp [[user@]host[:dir[/]]] > > sftp -b batchfile [user@]host > > > > it seems that there is a working version > > > > zhao > > > > Mihael Hategan wrote: > > > You could ask the folks who administer the BG to enable sftp on the io > > > nodes. This is enabled by default with openssh. > > > > > > On Thu, 2008-12-11 at 20:52 -0600, Zhao Zhang wrote: > > > > > >> got this > > >> > > >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > > >> -tc.file ./tc.data first.swift > > >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > > >> > > >> RunID: 20081211-2021-oi8c3r0b > > >> Progress: > > >> echo started > > >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > > >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > > >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > > >> echo failed > > >> Execution failed: > > >> Could not initialize shared directory on bgp000 > > >> Caused by: > > >> org.globus.cog.abstraction.impl.file.FileResourceException: > > >> Error while communicating with the SSH server on 172.16.3.2:22 > > >> Caused by: > > >> Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 > > >> > > >> > > >> Mihael Hategan wrote: > > >> > > >>> now try swift > > >>> > > >>> On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: > > >>> > > >>> > > >>>> Mihael Hategan wrote: > > >>>> > > >>>> > > >>>>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: > > >>>>> > > >>>>> > > >>>>> > > >>>>>> nope, we don't need to since ssh works for us. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> Mmm, obviously not. May I suggest typing "man ssh" and reading the > > >>>>> section on authentication? > > >>>>> > > >>>>> > > >>>>> > > >>>> By "it works" I mean it works for our ordinary use, we could login IO > > >>>> nodes with that host based authentication. > > >>>> > > >>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>>> Besides, I have no idea > > >>>>>> where the ssh on IO nodes saves the public key. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>> For public key authentication you need to put the public > > >>>>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote > > >>>>> machine. This is the public key that corresponds to your private key. > > >>>>> > > >>>>> > > >>>>> > > >>>> Ha, it works now. > > >>>> > > >>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o > > >>>> HostbasedAuthentication=no -l zzh > > >>>> ang -i /home/zzhang/.ssh/id_rsa ion-1 > > >>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > > >>>> debug1: Reading configuration data /etc/ssh/ssh_config > > >>>> debug1: Applying options for * > > >>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option > > >>>> "RhostsAuthentication" > > >>>> debug1: Connecting to ion-1 [172.16.3.1] port 22. > > >>>> debug1: Connection established. > > >>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > > >>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > > >>>> debug1: match: OpenSSH_4.2 pat OpenSSH* > > >>>> debug1: Enabling compatibility mode for protocol 2.0 > > >>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 > > >>>> debug1: SSH2_MSG_KEXINIT sent > > >>>> debug1: SSH2_MSG_KEXINIT received > > >>>> debug1: kex: server->client aes128-cbc hmac-md5 none > > >>>> debug1: kex: client->server aes128-cbc hmac-md5 none > > >>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > > >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > > >>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > > >>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > > >>>> debug1: Host 'ion-1' is known and matches the RSA host key. > > >>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 > > >>>> Warning: the RSA host key for 'ion-1' differs from the key for the IP > > >>>> address '172.16.3.1' > > >>>> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 > > >>>> Matching host key in /home/zzhang/.ssh/known_hosts:40 > > >>>> debug1: ssh_rsa_verify: signature correct > > >>>> debug1: SSH2_MSG_NEWKEYS sent > > >>>> debug1: expecting SSH2_MSG_NEWKEYS > > >>>> debug1: SSH2_MSG_NEWKEYS received > > >>>> debug1: SSH2_MSG_SERVICE_REQUEST sent > > >>>> debug1: SSH2_MSG_SERVICE_ACCEPT received > > >>>> debug1: Authentications that can continue: > > >>>> publickey,keyboard-interactive,hostbased > > >>>> debug1: Next authentication method: publickey > > >>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > > >>>> debug1: Server accepts key: pkalg ssh-rsa blen 277 > > >>>> debug1: read PEM private key done: type RSA > > >>>> debug1: Authentication succeeded (publickey). > > >>>> debug1: channel 0: new [client-session] > > >>>> debug1: Entering interactive session. > > >>>> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov > > >>>> > > >>>> > > >>>> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) > > >>>> Enter 'help' for a list of built-in commands. > > >>>> > > >>>> /gpfs/home/zzhang $ > > >>>> > > >>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>>> zhao > > >>>>>> > > >>>>>> Mihael Hategan wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> Have you installed the public key on ion-1? > > >>>>>>> > > >>>>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>>> Then it failed > > >>>>>>>> > > >>>>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o > > >>>>>>>> HostbasedAuthentication=no -l zzh > > >>>>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 > > >>>>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 > > >>>>>>>> debug1: Reading configuration data /etc/ssh/ssh_config > > >>>>>>>> debug1: Applying options for * > > >>>>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option > > >>>>>>>> "RhostsAuthentication" > > >>>>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. > > >>>>>>>> debug1: Connection established. > > >>>>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 > > >>>>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 > > >>>>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* > > >>>>>>>> debug1: Enabling compatibility mode for protocol 2.0 > > >>>>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 > > >>>>>>>> debug1: SSH2_MSG_KEXINIT sent > > >>>>>>>> debug1: SSH2_MSG_KEXINIT received > > >>>>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none > > >>>>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none > > >>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent > > >>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP > > >>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent > > >>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY > > >>>>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. > > >>>>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 > > >>>>>>>> debug1: ssh_rsa_verify: signature correct > > >>>>>>>> debug1: SSH2_MSG_NEWKEYS sent > > >>>>>>>> debug1: expecting SSH2_MSG_NEWKEYS > > >>>>>>>> debug1: SSH2_MSG_NEWKEYS received > > >>>>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent > > >>>>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received > > >>>>>>>> debug1: Authentications that can continue: > > >>>>>>>> publickey,keyboard-interactive,hostbased > > >>>>>>>> debug1: Next authentication method: publickey > > >>>>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa > > >>>>>>>> debug1: Authentications that can continue: > > >>>>>>>> publickey,keyboard-interactive,hostbased > > >>>>>>>> debug1: Next authentication method: keyboard-interactive > > >>>>>>>> debug1: Authentications that can continue: > > >>>>>>>> publickey,keyboard-interactive,hostbased > > >>>>>>>> debug1: No more authentication methods to try. > > >>>>>>>> Permission denied (publickey,keyboard-interactive,hostbased). > > >>>>>>>> zzhang at login6.surveyor:~/swift/test> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Mihael Hategan wrote: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>> I looked at the ssh logs, and it seems like you're logging in using > > >>>>>>>>> hostbased authentication. > > >>>>>>>>> > > >>>>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang > > >>>>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 > > >>>>>>>>> > > >>>>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity > > >>>>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such > > >>>>>>>>> file or directory. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> Ha, you are right, i put a wrong log here. > > >>>>>>>>>> > > >>>>>>>>>> I rerun it, if failed with the following message. > > >>>>>>>>>> > > >>>>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > > >>>>>>>>>> -tc.file ./tc.data first.swift > > >>>>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 > > >>>>>>>>>> > > >>>>>>>>>> RunID: 20081211-1850-rcrr2fk0 > > >>>>>>>>>> Progress: > > >>>>>>>>>> echo started > > >>>>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > > >>>>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > > >>>>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > > >>>>>>>>>> echo failed > > >>>>>>>>>> Execution failed: > > >>>>>>>>>> Could not initialize shared directory on bgp000 > > >>>>>>>>>> Caused by: > > >>>>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: > > >>>>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 > > >>>>>>>>>> Caused by: > > >>>>>>>>>> Public Key Authentication failed > > >>>>>>>>>> > > >>>>>>>>>> zhao > > >>>>>>>>>> > > >>>>>>>>>> Mihael Hategan wrote: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> sure, it is 172.16.3.6.passphrase= > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> I don't believe you. Can you paste the file? > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> Mihael Hategan wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi, Mihael > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> If I put .passphrase= there, I got this: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> With the IP address before .passphrase, of course. I.e. > > >>>>>>>>>>>>> 172.16.3.6.passphrase= > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>> > > >>> > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From zhaozhang at uchicago.edu Thu Dec 11 21:40:12 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Thu, 11 Dec 2008 21:40:12 -0600 Subject: [Swift-devel] ssh data provider In-Reply-To: <1229052331.15691.1.camel@localhost> References: <494190C7.3030903@uchicago.edu> <1229034441.10267.0.camel@localhost> <4941B098.1080105@uchicago.edu> <1229042368.12545.1.camel@localhost> <4941B275.4060904@uchicago.edu> <1229042783.12545.6.camel@localhost> <4941B590.7000903@uchicago.edu> <1229043747.12948.5.camel@localhost> <4941BA08.8010406@uchicago.edu> <1229046401.13926.0.camel@localhost> <4941C5AA.5000604@uchicago.edu> <1229047804.14275.4.camel@localhost> <4941C978.6020102@uchicago.edu> <1229049681.14869.0.camel@localhost> <4941D1EC.80905@uchicago.edu> <1229050664.15189.1.camel@localhost> <4941D31C.9050402@uchicago.edu> <1229050844.15302.0.camel@localhost> <1229052331.15691.1.camel@localhost> Message-ID: <4941DD1C.4030000@uchicago.edu> ok, I see, thanks. zhao Mihael Hategan wrote: > Uh. Sorry. Let me be more clear. They need to enable sftp in the openssh > server. This has little to do with whether the sftp client tool is > installed or not. > > On Thu, 2008-12-11 at 21:00 -0600, Mihael Hategan wrote: > >> Ok. Can you paste a link to the log file? >> >> On Thu, 2008-12-11 at 20:57 -0600, Zhao Zhang wrote: >> >>> I tried to run sftp on IO nodes, >>> >>> bash-3.1$ sftp >>> usage: sftp [-1Cv] [-B buffer_size] [-b batchfile] [-F ssh_config] >>> [-o ssh_option] [-P sftp_server_path] [-R num_requests] >>> [-S program] [-s subsystem | sftp_server] host >>> sftp [[user@]host[:file [file]]] >>> sftp [[user@]host[:dir[/]]] >>> sftp -b batchfile [user@]host >>> >>> it seems that there is a working version >>> >>> zhao >>> >>> Mihael Hategan wrote: >>> >>>> You could ask the folks who administer the BG to enable sftp on the io >>>> nodes. This is enabled by default with openssh. >>>> >>>> On Thu, 2008-12-11 at 20:52 -0600, Zhao Zhang wrote: >>>> >>>> >>>>> got this >>>>> >>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>>> -tc.file ./tc.data first.swift >>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>>> >>>>> RunID: 20081211-2021-oi8c3r0b >>>>> Progress: >>>>> echo started >>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>>> echo failed >>>>> Execution failed: >>>>> Could not initialize shared directory on bgp000 >>>>> Caused by: >>>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>>> Error while communicating with the SSH server on 172.16.3.2:22 >>>>> Caused by: >>>>> Failed to start the SFTP subsystem on zzhang:@172.16.3.2:22 >>>>> >>>>> >>>>> Mihael Hategan wrote: >>>>> >>>>> >>>>>> now try swift >>>>>> >>>>>> On Thu, 2008-12-11 at 20:16 -0600, Zhao Zhang wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Mihael Hategan wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Thu, 2008-12-11 at 20:00 -0600, Zhao Zhang wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> nope, we don't need to since ssh works for us. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Mmm, obviously not. May I suggest typing "man ssh" and reading the >>>>>>>> section on authentication? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> By "it works" I mean it works for our ordinary use, we could login IO >>>>>>> nodes with that host based authentication. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Besides, I have no idea >>>>>>>>> where the ssh on IO nodes saves the public key. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> For public key authentication you need to put the public >>>>>>>> key ?(~/.ssh/id_rsa.pub) in ~/.ssh/authorized_keys on the remote >>>>>>>> machine. This is the public key that corresponds to your private key. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Ha, it works now. >>>>>>> >>>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>>>>> HostbasedAuthentication=no -l zzh >>>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-1 >>>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>>>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>>>>> debug1: Applying options for * >>>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>>>>> "RhostsAuthentication" >>>>>>> debug1: Connecting to ion-1 [172.16.3.1] port 22. >>>>>>> debug1: Connection established. >>>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>>>>> debug1: Enabling compatibility mode for protocol 2.0 >>>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>>>>> debug1: SSH2_MSG_KEXINIT sent >>>>>>> debug1: SSH2_MSG_KEXINIT received >>>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>>>>> debug1: Host 'ion-1' is known and matches the RSA host key. >>>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:40 >>>>>>> Warning: the RSA host key for 'ion-1' differs from the key for the IP >>>>>>> address '172.16.3.1' >>>>>>> Offending key for IP in /home/zzhang/.ssh/known_hosts:3 >>>>>>> Matching host key in /home/zzhang/.ssh/known_hosts:40 >>>>>>> debug1: ssh_rsa_verify: signature correct >>>>>>> debug1: SSH2_MSG_NEWKEYS sent >>>>>>> debug1: expecting SSH2_MSG_NEWKEYS >>>>>>> debug1: SSH2_MSG_NEWKEYS received >>>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>>>>> debug1: Authentications that can continue: >>>>>>> publickey,keyboard-interactive,hostbased >>>>>>> debug1: Next authentication method: publickey >>>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>>>>> debug1: Server accepts key: pkalg ssh-rsa blen 277 >>>>>>> debug1: read PEM private key done: type RSA >>>>>>> debug1: Authentication succeeded (publickey). >>>>>>> debug1: channel 0: new [client-session] >>>>>>> debug1: Entering interactive session. >>>>>>> Last login: Thu Dec 11 20:15:10 2008 from login6-data.surveyor.alcf.anl.gov >>>>>>> >>>>>>> >>>>>>> BusyBox v1.4.2 (2008-05-07 02:58:20 UTC) Built-in shell (ash) >>>>>>> Enter 'help' for a list of built-in commands. >>>>>>> >>>>>>> /gpfs/home/zzhang $ >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> zhao >>>>>>>>> >>>>>>>>> Mihael Hategan wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Have you installed the public key on ion-1? >>>>>>>>>> >>>>>>>>>> On Thu, 2008-12-11 at 19:10 -0600, Zhao Zhang wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Then it failed >>>>>>>>>>> >>>>>>>>>>> zzhang at login6.surveyor:~/swift/test> ssh -v -o >>>>>>>>>>> HostbasedAuthentication=no -l zzh >>>>>>>>>>> ang -i /home/zzhang/.ssh/id_rsa ion-7 >>>>>>>>>>> OpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005 >>>>>>>>>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>>>>>>>>> debug1: Applying options for * >>>>>>>>>>> debug1: /etc/ssh/ssh_config line 25: Deprecated option >>>>>>>>>>> "RhostsAuthentication" >>>>>>>>>>> debug1: Connecting to ion-7 [172.16.3.7] port 22. >>>>>>>>>>> debug1: Connection established. >>>>>>>>>>> debug1: identity file /home/zzhang/.ssh/id_rsa type 1 >>>>>>>>>>> debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2 >>>>>>>>>>> debug1: match: OpenSSH_4.2 pat OpenSSH* >>>>>>>>>>> debug1: Enabling compatibility mode for protocol 2.0 >>>>>>>>>>> debug1: Local version string SSH-2.0-OpenSSH_4.2 >>>>>>>>>>> debug1: SSH2_MSG_KEXINIT sent >>>>>>>>>>> debug1: SSH2_MSG_KEXINIT received >>>>>>>>>>> debug1: kex: server->client aes128-cbc hmac-md5 none >>>>>>>>>>> debug1: kex: client->server aes128-cbc hmac-md5 none >>>>>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent >>>>>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP >>>>>>>>>>> debug1: SSH2_MSG_KEX_DH_GEX_INIT sent >>>>>>>>>>> debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY >>>>>>>>>>> debug1: Host 'ion-7' is known and matches the RSA host key. >>>>>>>>>>> debug1: Found key in /home/zzhang/.ssh/known_hosts:43 >>>>>>>>>>> debug1: ssh_rsa_verify: signature correct >>>>>>>>>>> debug1: SSH2_MSG_NEWKEYS sent >>>>>>>>>>> debug1: expecting SSH2_MSG_NEWKEYS >>>>>>>>>>> debug1: SSH2_MSG_NEWKEYS received >>>>>>>>>>> debug1: SSH2_MSG_SERVICE_REQUEST sent >>>>>>>>>>> debug1: SSH2_MSG_SERVICE_ACCEPT received >>>>>>>>>>> debug1: Authentications that can continue: >>>>>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>>>>> debug1: Next authentication method: publickey >>>>>>>>>>> debug1: Offering public key: /home/zzhang/.ssh/id_rsa >>>>>>>>>>> debug1: Authentications that can continue: >>>>>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>>>>> debug1: Next authentication method: keyboard-interactive >>>>>>>>>>> debug1: Authentications that can continue: >>>>>>>>>>> publickey,keyboard-interactive,hostbased >>>>>>>>>>> debug1: No more authentication methods to try. >>>>>>>>>>> Permission denied (publickey,keyboard-interactive,hostbased). >>>>>>>>>>> zzhang at login6.surveyor:~/swift/test> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I looked at the ssh logs, and it seems like you're logging in using >>>>>>>>>>>> hostbased authentication. >>>>>>>>>>>> >>>>>>>>>>>> Try ?ssh -v -o HostBasedAuthenticatiosn=no -l zzhang >>>>>>>>>>>> -i /home/zzhang/.ssh/id_rsa ion-1 >>>>>>>>>>>> >>>>>>>>>>>> Also, note that you misspelled "id_rsa": Warning: Identity >>>>>>>>>>>> file /home/zzhang/.ssh/ir_rsa not accessible: No such >>>>>>>>>>>> file or directory. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, 2008-12-11 at 18:51 -0600, Zhao Zhang wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Ha, you are right, i put a wrong log here. >>>>>>>>>>>>> >>>>>>>>>>>>> I rerun it, if failed with the following message. >>>>>>>>>>>>> >>>>>>>>>>>>> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >>>>>>>>>>>>> -tc.file ./tc.data first.swift >>>>>>>>>>>>> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >>>>>>>>>>>>> >>>>>>>>>>>>> RunID: 20081211-1850-rcrr2fk0 >>>>>>>>>>>>> Progress: >>>>>>>>>>>>> echo started >>>>>>>>>>>>> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >>>>>>>>>>>>> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >>>>>>>>>>>>> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >>>>>>>>>>>>> echo failed >>>>>>>>>>>>> Execution failed: >>>>>>>>>>>>> Could not initialize shared directory on bgp000 >>>>>>>>>>>>> Caused by: >>>>>>>>>>>>> org.globus.cog.abstraction.impl.file.FileResourceException: >>>>>>>>>>>>> Error while communicating with the SSH server on 172.16.3.7:22 >>>>>>>>>>>>> Caused by: >>>>>>>>>>>>> Public Key Authentication failed >>>>>>>>>>>>> >>>>>>>>>>>>> zhao >>>>>>>>>>>>> >>>>>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 2008-12-11 at 18:38 -0600, Zhao Zhang wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> sure, it is 172.16.3.6.passphrase= >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> I don't believe you. Can you paste the file? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, 2008-12-11 at 18:30 -0600, Zhao Zhang wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, Mihael >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If I put .passphrase= there, I got this: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> With the IP address before .passphrase, of course. I.e. >>>>>>>>>>>>>>>> 172.16.3.6.passphrase= >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:24:41 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:24:41 -0600 (CST) Subject: [Swift-devel] [Bug 163] New: ext mapper doesn't like being used for input files. Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=163 Summary: ext mapper doesn't like being used for input files. Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk The ext mapper gives this error, apparently for any time it is used to map input files. The present unit tests do not test the case of ext being used for inputs. Below is a patch to add such a test - this test fails at the moment. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:25:40 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:25:40 -0600 (CST) Subject: [Swift-devel] [Bug 163] ext mapper doesn't like being used for input files. In-Reply-To: Message-ID: <20081212212540.0E8A3164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=163 ------- Comment #1 from benc at hawaga.org.uk 2008-12-12 15:25 ------- Index: cog/modules/vdsk/tests/language-behaviour/07553-ext-mapper-in.swift =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ cog/modules/vdsk/tests/language-behaviour/07553-ext-mapper-in.swift 2008-12-12 16:14:25.000000000 -0500 @@ -0,0 +1,10 @@ +type quo; + +app p(quo o) { +echo "hi"; +} + + +quo a ; + +p(a); Index: cog/modules/vdsk/tests/language-behaviour/07553-ext-mapper-in.sh =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ cog/modules/vdsk/tests/language-behaviour/07553-ext-mapper-in.sh 2008-12-12 16:14:25.000000000 -0500 @@ -0,0 +1,2 @@ +#!/bin/bash +echo "[0] 07553-ext-mapper-in.in" Index: cog/modules/vdsk/tests/language-behaviour/07553-ext-mapper-in.in =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ cog/modules/vdsk/tests/language-behaviour/07553-ext-mapper-in.in 2008-12-12 16:15:48.000000000 -0500 @@ -0,0 +1 @@ +foo -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:35:37 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:35:37 -0600 (CST) Subject: [Swift-devel] [Bug 164] New: types with single character names do not work Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=164 Summary: types with single character names do not work Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk If a type is declared with a single character name, the xml->kml compiler throws an error: Could not start execution. Failed to convert .xml to .kml for 027-single-character-typename.swift The attached test cases include a test case (against r2366) fail for a single character and pass for a two character declaration. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:36:21 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:36:21 -0600 (CST) Subject: [Swift-devel] [Bug 164] types with single character names do not work In-Reply-To: Message-ID: <20081212213621.E614A164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=164 ------- Comment #1 from benc at hawaga.org.uk 2008-12-12 15:36 ------- Index: cog/modules/vdsk/tests/language-behaviour/027-single-character-typename.swift =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ cog/modules/vdsk/tests/language-behaviour/027-single-character-typename.swift 2008-12-12 16:32:43.000000000 -0500 @@ -0,0 +1,5 @@ + +type q; + +q i; + Index: cog/modules/vdsk/tests/language-behaviour/028-double-character-typename.swift =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ cog/modules/vdsk/tests/language-behaviour/028-double-character-typename.swift 2008-12-12 16:33:13.000000000 -0500 @@ -0,0 +1,4 @@ +type qq; + +qq i; + -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:44:23 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:44:23 -0600 (CST) Subject: [Swift-devel] [Bug 165] New: wrapper.sh and seq.sh name conflicts with "obvious" application-level names Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=165 Summary: wrapper.sh and seq.sh name conflicts with "obvious" application-level names Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk wrapper.sh and seq.sh should be given more obscure names that are less likely to conflict with user-chosen file names. I've seen a situation where user writes a wrapper script which will be staged in with the procedure call. That wrapper, naturally enough, is called 'wrapper.sh'; the presence of that wrapper.sh conflicts in the swift file cache with the wrapper.sh used by Swift and causes jobs to fail. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:46:50 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:46:50 -0600 (CST) Subject: [Swift-devel] [Bug 166] New: recursive mapper to map all files in a directory tree Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=166 Summary: recursive mapper to map all files in a directory tree Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: benc at hawaga.org.uk Want a mapper that will map all files in a directory tree; the most intuitive form for this to me is to make filesys_mapper have a recursive option. (this could also be achieved with ext mapper and a script, except that bug 163 blocks that) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 15:59:50 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 15:59:50 -0600 (CST) Subject: [Swift-devel] [Bug 167] New: clustering time limit specification in seconds is awkward for large clustering times Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=167 Summary: clustering time limit specification in seconds is awkward for large clustering times Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: enhancement Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk Make clustering time limit in swift.properties specifiable in units other than seconds; for users that want to specify clusters with max times of around an hour, seconds are an awkward unit to use -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 16:01:02 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 16:01:02 -0600 (CST) Subject: [Swift-devel] [Bug 167] clustering time limit specification in seconds is awkward for large clustering times In-Reply-To: Message-ID: <20081212220102.4A33F164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=167 ------- Comment #1 from benc at hawaga.org.uk 2008-12-12 16:01 ------- using the same format as for maxwalltime might be useful; however that would change the meaning of unadorned integers from what they mean now (they would become minutes rather than seconds) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Fri Dec 12 16:05:52 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Fri, 12 Dec 2008 16:05:52 -0600 (CST) Subject: [Swift-devel] [Bug 168] New: When sites file is specified with non-.xml extension, it is interpreted as non-xml; and poorly reported. Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=168 Summary: When sites file is specified with non-.xml extension, it is interpreted as non-xml; and poorly reported. Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk When site catalog is specified without an XML extension, rather than being interpreted as a sites file, it is interpreted as a karajan native-format program. When fed a site catalog, this produces unintuitive error messages. The error messages should be tidied up, and the format detection should behave better (either by always taking the file as XML or by rejecting non-.xml extensions) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at mcs.anl.gov Sun Dec 14 12:28:45 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 14 Dec 2008 12:28:45 -0600 (CST) Subject: [Swift-devel] [Bug 162] error message syntax makes distinct values look like a bizarre path In-Reply-To: Message-ID: <20081214182845.96488164B2@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=162 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from benc at hawaga.org.uk 2008-12-14 12:28 ------- fixed in r2369 -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Sun Dec 14 22:03:10 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 15 Dec 2008 04:03:10 +0000 (GMT) Subject: [Swift-devel] jobs that go active forever, and their effect on multisite osg runs Message-ID: During my experimentation last week with point Swift at the OSG Engage VO, I repeatedly ran into a problem where jobs aimed at a particular small subset of sites would go into the Active state and then never (for some multiple-hours value of never) be reported as Completed or Failed. This was the only site-misbehaviour problem that I encountered which caused Swift runs to not complete and required manual intervention to remove those sites before a run. Other site problems were dealt with by various mechanisms already implemented in Swift (site scoring, replication). I'm desirous, then, of some way to get round this problem. One approach we discussed previously was making maxwalltime enforced at the client side. -- From rynge at renci.org Mon Dec 15 11:52:13 2008 From: rynge at renci.org (Mats Rynge) Date: Mon, 15 Dec 2008 12:52:13 -0500 Subject: [Swift-devel] jobs that go active forever, and their effect on multisite osg runs In-Reply-To: References: Message-ID: <4946994D.4080804@renci.org> Ben Clifford wrote: > During my experimentation last week with point Swift at the OSG Engage VO, > I repeatedly ran into a problem where jobs aimed at a particular small > subset of sites would go into the Active state and then never (for some > multiple-hours value of never) be reported as Completed or Failed. > > This was the only site-misbehaviour problem that I encountered which > caused Swift runs to not complete and required manual intervention to > remove those sites before a run. Other site problems were dealt with by > various mechanisms already implemented in Swift (site scoring, > replication). > > I'm desirous, then, of some way to get round this problem. > > One approach we discussed previously was making maxwalltime enforced at > the client side. > I think you have the same problem in other states of the job cycle. My last run got stuck at: Progress: Selecting site:2 Stage in:1 Finished successfully:268 Initializing site shared directory:1 Log file: http://www.renci.org/~rynge/swift/logs/osg-20081215-1117-eedjvp3c.log There is a stack trace in the beginning which may or may not have anything to do with the stuck jobs. When we use OSG MatchMaker, we have timeouts for all job states, and that seem to work well. -- Mats Rynge Renaissance Computing Institute From benc at hawaga.org.uk Mon Dec 15 18:20:52 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 16 Dec 2008 00:20:52 +0000 (GMT) Subject: [Swift-devel] jobs that go active forever, and their effect on multisite osg runs In-Reply-To: <4946994D.4080804@renci.org> References: <4946994D.4080804@renci.org> Message-ID: On Mon, 15 Dec 2008, Mats Rynge wrote: > When we use OSG MatchMaker, we have timeouts for all job states, and > that seem to work well. What durations of timeouts do you use? -- From rynge at renci.org Mon Dec 15 19:34:02 2008 From: rynge at renci.org (Mats Rynge) Date: Mon, 15 Dec 2008 20:34:02 -0500 Subject: [Swift-devel] jobs that go active forever, and their effect on multisite osg runs In-Reply-To: References: <4946994D.4080804@renci.org> Message-ID: <4947058A.7060809@renci.org> Ben Clifford wrote: > On Mon, 15 Dec 2008, Mats Rynge wrote: > >> When we use OSG MatchMaker, we have timeouts for all job states, and >> that seem to work well. > > What durations of timeouts do you use? It depends a little bit on what model we are running, but here is an example: Submitting, Staging, other "quick" states - 10 minutes Pending (sitting the the remote queue) - 30 minutes Running - 2x the expected runtime These are all handled on the local side. We set the wallclock time in the RSL as well, but that is for giving the sites a better shot at job scheduling, not for job failure detection/recovery. -- Mats Rynge Renaissance Computing Institute From benc at hawaga.org.uk Tue Dec 16 12:33:21 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 16 Dec 2008 18:33:21 +0000 (GMT) Subject: [Swift-devel] jobs that go active forever, and their effect on multisite osg runs In-Reply-To: <4947058A.7060809@renci.org> References: <4946994D.4080804@renci.org> <4947058A.7060809@renci.org> Message-ID: On Mon, 15 Dec 2008, Mats Rynge wrote: > These are all handled on the local side. We set the wallclock time in > the RSL as well, but that is for giving the sites a better shot at job > scheduling, not for job failure detection/recovery. ok. I think we need more of that client-side to handle strange site behaviour. -- From bugzilla-daemon at mcs.anl.gov Tue Dec 16 15:30:20 2008 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 16 Dec 2008 15:30:20 -0600 (CST) Subject: [Swift-devel] [Bug 169] New: submit-side timeouts (or other fault detection) to accommodate some byzantine site failures Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=169 Summary: submit-side timeouts (or other fault detection) to accommodate some byzantine site failures Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk When running with a large number of sites, its common for runs to not complete due to some sites behaving improperly, causing runs to hang. Some client-side timeout (or other fault detection) would be useful; it might be useful for this to be at the cog provider layer, or might need to be higher in the stack. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From zhaozhang at uchicago.edu Fri Dec 19 12:40:26 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 19 Dec 2008 12:40:26 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? Message-ID: <494BEA9A.9040305@uchicago.edu> Hi, I started gridftp on one IO nodes with this command: bash-3.1$ ./globus-gridftp-server -auth-level 0 Server listening at ion-12.surveyor.alcf.anl.gov:59829 Then I modified sites.xml, invoked swift for a sample test, but the following message required authentication. Is there any switch in configuration file that we could switch this off? The log file is here at http://www.ci.uchicago.edu/~zzhang/first-20081219-1233-m233r976.log best wishes zhangzhao zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml -tc.file ./tc.data first.swift Swift svn swift-r2334 (Swift modified locally) cog-r2216 RunID: 20081219-1233-m233r976 Progress: echo started Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] Sorted: [bgp000:999.590(98.544):0/789 overload: 0] Sorted: [bgp000:999.180(98.544):0/789 overload: 0] echo failed Execution failed: Could not initialize shared directory on bgp000 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Error communicating with the GridFTP server Caused by: Server refused performing the request. Custom message: Server refused GSSAPI authentication. (error code 1) [Nested exception message: Custom message: Unexpected reply: 530-globus_xio: Server side credential failure 530-globus_gsi_gssapi: Error with gss credential handle 530-globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order. 530-Valid credentials could not be found in any of the possible locations specified by the credential search order. 530- 530-Attempt 1 530- 530-globus_credential: Error reading host credential 530-globus_sysconfig: Could not find a valid certificate file: The host cert could not be found in: 530-1) env. var. X509_USER_CERT 530-2) /etc/grid-security/hostcert.pem 530-3) $GLOBUS_LOCATION/etc/hostcert.pem 530-4) $HOME/.globus/hostcert.pem 530- 530-The host key could not be found in: 530-1) env. var. X509_USER_KEY 530-2) /etc/grid-security/hostkey.pem 530-3) $GLOBUS_LOCATION/etc/hostkey.pem 530-4) $HOME/.globus/hostkey.pem 530- 530- 530- 530-Attempt 2 530- 530-globus_credential: Error reading proxy credential 530-globus_sysconfig: Could not find a valid proxy certificate file location 530-globus_sysconfig: Error with key filename 530-globus_sysconfig: File does not exist: /tmp/x509up_u3850 is not a valid file 530- 530-Attempt 3 530- 530-globus_credential: Error reading user credential 530-globus_credential: Key is password protected: GSI does not currently support password protected private keys. 530-OpenSSL Error: pem_lib.c:401: in library: PEM routines, function PEM_do_header: bad password read 530- 530 End.] From hategan at mcs.anl.gov Fri Dec 19 13:20:18 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 19 Dec 2008 13:20:18 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BEA9A.9040305@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> Message-ID: <1229714418.6572.0.camel@localhost> On Fri, 2008-12-19 at 12:40 -0600, Zhao Zhang wrote: > Hi, > > I started gridftp on one IO nodes with this command: > > bash-3.1$ ./globus-gridftp-server -auth-level 0 > Server listening at ion-12.surveyor.alcf.anl.gov:59829 > > Then I modified sites.xml, invoked swift for a sample test, but the > following message required authentication. > Is there any switch in configuration file that we could switch this off? Have you tried -nosec when starting the server? > > The log file is here at > http://www.ci.uchicago.edu/~zzhang/first-20081219-1233-m233r976.log > > best wishes > zhangzhao > > > > > zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml > -tc.file ./tc.data first.swift > Swift svn swift-r2334 (Swift modified locally) cog-r2216 > > RunID: 20081219-1233-m233r976 > Progress: > echo started > Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] > Sorted: [bgp000:999.590(98.544):0/789 overload: 0] > Sorted: [bgp000:999.180(98.544):0/789 overload: 0] > echo failed > Execution failed: > Could not initialize shared directory on bgp000 > Caused by: > org.globus.cog.abstraction.impl.file.FileResourceException: > Error communicating with the GridFTP server > Caused by: > Server refused performing the request. Custom message: Server > refused GSSAPI authentication. (error code 1) [Nested exception > message: Custom message: Unexpected reply: 530-globus_xio: Server side > credential failure > 530-globus_gsi_gssapi: Error with gss credential handle > 530-globus_credential: Valid credentials could not be found in any of > the possible locations specified by the credential search order. > 530-Valid credentials could not be found in any of the possible > locations specified by the credential search order. > 530- > 530-Attempt 1 > 530- > 530-globus_credential: Error reading host credential > 530-globus_sysconfig: Could not find a valid certificate file: The host > cert could not be found in: > 530-1) env. var. X509_USER_CERT > 530-2) /etc/grid-security/hostcert.pem > 530-3) $GLOBUS_LOCATION/etc/hostcert.pem > 530-4) $HOME/.globus/hostcert.pem > 530- > 530-The host key could not be found in: > 530-1) env. var. X509_USER_KEY > 530-2) /etc/grid-security/hostkey.pem > 530-3) $GLOBUS_LOCATION/etc/hostkey.pem > 530-4) $HOME/.globus/hostkey.pem > 530- > 530- > 530- > 530-Attempt 2 > 530- > 530-globus_credential: Error reading proxy credential > 530-globus_sysconfig: Could not find a valid proxy certificate file location > 530-globus_sysconfig: Error with key filename > 530-globus_sysconfig: File does not exist: /tmp/x509up_u3850 is not a > valid file > 530- > 530-Attempt 3 > 530- > 530-globus_credential: Error reading user credential > 530-globus_credential: Key is password protected: GSI does not currently > support password protected private keys. > 530-OpenSSL Error: pem_lib.c:401: in library: PEM routines, function > PEM_do_header: bad password read > 530- > 530 End.] > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Fri Dec 19 13:21:27 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 19 Dec 2008 19:21:27 +0000 (GMT) Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BEA9A.9040305@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> Message-ID: I'm led to belive that -auth_level 0 disables authorization checks, not all of gsi; so with that parameter, you still need to have your server and client configured with credentials (that is -auth_level 0 eliminates the need for a gridmap file , not for credentials) I'm also led to believe that you could run the server with -aa and use ftp: as a uri scheme instead of gsiftp: (assuming cog supports that, which I think it does) in order to get anonymous access. My suggestion would be to try with a regular ftp client against your server to check that it is working ok before pointing Swift at it. -- From zhaozhang at uchicago.edu Fri Dec 19 13:43:51 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 19 Dec 2008 13:43:51 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <1229714418.6572.0.camel@localhost> References: <494BEA9A.9040305@uchicago.edu> <1229714418.6572.0.camel@localhost> Message-ID: <494BF977.7090206@uchicago.edu> Hi, Mihael There is no such option -nosec. The "-auth-level 0" Disables all authorization checks. zhao Mihael Hategan wrote: > On Fri, 2008-12-19 at 12:40 -0600, Zhao Zhang wrote: > >> Hi, >> >> I started gridftp on one IO nodes with this command: >> >> bash-3.1$ ./globus-gridftp-server -auth-level 0 >> Server listening at ion-12.surveyor.alcf.anl.gov:59829 >> >> Then I modified sites.xml, invoked swift for a sample test, but the >> following message required authentication. >> Is there any switch in configuration file that we could switch this off? >> > > Have you tried -nosec when starting the server? > > >> The log file is here at >> http://www.ci.uchicago.edu/~zzhang/first-20081219-1233-m233r976.log >> >> best wishes >> zhangzhao >> >> >> >> >> zzhang at login6.surveyor:~/swift/test> swift -sites.file ./sites.xml >> -tc.file ./tc.data first.swift >> Swift svn swift-r2334 (Swift modified locally) cog-r2216 >> >> RunID: 20081219-1233-m233r976 >> Progress: >> echo started >> Sorted: [bgp000:1,000.000(98.545):0/789 overload: 0] >> Sorted: [bgp000:999.590(98.544):0/789 overload: 0] >> Sorted: [bgp000:999.180(98.544):0/789 overload: 0] >> echo failed >> Execution failed: >> Could not initialize shared directory on bgp000 >> Caused by: >> org.globus.cog.abstraction.impl.file.FileResourceException: >> Error communicating with the GridFTP server >> Caused by: >> Server refused performing the request. Custom message: Server >> refused GSSAPI authentication. (error code 1) [Nested exception >> message: Custom message: Unexpected reply: 530-globus_xio: Server side >> credential failure >> 530-globus_gsi_gssapi: Error with gss credential handle >> 530-globus_credential: Valid credentials could not be found in any of >> the possible locations specified by the credential search order. >> 530-Valid credentials could not be found in any of the possible >> locations specified by the credential search order. >> 530- >> 530-Attempt 1 >> 530- >> 530-globus_credential: Error reading host credential >> 530-globus_sysconfig: Could not find a valid certificate file: The host >> cert could not be found in: >> 530-1) env. var. X509_USER_CERT >> 530-2) /etc/grid-security/hostcert.pem >> 530-3) $GLOBUS_LOCATION/etc/hostcert.pem >> 530-4) $HOME/.globus/hostcert.pem >> 530- >> 530-The host key could not be found in: >> 530-1) env. var. X509_USER_KEY >> 530-2) /etc/grid-security/hostkey.pem >> 530-3) $GLOBUS_LOCATION/etc/hostkey.pem >> 530-4) $HOME/.globus/hostkey.pem >> 530- >> 530- >> 530- >> 530-Attempt 2 >> 530- >> 530-globus_credential: Error reading proxy credential >> 530-globus_sysconfig: Could not find a valid proxy certificate file location >> 530-globus_sysconfig: Error with key filename >> 530-globus_sysconfig: File does not exist: /tmp/x509up_u3850 is not a >> valid file >> 530- >> 530-Attempt 3 >> 530- >> 530-globus_credential: Error reading user credential >> 530-globus_credential: Key is password protected: GSI does not currently >> support password protected private keys. >> 530-OpenSSL Error: pem_lib.c:401: in library: PEM routines, function >> PEM_do_header: bad password read >> 530- >> 530 End.] >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > From zhaozhang at uchicago.edu Fri Dec 19 13:45:05 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 19 Dec 2008 13:45:05 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: References: <494BEA9A.9040305@uchicago.edu> Message-ID: <494BF9C1.70800@uchicago.edu> Hi, Ben I tried regular ftp, the connection was refused: zzhang at login6.surveyor:~/swift/test> ftp ion-10.surveyor.alcf.anl.gov:59829 ftp: connect: Connection refused ftp: Can't connect or login to host `ion-10.surveyor.alcf.anl.gov' Then as gridftp webpage said, I tried telnet that worked fine. zzhang at login6.surveyor:~/swift/test> telnet ion-10 59829 Trying 172.16.3.10... Connected to ion-10. Escape character is '^]'. 220 172.16.3.10 GridFTP Server 2.8 (gcc32dbg, 1217607445-63) [Globus Toolkit 4.0.8] ready. zhao Ben Clifford wrote: > I'm led to belive that -auth_level 0 disables authorization checks, not > all of gsi; so with that parameter, you still need to have your server > and client configured with credentials (that is -auth_level 0 eliminates > the need for a gridmap file , not for credentials) > > I'm also led to believe that you could run the server with -aa and use > ftp: as a uri scheme instead of gsiftp: (assuming cog supports that, which > I think it does) in order to get anonymous access. > > My suggestion would be to try with a regular ftp client against your > server to check that it is working ok before pointing Swift at it. > > From hategan at mcs.anl.gov Fri Dec 19 13:52:43 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 19 Dec 2008 13:52:43 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BF977.7090206@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> <1229714418.6572.0.camel@localhost> <494BF977.7090206@uchicago.edu> Message-ID: <1229716363.7130.1.camel@localhost> On Fri, 2008-12-19 at 13:43 -0600, Zhao Zhang wrote: > Hi, Mihael > > There is no such option -nosec. Good point. That's the wsrf container. Ben mentioned -allow-anonymous. > The "-auth-level 0" Disables all > authorization checks. But not authentication. From hategan at mcs.anl.gov Fri Dec 19 13:55:27 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 19 Dec 2008 13:55:27 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BF9C1.70800@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> Message-ID: <1229716527.7130.4.camel@localhost> On Fri, 2008-12-19 at 13:45 -0600, Zhao Zhang wrote: > Hi, Ben > > I tried regular ftp, the connection was refused: > zzhang at login6.surveyor:~/swift/test> ftp ion-10.surveyor.alcf.anl.gov:59829 > ftp: connect: Connection refused > ftp: Can't connect or login to host `ion-10.surveyor.alcf.anl.gov' > > > Then as gridftp webpage said, I tried telnet that worked fine. > zzhang at login6.surveyor:~/swift/test> telnet ion-10 59829 > Trying 172.16.3.10... > Connected to ion-10. > Escape character is '^]'. > 220 172.16.3.10 GridFTP Server 2.8 (gcc32dbg, 1217607445-63) [Globus > Toolkit 4.0.8] ready. That's suspicious, because the ftp error suggests that telnet shouldn't work either. From benc at hawaga.org.uk Fri Dec 19 13:51:14 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 19 Dec 2008 19:51:14 +0000 (GMT) Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BF9C1.70800@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> Message-ID: On Fri, 19 Dec 2008, Zhao Zhang wrote: > I tried regular ftp, the connection was refused: > zzhang at login6.surveyor:~/swift/test> ftp ion-10.surveyor.alcf.anl.gov:59829 > ftp: connect: Connection refused > ftp: Can't connect or login to host `ion-10.surveyor.alcf.anl.gov' At least on os x, that is not correct syntax for the ftp client. Try: ftp -P 59829 ion-10.surveryor.alcf.anl.gov -- From benc at hawaga.org.uk Fri Dec 19 13:54:19 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 19 Dec 2008 19:54:19 +0000 (GMT) Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BF9C1.70800@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> Message-ID: On Fri, 19 Dec 2008, Zhao Zhang wrote: > Then as gridftp webpage said, I tried telnet that worked fine. That isn't really testing enough, though - I was intending for you to test transferring files. Telnetting to the ftp server control port will give you the server banner in almost any configuration, so the telnet test doesn't really give much useful information in this case. > zzhang at login6.surveyor:~/swift/test> telnet ion-10 59829 > Trying 172.16.3.10... > Connected to ion-10. > Escape character is '^]'. > 220 172.16.3.10 GridFTP Server 2.8 (gcc32dbg, 1217607445-63) [Globus Toolkit > 4.0.8] ready. -- From zhaozhang at uchicago.edu Fri Dec 19 13:58:13 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 19 Dec 2008 13:58:13 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> Message-ID: <494BFCD5.2050207@uchicago.edu> I tried this: zzhang at login6.surveyor:~/swift/test> ftp -P 59829 ion-10.surveryor.alcf.anl.gov ftp: Name or service not known ftp> ls Not connected. ftp> quit Then I tried zzhang at login6.surveyor:~/swift/test> ftp ion-10.surveyor.alcf.anl.gov 59829 Connected to ion-10.surveyor.alcf.anl.gov. 220 172.16.3.10 GridFTP Server 2.8 (gcc32dbg, 1217607445-63) [Globus Toolkit 4.0.8] ready. Name (ion-10.surveyor.alcf.anl.gov:zzhang): zzhang 331 Password required for zzhang. Password: 230 User zzhang logged in. Remote system type is UNIX. Using binary mode to transfer files. ftp> This worked. And I am trying out more now. zhao Ben Clifford wrote: > On Fri, 19 Dec 2008, Zhao Zhang wrote: > > >> I tried regular ftp, the connection was refused: >> zzhang at login6.surveyor:~/swift/test> ftp ion-10.surveyor.alcf.anl.gov:59829 >> ftp: connect: Connection refused >> ftp: Can't connect or login to host `ion-10.surveyor.alcf.anl.gov' >> > > At least on os x, that is not correct syntax for the ftp client. > > Try: > > ftp -P 59829 ion-10.surveryor.alcf.anl.gov > > From benc at hawaga.org.uk Fri Dec 19 13:57:58 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 19 Dec 2008 19:57:58 +0000 (GMT) Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <1229716527.7130.4.camel@localhost> References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> <1229716527.7130.4.camel@localhost> Message-ID: On Fri, 19 Dec 2008, Mihael Hategan wrote: > That's suspicious, because the ftp error suggests that telnet shouldn't > work either. wrong ftp syntax - the ftp command supplied mean connect to the ftp server on default port and then CWD 59829. -- From benc at hawaga.org.uk Fri Dec 19 14:00:23 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 19 Dec 2008 20:00:23 +0000 (GMT) Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: <494BFCD5.2050207@uchicago.edu> References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> <494BFCD5.2050207@uchicago.edu> Message-ID: On Fri, 19 Dec 2008, Zhao Zhang wrote: > 331 Password required for zzhang. > Password: did you put in a password? -- From zhaozhang at uchicago.edu Fri Dec 19 14:02:18 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Fri, 19 Dec 2008 14:02:18 -0600 Subject: [Swift-devel] can we disable the gridftp authentication for swift on BGP? In-Reply-To: References: <494BEA9A.9040305@uchicago.edu> <494BF9C1.70800@uchicago.edu> <494BFCD5.2050207@uchicago.edu> Message-ID: <494BFDCA.2040104@uchicago.edu> I simply type "Enter", and it went through. zhao Ben Clifford wrote: > On Fri, 19 Dec 2008, Zhao Zhang wrote: > > >> 331 Password required for zzhang. >> Password: >> > > did you put in a password? > > From zhengxiongh at uchicago.edu Tue Dec 23 10:23:42 2008 From: zhengxiongh at uchicago.edu (Zhengxiong Hou) Date: Tue, 23 Dec 2008 10:23:42 -0600 Subject: [Swift-devel] Re: shared filesystem or work directory ? In-Reply-To: References: <49501689.4010500@uchicago.edu> Message-ID: <4951108E.1070902@uchicago.edu> There is a boring problem on several OSG grid sites. I'm just trying to solve it. The returned error information is "No status file was found. Check the *shared filesystem* on Nebraska. " The "*wrapper.sh*" could NOT be executed. In the "vdl-int.k", just "element(initSharedDir, [rhost]" could be executed. So, in the "wfdir", there were "info, kickstart, shared,status" directory, and the "wrapper.sh, seq.sh" could also be transfered to the "shared" directory. But the "jobs", "wrapper.log", "3", etc. could NOT be created, which should be generated after the execution of "*wrapper.sh*". E.g. on Nebraska, [houzx at login run-by-swift]$ swift -tc.file tc.data -sites.file site-1-red.xml grid-many-dock6-auto.swift Swift svn swift-r2377 cog-r2125 RunID: 20081222-1554-9bq3l783 Progress: rundock started Sorted: [Nebraska:0.000(1.000):0/1 overload: 0] Progress: Submitted:1 Progress: Submitted:1 Progress: Submitted:1 Failed to transfer wrapper log from grid-many-dock6-auto-20081222-1554-9bq3l783/info/7 on Nebraska rundock failed Execution failed: Exception in rundock: Arguments: [disks/tp-gpfs/scratch/houzx/dock-run/databases/KEGG_and_Drugs/C10001.mol2, 1F9Y, C10001.mol2-result.tar.gz] Host: Nebraska Directory: grid-many-dock6-auto-20081222-1554-9bq3l783/jobs/7/rundock-7mo1h34j stderr.txt: stdout.txt: ---- Caused by: No status file was found. Check the shared filesystem on Nebraska Ben Clifford wrote: > What are you actually trying to do? > > Also, ask technical questions on the swift-devel list so the answers get > archived and searchable. > > From benc at hawaga.org.uk Wed Dec 24 10:28:21 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 24 Dec 2008 16:28:21 +0000 (GMT) Subject: [Swift-devel] Re: shared filesystem or work directory ? In-Reply-To: <4951108E.1070902@uchicago.edu> References: <49501689.4010500@uchicago.edu> <4951108E.1070902@uchicago.edu> Message-ID: In a private message you indicated some suspiciouon about initial working directories being wrong. So I investigated and found that at least on red.unl.edu the initial working directory of a job is not the directory specified in Swift. Below is a url for a rough patch I just made that will explicitly set the initial directory. Apply it by going into your vdsk directory and typing patch -p3 < condor-pwd-bug and then rebuilding with ant redist This makes submission to red.unl.edu/condor work for me and might make other sites work too. Please send feedback to the list... http://www.ci.uchicago.edu/~benc/tmp/condor-pwd-bug -- From zhengxiongh at uchicago.edu Wed Dec 24 15:07:43 2008 From: zhengxiongh at uchicago.edu (Zhengxiong Hou) Date: Wed, 24 Dec 2008 15:07:43 -0600 Subject: [Swift-devel] Re: work directory or "$PWD" bug fixed In-Reply-To: References: <49501689.4010500@uchicago.edu> <4951108E.1070902@uchicago.edu> Message-ID: <4952A49F.3010605@uchicago.edu> Hi Ben, The reason was exactly the wrong work directory. The bug was fixed, Cool! Thanks, Merry Christmas! Ben Clifford wrote: > In a private message you indicated some suspiciouon about initial working > directories being wrong. So I investigated and found that at least on > red.unl.edu the initial working directory of a job is not the directory > specified in Swift. > > Below is a url for a rough patch I just made that will explicitly set the > initial directory. > > Apply it by going into your vdsk directory and typing > > patch -p3 < condor-pwd-bug > > and then rebuilding with ant redist > > This makes submission to red.unl.edu/condor work for me and might make > other sites work too. Please send feedback to the list... > > http://www.ci.uchicago.edu/~benc/tmp/condor-pwd-bug > > From benc at hawaga.org.uk Fri Dec 26 08:54:26 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 26 Dec 2008 14:54:26 +0000 (GMT) Subject: [Swift-devel] running bash wrapper rather than shebbanging it Message-ID: At present, the wrapper script is launched by explicitly naming bash in the vdl:execute call in vdl-int.k and passing the wrapper script name as a parameter to bash. In theory, this should be the same as running the wrapper script directly with a #!/bin/bash as the first line of wrapper.sh However, I have a feeling that this second way doesn't work right (eg doesn't always end up with bash and sometimes ends up with the system shell) Briefly googling the swift-devel archives doesn't give me anything, though - does any one remember what (if any) the problem with this second approach is? -- From benc at hawaga.org.uk Fri Dec 26 11:03:49 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 26 Dec 2008 17:03:49 +0000 (GMT) Subject: [Swift-devel] Re: work directory or "$PWD" bug fixed In-Reply-To: <4952A49F.3010605@uchicago.edu> References: <49501689.4010500@uchicago.edu> <4951108E.1070902@uchicago.edu> <4952A49F.3010605@uchicago.edu> Message-ID: swift r2380 should put a tidier version of this into the codebase. I'm fairly confident about it but I can't test against red.unl.edu at the moment as it isn't taking my jobs today. -- From hategan at mcs.anl.gov Fri Dec 26 11:07:53 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 26 Dec 2008 11:07:53 -0600 Subject: [Swift-devel] running bash wrapper rather than shebbanging it In-Reply-To: References: Message-ID: <1230311273.8064.2.camel@localhost> On Fri, 2008-12-26 at 14:54 +0000, Ben Clifford wrote: > At present, the wrapper script is launched by explicitly naming bash in > the vdl:execute call in vdl-int.k and passing the wrapper script name as a > parameter to bash. > > In theory, this should be the same as running the wrapper script directly > with a #!/bin/bash as the first line of wrapper.sh > > However, I have a feeling that this second way doesn't work right (eg > doesn't always end up with bash and sometimes ends up with the system > shell) > > Briefly googling the swift-devel archives doesn't give me anything, though > - does any one remember what (if any) the problem with this second > approach is? When copied over using the java gridftp (and possibly other things), wrapper.sh loses its executable bit. So it becomes not executable. This could probably be corrected using a chmod, but I don't think that command is well supported. From zhengxiongh at uchicago.edu Fri Dec 26 13:51:45 2008 From: zhengxiongh at uchicago.edu (Zhengxiong Hou) Date: Fri, 26 Dec 2008 13:51:45 -0600 Subject: [Swift-devel] Re: swift r2380 In-Reply-To: References: <49501689.4010500@uchicago.edu> <4951108E.1070902@uchicago.edu> <4952A49F.3010605@uchicago.edu> Message-ID: <495535D1.8050404@uchicago.edu> Good! If you want to test your new stuff, besides "red.unl.edu", there are some other grid sites with the same problem, such as : "ce01.cmsaf.mit.edu", "cit-gatekeeper.ultralight.org", "abitibi.sbgrid.org", "proton.fis.cinvestav.mx", "osg-gw-4.t2.ucsd.edu" Ben Clifford wrote: > swift r2380 should put a tidier version of this into the codebase. I'm > fairly confident about it but I can't test against red.unl.edu at the > moment as it isn't taking my jobs today. > > From rynge at renci.org Fri Dec 26 14:11:05 2008 From: rynge at renci.org (Mats Rynge) Date: Fri, 26 Dec 2008 15:11:05 -0500 Subject: [Swift-devel] Re: shared filesystem or work directory ? In-Reply-To: References: <49501689.4010500@uchicago.edu> <4951108E.1070902@uchicago.edu> Message-ID: <49553A59.3070400@renci.org> Ben Clifford wrote: > In a private message you indicated some suspiciouon about initial working > directories being wrong. So I investigated and found that at least on > red.unl.edu the initial working directory of a job is not the directory > specified in Swift. Is the problem that the (directory=) RSL is being ignored? > Below is a url for a rough patch I just made that will explicitly set the > initial directory. > > Apply it by going into your vdsk directory and typing > > patch -p3 < condor-pwd-bug > > and then rebuilding with ant redist > > This makes submission to red.unl.edu/condor work for me and might make > other sites work too. Please send feedback to the list... > > http://www.ci.uchicago.edu/~benc/tmp/condor-pwd-bug > -- Mats Rynge Renaissance Computing Institute From benc at hawaga.org.uk Fri Dec 26 18:45:40 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 27 Dec 2008 00:45:40 +0000 (GMT) Subject: [Swift-devel] Re: shared filesystem or work directory ? In-Reply-To: <49553A59.3070400@renci.org> References: <49501689.4010500@uchicago.edu> <4951108E.1070902@uchicago.edu> <49553A59.3070400@renci.org> Message-ID: On Fri, 26 Dec 2008, Mats Rynge wrote: > Is the problem that the (directory=) RSL is being ignored? approximately yes. The swift code specifies a directory to the cog provider layer which goes from there through gam and condor and comes out the far side not right. From what I've seen, it oculd be anywhere in there. From Zhenxiong's private comments, it sounds like its an interaction between the way condor is configured on those sites and GRAM - with condor being configured to start jobs in a condor specific working directory always. He might write up more details. -- From zhaozhang at uchicago.edu Mon Dec 29 13:05:04 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 29 Dec 2008 13:05:04 -0600 Subject: [Swift-devel] Q of traversing all files in one directory Message-ID: <49591F60.1070500@uchicago.edu> Hi, All Sorry to send out a Q in Holiday time. My question is about Chapter 3.7 of swift tutorial, the foreach statement and regexp_mapper. In the following code example, all input files are listed in the string. string inputNames = "one.txt two.txt three.txt"; messagefile inputfiles[] ; foreach f in inputfiles { countfile c ; c = countwords(f); } How can I say that foreach file in a directory { outfile = app(file) } In swift ? Those file names are fasta00, fasta01, fasta02, ..., fasta100, fasta101,..., hundreds of them. Thanks. Happy New Year zhao From benc at hawaga.org.uk Mon Dec 29 13:10:07 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Dec 2008 19:10:07 +0000 (GMT) Subject: [Swift-devel] Q of traversing all files in one directory In-Reply-To: <49591F60.1070500@uchicago.edu> References: <49591F60.1070500@uchicago.edu> Message-ID: On Mon, 29 Dec 2008, Zhao Zhang wrote: > foreach file in a directory { > outfile = app(file) > } You want to explicitly list the files? Yes/no? And then generate the filenames for 'outfile' automatically? -- From zhaozhang at uchicago.edu Mon Dec 29 13:11:51 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 29 Dec 2008 13:11:51 -0600 Subject: [Swift-devel] Q of traversing all files in one directory In-Reply-To: References: <49591F60.1070500@uchicago.edu> Message-ID: <495920F7.5010109@uchicago.edu> Hi, Ben Clifford wrote: > On Mon, 29 Dec 2008, Zhao Zhang wrote: > > >> foreach file in a directory { >> outfile = app(file) >> } >> > > You want to explicitly list the files? Yes/no? > No, I don't want to explicitly list the files. > And then generate the filenames for 'outfile' automatically? > yes, I think I know how to do this. zhao From hategan at mcs.anl.gov Mon Dec 29 13:15:51 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Dec 2008 13:15:51 -0600 Subject: [Swift-devel] Q of traversing all files in one directory In-Reply-To: <49591F60.1070500@uchicago.edu> References: <49591F60.1070500@uchicago.edu> Message-ID: <1230578151.11926.4.camel@localhost> On Mon, 2008-12-29 at 13:05 -0600, Zhao Zhang wrote: > Hi, All > > Sorry to send out a Q in Holiday time. > My question is about Chapter 3.7 of swift tutorial, the foreach > statement and regexp_mapper. > > In the following code example, all input files are listed in the string. > > string inputNames = "one.txt two.txt three.txt"; > > messagefile inputfiles[] ; > > foreach f in inputfiles { > countfile c source=@f, > match="(.*)txt", > transform="\\1count">; > c = countwords(f); > } > > > How can I say that > > foreach file in a directory { > outfile = app(file) > } > > In swift ? You use the filesystem mapper instead. messagefile inputfiles[] ; (refer to the docs for the full set of parameters for that mapper). From benc at hawaga.org.uk Mon Dec 29 13:16:33 2008 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 29 Dec 2008 19:16:33 +0000 (GMT) Subject: [Swift-devel] Q of traversing all files in one directory In-Reply-To: <495920F7.5010109@uchicago.edu> References: <49591F60.1070500@uchicago.edu> <495920F7.5010109@uchicago.edu> Message-ID: On Mon, 29 Dec 2008, Zhao Zhang wrote: > > You want to explicitly list the files? Yes/no? > > > No, I don't want to explicitly list the files. You can map some files in a diretory based on a wildcard pattern to an array using filesys_mapper http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.filesys_mapper or the simple_mapper http://www.ci.uchicago.edu/swift/guides/userguide.php#mapper.simple_mapper > > And then generate the filenames for 'outfile' automatically? > > > yes, I think I know how to do this. > > zhao > > From zhaozhang at uchicago.edu Mon Dec 29 14:17:43 2008 From: zhaozhang at uchicago.edu (Zhao Zhang) Date: Mon, 29 Dec 2008 14:17:43 -0600 Subject: [Swift-devel] Q of traversing all files in one directory In-Reply-To: <1230578151.11926.4.camel@localhost> References: <49591F60.1070500@uchicago.edu> <1230578151.11926.4.camel@localhost> Message-ID: <49593067.50308@uchicago.edu> Thank you guys. I got it. zhao Mihael Hategan wrote: > On Mon, 2008-12-29 at 13:05 -0600, Zhao Zhang wrote: > >> Hi, All >> >> Sorry to send out a Q in Holiday time. >> My question is about Chapter 3.7 of swift tutorial, the foreach >> statement and regexp_mapper. >> >> In the following code example, all input files are listed in the string. >> >> string inputNames = "one.txt two.txt three.txt"; >> >> messagefile inputfiles[] ; >> >> foreach f in inputfiles { >> countfile c > source=@f, >> match="(.*)txt", >> transform="\\1count">; >> c = countwords(f); >> } >> >> >> How can I say that >> >> foreach file in a directory { >> outfile = app(file) >> } >> >> In swift ? >> > > You use the filesystem mapper instead. > > messagefile inputfiles[] ; > > (refer to the docs for the full set of parameters for that mapper). > > > From hategan at mcs.anl.gov Mon Dec 29 18:25:21 2008 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 29 Dec 2008 18:25:21 -0600 Subject: [Swift-devel] Re: question about Swift scheduling and execution In-Reply-To: <4959612D.8060307@uchicago.edu> References: <4959612D.8060307@uchicago.edu> Message-ID: <1230596721.17133.17.camel@localhost> Such questions should go to swift-devel. On Mon, 2008-12-29 at 17:45 -0600, Zhengxiong Hou wrote: > Hi Mihael, > Could you please spend some time to help to answer the following > questions about the Swift scheduling and execution mechanism? > > * (1) How to set the policy, POLICY_WEIGHTED_RANDOM or > POLICY_BEST_SCORE ?* > ------in WeightedHostScoreScheduler.java------------- > if (POLICY.equals(name)) { > if (value instanceof String) { > value = ((String) value).toLowerCase(); > } > if ("random".equals(value)) { > policy = POLICY_WEIGHTED_RANDOM; > } > else if ("best".equals("value")) { > policy = POLICY_BEST_SCORE; > } > else { > throw new KarajanRuntimeException("Unknown policy > type: " + value); > } > } You can set that in libexec/scheduler.xml, by adding in /. > > * (2) How to decide whether a site is Overloaded or NOT? Where is the > load information source?* > ------in OverloadedHostMonitor.java------------- > WeightedHost wh = (WeightedHost) i.next(); > if > (wh.isOverloaded() == 0) { Take a look at the WeightedHost.isOverloaded() method. The logic is basically overloaded = (load <= maxLoad), where maxLoad = jobThrottle * tscore + 1. > > whss.removeOverloaded(wh); > i.remove(); > } > > * (3) What's the specific constrains ? * > ------in WeightedHostScoreScheduler.java------------- > protected WeightedHostSet constrain(WeightedHostSet s, > ResourceConstraintChecker rcc, > TaskConstraints tc) { > if (rcc == null) { > return s; > } The only one I know being used is the executable (so that swift will schedule only on sites that have it specified in tc.data), but it is meant to allow specification of architecture, OS, etc. > ------in TaskConstraints.java------------- > private synchronized Map getMap() { > if (map == null) { > map = new HashMap(); > } > return map; > } > > public void addConstraint(String name, Object value) { > getMap().put(name, value); > } > > public Object getConstraint(String name) { > return getMap().get(name); > } > > (4) In the standard output information, i.e. > Firstly: Sorted: [FLTECH:*0.000(1.000):0/1* > overload: 0, AGLT2:*0.000(1.000*):1/1 overload: 0] > Finally: Sorted: [AGLT2:*229.804(93.821):19/19* > overload: 0] > > I think the score is the (*1.000*) and *(93.821). *This score > should be less than scoreHighCap (= 100). > So, *what's the meaning of 0.000 and **229.804? Raw score. score = e^(B*arctan(C*rawScore)), where B and C are constants. While not exactly the same thing, that function is similar in principle and purpose to this: http://en.wikipedia.org/wiki/Gompertz_curve > *And *what's the exact meaning of **0/1, 1/1, 19/19 ? *( I guess > that it means the running jobs. There are 0 out of scheduled 1 job is > running; 1 out 1 scheduled job is running; 19 out of 19 scheduled jobs > are running. Is that right?) No. The first number shows the number of currently running jobs on that site, and the second one shows the maximum number of concurrent jobs that will be allowed on that site (after which it becomes overloaded). This is all fairly clear if you look at WeightedHost.toString().