[Swift-devel] wrong file staged in

Mihael Hategan hategan at mcs.anl.gov
Fri Jul 6 16:39:19 CDT 2007


Consistent or intermittent behavior?

Also, can you attach the swift source?

On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
> Nope... I checked with grep:
> 
> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,  
> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,  
> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,  
> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",  
> "rcut2:1");
> nefedova at viper:~/alamines>
> 
> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
> 
> > Wonder if there is another declaration of the same variable mapped to
> > the wrong file.
> >
> > On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
> >> The wrong file was staged in during the 4th stage of the workflow...
> >>
> >> I have this inside my foreach loop:
> >> <snip>
> >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
> >> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
> >> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
> >>
> >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
> >> solv_repu_0DO\
> >> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
> >> prm_file, psf_file,\
> >> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
> >> "urandseed:59\
> >> 64163", sprt, "rcut1:0.9", "rcut2:1");
> >> <snip>
> >>
> >>
> >> The first  file (with DOT) is an input files for CHARMM3 and three
> >> last declared files (out, crd and done) are output files.
> >>
> >> When I check my remote directory during execution, I see that the
> >> wrong files were staged in. In particular, the wrong prt file was
> >> staged in:
> >>
> >> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
> >> solv_repu_0DOT9_1_b1_prt)
> >>
> >> The solv_repu_0.9_1_b1.prt file is not produced by a previous stage,
> >> its being/supposed to be/ staged in from the submit host.
> >>
> >> The above declaration is the only place where the file
> >> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did grep
> >> to check it). kml file also looks ok.
> >>
> >> I am not sure why it has happened -- this piece of code has not been
> >> changed from the previous version...
> >>
> >>
> >> This is the work directory for this job (CHARMM3) on TG-UC:
> >>
> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >> chrm_long-p2v28ydi> ls
> >> m001_am1.prm           solv.inp          solv_m001_eq.crd
> >> stderr.txt
> >> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
> >> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
> >> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >> chrm_long-p2v28ydi>
> >>
> >> as you can see 2 files have the wrong names (solv_disp_a3 instead of
> >> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
> >> parameter file (prt) was staged in...
> >>
> >>
> >> I checked whether that file was even staged in to the remote host --
> >> in fact it was:
> >>
> >> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
> >> find */ -name solv_repu_0.9_1_b1.prt -print
> >> shared/solv_repu_0.9_1_b1.prt
> >> But it never went to the right working directory...
> >>
> >> Any idea what is going on here?
> >>
> >> Thanks,
> >>
> >> Nika
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>
> >
> 




More information about the Swift-devel mailing list