[Swift-devel] wrong file staged in

Mihael Hategan hategan at mcs.anl.gov
Fri Jul 6 16:49:39 CDT 2007


On Fri, 2007-07-06 at 16:44 -0500, Veronika Nefedova wrote:
> I put the dtm file on terminable in ~nefedova/MolDyn.dtm
> 
> I see a few more directories with wrong files staged in, but I
> didn't  
> check them all (130+ of them). I saw at least one with the correct  
> files staged in.

Across different runs that is. Do you get the exact same mess-up, or is
it different?

> 
> Nika
> 
> On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote:
> 
> > Consistent or intermittent behavior?
> >
> > Also, can you attach the swift source?
> >
> > On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
> >> Nope... I checked with grep:
> >>
> >> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
> >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
> >> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,
> >> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,
> >> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",
> >> "rcut2:1");
> >> nefedova at viper:~/alamines>
> >>
> >> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
> >>
> >>> Wonder if there is another declaration of the same variable  
> >>> mapped to
> >>> the wrong file.
> >>>
> >>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
> >>>> The wrong file was staged in during the 4th stage of the  
> >>>> workflow...
> >>>>
> >>>> I have this inside my foreach loop:
> >>>> <snip>
> >>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >>>> file solv_repu_0DOT9_1_b1_crd  <"solv_repu_0.9_1_b1.crd">;
> >>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
> >>>> file solv_repu_0DOT9_1_b1_done  <"solv_repu_0.9_1_b1_done">;
> >>>>
> >>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd,  
> >>>> solv_repu_0DOT9_1_b1_out,
> >>>> solv_repu_0DO\
> >>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
> >>>> prm_file, psf_file,\
> >>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
> >>>> "urandseed:59\
> >>>> 64163", sprt, "rcut1:0.9", "rcut2:1");
> >>>> <snip>
> >>>>
> >>>>
> >>>> The first  file (with DOT) is an input files for CHARMM3 and three
> >>>> last declared files (out, crd and done) are output files.
> >>>>
> >>>> When I check my remote directory during execution, I see that the
> >>>> wrong files were staged in. In particular, the wrong prt file was
> >>>> staged in:
> >>>>
> >>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt  (aka
> >>>> solv_repu_0DOT9_1_b1_prt)
> >>>>
> >>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous  
> >>>> stage,
> >>>> its being/supposed to be/ staged in from the submit host.
> >>>>
> >>>> The above declaration is the only place where the file
> >>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did  
> >>>> grep
> >>>> to check it). kml file also looks ok.
> >>>>
> >>>> I am not sure why it has happened -- this piece of code has not  
> >>>> been
> >>>> changed from the previous version...
> >>>>
> >>>>
> >>>> This is the work directory for this job (CHARMM3) on TG-UC:
> >>>>
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >>>> chrm_long-p2v28ydi> ls
> >>>> m001_am1.prm           solv.inp          solv_m001_eq.crd
> >>>> stderr.txt
> >>>> m001_am1.rtf           solv_disp_a3.out  solv_repu_0.9_1_b1.rst
> >>>> parm03_gaff_all.rtf    solv_disp_a3.prt  solv_repu_0.9_1_b1.trj
> >>>> parm03_gaffnb_all.prm  solv_m001.psf     solv_repu_0.9_1_b1.wham
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >>>> chrm_long-p2v28ydi>
> >>>>
> >>>> as you can see 2 files have the wrong names (solv_disp_a3  
> >>>> instead of
> >>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
> >>>> parameter file (prt) was staged in...
> >>>>
> >>>>
> >>>> I checked whether that file was even staged in to the remote  
> >>>> host --
> >>>> in fact it was:
> >>>>
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
> >>>> find */ -name solv_repu_0.9_1_b1.prt -print
> >>>> shared/solv_repu_0.9_1_b1.prt
> >>>> But it never went to the right working directory...
> >>>>
> >>>> Any idea what is going on here?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Nika
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>>
> >>
> >
> 




More information about the Swift-devel mailing list