[Swift-devel] wrong file staged in
Mihael Hategan
hategan at mcs.anl.gov
Fri Jul 6 16:49:39 CDT 2007
On Fri, 2007-07-06 at 16:44 -0500, Veronika Nefedova wrote:
> I put the dtm file on terminable in ~nefedova/MolDyn.dtm
>
> I see a few more directories with wrong files staged in, but I
> didn't
> check them all (130+ of them). I saw at least one with the correct
> files staged in.
Across different runs that is. Do you get the exact same mess-up, or is
it different?
>
> Nika
>
> On Jul 6, 2007, at 4:39 PM, Mihael Hategan wrote:
>
> > Consistent or intermittent behavior?
> >
> > Also, can you attach the swift source?
> >
> > On Fri, 2007-07-06 at 16:37 -0500, Veronika Nefedova wrote:
> >> Nope... I checked with grep:
> >>
> >> nefedova at viper:~/alamines> grep solv_repu_0DOT9_1_b1_prt MolDyn.dtm
> >> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd, solv_repu_0DOT9_1_b1_out,
> >> solv_repu_0DOT9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft,
> >> rtf_file, prm_file, psf_file, crd_eq_file, solv_repu_0DOT9_1_b1_prt,
> >> ss1, s1, s2, s3, s4, s5, s7, "urandseed:5964163", sprt, "rcut1:0.9",
> >> "rcut2:1");
> >> nefedova at viper:~/alamines>
> >>
> >> On Jul 6, 2007, at 4:31 PM, Mihael Hategan wrote:
> >>
> >>> Wonder if there is another declaration of the same variable
> >>> mapped to
> >>> the wrong file.
> >>>
> >>> On Fri, 2007-07-06 at 16:03 -0500, Veronika Nefedova wrote:
> >>>> The wrong file was staged in during the 4th stage of the
> >>>> workflow...
> >>>>
> >>>> I have this inside my foreach loop:
> >>>> <snip>
> >>>> file solv_repu_0DOT9_1_b1_prt <"solv_repu_0.9_1_b1.prt">;
> >>>> file solv_repu_0DOT9_1_b1_crd <"solv_repu_0.9_1_b1.crd">;
> >>>> file solv_repu_0DOT9_1_b1_out <"solv_repu_0.9_1_b1.out">;
> >>>> file solv_repu_0DOT9_1_b1_done <"solv_repu_0.9_1_b1_done">;
> >>>>
> >>>> (whamfiles[67] , solv_repu_0DOT9_1_b1_crd,
> >>>> solv_repu_0DOT9_1_b1_out,
> >>>> solv_repu_0DO\
> >>>> T9_1_b1_done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file,
> >>>> prm_file, psf_file,\
> >>>> crd_eq_file, solv_repu_0DOT9_1_b1_prt, ss1, s1, s2, s3, s4, s5, s7,
> >>>> "urandseed:59\
> >>>> 64163", sprt, "rcut1:0.9", "rcut2:1");
> >>>> <snip>
> >>>>
> >>>>
> >>>> The first file (with DOT) is an input files for CHARMM3 and three
> >>>> last declared files (out, crd and done) are output files.
> >>>>
> >>>> When I check my remote directory during execution, I see that the
> >>>> wrong files were staged in. In particular, the wrong prt file was
> >>>> staged in:
> >>>>
> >>>> solv_disp_a3.prt instead of solv_repu_0.9_1_b1.prt (aka
> >>>> solv_repu_0DOT9_1_b1_prt)
> >>>>
> >>>> The solv_repu_0.9_1_b1.prt file is not produced by a previous
> >>>> stage,
> >>>> its being/supposed to be/ staged in from the submit host.
> >>>>
> >>>> The above declaration is the only place where the file
> >>>> solv_repu_0DOT9_1_b1_prt is being declared in swift file (I did
> >>>> grep
> >>>> to check it). kml file also looks ok.
> >>>>
> >>>> I am not sure why it has happened -- this piece of code has not
> >>>> been
> >>>> changed from the previous version...
> >>>>
> >>>>
> >>>> This is the work directory for this job (CHARMM3) on TG-UC:
> >>>>
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >>>> chrm_long-p2v28ydi> ls
> >>>> m001_am1.prm solv.inp solv_m001_eq.crd
> >>>> stderr.txt
> >>>> m001_am1.rtf solv_disp_a3.out solv_repu_0.9_1_b1.rst
> >>>> parm03_gaff_all.rtf solv_disp_a3.prt solv_repu_0.9_1_b1.trj
> >>>> parm03_gaffnb_all.prm solv_m001.psf solv_repu_0.9_1_b1.wham
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0/
> >>>> chrm_long-p2v28ydi>
> >>>>
> >>>> as you can see 2 files have the wrong names (solv_disp_a3
> >>>> instead of
> >>>> solv_repu_0.9_1_b1 ) and execution is screwed up since the wrong
> >>>> parameter file (prt) was staged in...
> >>>>
> >>>>
> >>>> I checked whether that file was even staged in to the remote
> >>>> host --
> >>>> in fact it was:
> >>>>
> >>>> nefedova at tg-login1:/disks/scratchgpfs1/iraicu/MolDyn-zvlc1f9c03pf0>
> >>>> find */ -name solv_repu_0.9_1_b1.prt -print
> >>>> shared/solv_repu_0.9_1_b1.prt
> >>>> But it never went to the right working directory...
> >>>>
> >>>> Any idea what is going on here?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Nika
> >>>>
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> >>>>
> >>>
> >>
> >
>
More information about the Swift-devel
mailing list