[Swift-commit] r5961 - SwiftApps/ParVis/HiRAMTools

wilde at ci.uchicago.edu wilde at ci.uchicago.edu
Wed Oct 10 09:43:26 CDT 2012


Author: wilde
Date: 2012-10-10 09:43:26 -0500 (Wed, 10 Oct 2012)
New Revision: 5961

Added:
   SwiftApps/ParVis/HiRAMTools/STATUS
Modified:
   SwiftApps/ParVis/HiRAMTools/README
Log:
Updates from work in July 2012.

Modified: SwiftApps/ParVis/HiRAMTools/README
===================================================================
--- SwiftApps/ParVis/HiRAMTools/README	2012-10-10 14:42:25 UTC (rev 5960)
+++ SwiftApps/ParVis/HiRAMTools/README	2012-10-10 14:43:26 UTC (rev 5961)
@@ -13,17 +13,21 @@
 
   runall.sh   # called by the user, runs the scripts below
 
-    makeyearly_realization.sh
-      makeyearly.swift
+     combine_realization.sh
+        combine.swift
+        combine.sh (app, specified in tc)
+
+     makeyearly_realization.sh
+        makeyearly.swift
         makeyearly-cdo.sh (app, specified in tc)
   
-    combine_realization.sh
-      combine.swift
-        combine.chk.sh (app, specified in tc)
+  runpfrepps.sh # Called by used, runs the scripts below
+    pfrepps.swift
+      genpfrepps.sh
+      runscript.sh
 
 
-To process a set of realizations (for example, to combine them), perform these
-steps:
+To combine or annualize a set of realizations, perform these steps:
 
 1. Add Swift and a recent Sun Java to your path. For now we will use a Swift
    and Java version maintained by Swift team member Jon Monette:
@@ -57,28 +61,83 @@
    script=makeyearly_realization.sh                          # Script to run for each realization
 
 
-5. Place the list of realizations to process in a file named "real.todo" and
-   create and empty file "real.done". For example, to process 2 realizations,
-   Climo_001 and Climo_023):
+5. Place the list of *full pathnames* of the realizations to process in a file
+   named "real.todo" and create and empty file "real.done". For example, to
+   process 3 realizations, eg en3rc16Ic1, ... do:
 
    cat >real.todo
-   Climo_001
-   Climo_023
+   /intrepid-fs0/users/lzamboni/persistent/en3rc16Ic1
+   /intrepid-fs0/users/lzamboni/persistent/en3rth12Ic2
+   /intrepid-fs0/users/lzamboni/persistent/en3rth8Ic2
    ^D
 
-   >real.done   # create an empty real.done file
+   >real.done   # IMPORTANT! create an empty real.done file
 
 6. run:
 
-   ./runall.combine_realizations.sh >& runall.out
+   $HOME/HiRAMTools/runall.sh >& runall.out
 
-Then look for the most recently created run directory, and do:
+Then cd to the most recently created run directory, and do:
+ 
+   cd run042 # for example
+   tail -f swift.out
 
+===
+
+To run pfrep on a set of realizations, do:
+
+1. Create a list of realizations in a file, starting with a header line, in
+   the following format:
+
+--- file pflist --- (next line is first line of file):
+path id
+/full/path/of/realization/history/dir realizationID
+etc
+---
+
+For example, file "pflist" contains:
+
+--- Next line is first line of file. "---" is not in file:
+path id
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run001/run001/ en1eo14Ic2
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run002/run003/ en1eo14Ic3
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run003/run004/ en1eo14Ic4
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run004/run005/ en1eo16Ic1
+---
+
+2. Run the script:
+
+   runpfrepps.sh pflistFile outputDir
+
+   $HOME/HiRAMTools/runpfrepps.sh pflist /intrepid-fs0/users/wilde/persistent/LZ/pfrepps.2012.0630
+
+3. Watch the status
+
+   cd run012
    tail -f swift.out
 
+4. Manual, local execution of pfrepp atmos average scripts (generated from step 3, above)
 
+cd /intrepid-fs0/users/wilde/persistent/LZ/pfrepps
+
+# start scripts here ???
+
+# show script progress:
+
+for s in $(cat set02); do echo "$s: "; tail -15 $s.out; done
+
+
+
 OPEN ISSUES:
 
+- error handling in the leaf scripts is highly suspect: are errors correctly
+  getting caught???
+
 - We see straggler scripts on Eureka nodes. Seems to be doing very slow IO on
-shared disk for unexplained reasons. Ticket is open with ALCF support, being
-investigated by Andrew Cherry.
+  shared disk for unexplained reasons. Ticket is open with ALCF support, being
+  investigated by Andrew Cherry.
+
+- If /scratch is not available, we switch to running entirely on fs0
+
+- Naming conventions for outdir now used in combine need to be added to
+  makeyearly

Added: SwiftApps/ParVis/HiRAMTools/STATUS
===================================================================
--- SwiftApps/ParVis/HiRAMTools/STATUS	                        (rev 0)
+++ SwiftApps/ParVis/HiRAMTools/STATUS	2012-10-10 14:43:26 UTC (rev 5961)
@@ -0,0 +1,273 @@
+From LZ: here are the versions we know they work:
+
+  nco-4.0.9 for yearly reorganization
+
+  nco-3.9.2 for pfrepp and for combine (combine might not used nco at all, but
+  this is our solution today)
+
+  when we know makeyearly, combine and pfrepp work, we may want to test whether
+  the following version works for all cases (as ALCF Support suggests)
+
+  3.9.9-udunits2
+
+----
+
+In: /intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1
+
+run001/run001/ run001 en1eo14Ic2
+run002/run003/ run002 en1eo14Ic3
+run003/run004/ run003 en1eo14Ic4
+run004/run005/ run004 en1eo16Ic1
+run005/run006/ run005 en1eo16Ic2
+run006/run007/ run006 en1eo16Ic3
+run007/run008/ run007 en1eo16Ic4
+run008/run009/ run008 en1eo8Ic1
+run009/run010/ run009 en1eo8Ic2
+run010/run011/ run010 en1eo8Ic3
+run011/run012/ run011 en1eo8Ic4
+
+run001/run001/ en1eo14Ic2
+run002/run003/ en1eo14Ic3
+run003/run004/ en1eo14Ic4
+run004/run005/ en1eo16Ic1
+run005/run006/ en1eo16Ic2
+run006/run007/ en1eo16Ic3
+run007/run008/ en1eo16Ic4
+run008/run009/ en1eo8Ic1
+run009/run010/ en1eo8Ic2
+run010/run011/ en1eo8Ic3
+run011/run012/ en1eo8Ic4
+
+run001/run001/
+run002/run003/
+run003/run004/
+run004/run005/
+run005/run006/
+run006/run007/
+run007/run008/
+run008/run009/
+run009/run010/
+run010/run011/
+run011/run012/
+
+740 files per combined dir
+740 * 2.5 = 1850 files per annualized dir
+
+Short yearlies:
+
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run001/run001/ 1540
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run002/run003/ 1405
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run003/run004/ 1844
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run011/run012/ 1151
+
+cdir=/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706
+mkdir -p $cdir
+ln -s /intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run001 $cdir/en1eo14Ic2
+ln -s /intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run002 $cdir/en1eo14Ic3
+ln -s /intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run003 $cdir/en1eo14Ic4
+ln -s /intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run011 $cdir/en1eo8Ic4
+
+
+eur$ for d in $cdir/*; do (echo -n $d: ; find $d/19????01 -type f | wc -l ); done
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo14Ic2:740
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo14Ic3:740
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo14Ic4:740
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo8Ic4:740
+eur$ 
+
+
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo14Ic2
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo14Ic3
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo14Ic4
+/intrepid-fs0/users/wilde/persistent/LZ/combines.toann.2012.0706/en1eo8Ic4
+
+
+==== Clean Tracker
+
+en1 realization IDs:
+
+en1eo12Ic1
+en1eo12Ic2
+en1eo12Ic3
+en1eo12Ic4
+en1eo14Ic1
+en1eo14Ic2
+en1eo14Ic3
+en1eo14Ic4
+en1eo16Ic1
+en1eo16Ic2
+en1eo16Ic3
+en1eo16Ic4
+en1eo8Ic1
+en1eo8Ic2
+en1eo8Ic3
+en1eo8Ic4
+
+en1 HiRAM output:
+
+/intrepid-fs0/users/lzamboni/persistent/en1eo12Ic1
+/intrepid-fs0/users/lzamboni/persistent/en1eo12Ic2
+/intrepid-fs0/users/lzamboni/persistent/en1eo12Ic3
+/intrepid-fs0/users/lzamboni/persistent/en1eo12Ic4
+/intrepid-fs0/users/lzamboni/persistent/en1eo14Ic1
+/intrepid-fs0/users/lzamboni/persistent/en1eo14Ic2
+/intrepid-fs0/users/lzamboni/persistent/en1eo14Ic3
+/intrepid-fs0/users/lzamboni/persistent/en1eo14Ic4
+/intrepid-fs0/users/lzamboni/persistent/en1eo16Ic1
+/intrepid-fs0/users/lzamboni/persistent/en1eo16Ic2
+/intrepid-fs0/users/lzamboni/persistent/en1eo16Ic3
+/intrepid-fs0/users/lzamboni/persistent/en1eo16Ic4
+/intrepid-fs0/users/lzamboni/persistent/en1eo8Ic1
+/intrepid-fs0/users/lzamboni/persistent/en1eo8Ic2
+/intrepid-fs0/users/lzamboni/persistent/en1eo8Ic3
+/intrepid-fs0/users/lzamboni/persistent/en1eo8Ic4
+
+Combined output:
+
+/intrepid-fs0/users/lzamboni/persistent/combined/en1eo12Ic1/run004    en1eo12Ic1
+/intrepid-fs0/users/lzamboni/persistent/combined/en1eo12Ic2/run001    en1eo12Ic2
+/intrepid-fs0/users/lzamboni/persistent/combined/en1eo12Ic3/run001    en1eo12Ic3
+/intrepid-fs0/users/lzamboni/persistent/combined/run001               en1eo12Ic4 uncertain realID
+/intrepid-fs0/users/lzamboni/persistent/combined/run002               en1eo14Ic1 uncertain realID
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run001      en1eo14Ic2 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/combined.2012.0707/en1eo14Ic2 en1eo14Ic2
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run002      en1eo14Ic3 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/combined.2012.0707/en1eo14Ic3 en1eo14Ic3
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run003      en1eo14Ic4 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/combined.2012.0707/en1eo14Ic4 en1eo14Ic4
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run004      en1eo16Ic1
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run005      en1eo16Ic2
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run006      en1eo16Ic3
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run007      en1eo16Ic4
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run008      en1eo8Ic1
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run009      en1eo8Ic2
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run010      en1eo8Ic3
+/intrepid-fs0/users/lzamboni/persistent/combined/many-en1/run011      en1eo8Ic4  BAD
+/intrepid-fs0/users/wilde/persistent/LZ/combined.2012.0707/en1eo8Ic4  en1eo8Ic4
+
+Yearly output:
+
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/en1eo12Ic1/run001 en1eo12Ic1 MISSING
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/en1eo12Ic2/run002 en1eo12Ic2 MISSING
+COMPLETED - ignore                                                            en1eo12Ic3 IGNORE
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/run001/run003     en1eo12Ic4 SHORT by half
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/run002/run004     en1eo14Ic1
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run001/run001/        en1eo14Ic2 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo14Ic2/run005/run001 en1eo14Ic2
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run002/run003/        en1eo14Ic3 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo14Ic3/run006/run002 en1eo14Ic3
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run003/run004/        en1eo14Ic4 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo14Ic4/run007/run004 en1eo14Ic4
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run004/run005/        en1eo16Ic1
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run005/run006/        en1eo16Ic2
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run006/run007/        en1eo16Ic3
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run007/run008/        en1eo16Ic4
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run008/run009/        en1eo8Ic1
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run009/run010/        en1eo8Ic2
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run010/run011/        en1eo8Ic3
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run011/run012/        en1eo8Ic4 BAD
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo8Ic4/run008/run004  en1eo8Ic4 running
+
+Atmos average output from pfrepp:
+
+                                                           en1eo12Ic1
+                                                           en1eo12Ic2
+COMPLETED - IGNORE --------------------------------------  en1eo12Ic3
+                                                           en1eo12Ic4
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo14Ic1 en1eo14Ic1
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo14Ic2 en1eo14Ic2
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo14Ic3 en1eo14Ic3
+                                                           en1eo14Ic4
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo16Ic1 en1eo16Ic1
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo16Ic2 en1eo16Ic2
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo16Ic3 en1eo16Ic3
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo16Ic4 en1eo16Ic4
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo8Ic1  en1eo8Ic1
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo8Ic2  en1eo8Ic2
+/intrepid-fs0/users/wilde/persistent/LZ/pfrepps/en1eo8Ic3  en1eo8Ic3
+                                                           en1eo8Ic4
+
+
+
+
+
+en1eo8Ic1 - not sure why this was omitted from set02???
+
+
+en1eo16Ic1 - completed atmos avg as part of 3/8 steps that completed in a full-run attempt
+en1eo12Ic3 - Laura is running analysis on this reali, so that one must have been pfrepp'ed by her
+
+Set03 to do: 
+
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/run002/run004     en1eo14Ic1
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo14Ic2/run005/run001 en1eo14Ic2
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo14Ic3/run006/run002 en1eo14Ic3
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/many-en1/run008/run009/        en1eo8Ic1
+
+set04:
+
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo14Ic4/run007/run004 en1eo14Ic4
+/intrepid-fs0/users/wilde/persistent/LZ/yearly.2012.0708/en1eo8Ic4/run008/run004  en1eo8Ic4
+
+set05 (tentative):
+
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/en1eo12Ic1/run001 en1eo12Ic1 MISSING
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/en1eo12Ic2/run002 en1eo12Ic2 MISSING
+/intrepid-fs0/users/lzamboni/persistent/yearly-nco/combined/run001/run003     en1eo12Ic4 SHORT by half - Rerun?
+
+
+============= TO FIX ==============
+
+[x] first of all, if you need to run, make sure you're using the following version
+of NCO (which is a package that handles netCDF files)
+
+nco-4.0.9 for yearly reorganization
+
+nco-3.9.2 for pfrepp and for combine (combine might not used nco at all, but
+this is our solution today)
+
+when we know makeyearly, combine and pfrepp work, we may want to test whether
+the following version works for all cases (as ALCF Support suggests)
+
+3.9.9-udunits2
+
+---
+
+You can referred to my working dir /home/lzamboni/SWIFT/output/en1eo12Ic3 to
+track these changes (don't look at either en1eo12Ic1 or en1eo12Ic2):
+
+[no] 1) I renamed combine_realization.sh as combine_realizations.sh (note the "s"
+before .sh) since this is what the script expects.
+
+[x] fixed in README: 2) what is referred to as runall.combine_realizations.sh in the README (see in
+6.) is actually runall.sh so I changed that.
+
+3) On real.todo I added the full path of the input files and not only the name
+  of the realization. (what in the README is Climo_001 needs to be
+  /intrepid-fs0/users/lzamboni/persistent/Climo_001/history instead; what
+  changes between different realizations is only the dirname between
+  persistent and history)
+
+[x] used combine.chk.sh version, renamed it to combine.sh: 4) what is referred to as combine.sh is actually combine.chk.sh
+
+[x] .shk version used $USER: 5) on this combine.chk.sh I changed "wilde" to "lzamboni" 
+ 
+6) this is the part that needs adjustments: in runall.config I changed outdir
+(which is where the output goes)
+
+if you look at the dir I pointed you to, you'll see that in that case
+outdir=/intrepid-fs0/users/lzamboni/persistent/combined/en1eo12Ic3 which
+creates "en1eo12Ic3" under /intrepid-fs0/users/lzamboni/persistent/combined/
+
+as is, I cannot process 2 realizations (say en1eo12Ic4 and en1eo14Ic1) and
+have the output in separate directories.
+
+At the moment I am running runall.sh to process 2 realizations (en1eo12Ic4 and
+en1eo14Ic1). I set outdir=/intrepid-fs0/users/lzamboni/persistent/combined/ in
+runall.config
+
+Even if it completes without problems, I would like to be able to specify just
+the name of the realization (real_name) and obtaining the output in
+/intrepid-fs0/users/lzamboni/persistent/combined/real_name.  The input data
+are always in /intrepid-fs0/users/lzamboni/persistent/real_name/history.
+




More information about the Swift-commit mailing list