[Swift-devel] Issues with Montage & Swift-CDM
Emalayan Vairavanathan
svemalayan at yahoo.com
Fri Mar 23 11:16:37 CDT 2012
Hi Justin,
Sure we can meet, I am free too. Samer can you also join ?
Thank you
Emalayan
________________________________
From: Justin M Wozniak <wozniak at mcs.anl.gov>
To: Emalayan Vairavanathan <svemalayan at yahoo.com>
Cc: Jonathan Monette <jonmon at mcs.anl.gov>; Justin M Wozniak <wozniak at mcs.anl.gov>; matei <matei at ece.ubc.ca>; "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; MosaStore <mosastore at googlegroups.com>
Sent: Friday, 23 March 2012 6:25 AM
Subject: Re: [Swift-devel] Issues with Montage & Swift-CDM
For MosaSwift purposes, CDM DIRECT is used to place files in a file system (Mosa) accessible to the worker nodes but not the login node. readData() is executed by the Swift Java process on the login node to read data into script variables.
Can we set up a phone call today? I am free at 2pm Central.
Justin
On Thu, 22 Mar 2012, Emalayan Vairavanathan wrote:
> Hi Justin and Jon,
>
> I thought the goal of having CDM is to provide a translation between the file data type (used in swift) and the actual location of the files. This will help to avoid the actual location of the file being hard coded in the swift-script and also help swift to harness the platform specific data transfer mechanisms.
>
> But from what Justin said, it seems the issue is with the rules (usage of CDM_DIRECT Vs CDM DEFAULT). I did not understand how such translation layer get confused by CDM_DIRECT / CDM DEFAULT. Does readData() calles does not go through this translation layer ?
>
>
> May be I am wrong here. If so please correct me and provide more high level information.
>
>
>
> Jon: What is the action plan ? Do we need modification in the monage swift scripts ? or Do you suggest me to use CDM rules with CDM_DEFAULT for some files ? (In this case these intermediate files will be stored in GPFS)
>
> Thank you very much
>
> Emalayan
>
>
>
> ________________________________
> From: Jonathan Monette <jonmon at mcs.anl.gov>
> To: Justin M Wozniak <wozniak at mcs.anl.gov> Cc: Emalayan Vairavanathan <svemalayan at yahoo.com>; matei <matei at ece.ubc.ca>; "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; MosaStore <mosastore at googlegroups.com> Sent: Thursday, 22 March 2012 3:11 PM
> Subject: Re: [Swift-devel] Issues with Montage & Swift-CDM
>
> So those files are in _concurrent. Those files(the ones mapped with the concurrent mapper) are read by readData or readData2. So if he uses fs_1.data from a previous email it works, which he said he confirmed.
>
> On Mar 22, 2012, at 5:03 PM, Justin M Wozniak wrote:
>
>>
>> I think SwiftMontage is trying to do a readData() on a CDM DIRECT file. This file is not accessible to the Swift engine, as it will be on the compute nodes in MosaStore.
>>
>> We need to enumerate the file names that must be accessible to Swift for readData(). These will have to be CDM DEFAULT. The rest can be CDM DIRECT.
>>
>> Justin
>>
>> On Thu, 22 Mar 2012, Justin M Wozniak wrote:
>>
>>> Ok, I can run the whole workflow with fs.data:
>>>
>>> rule .*raw_dir.* DIRECT /home/emalayan/app/montage-swift-cdm/SwiftMontage/scripts
>>> rule .*header.hdr.* DIRECT /home/emalayan/app/montage-swift-cdm/SwiftMontage/scripts
>>> # rule .*.* DIRECT /tmp/local
>>>
>>> I am now going to uncomment the last line...
>>>
>>> Emalayan, can you try to run that and see what happens?
>>>
>>> On Thu, 22 Mar 2012, Emalayan Vairavanathan wrote:
>>>
>>>> Thank you Jon. I am using swift-0.93 (this is in our Cluster). So I do not need to check for garbage-collector errors.
>>>> I forgot to mention one point about concurrent mappers. The pipeline swift benchmark I wrote uses concurrent mapper too. Last week I ran the benchmark successfully regardless of the concurrent mappers location (I tried various location for concurrent mappers). So I guess the location of the concurrent mappers wont be an issue.
>>>> Thank you
>>>> Emalayan
>>>> ________________________________
>>>> From: Jonathan Monette <jonmon at mcs.anl.gov>
>>>> To: Emalayan Vairavanathan <svemalayan at yahoo.com> Cc: Justin M Wozniak <wozniak at mcs.anl.gov>; matei <matei at ece.ubc.ca>; "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; MosaStore <mosastore at googlegroups.com> Sent: Thursday, 22 March 2012 2:45 PM
>>>> Subject: Re: [Swift-devel] Issues with Montage & Swift-CDM
>>>> I'll run with those rules. I saw the garbage collection exception in the log file. The exception may be causing no harm, but it probably shouldn't be throwing an exception at all so this may help fix it in trunk.
>>>> But it just occurred to me, which swift are you using again? I am running with the copy I grabbed from Justin's directory. Are you running that as well or with 0.93? If with 0.93 you will not see that exception because that version does not have the swift garbage collector so we should not waste time with that exception.
>>>> On Mar 22, 2012, at 4:40 PM, Emalayan Vairavanathan wrote:
>>>> Hi Jon,
>>>>> Thank you for you suggestions. By the way can you try to run motage with the CDM rules below and see whether the problem is with concurrent mapper and not because of location dependency ?
>>>>> rule .*raw_dir.* DIRECT /home/emalayan/App/montage-swift-cdm/SwiftMontage/scripts
>>>>> rule .*header.hdr.* DIRECT /home/emalayan/App/montage-swift-cdm/SwiftMontage/scripts/
>>>>> rule .*final.* DIRECT
>>>> /home/emalayan/App/montage-swift-cdm/SwiftMontage/scripts/
>>>>> rule .*.* DIRECT /tmp/local
>>>>> Meantime I will run the montage again and see whether swift garbage-collector throws some error.
>>>>> Please let me know if you have a better idea to debug the problem.
>>>>> Thank you
>>>>> Emalayan
>>>>> ________________________________
>>>>> From: Jonathan Monette <jonmon at mcs.anl.gov>
>>>>> To: Emalayan Vairavanathan <svemalayan at yahoo.com> Cc: Justin M Wozniak <wozniak at mcs.anl.gov>; matei <matei at ece.ubc.ca>; "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; MosaStore <mosastore at googlegroups.com> Sent: Thursday, 22 March 2012 2:20 PM
>>>>> Subject: Re: [Swift-devel] Issues with Montage & Swift-CDM
>>>>> So I was able to run it on surveyor and it completed. I see those same SetFielfValue lines but I don't think that is an issue. That is just because I declare an array called projected_images and fill it up inside another function with files I used the regexp mapper on. However I do see the Swift garbage collector kicking in and throwing exceptions:
>>>>> 2012-03-22 20:58:03,070+0000 INFO FileGarbageCollector Failed to clean file://localhost/_concurrent/back_struct-d10301cf-5be0-4918-b0c2-49be758cf53a-7-array//elt-2.-field/b
>>>>> java.lang.RuntimeException: org.globus.cog.abstraction.impl.file.FileNotFoundException: _concurrent/back_struct-d10301cf-5be0-4918-b0c2-49be758cf53a-7-array//elt-2.-field/b not found.
>>>>> at org.griphyn.vdl.mapping.AbsFile.clean(AbsFile.java:191)
>>>>> at org.griphyn.vdl.mapping.file.FileGarbageCollector.run(FileGarbageCollector.java:115)
>>>>> I am not sure if that may be causing problems for when Emalayan tries to run with CDM.
>>>>> Emalayan, it does not look like those scripts expect files to be in a certain location, at least that is not what is intended. In the main swiftscript, when you call the other functions you pass the directory name you want the intermediate files to be stored in. Then in the SwiftMontage_Batch functions it uses those directories you passed to map input/output files. The only things that are expected to be in certain places are the raw_dir and header.hdr. Those have to be in the pwd. However I will continue debugging to make sure those assumptions I made are holding.
>>>>> As to the different cdm setups, I do use the concurrent mapper(files that get dumped to _concurrent) where swift decides on the names. I did this for a couple files that I did not care what they were named and they were small enough that I didn't care if they were staged in/out or not. CDM may not like that. Perhaps Swift expects those _concurrent files in a certain place but you told CDM to put them someplace different. I am not sure, that is just a hypothesis. I can always change the scripts to not use the concurrent mapper and use a better mapper for the CDM rules if that turns out to be the case.
>>>>> On Mar 22, 2012, at 4:02 PM, Emalayan Vairavanathan wrote:
>>>>> Hi Justin,
>>>>>> For me the setup in /home/emalayan/app/montage-swift-cdm/SwiftMontage/scripts works.
>>>>>> I tried onlogin2.surveyor with the swift binaries located in /home/wozniak/Public/swift/bin/swift
>>>>>> May be you are using a different swift version or may be a different login machine. Thank you
>>>>>> Emalayan
>>>>>> ________________________________
>>>>>> From: Justin M Wozniak <wozniak at mcs.anl.gov>
>>>>>> To: Jonathan Monette <jonmon at mcs.anl.gov> Cc: Emalayan Vairavanathan <svemalayan at yahoo.com>; matei <matei at ece.ubc.ca>; "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; MosaStore <mosastore at googlegroups.com> Sent: Thursday, 22 March 2012 1:49 PM
>>>>>> Subject: Re: [Swift-devel] Issues with Montage & Swift-CDM
>>>>>> This run had no CDM
>>>> file.
>>>>>> On Thu, 22 Mar 2012, Jonathan Monette wrote:
>>>>>>> Was that with CDM in your run? I am going to take a look too as to why that is showing up.
>>>>>>> On Mar 22, 2012, at 3:42 PM, Justin M Wozniak wrote:
>>>>>>>> Ok, I can get it started but I get:
>>>>>>>> 2012-03-22 20:24:06,048+0000 INFO SetFieldValue Set: projected_images[0]=null
>>>>>>>> 2012-03-22 20:24:06,049+0000 INFO SetFieldValue Set: projected_images[6]=null
>>>>>>>> 2012-03-22 20:24:06,050+0000 INFO SetFieldValue Set: projected_images[7]=null
>>>>>>>> resulting in:
>>>>>>>> File not found: /gpfs/home/wozniak/SwiftMontage/scripts/./proj_dir/null
>>>>>>>> This looks like a Swift bug. However, do you guys have an existing workaround?
>>>>>>>> Thanks
>>>>>>>> On Thu, 22 Mar 2012, Emalayan
>>>> Vairavanathan wrote:
>>>>>>>>> Hi Justin,
>>>>>>>>> Please use ./run_local.sh to run the montage without cdm locally on the headnode.
>>>>>>>>> The rest of the scripts (run-workers.sh, run.sh, run-swift.sh, main.sh) are written to run experiments in our cluster and wont work in Surveyor.
>>>>>>>>> Please let me know if you have questions.
>>>>>>>>> Thank you
>>>>>>>>> Emalayan
>>>>>>>>> ________________________________
>>>>>>>>> From: Justin M Wozniak <wozniak at mcs.anl.gov>
>>>>>>>>> To: Emalayan Vairavanathan <svemalayan at yahoo.com> Cc: "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; MosaStore <mosastore at googlegroups.com>; matei <matei at ece.ubc.ca> Sent: Thursday, 22 March 2012 11:54 AM
>>>>>>>>> Subject: Re: [Swift-devel] Issues with Montage & Swift-CDM
>>>>>>>>> On Wed, 21 Mar 2012, Emalayan Vairavanathan wrote:
>>>>>>>>>> But I just setup everything on Surveyor and it works locally on the head node. You can find the setup here.
>>>>>>>>>> /home/emalayan/app/montage-swift-cdm/SwiftMontage/scripts
>>>>>>>>> What is the entry
>>>> point?
>>>>>>>>> Are we missing common.sh?
>>>>>>>>> -- Justin M
>>>> Wozniak
>>>>>>>> --
>>>>>>>> Justin M Wozniak_______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>>>> -- Justin M Wozniak
>>>
>>>
>>
>> -- Justin M Wozniak_______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-- Justin M Wozniak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120323/ff74f627/attachment.html>
More information about the Swift-devel
mailing list