[Swift-devel] feature request
Michael Wilde
wilde at mcs.anl.gov
Wed Apr 29 11:05:21 CDT 2009
Separately, to test on both OSG and TG, you need to have a good
understanding of where data and apps live, and how various directories
are intended to be used.
OSG has a set of generic storage locations (tmp, app, data, etc), and TG
has basically home and scratch for each user, with some provision for
group directories.
We should document the relationships between what the grids specify, how
swift users should use the various dirs, and how the tests use those
dirs (which should test what we tell the users to do).
The site tools and the tests need to be (and I think are) are already
cognizant of this. But theres a few variations possible, like putting
apps under $APP vs $DATA, and how writability is based on your VO, and
how and when subdirectories should be used.
We should locate the relevant Grid doc pages on these topics, and link
to them from the "local details" section of the Swift user guide.
This isnt very complicated once the following are documented:
- each grids conventions for directories
- permissions and multi-user issues
- transience and space management issues
- group sharing of various directories
In the current simple Swift model where permanent data lives on the
submit host or a specified server and the workdirectory is always
transient, this is not much of an issue.
In the future, Swift might support the caching of files on sites between
workflows, and then we'll need to have a bit more complex model.
What I'm saying here is perhaps not precise enough, but I believe its
important enough, and confusing enough to most users, that we should
document it clealy and provide pointers and examples.
- Mike
On 4/29/09 10:52 AM, Michael Wilde wrote:
> The message:
>
> Could not find any valid host for task "Task(type=UNKNOWN,
> identity=urn:cog-1241018496704)" with constraints
> {filenames=[Ljava.lang.String;@7f38f3d1, trfqn=cat,
> filecache=org.griphyn.vdl.karajan.lib.cache.CacheMapAdapter at 740f5f97,
> tr=cat}
>
> means there was no tc.data entry for cat on any site.
>
> So I suspect this is due to your pool handles not natching your tc.data
> site names.
>
> On 4/29/09 10:35 AM, Zhao Zhang wrote:
>> Hi, again.
>>
>> Now I have two sites.xml for renci-engagement. The first one is
>> working, while the second is not. The second sites.xml is generated by
>> the script.
>> Since the only difference between the first and the second is the
>> <workdirectory> value, I am assuming the error is caused by the
>> <workdirectory>.
>> If this is true, how could I find all valid <workdirectory> for all
>> sites on OSG?
>>
>> zhao
>>
>> First:
>> <config>
>> <pool handle="renci-engage">
>> <gridftp url="gsiftp://belhaven-1.renci.org/" />
>> <execution provider="condor" />
>> <workdirectory >/nfs/home/osgedu/benc</workdirectory>
>> <profile namespace="globus" key="jobType">grid</profile>
>> <profile namespace="globus" key="gridResource">gt2
>> belhaven-1.renci.org/jobmanager-fork</profile>
>> </pool>
>> </config>
>>
>> Second:
>> <config>
>> <!-- RENCI-Engagement -->
>> <pool handle="RENCI-Engagement" >
>> <gridftp url="gsiftp://belhaven-1.renci.org/" />
>> <execution provider="condor" />
>> <workdirectory
>> >/nfs/osg-data/engage/tmp/RENCI-Engagement</workdirectory>
>> <profile namespace="globus" key="jobType">grid</profile>
>> <profile namespace="globus" key="gridResource">gt2
>> belhaven-1.renci.org/jobmanager-fork</profile>
>> </pool>
>> </config>
>>
>> The error message is:
>> [zzhang at tp-grid1 sites]$ ./run-site condor-g/RENCI-Engagement.xml
>> testing site configuration: condor-g/RENCI-Engagement.xml
>> Removing files from previous runs
>> Running test 061-cattwo at Wed Apr 29 10:21:31 CDT 2009
>> Swift 0.9rc2 swift-r2860 cog-r2388
>>
>> RunID: 20090429-1021-8ayf7v75
>> Progress:
>> Execution failed:
>> Could not find any valid host for task "Task(type=UNKNOWN,
>> identity=urn:cog-1241018496704)" with constraints
>> {filenames=[Ljava.lang.String;@7f38f3d1, trfqn=cat,
>> filecache=org.griphyn.vdl.karajan.lib.cache.CacheMapAdapter at 740f5f97,
>> tr=cat}
>> SWIFT RETURN CODE NON-ZERO - test 061-cattwo
>>
>>
>> Ben Clifford wrote:
>>> On Wed, 29 Apr 2009, Zhao Zhang wrote:
>>>
>>>
>>>> I modified the existing swift-osg-ress-site-catalog, and generate a
>>>> sample
>>>> sites.xml at CI network
>>>> /home/zzhang/swift_coaster/cog/modules/swift/tests/sites/condor-g/sample-sites.xml
>>>>
>>>>
>>>> Could you help me check if there is any known error in there? thanks.
>>>>
>>>
>>> I looked briefly and it seems ok
>>>
>>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list