[Swift-user] oops failures on OSG

Mats Rynge rynge at renci.org
Fri Feb 20 20:14:38 CST 2009


On Fri, Feb 20, 2009 at 06:15:51PM -0600, Mihael Hategan wrote:
> 
> ----- Mats Rynge <rynge at renci.org> wrote:
> > On Fri, Feb 20, 2009 at 04:10:30PM -0600, Mihael Hategan wrote:
> > > I'm seeing this in the logs:
> > > No status file was found. Check the shared filesystem on UCSDT2-B
> > > 
> > 
> > This reminds me to bring up an issue other users have had on UCSDT2 and
> > similar sites on OSG. UCSDT2 and UCSDT2-B are two interfaces to the
> > same cluster (shared filesystem is the same). If you give those as two
> > separate sites to Swift, will Swift be confused about the data caching?
> > That is, is the data caching directory named the same on all sites in a
> > run?
> 
> It's likely that files in the shared directory will get messed up.
> 
> I can see no benefit in swift seeing one site as two, so I would recommend
> against doing this in general.

OSG is not going to stop doing this. The benefits for the site are the
ability to spread the load on multiple gatekeepers, and the ability
to do maintenance on one node and still have the resource available
with the other one.

One fix could be to add something site specific to the path of data
cache directory. That would mean duplicate copies of the data on some
sites, but if that is how the sites are advertised, I think that is
acceptable.

For users under the OSG Engagement VO, the problematic sites are:

UCSDT2          <->  UCSDT2-B
BNL_ATLAS_1     <->  BNL_ATLAS_2
Purdue-RCAC     <->  Purdue-Steele
FNAL_FERMIGRID  <->  FNAL_GPGRID_1

In your site catalog, make sure you only have one of the sites on each
line above.

-- 
Mats Rynge
Renaissance Computing Institute <http://www.renci.org>



More information about the Swift-user mailing list