[AG-TECH] DPPT3 Failure - Conflict description and a fix

Rick Stevens stevens at mcs.anl.gov
Thu Oct 18 21:07:52 CDT 2001


Awesome debugging!!

At 05:57 PM 10/17/2001 -0500, Marty Hoag wrote:
>    For the past few months several of us have had some strange problems
>with DPPT.  Our clients would fail for no apparent reason.  While the
>last message is something like "nullpointer exception" we noticed that
>up a few lines there is a message like
>
>    Caught JSDT exception: name in use
>
>We had this problem on Monday during a virtual conference in genomics
>and bioinformatics we were hub for.  Cindy Sievers at LANL mentioned
>a couple times in the moo that her client was failing but then later
>NDSU's failed and hers worked.  This pattern was consistent throughout
>the day.  We had tried rebooting our display at one point even.
>Kay Gunn was great in calling out slide changes but it wasn't so fun
>for Cindy or Jim Senechal here who had to manually keep up (Jim
>was too far from the screen to read the numbers so we put binoculars
>on our AG Node large event requirements list).
>
>    It struck me that this sounded like some sort of resource conflict
>between LANL and NDSU.  We looked at the agserv command window and
>noticed the clients are identified by userid and host name.  We
>saw that ours was coming in as  ag at agdisplay  .  Cindy had the same
>combination!  Bob Olson has verified that this is indeed a problem
>by checking the code.
>
>    While I think we had the windows "DNS" stuff set up to use both a
>hostname and domain, we did NOT have that set up in the "Net ID" field
>of windows.  We think we have found a quick fix for this to avoid any
>conflict.  There could be some strange implications if you are using
>windows networking in some way but we made the change Monday evening
>and ran all day Tuesday with LANL and NDSU coexisting and had no other
>problems.
>
>    Here is the "quick fix" we used for W2K (we are at SP2 but I don't
>think that should matter).  Please note that I'm not a windows
>expert so my terminology might be wrong but I think I have our
>procedures down).  Use at your own risk (at least until independently
>confirmed):
>
>1) Go do Start / Settings / Control Panel / System
>
>2) Click on the Network Identification tab.  If your "Full computer
>name" already is fully qualified then your machine should be
>uniquely identified (at least for this problem).  If not go on...
>
>3) Click on the Properties button.
>
>4) The Computer Name is just the first element of the name.  To add a
>domain to qualify this click on the More... button.
>
>5) For primary DNS suffix for this computer enter your domain.  E.g.
>if the machine is agdisplay.foo.bar.edu then you'd add foo.bar.edu
>here.
>
>6) We UNCHECKED the "Change primary DNS suffix when domain membership
>changes" because we didn't understand it and didn't want to do something
>we didn't understand.
>
>7) We OKed those and I think had to reboot.  Then when we connected to
>our agserv we saw our Client ID as  ag at agdisplay.ndsu.nodak.edu .
>
>    Since more sites are joining the AG and using more standardized
>software and documentation that probably explains why this seems to
>be a more common problem (or maybe Cindy and NDSU are the only ones
>who are "uncreative").
>
>    Given work on a future replacement for DPPT (rppt) and the apparent
>ease with fixing this (we probably need more than one site to try the
>fix to confirm that) I'm not sure it is worth spending time rewriting
>DPPT at this point.  But that would be for others to decide.
>
>    At least LANL and NDSU should be able to both be clients to the same
>dppt server now.  ;-)  And this is yet another proof of the value of
>collaboration and tools like the moo!  I owe Cindy a couple breakfast
>burritos.
>
>    Marty




More information about the ag-tech mailing list