<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

<br>

Mihael Hategan wrote:

<blockquote cite="mid:1213934501.1194.18.camel@localhost" type="cite">

  <pre wrap="">On Thu, 2008-06-19 at 22:24 -0500, Ioan Raicu wrote:

  </pre>

  <blockquote type="cite">

    <pre wrap="">

Mihael Hategan wrote: 

    </pre>

    <blockquote type="cite">

      <pre wrap="">There's probably a misunderstanding. Mike seemed to suggest that, when

using BG/P, there should be multiple services in order to distribute

load. 

      </pre>

    </blockquote>

    <pre wrap="">Yes, he was correct.

    </pre>

    <blockquote type="cite">

      <pre wrap="">That I think is a problem. 

      </pre>

    </blockquote>

    <pre wrap="">I don't follow.  If your goal is to just show that it works at small

scales (100s, maybe 1000s of CPUs), you don't need this, but if you

want to have any chance of scaling to 160K CPUs, I don't think you'll

have many options :(

    </pre>

  </blockquote>

  <pre wrap=""><!---->

If your service scales linearly, then splitting it into multiple

processes does not help. But now you have more services to maintain.

That's because k*n = c*k*(n/c), where k would be your linearity factor.

If you have worse, say k*n^2, then dividing makes sense because

c*k*((n/c)^2) = k*n/c, which is better than k*(n^2).

The point is that I'd rather spend my time making the algorithm linear

than dealing with multiple services.

Now, of course, as you mention, it may not be possible to do so because

the problem is at the networking layer. So we should probably stop

talking until we know what the actual bottleneck is. And I mean *know*.

Do we?

  </pre>

</blockquote>

For Falkon, it was a networking issue (couple with the amount of

CPU/RAM the node had where the service was running), that was causing

one Falkon service to not scale beyond 10K+ CPUs reliably, when using

persistent sockets.  Note that when not using persistent sockets, as is

the case with GT4.0.x WS, we were able to scale to 50K CPUs just fine,

but in this case, there were never more than a few 100 TCP connections

that the service had to maintain at the same time, which is why it

scaled so well.  Now, that is not to say that your implementation of

Coaster won't scale to 160K CPUs all from 1 service, but from my

experience, a server (implemented in Java anyways) using select with

2~4GB of memory and 4 CPU cores will not be able to handle 100K+

concurrent TCP connections that are all active at the same time. 

Anyways, I never did a thorough study of this to see what part of the

networking stack or OS level calls was the problem... I'd be curious to

see how far Coaster will scale with a single service using TCP, so it

might be worth running 1 Coaster service on a login node, and trying to

see how many CPUs it can manage before running into trouble.<br>

<br>

Ioan<br>

<blockquote cite="mid:1213934501.1194.18.camel@localhost" type="cite">

  <pre wrap="">

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

===================================================

Ioan Raicu

Ph.D. Candidate

===================================================

Distributed Systems Laboratory

Computer Science Department

University of Chicago

1100 E. 58th Street, Ryerson Hall

Chicago, IL 60637

===================================================

Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@cs.uchicago.edu">iraicu@cs.uchicago.edu</a>

Web:   <a class="moz-txt-link-freetext" href="http://www.cs.uchicago.edu/~iraicu">http://www.cs.uchicago.edu/~iraicu</a>

<a class="moz-txt-link-freetext" href="http://dev.globus.org/wiki/Incubator/Falkon">http://dev.globus.org/wiki/Incubator/Falkon</a>

<a class="moz-txt-link-freetext" href="http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page">http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page</a>

===================================================

===================================================

</pre>

</body>

</html>