<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<br>

<br>

Mihael Hategan wrote:

<blockquote cite="mid:1256597730.10196.49.camel@localhost" type="cite">

  <pre wrap="">On Mon, 2009-10-26 at 16:36 -0500, Ioan Raicu wrote:

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">  

      </pre>

    </blockquote>

    <pre wrap="">Here were our experiences with running scripts from GPFS. The #s below

represents the throughput for invoking scripts (a bash script that

invoked a sleep 0) from GPFS on 4 workers, 256 workers, and 2048

workers. 

Number of Processors

Invoke script throughput (ops/sec)

                                  4

                            125.214

                                256

                           109.3272

                               2048

                           823.0374

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Looks right. What I saw was that things were getting shitty at around

10000 cores. Lower if info writing, directory making, and file copying

was involved.

  </pre>

</blockquote>

Right.<br>

<blockquote cite="mid:1256597730.10196.49.camel@localhost" type="cite">

  <pre wrap="">

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">[...]  

      </pre>

    </blockquote>

    <pre wrap="">In our experience with Falkon, the limit came much sooner than 64K. In

Falkon, using the C worker code (which runs on the BG/P), each worker

consumes 2 TCP/IP connections to the Falkon service.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Well, the coaster workers use only one connection.

  </pre>

</blockquote>

Its 1 connection per core? or per node? Zhao tried to reduce to 1

connection per node, but the worker was not stable, so we left it alone

in the interest of time. The last time I looked at it, the workers used

2 connections per core, or 8 connections per node. Quite inefficient at

scale, but not an issue given that each service only handles 256 cores.

<br>

<blockquote cite="mid:1256597730.10196.49.camel@localhost" type="cite">

  <pre wrap="">

  </pre>

  <blockquote type="cite">

    <pre wrap=""> In the centralized Falkon service version, this racks up connections

pretty quick. I don't recall at exactly what point we started having

issues, but it was somewhere in the range of 10K~20K CPU cores.

Essentially, we could establish all the connections (20K~40K TCP

connections), but when the experiment would actually start, and data

needed to flow over these connections, all sort of weird stuff started

happening, TCP connection would get reset, workers were failing (e.g.

their TCP connection was being severed and not being re-established),

etc. I want to say that 8K (maybe 16K) cores was the largest tests we

made on the BG/P with a centralized Falkon service, that were stable

and successful. 

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Possible. I haven't properly tested above 12k workers. I was just

mentioning a theoretical limitation that doesn't seem possible to beat

without having things distributed.

[...]

  </pre>

  <blockquote type="cite">

    <pre wrap="">For the BG/P specifically, I think the distribution of the Falkon

service to the I/O nodes gave us a low maintanance, robust, and

scalable solution!

    </pre>

  </blockquote>

  <pre wrap=""><!---->

Lower than if you only had to run one service on the head node?

  </pre>

</blockquote>

Yes, in fact it was for Falkon. If we ran Falkon on the head node, the

user would have to start it manually, on an available port, and then

shut it down when finished. Running things on the I/O nodes was tougher

at the beginning, but once we got it all configured and running, it was

great! The Falkon service starts up on I/O node boot time, on a

specific port (no need to check if its available as the I/O node is

dedicated to the user), all compute nodes can easily find their

respective I/O nodes at the same location (some 192.xxx private

address), and when the run is over, the I/O nodes terminate and the

services stop all on their own. At least for Falkon, it really made the

difference between having a turn-key solution that always works, and

one that would require constant tinkering (starting and stopping) and

configuration (e.g. ports). <br>

<br>

Again, the downside to the distributed one, was the overhead of

implementing and testing it, and also the load-balancing that required

a bit of fine tunning in Swift to get just right.<br>

<br>

Ioan<br>

<blockquote cite="mid:1256597730.10196.49.camel@localhost" type="cite">

  <pre wrap="">

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

=================================================================

Ioan Raicu, Ph.D.

NSF/CRA Computing Innovation Fellow

=================================================================

Center for Ultra-scale Computing and Information Security (CUCIS)

Department of Electrical Engineering and Computer Science

Northwestern University

2145 Sheridan Rd, Tech M384 

Evanston, IL 60208-3118

=================================================================

Cel:   1-847-722-0876

Tel:   1-847-491-8163

Email: <a class="moz-txt-link-abbreviated" href="mailto:iraicu@eecs.northwestern.edu">iraicu@eecs.northwestern.edu</a>

Web:   <a class="moz-txt-link-freetext" href="http://www.eecs.northwestern.edu/~iraicu/">http://www.eecs.northwestern.edu/~iraicu/</a>

       <a class="moz-txt-link-freetext" href="https://wiki.cucis.eecs.northwestern.edu/">https://wiki.cucis.eecs.northwestern.edu/</a>

=================================================================

=================================================================

</pre>

</body>

</html>