[MPICH2-dev] Fwd: [SiCortex-ANL] Re: visit day 1 summary

Rusty Lusk lusk at mcs.anl.gov
Wed Sep 12 09:30:46 CDT 2007


There is a cryptic note in here about user-space RDMA on the  
SiCortex, which the SiCortex guys might already be using for parts of  
MPI.  But we should probably learn more about it.   Acquisition of  
this 6000-processor machine for research purposes seems to be on  
track at this point.

Begin forwarded message:

> From: Kazutomo Yoshii <kazutomo at mcs.anl.gov>
> Date: September 11, 2007 8:29:56 PM CDT
> To: sicortex-anl at googlegroups.com
> Subject: [SiCortex-ANL] Re: visit day 1 summary
> Reply-To: sicortex-anl at googlegroups.com
>
>
>
> Thank you for summering today's work > Narayan,
>
> As Narayan described, later boot stage uses nbd
> but early boot stages(before Linux kernel)
> is interesting. There are 3 stages. It's something like
> stage1: cpu loads instructions (one by one!) from jtag,
>         initialize regs and load stage2 (to L1 cache?)
> stage2: loads stage3 into L2 cache
> stage3: initialize main memory, copy vmlinux into main memory.
>
> Binary on each stage are very small. According to Larry,
> those early stages take about 45 secs.
> I believe there is not much room to optimize these steps.
> (putting the binaries into prom might reduce boottime but
> guess they don't like)
>
> I'm just using scethernet as short term solution but
> to beat lustre, we have to master SiCortex RDMA programming.
> Lustre is already using RDMA natively.
>
> The API of RDMA are not documented yet (I think they are still
> designing API). Larry also exaplied me overview of RMDA and
> showed me some codes.
> API works with userspace well. completely memory mapped I/O.
> we don't need to operate any weird special register stuff,
> which is very nice.
>
> Thanks,
> Kaz
>
>> Anthony, Kaz and I have nearly finished our first day at
>> sicortex. Here is where things are.
>>
>> A first bit of news is that Sicortex has abandoned lustre for  
>> boot, at
>> least for now. (they may revisit this later) For now, they are using
>> an nbd device over scethernet (reexported from an internal node)
>> So we don't need to do much work to completely replace lustre on the
>> system overall; we just need to get pvfs up as a parallel filesystem.
>>
>> As far as system I/O goes, there are two issues. The first is that  
>> the
>> I/O -- gateway nodes are wimpy. The second is that we need to forward
>> traffic from the internal nodes over the internal network to the I/O
>> nodes and then pass the data out of the machine. They are really
>> excited about the prospect of using ZOID for this; it seems like they
>> haven't really thought a ton about I/O so far. (They seem to be
>> looking to us to motivate them in this regard)
>>
>> The best news so far is that Kaz has managed to boot a kernel that he
>> built on the compute nodes. PVFS doesn't yet work; it appears that  
>> the
>> mips kernel doesn't export a function for modules that pvfs
>> needs. Kaz expects this to be pretty easy to fix. He has copious
>> details on how the boot process works at this point.
>>
>> Anthony has spent the day with the application environment. There  
>> were
>> some issues with getting compilers up and running, but he has
>> successfully run an application and gotten jumpshot running on the
>> logfile. So far, it appears that the time data is pretty low
>> resolution, so the profiling trace is less than useful. He's got a
>> handle on compiling and running codes. There are a few apps issues:
>>  - MPI_WTIME_IS_GLOBAL is misleading
>>  - no MPIO in provided MPI (this is expected)
>>  - MPI_Wtime is too low resolution to be useful
>>
>> I've spent the day on logistics and I/O. The logistics have worked  
>> out
>> reasonably, and we've worked up a plan for I/O testing and software
>> setup while we are here.
>>
>> They are getting a real server together for tomorrow to run pvfs. We
>> are going to try using IP forwarding to start with, to see how things
>> perform. (we expect not too well). Zoid might be another option.
>> While Kamil's zoid won't work for this system without some work,  
>> maybe
>> Ivan's prototype will work. We are going to look at these things
>> tomorrow morning.
>>  -nld + Anthony + Kaz
>>
>>>
>
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google  
> Groups "SiCortex-ANL" group.
> To post to this group, send email to sicortex-anl at googlegroups.com
> To unsubscribe from this group, send email to sicortex-anl- 
> unsubscribe at googlegroups.com
> For more options, visit this group at http://groups.google.com/ 
> group/sicortex-anl?hl=en
> -~----------~----~----~----~------~----~------~--~---
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070912/b812377f/attachment.htm>


More information about the mpich2-dev mailing list