[Swift-devel] Swift-issues (PBS+NFS Cluster)

Michael Wilde wilde at mcs.anl.gov
Tue May 12 12:10:36 CDT 2009



On 5/12/09 12:04 PM, Michael Wilde wrote:
> Indeed.
> 
> I should note that the intent in Swift is to reduce or eliminate 
> accesses to shared filesystems, so EBS and S3 would become similar.
> 
> The basic model of running a job would be: pull in the data you need, 
> operate on it on local disk, push it back. Cache locally where possible. 
> Only read direct from shared storage when the data is too big.
> 
> This is pretty much what happens now, except that the "pull" and "push" 
> are staged via a site shared filesystem.
(and that it doesnt do any size discrimination, I should add)
> 
> - Mike
> 
> 
> On 5/12/09 11:53 AM, Tim Freeman wrote:
>> On Tue, 12 May 2009 11:42:00 -0500
>> yizhu <yizhu at cs.uchicago.edu> wrote:
>>
>>> That's great. It makes file management between EC2 and S3 much 
>>> easier, I will definitely check it out.
>>>
>>
>> Note that every I/O op to S3 will be significantly slower than EBS (and
>> probably also slower than the local scratch disk).  Add in FUSE 
>> (userspace) and
>> that only compounds the problem.  So if you use this option I would 
>> suggest
>> taking measurements to make sure it is acceptable for the task at hand.
>>
>> Tim
>>
>>> Michael Wilde wrote:
>>>> Maybe relevant - there's a FUSE filesystem to mount an S3 bucket as 
>>>> a filesystem:
>>>>
>>>> http://code.google.com/p/s3fs/wiki/FuseOverAmazon
>>>>
>>>> - Mike
>>>>
>>>>
>>>> On 5/12/09 11:19 AM, Ioan Raicu wrote:
>>>>> Hi,
>>>>>
>>>>> yizhu wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've now got the swift system running on Amazon EC2 with swift 
>>>>>> installed on head-node of the cluster. I will soon let it the user 
>>>>>> submit the job from its own  machine after i solve the 
>>>>>> authentication issues of Globus.
>>>>>>
>>>>>>
>>>>>> I think my next step is to write a sample swift code to check if 
>>>>>> we can let swift grab input files from S3, execute it, and then 
>>>>>> write the output files back to S3.
>>>>>>
>>>>>>
>>>>>> Since each Amazon virtual node only has limited storage space( 
>>>>>> Small instance for 1.7GB, Large Instance for 7.5GB), 
>>>>> Are you sure? That sounds like like the amount of RAM each instance 
>>>>> gets. The last time I used EC2 (more than a year ago), each 
>>>>> instance had a disk of 100GB+ each, which could be treated as a 
>>>>> scratch disk that would be lost when the instance was powered down.
>>>>>> we may need to use EBS(Elastic Block Store) to storage temp files 
>>>>>> created by swift. The EBS behaved like a hard disk volume and can 
>>>>>> be mounted by any virtual nodes. But here arise a problem, since a 
>>>>>> volume can only be attached to one instance at a time[1], so the 
>>>>>> files store in EBS mounted by one node won't be shared by any 
>>>>>> other nodes; 
>>>>> The EBS sounds like the same thing as the local disk, except that 
>>>>> it is persisted in S3, and can be recovered when the instance is 
>>>>> started again later.
>>>>>
>>>>> You can use these local disks, or EBS disks, to create a 
>>>>> shared/parallel file system, or manage them yourself.
>>>>>> we lost the file sharing ability which I think is a fundamental 
>>>>>> requirement for swift.
>>>>> The last time I worked with the Workspace Service (now part of 
>>>>> Nimbus), we were able to use NFS to create a shared file system 
>>>>> across our virtual cluster. This allowed us to run Swift without 
>>>>> modifications on our virtual cluster. Some of the more recent work 
>>>>> on Swift might have eased up some of the requirements for a shared 
>>>>> file system, so you might be able to run Swift without a shared 
>>>>> file system, if you configure Swift just right. Mike, is this true, 
>>>>> or we aren't quite there yet?
>>>>>
>>>>> Ioan
>>>>>>
>>>>>> -Yi
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Michael Wilde wrote:
>>>>>>>
>>>>>>> On 5/7/09 12:54 PM, Tim Freeman wrote:
>>>>>>>> On Thu, 07 May 2009 11:39:40 -0500
>>>>>>>> Yi Zhu <yizhu at cs.uchicago.edu> wrote:
>>>>>>>>
>>>>>>>>> Michael Wilde wrote:
>>>>>>>>>> Very good!
>>>>>>>>>>
>>>>>>>>>> Now, what kind of tests can you do next?
>>>>>>>>> Next, I will try to let swift running on Amazon EC2.
>>>>>>>>>
>>>>>>>>>> Can you exercise the cluster with an interesting workflow?
>>>>>>>>> Yes, Is there any complex sample/tools i can use (rahter than 
>>>>>>>>> first.swift) to test swift performance? Is there any benchmark 
>>>>>>>>> available i can compare with?
>>>>>>>>>
>>>>>>>>>> How large of a cluster can you assemble in a Nimbus workspace ?
>>>>>>>>> Since the vm-image i use to test 'swift' is based on NFS shared 
>>>>>>>>> file system, the performance may not be satisfiable if the we 
>>>>>>>>> have a large scale of cluster. After I got the swift running on 
>>>>>>>>> Amazon EC2, I will try to make a dedicate vm-image by using 
>>>>>>>>> GPFS or any other shared file system you recommended.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Can you aggregate VM's from a few different physical clusters 
>>>>>>>>>> into one Nimbus workspace?
>>>>>>>>> I don't think so. Tim may make commit on it.
>>>>>>>> There is some work going on right now making auto-configuration 
>>>>>>>> easier to do
>>>>>>>> over multiple clusters (that is possible now, it's just very 
>>>>>>>> 'manual' and
>>>>>>>> non-ideal unlike with one physical cluster).  You wouldn't 
>>>>>>>> really want to do NFS
>>>>>>>> across a WAN, though.
>>>>>>> Indeed. Now that I think this through more clearly, one workspace 
>>>>>>> == one cluster == one Swift "site", so we could aggregate the 
>>>>>>> resources of multiple workspaces through Swift to execute a 
>>>>>>> multi-site workflow.
>>>>>>>
>>>>>>> - Mike
>>>>>>>
>>>>>>>>>> What's the largest cluster you can assemble with Nimbus?
>>>>>>>>> I am not quite sure,I will do some test onto it soon. since it 
>>>>>>>>> is a EC2-like cloud, it should easily be configured as a 
>>>>>>>>> cluster with hundreds of nodes. Tim may make commit on it.
>>>>>>>> I've heard of EC2 deployments in the 1000s at once, it's up to 
>>>>>>>> your EC2 account
>>>>>>>> limitations (they seem pretty efficient with making your quota 
>>>>>>>> ever-higher).
>>>>>>>> Nimbus installation at Teraport maxes out at 16, there are other 
>>>>>>>> 'science
>>>>>>>> clouds' but I don't know their node numbers.  EC2 is the place 
>>>>>>>> where you will
>>>>>>>> be able to really test scaling something.
>>>>>>>>
>>>>>>>> Tim
>>>>>>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list