[Swift-devel] Swift-issues (PBS+NFS Cluster)

yizhu yizhu at cs.uchicago.edu
Tue May 12 11:42:00 CDT 2009


That's great. It makes file management between EC2 and S3 much easier, I 
will definitely check it out.

Michael Wilde wrote:
> Maybe relevant - there's a FUSE filesystem to mount an S3 bucket as a 
> filesystem:
> 
> http://code.google.com/p/s3fs/wiki/FuseOverAmazon
> 
> - Mike
> 
> 
> On 5/12/09 11:19 AM, Ioan Raicu wrote:
>> Hi,
>>
>> yizhu wrote:
>>> Hi,
>>>
>>> I've now got the swift system running on Amazon EC2 with swift 
>>> installed on head-node of the cluster. I will soon let it the user 
>>> submit the job from its own  machine after i solve the authentication 
>>> issues of Globus.
>>>
>>>
>>> I think my next step is to write a sample swift code to check if we 
>>> can let swift grab input files from S3, execute it, and then write 
>>> the output files back to S3.
>>>
>>>
>>> Since each Amazon virtual node only has limited storage space( Small 
>>> instance for 1.7GB, Large Instance for 7.5GB), 
>> Are you sure? That sounds like like the amount of RAM each instance 
>> gets. The last time I used EC2 (more than a year ago), each instance 
>> had a disk of 100GB+ each, which could be treated as a scratch disk 
>> that would be lost when the instance was powered down.
>>> we may need to use EBS(Elastic Block Store) to storage temp files 
>>> created by swift. The EBS behaved like a hard disk volume and can be 
>>> mounted by any virtual nodes. But here arise a problem, since a 
>>> volume can only be attached to one instance at a time[1], so the 
>>> files store in EBS mounted by one node won't be shared by any other 
>>> nodes; 
>> The EBS sounds like the same thing as the local disk, except that it 
>> is persisted in S3, and can be recovered when the instance is started 
>> again later.
>>
>> You can use these local disks, or EBS disks, to create a 
>> shared/parallel file system, or manage them yourself.
>>> we lost the file sharing ability which I think is a fundamental 
>>> requirement for swift.
>> The last time I worked with the Workspace Service (now part of 
>> Nimbus), we were able to use NFS to create a shared file system across 
>> our virtual cluster. This allowed us to run Swift without 
>> modifications on our virtual cluster. Some of the more recent work on 
>> Swift might have eased up some of the requirements for a shared file 
>> system, so you might be able to run Swift without a shared file 
>> system, if you configure Swift just right. Mike, is this true, or we 
>> aren't quite there yet?
>>
>> Ioan
>>>
>>>
>>> -Yi
>>>
>>>
>>>
>>>
>>>
>>> Michael Wilde wrote:
>>>>
>>>>
>>>> On 5/7/09 12:54 PM, Tim Freeman wrote:
>>>>> On Thu, 07 May 2009 11:39:40 -0500
>>>>> Yi Zhu <yizhu at cs.uchicago.edu> wrote:
>>>>>
>>>>>> Michael Wilde wrote:
>>>>>>> Very good!
>>>>>>>
>>>>>>> Now, what kind of tests can you do next?
>>>>>> Next, I will try to let swift running on Amazon EC2.
>>>>>>
>>>>>>> Can you exercise the cluster with an interesting workflow?
>>>>>> Yes, Is there any complex sample/tools i can use (rahter than 
>>>>>> first.swift) to test swift performance? Is there any benchmark 
>>>>>> available i can compare with?
>>>>>>
>>>>>>> How large of a cluster can you assemble in a Nimbus workspace ?
>>>>>> Since the vm-image i use to test 'swift' is based on NFS shared 
>>>>>> file system, the performance may not be satisfiable if the we have 
>>>>>> a large scale of cluster. After I got the swift running on Amazon 
>>>>>> EC2, I will try to make a dedicate vm-image by using GPFS or any 
>>>>>> other shared file system you recommended.
>>>>>>
>>>>>>
>>>>>>> Can you aggregate VM's from a few different physical clusters 
>>>>>>> into one Nimbus workspace?
>>>>>> I don't think so. Tim may make commit on it.
>>>>>
>>>>> There is some work going on right now making auto-configuration 
>>>>> easier to do
>>>>> over multiple clusters (that is possible now, it's just very 
>>>>> 'manual' and
>>>>> non-ideal unlike with one physical cluster).  You wouldn't really 
>>>>> want to do NFS
>>>>> across a WAN, though.
>>>>
>>>> Indeed. Now that I think this through more clearly, one workspace == 
>>>> one cluster == one Swift "site", so we could aggregate the resources 
>>>> of multiple workspaces through Swift to execute a multi-site workflow.
>>>>
>>>> - Mike
>>>>
>>>>>>
>>>>>>> What's the largest cluster you can assemble with Nimbus?
>>>>>> I am not quite sure,I will do some test onto it soon. since it is 
>>>>>> a EC2-like cloud, it should easily be configured as a cluster with 
>>>>>> hundreds of nodes. Tim may make commit on it.
>>>>>
>>>>> I've heard of EC2 deployments in the 1000s at once, it's up to your 
>>>>> EC2 account
>>>>> limitations (they seem pretty efficient with making your quota 
>>>>> ever-higher).
>>>>> Nimbus installation at Teraport maxes out at 16, there are other 
>>>>> 'science
>>>>> clouds' but I don't know their node numbers.  EC2 is the place 
>>>>> where you will
>>>>> be able to really test scaling something.
>>>>>
>>>>> Tim
>>>>
>>>
>>>
>>
> 




More information about the Swift-devel mailing list