[Swift-devel] Swift-issues (PBS+NFS Cluster)
yizhu
yizhu at cs.uchicago.edu
Tue May 12 11:42:00 CDT 2009
That's great. It makes file management between EC2 and S3 much easier, I
will definitely check it out.
Michael Wilde wrote:
> Maybe relevant - there's a FUSE filesystem to mount an S3 bucket as a
> filesystem:
>
> http://code.google.com/p/s3fs/wiki/FuseOverAmazon
>
> - Mike
>
>
> On 5/12/09 11:19 AM, Ioan Raicu wrote:
>> Hi,
>>
>> yizhu wrote:
>>> Hi,
>>>
>>> I've now got the swift system running on Amazon EC2 with swift
>>> installed on head-node of the cluster. I will soon let it the user
>>> submit the job from its own machine after i solve the authentication
>>> issues of Globus.
>>>
>>>
>>> I think my next step is to write a sample swift code to check if we
>>> can let swift grab input files from S3, execute it, and then write
>>> the output files back to S3.
>>>
>>>
>>> Since each Amazon virtual node only has limited storage space( Small
>>> instance for 1.7GB, Large Instance for 7.5GB),
>> Are you sure? That sounds like like the amount of RAM each instance
>> gets. The last time I used EC2 (more than a year ago), each instance
>> had a disk of 100GB+ each, which could be treated as a scratch disk
>> that would be lost when the instance was powered down.
>>> we may need to use EBS(Elastic Block Store) to storage temp files
>>> created by swift. The EBS behaved like a hard disk volume and can be
>>> mounted by any virtual nodes. But here arise a problem, since a
>>> volume can only be attached to one instance at a time[1], so the
>>> files store in EBS mounted by one node won't be shared by any other
>>> nodes;
>> The EBS sounds like the same thing as the local disk, except that it
>> is persisted in S3, and can be recovered when the instance is started
>> again later.
>>
>> You can use these local disks, or EBS disks, to create a
>> shared/parallel file system, or manage them yourself.
>>> we lost the file sharing ability which I think is a fundamental
>>> requirement for swift.
>> The last time I worked with the Workspace Service (now part of
>> Nimbus), we were able to use NFS to create a shared file system across
>> our virtual cluster. This allowed us to run Swift without
>> modifications on our virtual cluster. Some of the more recent work on
>> Swift might have eased up some of the requirements for a shared file
>> system, so you might be able to run Swift without a shared file
>> system, if you configure Swift just right. Mike, is this true, or we
>> aren't quite there yet?
>>
>> Ioan
>>>
>>>
>>> -Yi
>>>
>>>
>>>
>>>
>>>
>>> Michael Wilde wrote:
>>>>
>>>>
>>>> On 5/7/09 12:54 PM, Tim Freeman wrote:
>>>>> On Thu, 07 May 2009 11:39:40 -0500
>>>>> Yi Zhu <yizhu at cs.uchicago.edu> wrote:
>>>>>
>>>>>> Michael Wilde wrote:
>>>>>>> Very good!
>>>>>>>
>>>>>>> Now, what kind of tests can you do next?
>>>>>> Next, I will try to let swift running on Amazon EC2.
>>>>>>
>>>>>>> Can you exercise the cluster with an interesting workflow?
>>>>>> Yes, Is there any complex sample/tools i can use (rahter than
>>>>>> first.swift) to test swift performance? Is there any benchmark
>>>>>> available i can compare with?
>>>>>>
>>>>>>> How large of a cluster can you assemble in a Nimbus workspace ?
>>>>>> Since the vm-image i use to test 'swift' is based on NFS shared
>>>>>> file system, the performance may not be satisfiable if the we have
>>>>>> a large scale of cluster. After I got the swift running on Amazon
>>>>>> EC2, I will try to make a dedicate vm-image by using GPFS or any
>>>>>> other shared file system you recommended.
>>>>>>
>>>>>>
>>>>>>> Can you aggregate VM's from a few different physical clusters
>>>>>>> into one Nimbus workspace?
>>>>>> I don't think so. Tim may make commit on it.
>>>>>
>>>>> There is some work going on right now making auto-configuration
>>>>> easier to do
>>>>> over multiple clusters (that is possible now, it's just very
>>>>> 'manual' and
>>>>> non-ideal unlike with one physical cluster). You wouldn't really
>>>>> want to do NFS
>>>>> across a WAN, though.
>>>>
>>>> Indeed. Now that I think this through more clearly, one workspace ==
>>>> one cluster == one Swift "site", so we could aggregate the resources
>>>> of multiple workspaces through Swift to execute a multi-site workflow.
>>>>
>>>> - Mike
>>>>
>>>>>>
>>>>>>> What's the largest cluster you can assemble with Nimbus?
>>>>>> I am not quite sure,I will do some test onto it soon. since it is
>>>>>> a EC2-like cloud, it should easily be configured as a cluster with
>>>>>> hundreds of nodes. Tim may make commit on it.
>>>>>
>>>>> I've heard of EC2 deployments in the 1000s at once, it's up to your
>>>>> EC2 account
>>>>> limitations (they seem pretty efficient with making your quota
>>>>> ever-higher).
>>>>> Nimbus installation at Teraport maxes out at 16, there are other
>>>>> 'science
>>>>> clouds' but I don't know their node numbers. EC2 is the place
>>>>> where you will
>>>>> be able to really test scaling something.
>>>>>
>>>>> Tim
>>>>
>>>
>>>
>>
>
More information about the Swift-devel
mailing list