[Swift-devel] Iterative PageRank in Swift

ZHAO ZHANG zhaozhang at uchicago.edu
Sun Jun 2 15:41:14 CDT 2013


Hi Mike,

On Jun 2, 2013, at 1:29 PM, Michael Wilde wrote:

> I should clarify, that by "Is the partition() app coded to do this, including making the parent directories ./_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array/ ?" I meant:
> 
> Is the partition() app coded to *return these exact 16 files*, including making the parent directories ./_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array/ ?

Ah, I see what you mean here. In my partition app definition, 
app (file t[]) partition (file f) {
    partition @filename(f) "16";
}

I did not put output files as parameter. I think that is why it does not work right.
The command executes as:
bin/partition.py input_file 16 

and it produces 16 output files with the naming convention of input_file-0, input-file-1, ..., input-file-15

Now I think of one way to correct this.

I will be offline for the next couple of hours. I will let you know my progress once I figured it out. You don't have to try anything for now, as I think I know there the problem is.

best
Zhao

> 
> I.e, partition() needs to look at its first command line arg $1 and create $2 (in this case, 16) filenames just like $1, including directories, relative to the current working dir $PWD.
> 
> Is it coded to do that?
> 
> - Mike
> 
> ----- Original Message -----
>> From: "Michael Wilde" <wilde at mcs.anl.gov>
>> To: "ZHAO ZHANG" <zhaozhang at uchicago.edu>
>> Cc: "swift-devel" <swift-devel at ci.uchicago.edu>
>> Sent: Sunday, June 2, 2013 3:21:24 PM
>> Subject: Re: [Swift-devel] Iterative PageRank in Swift
>> 
>> Zhao,
>> 
>> The immediate failure in the run below seems to be due to partition()
>> not creating the following 16 files below its *work* directory:
>> 
>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-0
>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-1
>> ...
>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-15
>> 
>> Im assuming mapper.sh gets the name of an element of fn[], which has
>> been mapped by the concurrent mapper, and returns an array mapping
>> of the original name suffixed by -0 through -16?
>> 
>> Is that the expected behavior?
>> 
>> Is the partition() app coded to do this, including making the parent
>> directories
>> ./_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array/ ?
>> 
>> I cant yet explain why/how this worked when you serialized the main
>> (open code) function, but there's enough going on in this code that
>> I'd carefully check the behavior of each stage.
>> 
>> - Mike
>> 
>> ----- Original Message -----
>>> From: "Michael Wilde" <wilde at mcs.anl.gov>
>>> To: "ZHAO ZHANG" <zhaozhang at uchicago.edu>
>>> Cc: "swift-devel" <swift-devel at ci.uchicago.edu>
>>> Sent: Sunday, June 2, 2013 2:48:58 PM
>>> Subject: Re: [Swift-devel] Iterative PageRank in Swift
>>> 
>>> Zhao, Im studying this.  Can you post a copy of mapper.sh?
>>> 
>>> Can you put a copy on a local machine here (both failing and
>>> working
>>> version) that I can experiment with?
>>> 
>>> Thanks,
>>> 
>>> - Mike
>>> 
>>> 
>>> ----- Original Message -----
>>>> From: "ZHAO ZHANG" <zhaozhang at uchicago.edu>
>>>> To: "Swift Devel" <swift-devel at ci.uchicago.edu>
>>>> Sent: Sunday, June 2, 2013 2:12:30 PM
>>>> Subject: [Swift-devel] Iterative PageRank in Swift
>>>> 
>>>> Dear all,
>>>> 
>>>> I have been working with my cousin on an iterative PageRank
>>>> implementation with Swift for his graduation project. We now
>>>> encounter an problem: we try to use "file fn[]" as intermediate
>>>> data
>>>> between two stages, however, it does not work well.
>>>> 
>>>> The app and stage definition looks like below:
>>>> zhaozhang at bigben:/var/tmp/workplace$ cat PageRank-new.swift
>>>> type file;
>>>> 
>>>> app (file t) distribution (file f, file s) {
>>>>    distribution @filename(f) @filename(t) @filename(s);
>>>> }
>>>> 
>>>> app (file t[]) partition (file f) {
>>>>    partition @filename(f) "16";
>>>> }
>>>> 
>>>> app (file t) aggregation (file f[]){
>>>>    aggregation @filename(t) @filenames(f);
>>>> }
>>>> 
>>>> app (file t) cat (file f[]){
>>>>    cat @filenames(f) stdout=@filename(t);
>>>> }
>>>> 
>>>> app (file t) sort (file f){
>>>>    sort "-nrk 2" @filename(f) stdout=@filename(t);
>>>> }
>>>> 
>>>> (file fn[])map(file input[], file score){
>>>>   foreach f,i in input {
>>>>      file c<regexp_mapper;
>>>>         source=@f,
>>>>         match="input/(.*)",
>>>>         transform="temp/\\1">;
>>>>      c = distribution(f, score);
>>>>      fn[i] = c;
>>>>   }
>>>> }
>>>> 
>>>> (file matrix[][])shuffle(file fn[]){
>>>>   foreach c, j in fn{
>>>> 	file output[] <ext; exec="bin/mapper.sh", source=@filename(c),
>>>> 	scale=16>;
>>>> 	output = partition(c);
>>>> 	foreach f, k in output{
>>>> 		matrix[k][j] = output[k];
>>>> 	}
>>>>   }
>>>> }
>>>> 
>>>> (file final)reduce(file matrix[][]){
>>>>   file result[];
>>>>   foreach fl, k in matrix{
>>>>      file output <single_file_mapper;
>>>>      file=@strcat("result/result-",
>>>>      @toString(k))>;
>>>>      output = aggregation(fl);
>>>>      result[k] = output;
>>>>   }
>>>> 
>>>>   final = cat(result);
>>>> }
>>>> 
>>>> 
>>>> If I write the main function as below, it does not work: it seems
>>>> the
>>>> intermediate files are not mapped to the expected file names.
>>>> 
>>>> //below are main function
>>>> file input[] <filesys_mapper; location="input", prefix="links-">;
>>>> file matrix[][];
>>>> file fn[];
>>>> 
>>>> int loop=0;
>>>> file score <single_file_mapper; file=@strcat("score.txt.",
>>>> @toString(loop))>;
>>>> file final <single_file_mapper;file=@strcat("score.txt.",
>>>> @toString(loop+1))>;
>>>> file sorted <single_file_mapper;file=@strcat("score.txt.",
>>>> @toString(loop+1), ".sorted")>;
>>>> 
>>>> fn = map(input, score);
>>>> matrix = shuffle(fn);
>>>> final = reduce(matrix);
>>>> sorted = sort(final);
>>>> 
>>>> The execution failed with the following message:
>>>> Swift 0.94 swift-r6492 cog-r3658
>>>> 
>>>> RunID: 20130602-1348-yresjj56
>>>> Progress:  time: Sun, 02 Jun 2013 13:48:49 -0500
>>>> Progress:  time: Sun, 02 Jun 2013 13:48:51 -0500  Selecting
>>>> site:3
>>>> Checking status:1
>>>> Progress:  time: Sun, 02 Jun 2013 13:48:52 -0500  Selecting
>>>> site:3
>>>> Checking status:1  Finished successfully:2
>>>> Progress:  time: Sun, 02 Jun 2013 13:48:53 -0500  Selecting
>>>> site:3
>>>> Checking status:1  Finished successfully:4
>>>> Execution failed:
>>>> 	Exception in partition:
>>>>    Arguments: [temp/links-part-0001, 16]
>>>>    Host: localhost
>>>>    Directory:
>>>>    PageRank-new-20130602-1348-yresjj56/jobs/i/partition-ishh5dal
>>>>    stderr.txt:
>>>>    stdout.txt:
>>>> Caused by:
>>>> 	The following output files were not created by the application:
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-0,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-1,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-2,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-3,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-4,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-5,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-6,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-7,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-8,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-9,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-10,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-11,
>>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-12,
>>>> 	_concurrent/fn-139240b8-8
>>>> 1cc-4b22-8088-aa5aedd98afe--array//elt-3-13,
>>>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-14,
>>>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-15
>>>> 	partition, PageRank-new.swift, line 38
>>>> 	shuffle, PageRank-new.swift, line 88
>>>> 
>>>> 
>>>> 
>>>> However, if I put the stages in Iterate control struct: it works.
>>>> 
>>>> //below are main function
>>>> file input[] <filesys_mapper; location="input", prefix="links-">;
>>>> file matrix[][];
>>>> file fn[];
>>>> 
>>>> /*iterate loop{
>>>>   iterate i{
>>>>      if (i==0){
>>>>         file score <single_file_mapper;
>>>>         file=@strcat("score.txt.",
>>>>         @toString(loop))>;
>>>>         fn = map(input, score);
>>>>      }
>>>>      if(i==1){
>>>>         matrix = shuffle(fn);
>>>>      }
>>>>      if(i==2){
>>>>         file final
>>>>         <single_file_mapper;file=@strcat("score.txt.",
>>>>         @toString(loop+1))>;
>>>>         final = reduce(matrix);
>>>>         file sorted
>>>>         <single_file_mapper;file=@strcat("score.txt.",
>>>>         @toString(loop+1), ".sorted")>;
>>>>         sorted = sort(final);
>>>>      }
>>>>   }until(i==3);
>>>> }until(loop==1);*/
>>>> 
>>>> 
>>>> I also checked SwiftMontage implementation, it was also written
>>>> in
>>>> this way, so I assumed the first draft should work some time ago.
>>>> Is
>>>> this a already known problem?
>>>> 
>>>> Best
>>>> Zhao
>>>> 
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 




More information about the Swift-devel mailing list