[Swift-devel] Iterative PageRank in Swift

ZHAO ZHANG zhaozhang at uchicago.edu
Sun Jun 2 15:23:29 CDT 2013


Hi Mike,

Sorry for my late response. I am setting thins up on communicado, and will let you know once it is ready for you to test it.

zhao

On Jun 2, 2013, at 1:21 PM, Michael Wilde wrote:

> Zhao,
> 
> The immediate failure in the run below seems to be due to partition() not creating the following 16 files below its *work* directory:
> 
> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-0
> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-1
> ...
> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-15
> 
> Im assuming mapper.sh gets the name of an element of fn[], which has been mapped by the concurrent mapper, and returns an array mapping of the original name suffixed by -0 through -16?
> 
> Is that the expected behavior?
> 
> Is the partition() app coded to do this, including making the parent directories ./_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array/ ?
> 
> I cant yet explain why/how this worked when you serialized the main (open code) function, but there's enough going on in this code that I'd carefully check the behavior of each stage.
> 
> - Mike
> 
> ----- Original Message -----
>> From: "Michael Wilde" <wilde at mcs.anl.gov>
>> To: "ZHAO ZHANG" <zhaozhang at uchicago.edu>
>> Cc: "swift-devel" <swift-devel at ci.uchicago.edu>
>> Sent: Sunday, June 2, 2013 2:48:58 PM
>> Subject: Re: [Swift-devel] Iterative PageRank in Swift
>> 
>> Zhao, Im studying this.  Can you post a copy of mapper.sh?
>> 
>> Can you put a copy on a local machine here (both failing and working
>> version) that I can experiment with?
>> 
>> Thanks,
>> 
>> - Mike
>> 
>> 
>> ----- Original Message -----
>>> From: "ZHAO ZHANG" <zhaozhang at uchicago.edu>
>>> To: "Swift Devel" <swift-devel at ci.uchicago.edu>
>>> Sent: Sunday, June 2, 2013 2:12:30 PM
>>> Subject: [Swift-devel] Iterative PageRank in Swift
>>> 
>>> Dear all,
>>> 
>>> I have been working with my cousin on an iterative PageRank
>>> implementation with Swift for his graduation project. We now
>>> encounter an problem: we try to use "file fn[]" as intermediate
>>> data
>>> between two stages, however, it does not work well.
>>> 
>>> The app and stage definition looks like below:
>>> zhaozhang at bigben:/var/tmp/workplace$ cat PageRank-new.swift
>>> type file;
>>> 
>>> app (file t) distribution (file f, file s) {
>>>    distribution @filename(f) @filename(t) @filename(s);
>>> }
>>> 
>>> app (file t[]) partition (file f) {
>>>    partition @filename(f) "16";
>>> }
>>> 
>>> app (file t) aggregation (file f[]){
>>>    aggregation @filename(t) @filenames(f);
>>> }
>>> 
>>> app (file t) cat (file f[]){
>>>    cat @filenames(f) stdout=@filename(t);
>>> }
>>> 
>>> app (file t) sort (file f){
>>>    sort "-nrk 2" @filename(f) stdout=@filename(t);
>>> }
>>> 
>>> (file fn[])map(file input[], file score){
>>>   foreach f,i in input {
>>>      file c<regexp_mapper;
>>>         source=@f,
>>>         match="input/(.*)",
>>>         transform="temp/\\1">;
>>>      c = distribution(f, score);
>>>      fn[i] = c;
>>>   }
>>> }
>>> 
>>> (file matrix[][])shuffle(file fn[]){
>>>   foreach c, j in fn{
>>> 	file output[] <ext; exec="bin/mapper.sh", source=@filename(c),
>>> 	scale=16>;
>>> 	output = partition(c);
>>> 	foreach f, k in output{
>>> 		matrix[k][j] = output[k];
>>> 	}
>>>   }
>>> }
>>> 
>>> (file final)reduce(file matrix[][]){
>>>   file result[];
>>>   foreach fl, k in matrix{
>>>      file output <single_file_mapper;
>>>      file=@strcat("result/result-",
>>>      @toString(k))>;
>>>      output = aggregation(fl);
>>>      result[k] = output;
>>>   }
>>> 
>>>   final = cat(result);
>>> }
>>> 
>>> 
>>> If I write the main function as below, it does not work: it seems
>>> the
>>> intermediate files are not mapped to the expected file names.
>>> 
>>> //below are main function
>>> file input[] <filesys_mapper; location="input", prefix="links-">;
>>> file matrix[][];
>>> file fn[];
>>> 
>>> int loop=0;
>>> file score <single_file_mapper; file=@strcat("score.txt.",
>>> @toString(loop))>;
>>> file final <single_file_mapper;file=@strcat("score.txt.",
>>> @toString(loop+1))>;
>>> file sorted <single_file_mapper;file=@strcat("score.txt.",
>>> @toString(loop+1), ".sorted")>;
>>> 
>>> fn = map(input, score);
>>> matrix = shuffle(fn);
>>> final = reduce(matrix);
>>> sorted = sort(final);
>>> 
>>> The execution failed with the following message:
>>> Swift 0.94 swift-r6492 cog-r3658
>>> 
>>> RunID: 20130602-1348-yresjj56
>>> Progress:  time: Sun, 02 Jun 2013 13:48:49 -0500
>>> Progress:  time: Sun, 02 Jun 2013 13:48:51 -0500  Selecting site:3
>>> Checking status:1
>>> Progress:  time: Sun, 02 Jun 2013 13:48:52 -0500  Selecting site:3
>>> Checking status:1  Finished successfully:2
>>> Progress:  time: Sun, 02 Jun 2013 13:48:53 -0500  Selecting site:3
>>> Checking status:1  Finished successfully:4
>>> Execution failed:
>>> 	Exception in partition:
>>>    Arguments: [temp/links-part-0001, 16]
>>>    Host: localhost
>>>    Directory:
>>>    PageRank-new-20130602-1348-yresjj56/jobs/i/partition-ishh5dal
>>>    stderr.txt:
>>>    stdout.txt:
>>> Caused by:
>>> 	The following output files were not created by the application:
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-0,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-1,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-2,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-3,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-4,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-5,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-6,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-7,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-8,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-9,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-10,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-11,
>>> 	_concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-12,
>>> 	_concurrent/fn-139240b8-8
>>> 1cc-4b22-8088-aa5aedd98afe--array//elt-3-13,
>>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-14,
>>> _concurrent/fn-139240b8-81cc-4b22-8088-aa5aedd98afe--array//elt-3-15
>>> 	partition, PageRank-new.swift, line 38
>>> 	shuffle, PageRank-new.swift, line 88
>>> 
>>> 
>>> 
>>> However, if I put the stages in Iterate control struct: it works.
>>> 
>>> //below are main function
>>> file input[] <filesys_mapper; location="input", prefix="links-">;
>>> file matrix[][];
>>> file fn[];
>>> 
>>> /*iterate loop{
>>>   iterate i{
>>>      if (i==0){
>>>         file score <single_file_mapper; file=@strcat("score.txt.",
>>>         @toString(loop))>;
>>>         fn = map(input, score);
>>>      }
>>>      if(i==1){
>>>         matrix = shuffle(fn);
>>>      }
>>>      if(i==2){
>>>         file final <single_file_mapper;file=@strcat("score.txt.",
>>>         @toString(loop+1))>;
>>>         final = reduce(matrix);
>>>         file sorted <single_file_mapper;file=@strcat("score.txt.",
>>>         @toString(loop+1), ".sorted")>;
>>>         sorted = sort(final);
>>>      }
>>>   }until(i==3);
>>> }until(loop==1);*/
>>> 
>>> 
>>> I also checked SwiftMontage implementation, it was also written in
>>> this way, so I assumed the first draft should work some time ago.
>>> Is
>>> this a already known problem?
>>> 
>>> Best
>>> Zhao
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 




More information about the Swift-devel mailing list