Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop while( my $rs = scroller->next ) #99

Open
lbe opened this issue Apr 8, 2018 · 4 comments
Open

Infinite loop while( my $rs = scroller->next ) #99

lbe opened this issue Apr 8, 2018 · 4 comments

Comments

@lbe
Copy link

lbe commented Apr 8, 2018

While researching a curious question this afternoon regarding Devel-* distributions, I came across what I believe to be a bug when trying to use the next iterator on a result set from a release query. If I use the caching as shown in the MetaCPAN::Client Summary with a while state as contained in the author_release.pl example, my loop never terminates. If I run the same code but without using the cache, it completes fine. My guess is that the bug is not in the MetaCPAN::Client code, but I don't have the time, or possibly skill :), to try to get to the root cause at this time.

Please see the following code that can be used to replicate the problem. I have run this code on both my Mac and a Debbie Jessie box with the same results. Details also provided below.

Thanks,

lbe

Run the program below to with a '-c' command line parameter to used a UA with caching.

Run without cache

perl mcpan-client-test.pl
Use Cache = No
Releases found: 2870
Parsing: 2745 : 2870 Devel-IPerl-Plugin-Chart-Plotly
Unique distribution versions found: 2745

This code ran to successful completion. It found 2745 unique distribution versions from the 2870 releases returned from the server

Run with cache

rm -fR /tmp/metacpan-cache     #delete existing cache
perl mcpan-client-test.pl -c
Use Cache = Yes
Releases found: 2870
^Crsing: 1926 : 10049 Devel-SawAmpersand

I had to ^C to cancel the loop. The 1926 is the number of unique distribution versions stored in %rs where as the 10049 is the number of loop iterations which is much larger than the 2745 in the no-cache case.

When I initially discovered the behavior, I thought the problem was due to the iterator not terminating properly. If this were the case, then I would expect the count of unique distribution versions to max at 2,745 as show above.

I believe that the server returns a max of 1000 rows at a time. If so, it appears that there is a problem beginning with the 3rd pull from the server.

EDIT: I changed the search options from 'Devel-*' to 'Data-*'. In this case, the server returned 5451 records with 5358 unique distribution versions when run with caching. When run with caching, the unique distribution stopped at 1980 and I had to ^C to cancel the program after the number of iterations was 6018. It appears that the problem is indeed related to the 3rd pull from the server

**EDIT(2): I added scroller_size => 100 to the options has of the release method the unique distribution versions for the caching only reached 198. The issue is definitely related to the 3rd update from the server.

mcpan-client-test.pl

#!/usr/bin/env perl

use strict;
use warnings;
use utf8;

use CHI;
use Data::Printer;
use Getopt::Std;
use HTTP::Tiny::Mech;
use MetaCPAN::Client;
use WWW::Mechanize::Cached;

my %opts;
getopts('c', \%opts);

my $mcpan;

print "Use Cache = ", $opts{c} ? "Yes\n" : "No\n";
if ( $opts{c} ) {
	$mcpan = MetaCPAN::Client->new(
		ua => HTTP::Tiny::Mech->new(
			mechua => WWW::Mechanize::Cached->new(
				cache => CHI->new(
					driver   => 'File',
					root_dir => '/tmp/metacpan-cache',
				),
			),
		),
	);
}
else {
	$mcpan = MetaCPAN::Client->new;
}

my $r = $mcpan->release(
	{ distribution => 'Devel-*' },
    { fields => [ 'distribution', 'date', 'version', 'version_numified' ] }
);

print STDERR "Releases found: ", $r->total, "\n";

my $rs = {};
my $i = 0;
my $j;
while ( my $rn = $r->next ) {
	$i++;
	$j = 0;
	foreach my $d ( keys %{ $rs } ) {
		foreach my $v ( keys %{ $rs->{$d} } ) {
			$j++;
		}
	}
	my $s = "Parsing: $j : $i " . $rn->distribution;
	$s .= ' ' x (80 - length($s));
	print STDERR "$s\r";
	$rs->{ $rn->distribution }->{ $rn->version_numified } = { version => $rn->version, date => $rn->date };
}
print STDERR "\n";

print "Unique distribution versions found: $j\n";

Machine Configuration Information:

Mac

perl -v
This is perl 5, version 22, subversion 3 (v5.22.3) built for darwin-thread-multi-2level
...

cpanm CHI HTTP::Tiny::Mech MetaCPAN::Client WWW::Mechanize::Cached
CHI is up to date. (0.60)
HTTP::Tiny::Mech is up to date. (1.001002)
MetaCPAN::Client is up to date. (2.023000)
WWW::Mechanize::Cached is up to date. (1.51)

Debian Jessie

perl -v

This is perl 5, version 24, subversion 1 (v5.24.1) built for x86_64-linux-thread-multi

cpanm CHI HTTP::Tiny::Mech MetaCPAN::Client WWW::Mechanize::Cached
CHI is up to date. (0.60)
HTTP::Tiny::Mech is up to date. (1.001002)
MetaCPAN::Client is up to date. (2.023000)
WWW::Mechanize::Cached is up to date. (1.51)
@oalders
Copy link
Member

oalders commented Apr 9, 2018

Could you try adding use LWP::ConsoleLogger::Everywhere; to the top of your script? I wonder if this could help debug where the loop is happening.

@lbe
Copy link
Author

lbe commented Apr 16, 2018

Sorry for the delayed response, its been a busy week. I added the module as shown below.

#!/usr/bin/env perl

use strict;
use warnings;
use utf8;

use LWP::ConsoleLogger::Everywhere;
use CHI;
...

On Debian Jessie, I received two surprises:

  1. I received no output whatsoever from the logger
  2. The cached mode works fine now.

The only change made to this host was the installation of the logger and the supporting modules.

On my Mac, I observed the following:

  1. When running without the cache, I received not logging output
  2. When running with the cache, the logger produced output for the first two updates and then nothing when the third update should have happened.
  3. My additional print code displayed a pattern leading me to believe that a third update from the scroller was never posted. See the annotated output below:

The first column of numbers is a unique count for distribution name and version.
The second column of numbers is a running counter of rows process when calling ->next

Parsing: 1481 : 1537 Devel-Debug-Server <- this and the next 2 rows were from scroller
Parsing: 1482 : 1538 Devel-Debug-Server
Parsing: 1483 : 1539 Devel-Debug-Server
Parsing: 1922 : 2537 Devel-Debug-Server <- these three rows are same as above offset by 1000
Parsing: 1922 : 2538 Devel-Debug-Server
Parsing: 1922 : 2539 Devel-Debug-Server
Parsing: 1922 : 3537 Devel-Debug-Server <- these three rows are same as above offset by 2000
Parsing: 1922 : 3538 Devel-Debug-Server
Parsing: 1922 : 3539 Devel-Debug-Server

This pattern continues until I kill the program.

I have uploaded all of the response to the following gist: https://gist.github.com/lbe/42d647422b45fef1a638326c6618636a

The only value of the content in gist, other than what I have identified above, is to show that the UA does not make a network call after the second call to ->next.

I have a few more ideas to help localize the bug. I'll let you know what I find.

Thanks,

lbe

@lbe
Copy link
Author

lbe commented Apr 18, 2018

None of the things that I thought would help me localize the error did much to help.

Sorry

lbe

@mickeyn
Copy link
Contributor

mickeyn commented Apr 19, 2018

I'm not convinced this is a client issue, I would try to reduce it to direct ES scrolled searches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants