Skip to content

Commit

Permalink
Merge pull request #46 from luyadev/purging
Browse files Browse the repository at this point in the history
purging
  • Loading branch information
nadar authored Apr 28, 2022
2 parents 1e8a6cf + 2d95d74 commit 9bbe03b
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 1 deletion.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
All notable changes to this project will be documented in this file. This project adheres to [Semantic Versioning](http://semver.org/).
In order to read more about upgrading and BC breaks have a look at the [UPGRADE Document](UPGRADE.md).

## 3.5.0 (28. April 2022)

+ [#46](https://github.com/luyadev/luya-module-crawler/pull/46) Prevent the crawler from purge the full index when the builder index is empty. This can be disabled with the new option `--purging=1`.

## 3.4.1 (28. April 2022)

+ [#45](https://github.com/luyadev/luya-module-crawler/pull/45) Use transaction to sync index table when crawler finish the process.
Expand Down
8 changes: 7 additions & 1 deletion src/crawler/ResultHandler.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
use Nadar\Crawler\Interfaces\HandlerInterface;
use Nadar\Crawler\Result;
use Yii;
use yii\console\Exception;
use yii\helpers\Console;

/**
Expand Down Expand Up @@ -89,8 +90,13 @@ public function onEnd(Crawler $crawler)
$transaction = Yii::$app->db->beginTransaction();
try {
$keepIndexIds = [];

$currentTotal = (int) Index::find()->count();
$total = (int) Builderindex::find()->count();

if (!$this->controller->purging && ($currentTotal > 0 && $total == 0)) {
throw new Exception("The old index contained {$currentTotal} while the new index is empty. Possible misconfiguration or error while crawling the website. The force an empty index us --purging=1");
}

$i = 0;
if ($this->controller->verbose) {
Console::startProgress(0, $total, 'synchronize index: ', false);
Expand Down
10 changes: 10 additions & 0 deletions src/frontend/commands/CrawlController.php
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,15 @@ class CrawlController extends \luya\console\Command
*/
public $concurrent = 15;

/**
* @var boolean If enabled, the crawler can fully purge the index. This is by default disabled to prevent the issue that when the crawler
* starts to crawler but the target host is not returning content (maybe due to the fact its down or there is a firewall issue) the crawler
* will finish with 0 builder index entries and override a fully available index with an empty index. Therefore this ensures that: if builder index is empty and
* the index is more then 0, an exception is thrown. `if ($builderIndexCount == 0 && $indexCount > 0) { Exception }`
* @since 3.5.0
*/
public $purging = false;

/**
* {@inheritDoc}
*/
Expand All @@ -64,6 +73,7 @@ public function options($actionID)
$options[] = 'linkcheck';
$options[] = 'pdfs';
$options[] = 'concurrent';
$options[] = 'purging';
return $options;
}

Expand Down

0 comments on commit 9bbe03b

Please sign in to comment.