PHP Email Crawler: Crawl Web site pages to extract email addresses

Recommend this page to a friend!

Download

Info

Example

Files

Install with Composer

Download

Reputation

Support forum

Blog

Links

Ratings				Unique User Downloads		Download Rankings
Not enough user ratings				Total: 82		All time: 10,085 This week: 49

Version		License		PHP version		Categories
`email-crawl` 1.0.0		GNU General Publi...		5		Email, PHP 5, Searching, Web services, C...

Description

Author

Ujah Chigozie peter

This package can crawl Web site pages to extract email addresses.

It can take the URL of a given site and retrieve the page contents.

The package can parse the page to extract any email addresses that it contains and links to other pages.

Then it may crawl other linked pages recursively to extract different email addresses also contained in the pages.

The count of crawled pages can be limited to a given number.

The email addresses found using this package will be returned in an array.

A report of the crawl process may be outputted to the console terminal or saved to a file.

Ujah Chigozie peter

Performance

Level

Name:	Ujah Chigozie peter `<contact>`
Classes:	30 packages by Ujah Chigozie peter
Country:	Nigeria
Age:	33
All time rank:	1973	10 in Nigeria
Week rank:	19	2 in Nigeria

Level 4

Innovation award

Nominee: 11x

Example


<?php 

error_reporting(E_ALL);

ini_set('display_errors', '1');

require __DIR__ . '/plugins/autoload.php';

use Peterujah\NanoBlock\EmailCrawl;

$target = "https://default.com/contact";

$limit = 50;

if(!empty($argv[1])){

    if(filter_var($argv[1], FILTER_VALIDATE_URL)){

        $target = $argv[1];

        $limit = $argv[2]??50;

    }else{

        $req = unserialize(base64_decode($argv[1]));

        $target = $req["target"];

        $limit = $req["max"]??50;

    }

}

$craw = new EmailCrawl($target, $limit);

$resInstance = $craw->craw()->getResponse();

$data = $resInstance->inLine();

$resInstance->printCommandResult($data)->saveAs(__DIR__ . "/craw/", $data);

Details

email-crawl

PHP Email Web Crawler, is a simple and easy to use class that uses curl & command line interface to extract email address from websites. It also has the feature to deep extract email from website link which is found from the initial target website.

Installation

Installation is super-easy via Composer:

composer require peterujah/email-crawl

Basic Usage

Initalize email crawl instance

$craw = new EmailCrawl("https://example.com", 200);

Star email crawling scan

$craw->craw()

Get scanned response and return CrawlResponse instance

$response = $craw->getResponse();

Get response emails separate in a new line

$data = $response->inLine();

Get response emails separate with a comma

$data = $response->withComma();

Get response emails as an array

$data = $response->asArray();

Print response email

$response->printCommandResult($data);

Save response emails to file. This will save result as json string

$response->save("/path/save/craw/");

Save response emails to file. If string data is passed it will save it, els it will save result as json string

$response->saveAs("/path/save/craw/", $data);

Example

Create a file name it craw.php, inside the file add this example code. With this example you can run your craw directly from command line, browser or php shell_exec.

error_reporting(E_ALL);
ini_set('display_errors', '1');
require __DIR__ . '/plugins/autoload.php';
use Peterujah\NanoBlock\EmailCrawl;
$target = "https://example.com/contact";
$limit = 50;
if(!empty($argv[1])){
    if(filter_var($argv[1], FILTER_VALIDATE_URL)){
        $target = $argv[1];
        $limit = $argv[2]??50;
    }else{
        $req = unserialize(base64_decode($argv[1]));
        $target = $req["target"];
        $limit = $req["max"]??50;
    }
}
$craw = new EmailCrawl($target, $limit);
$response = $craw->craw()->getResponse();
$data = $response->inLine();
$response->printCommandResult($data)->saveAs(__DIR__ . "/craw/", $data);

Execute craw through command line interface, run the below command

php craw.php https://google.com 50

Execute craw through php shell_exec, create a file call exec.php and add below example script. Note: change PHP_SHELL_EXECUTION_PATH to your php executable path. Once done navigate to https://mycraw.example.com/exec.php

define("PHP_SHELL_EXECUTION_PATH", "path/to/php");
$crawOptions = array(
    'target' => 'https://example.com',
    'max' => 50,
);
$crawRequest = base64_encode(serialize($crawOptions));
$crawScript =  __DIR__ . "/craw.php";
$crawLogs =  __DIR__ . "/craw_logs.log";
shell_exec(PHP_SHELL_EXECUTION_PATH . " " . $crawScript . " " . $crawRequest ." 'alert' >> " . $crawLogs . " 2>&1");

ATTENTION

Is advisable to run this code in command line interface for be better performance.

Files (6)

File	Role	Description
`src` (2 files)
`composer.json`	Data	Auxiliary data
`craw.php`	Example	Example script
`exec.php`	Aux.	Auxiliary script
`README.md`	Doc.	Documentation

Files (6)

src

File	Role	Description
`CrawlResponse.php`	Class	Class source
`EmailCrawl.php`	Class	Class source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.

Install with Composer

	email-crawl-2022-05-05.zip 5KB
	email-crawl-2022-05-05.tar.gz 4KB
	Install with Composer

Version Control

Unique User Downloads

Download Rankings

100%

Total:	82
This week:	0

All time:	10,085
This week:	49

Applications that use this package

No pages of applications that use this class were specified.

If you know an application of this package, send a message to the author to add a link here.

About us

Advertise on this site

For more information send a message to info at phpclasses dot org.