PHP Classes

PHP Scraping HTML Tag Elements: Retrieve HTML pages and scrape tag elements

Recommend this page to a friend!
  Info   View files Example   Demos   View files View files (6)   DownloadInstall with Composer Download .zip   Reputation   Support forum (2)   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStar 59%Total: 332 This week: 1All time: 7,126 This week: 571Up
Version License PHP version Categories
scrap-html-elements 1.0.2The PHP License5HTML, PHP 5, Parsers
Collaborate with this project 

Author

HtmlScrapingRequest - github.com

Description

This package can retrieve HTML pages and scrape tag elements.

There is one class that can send HTTP requests to a given server and retrieve a given HTML page.

Another class can find tag elements with a given selector expression and retrieves the elements and the contained data.

Picture of Truong Van Phu
  Performance   Level  
Name: Truong Van Phu is available for providing paid consulting. Contact Truong Van Phu .
Classes: 3 packages by
Country: Viet Nam Viet Nam
Age: 34
All time rank: 271412 in Viet Nam Viet Nam
Week rank: 420 Up6 in Viet Nam Viet Nam Up
Innovation award
Innovation award
Nominee: 1x

Winner: 1x

Example

<?php
if ($_SERVER['REQUEST_METHOD'] == 'POST') {
   
scraping();
   
}
function
scraping()
{
    require_once
'class.HttpRequest.php';
   
$http = new Httprequest();
    if (isset(
$_POST['domain'])) {
       
$http->setServer($_POST['domain']);
    }
   
$html = $http->send();
    if (isset(
$html->error)) {
        echo
json_encode($html); exit();
    }
   
$result = $html->contents;
    if (isset(
$_POST['tag-element'])) {
       
$response = array();
       
$checkData = $result->find($_POST['tag-element']);
        if (
count($checkData)) {
            foreach (
$checkData as $key => $check) {
               
array_push($response, $check->outertext());
            }
        }
        echo
json_encode($response); exit();
    }
}

?>
<!DOCTYPE html>
<html lang="">
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <title>Title Page</title>

        <!-- Bootstrap CSS -->
        <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css" integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7" crossorigin="anonymous">

        <!-- HTML5 Shim and Respond.js IE8 support of HTML5 elements and media queries -->
        <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->
        <!--[if lt IE 9]>
            <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.2/html5shiv.min.js"></script>
            <script src="https://oss.maxcdn.com/libs/respond.js/1.4.2/respond.min.js"></script>
        <![endif]-->
    </head>
    <body>
        <div class="container-fluid">
            <div class="page-header">
              <h1>Scraping html tag elements<small> v1.0</small></h1>
            </div>
            <div class="container-fluid">
                <div class="col-xs-6 col-sm-6 col-md-6 col-lg-6">
                    <div class="panel panel-info">
                        <div class="panel-heading">
                            <h3 class="panel-title">Scraping options</h3>
                        </div>
                        <div class="panel-body">
                            <div class="row">
                                <div class="col-xs-12 col-sm-12 col-md-12 col-lg-12">
                                    <form action="" method="POST" class="form" role="form">
                                        <div class="form-group">
                                            <label class="" for="">URL scraping: <span> http://example.com</span></label>
                                            <input type="text" name="domain" id="inputURL" class="form-control" value="" required="required" title="URL Scrap target" placeholder="Enter a domain">
                                        </div>
                                        <div class="form-group">
                                            <label class="" for="">Tag finds: <span>(#elements, .classes)</span></label>
                                            <input type="text" name="tag-element" id="inputTagElement" class="form-control" value="" required="required" title="Target finds" placeholder="Enter a html tag elements">
                                        </div>
                                        <div class="form-group">
                                            <label class="" for="">Display type</label>
                                            <div class="radio">
                                                <label>
                                                    <input type="radio" name="display-html" value="0">
                                                    Html elements
                                                </label>
                                            </div>
                                            <div class="radio">
                                                <label>
                                                    <input type="radio" name="display-html" value="1" checked="checked">
                                                    Html string
                                                </label>
                                            </div>
                                        </div>
                                        <div class="form-group">
                                            <div class="row">
                                                <div class="col-sm-12">
                                                    <button type="submit" class="btn btn-primary pull-right">Submit</button>
                                                </div>
                                            </div>
                                        </div>
                                    </form>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
                <div class="col-xs-6 col-sm-6 col-md-6 col-lg-6">
                    <div class="panel panel-info">
                        <div class="panel-heading">
                            <h3 class="panel-title">Respond Scraping</h3>
                        </div>
                        <div class="panel-body" style="text-align: center;overflow: scroll;">
                            Display result Html here
                            <div id="result" style="text-align: center;">
                            </div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
        <!-- jQuery -->
        <script src="//code.jquery.com/jquery.js"></script>
        <!-- Bootstrap JavaScript -->
        <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/js/bootstrap.min.js" integrity="sha384-0mSbJDEHialfmuBBQP6A4Qrprq5OVfW37PRR3j5ELqxss1yVqOtnepnHVP9aJ7xS" crossorigin="anonymous"></script>
        <!-- IE10 viewport hack for Surface/desktop Windows 8 bug -->
         <script src="js/script.js"></script>
    </body>
</html>


Details

HtmlScrapingRequest

  • Capture URL of the site your want to scrap.
  • Capture selector, eg : img.thumbnail, ...

How to use 1. Require class: require_once 'class.HttpRequest.php'; 2. Using: $http = new Httprequest(); 3. Using: $http->setServer($_POST['domain']);

`if (isset($_POST['domain'])) { //it is a URL of page that you want to scrap`

	$http->setServer($_POST['domain']);

`}`

`$html = $http->send();`

`if (isset($html->error)) {`

	echo json_encode($html); exit();

`}`

`$result = $html->contents;`

`if (isset($_POST['tag-element'])) { // it is a selector (img.thumbnail)`

	$response = array();

	$checkData = $result->find($_POST['tag-element']);

	if (count($checkData)) {

		foreach ($checkData as $key => $check) {

			array_push($response, $check->outertext());

		}

	}

	echo json_encode($response); exit();  // Get your result -> by JSON or any format response

`}`

Donations

I highly appreciate any of your donations.

paypal


  Demo Scraping HTML TagExternal page  
  Files folder image Files  
File Role Description
Files folder imageimg (1 file)
Files folder imagejs (1 file)
Plain text file class.HttpRequest.php Class Class source
Plain text file class.SimpleHtmlDom.php Class Class source
Plain text file index.php Example Example script
Plain text file README.md Doc. Documentation

  Files folder image Files  /  img  
File Role Description
  Image file default.gif Icon Icon image

  Files folder image Files  /  js  
File Role Description
  Plain text file script.js Data Auxiliary data

 Version Control Unique User Downloads Download Rankings  
 100%
Total:332
This week:1
All time:7,126
This week:571Up
User Ratings User Comments (1)
 All time
Utility:75%StarStarStarStar
Consistency:75%StarStarStarStar
Documentation:83%StarStarStarStarStar
Examples:66%StarStarStarStar
Tests:-
Videos:-
Overall:59%StarStarStar
Rank:1233
 
Vulnerable to many attacks.
6 years ago (Velimir Majstorov)
40%StarStarStar