i've bellow code trying take html
elements 1 1 serially include tag self without styles
, classes
. plus, i'm failing images
$client = new client(); $crawler = $client->request('get', 'http://www.tutorialspoint.com/laravel/laravel_ajax.htm'); $crawler->filter('h1, h2, h3, h4, h5, h6, p, pre, p > img, div > img, p > a')->each(function(crawler $node, $i){ if ($node->filter('p')){ echo $node->text()."<br/>"; } else if ($node->filter('pre')) { echo '<code>'.$node->html().'</code><br/>'; } });
but whatever do, i'm either getting texts when use $node->text()
or html in page when use $node->html()
in page.
i'm trying example p
- <p>text here</p>
. img
- <img src="default.jp"/>
.
the line $node->filter('p')
return true, since returned value of function filter
crawler
object, second else if
never called.
if want check if crawler has nodes in can use count()
function.
as code - i'm not sure why doing, code check if current element has <p>
child element (is trying do?), , if has - print content of parent's node text.
in order nodes domelement crawler ($node
) can use
$node->getnode(0)`
and using node can check nodename
(==tag name), textcontent
(the content of tag), etc.
here example can use:
$crawler = $client->request('get', 'http://www.tutorialspoint.com/laravel/laravel_ajax.htm'); $crawler->filter('h1, h2, h3, h4, h5, h6, p, pre, p > img, div > img, p > a')->each(function(crawler $node, $i){ if (in_array($node->getnode(0)->nodename, ['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'a'])) { echo "{$node->getnode(0)->nodename} => {$node->getnode(0)->textcontent}.<br/>\n"; } elseif ($node->getnode(0)->nodename == 'pre') { echo "pre => <code>".$node->html()."</code><br/>\n"; } elseif ($node->getnode(0)->nodename == 'img') { echo 'img => src="'.$node->getnode(0)->getattribute('src')."\" <br/>\n"; } });
Comments
Post a Comment