Computer Hope

Internet & Networking => Web design => Topic started by: Zylstra on July 17, 2006, 12:18:02 AM

Title: Web Crawler
Post by: Zylstra on July 17, 2006, 12:18:02 AM
I love to experiment with web stuff. My new goal (sort of) is to make a web crawler that will go through a bunch of pages, find URL's, and link them.

I would like to make a web crawler that can get a large amount of URL's, and save them in a simple text file.

How would I do this?
(Not looking for anything fancy, I've got limited bandwidth, you know. )
Title: Re: Web Crawler
Post by: Rob Pomeroy on July 17, 2006, 07:52:20 AM
I can't remember where you're up to - have you got PHP under your belt yet?

First thing to bear in mind: you are considering using an automated process to retrieve information from web sites.  Some webmasters would consider that abuse of their bandwidth, never mind yours.  ;)  You should really ensure that whatever code you write respects robots.txt files - but you're on your own there.  I am not an expert when it comes to their syntax.

Have a look at >the PHP filesystem reference< (http://www.php.net/manual/en/ref.filesystem.php) - in particular fopen, which can open a url, not just a file.  You can do more complex stuff with the >streams< (http://www.php.net/manual/en/ref.stream.php) library.
Title: Re: Web Crawler
Post by: Zylstra on July 17, 2006, 08:49:33 PM
Looks a bit complicated...
Title: Re: Web Crawler
Post by: Rob Pomeroy on July 18, 2006, 03:26:41 AM
You're not joking!