Author Topic: Web Crawler (Read 4118 times)

Zylstra · « **on:** July 17, 2006, 12:18:02 AM »

I love to experiment with web stuff. My new goal (sort of) is to make a web crawler that will go through a bunch of pages, find URL's, and link them.

I would like to make a web crawler that can get a large amount of URL's, and save them in a simple text file.

How would I do this?
(Not looking for anything fancy, I've got limited bandwidth, you know. )

Rob Pomeroy · « **Reply #1 on:** July 17, 2006, 07:52:20 AM »

I can't remember where you're up to - have you got PHP under your belt yet?

First thing to bear in mind: you are considering using an automated process to retrieve information from web sites. Some webmasters would consider that abuse of their bandwidth, never mind yours.

You should really ensure that whatever code you write respects robots.txt files - but you're on your own there. I am not an expert when it comes to their syntax.

Have a look at >the PHP filesystem reference< - in particular fopen, which can open a url, not just a file. You can do more complex stuff with the >streams< library.

Zylstra · « **Reply #2 on:** July 17, 2006, 08:49:33 PM »

Looks a bit complicated...

Rob Pomeroy · « **Reply #3 on:** July 18, 2006, 03:26:41 AM »

You're not joking!

Computer Hope Forum

News:

Author Topic: Web Crawler (Read 4118 times)

Zylstra

Web Crawler

Rob Pomeroy

Re: Web Crawler

Zylstra

Re: Web Crawler

Rob Pomeroy

Re: Web Crawler