Computer Hope
Internet & Networking => Web design => Topic started by: Zylstra on July 17, 2006, 12:18:02 AM
-
I love to experiment with web stuff. My new goal (sort of) is to make a web crawler that will go through a bunch of pages, find URL's, and link them.
I would like to make a web crawler that can get a large amount of URL's, and save them in a simple text file.
How would I do this?
(Not looking for anything fancy, I've got limited bandwidth, you know. )
-
I can't remember where you're up to - have you got PHP under your belt yet?
First thing to bear in mind: you are considering using an automated process to retrieve information from web sites. Some webmasters would consider that abuse of their bandwidth, never mind yours. ;) You should really ensure that whatever code you write respects robots.txt files - but you're on your own there. I am not an expert when it comes to their syntax.
Have a look at >the PHP filesystem reference< (http://www.php.net/manual/en/ref.filesystem.php) - in particular fopen, which can open a url, not just a file. You can do more complex stuff with the >streams< (http://www.php.net/manual/en/ref.stream.php) library.
-
Looks a bit complicated...
-
You're not joking!