Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Web Crawler  (Read 4118 times)

0 Members and 1 Guest are viewing this topic.

Zylstra

    Topic Starter
  • Moderator


  • Hacker

  • The Techinator!
  • Thanked: 45
    • Yes
    • Technology News and Information
  • Certifications: List
  • Computer: Specs
  • Experience: Guru
  • OS: Windows 7
Web Crawler
« on: July 17, 2006, 12:18:02 AM »
I love to experiment with web stuff. My new goal (sort of) is to make a web crawler that will go through a bunch of pages, find URL's, and link them.

I would like to make a web crawler that can get a large amount of URL's, and save them in a simple text file.

How would I do this?
(Not looking for anything fancy, I've got limited bandwidth, you know. )

Rob Pomeroy



    Prodigy

  • Systems Architect
  • Thanked: 124
    • Me
  • Experience: Expert
  • OS: Other
Re: Web Crawler
« Reply #1 on: July 17, 2006, 07:52:20 AM »
I can't remember where you're up to - have you got PHP under your belt yet?

First thing to bear in mind: you are considering using an automated process to retrieve information from web sites.  Some webmasters would consider that abuse of their bandwidth, never mind yours.  ;)  You should really ensure that whatever code you write respects robots.txt files - but you're on your own there.  I am not an expert when it comes to their syntax.

Have a look at >the PHP filesystem reference< - in particular fopen, which can open a url, not just a file.  You can do more complex stuff with the >streams< library.
« Last Edit: July 17, 2006, 07:52:46 AM by robpomeroy »
Only able to visit the forums sporadically, sorry.

Geek & Dummy - honest news, reviews and howtos

Zylstra

    Topic Starter
  • Moderator


  • Hacker

  • The Techinator!
  • Thanked: 45
    • Yes
    • Technology News and Information
  • Certifications: List
  • Computer: Specs
  • Experience: Guru
  • OS: Windows 7
Re: Web Crawler
« Reply #2 on: July 17, 2006, 08:49:33 PM »
Looks a bit complicated...

Rob Pomeroy



    Prodigy

  • Systems Architect
  • Thanked: 124
    • Me
  • Experience: Expert
  • OS: Other
Re: Web Crawler
« Reply #3 on: July 18, 2006, 03:26:41 AM »
You're not joking!
Only able to visit the forums sporadically, sorry.

Geek & Dummy - honest news, reviews and howtos