Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: My Robots.txt file  (Read 10191 times)

0 Members and 1 Guest are viewing this topic.

Zylstra

    Topic Starter
  • Moderator


  • Hacker

  • The Techinator!
  • Thanked: 45
    • Yes
    • Technology News and Information
  • Certifications: List
  • Computer: Specs
  • Experience: Guru
  • OS: Windows 7
My Robots.txt file
« on: August 04, 2006, 02:26:12 PM »
I have finished my Robots.txt file, and would like someone to check it. I believe that I have everything right, but am not completely sure.

Its here:
www.jessez.mbhosting.com/robots.txt

I worked hard on it, and the blocked robots are all robots that are really new, have no contact information, or have been reported as having a "strange or unusual" indexing pattern.
It will not block and major search engines.

If anyone knows a list of bots I should block that haven't been blocked on this list already, please tell me.
I am thinking about adding the same ones at www.computerhope.com/robots.txt as well as some other sites for extra security.
« Last Edit: August 04, 2006, 02:27:25 PM by zylstra555 »

Rob Pomeroy



    Prodigy

  • Systems Architect
  • Thanked: 124
    • Me
  • Experience: Expert
  • OS: Other
Re: My Robots.txt file
« Reply #1 on: August 04, 2006, 03:48:36 PM »
You do know that indexing spiders are generally doing you a favuor, right?
Only able to visit the forums sporadically, sorry.

Geek & Dummy - honest news, reviews and howtos

Zylstra

    Topic Starter
  • Moderator


  • Hacker

  • The Techinator!
  • Thanked: 45
    • Yes
    • Technology News and Information
  • Certifications: List
  • Computer: Specs
  • Experience: Guru
  • OS: Windows 7
Re: My Robots.txt file
« Reply #2 on: August 04, 2006, 04:09:44 PM »
Quote
You do know that indexing spiders are generally doing you a favuor, right?
Yes, but none of the most common search engines are blocked.
The ones that are blocked are one without contact information, they don't seem to have any origin.
They could be email address collectors. (I dont want that!)

If you think I should, Rob, I could just include the same ones that www.computerhope.com/robots.txt have...

Rob Pomeroy



    Prodigy

  • Systems Architect
  • Thanked: 124
    • Me
  • Experience: Expert
  • OS: Other
Re: My Robots.txt file
« Reply #3 on: August 05, 2006, 06:30:42 AM »
What makes you think that email-harvesting spiders would pay any attention to robots.txt files, telling them where they shouldn't look?  ;)

That's my point really: robots.txt files do not come with any mechanism to enforce their use.  They are for guidance only, hence useful for telling genuine search engines that there's no point indexing certain parts of your web site.
Only able to visit the forums sporadically, sorry.

Geek & Dummy - honest news, reviews and howtos

Zylstra

    Topic Starter
  • Moderator


  • Hacker

  • The Techinator!
  • Thanked: 45
    • Yes
    • Technology News and Information
  • Certifications: List
  • Computer: Specs
  • Experience: Guru
  • OS: Windows 7
Re: My Robots.txt file
« Reply #4 on: August 05, 2006, 10:38:06 AM »
I see your point.
I will go by this rule:
If I decide I am having too much trouble with robot crawlers, then I will enforce a strict setting, and block new and no contact robots.

Should I just copy/paste the blocked robots on the ComputerHope Robots.txt file?
(It seems as if Nathan himself has quite the list of blocks)

Rob Pomeroy



    Prodigy

  • Systems Architect
  • Thanked: 124
    • Me
  • Experience: Expert
  • OS: Other
Re: My Robots.txt file
« Reply #5 on: August 06, 2006, 01:14:21 AM »
There's no harm in that. :)
Only able to visit the forums sporadically, sorry.

Geek & Dummy - honest news, reviews and howtos

unlovedwarrior



    Guru

  • someday this name will be known
  • Thanked: 13
    Re: My Robots.txt file
    « Reply #6 on: August 09, 2006, 10:22:25 AM »
    ummm.. do you mined if i wish your robot.txt for my site? if not can i model of of it?


    thanks

    and of course i'll give you credit on the file and i'll just update it..

    Zylstra

      Topic Starter
    • Moderator


    • Hacker

    • The Techinator!
    • Thanked: 45
      • Yes
      • Technology News and Information
    • Certifications: List
    • Computer: Specs
    • Experience: Guru
    • OS: Windows 7
    Re: My Robots.txt file
    « Reply #7 on: August 09, 2006, 02:25:31 PM »
    Quote
    ummm.. do you mined if i wish your robot.txt for my site? if not can i model of of it?


    thanks

    and of course i'll give you credit on the file and i'll just update it..
    Well, as Rob said, robots are generally doing you a favor.
    Go ahead and use my list if you want.
    I will be modifying it later.
    My list is simply set up to block new robots with no contact information, and robots with strange patterns.

    You could also use the ComputerHope robots.txt file names as well
    and check out www.computerhope.com/promote.htm as there's some more robots directing information there.

    unlovedwarrior



      Guru

    • someday this name will be known
    • Thanked: 13
      Re: My Robots.txt file
      « Reply #8 on: August 09, 2006, 02:28:08 PM »
      k where do i put it? just in the folder with my site?

      Zylstra

        Topic Starter
      • Moderator


      • Hacker

      • The Techinator!
      • Thanked: 45
        • Yes
        • Technology News and Information
      • Certifications: List
      • Computer: Specs
      • Experience: Guru
      • OS: Windows 7
      Re: My Robots.txt file
      « Reply #9 on: August 09, 2006, 02:33:41 PM »
      Quote
      k where do i put it? just in the folder with my site?
      The root folder (where your index.htm file is)
      You may want to add this to the Meta area of your index.htm file...

      Code: [Select]
      <meta name="revisit-after" content="15 days">
      <meta name="ROBOTS" content="INDEX, ALL">
      <meta name="ROBOTS" content="INDEX, FOLLOW">

      This tells robots that your site can be indexed.

      BTW: My robots.txt file only blocks some of the YaBB files, which, quite frankly, isn't a really good idea... if you have a forum, you will want to block the admin directories and the member directories. This will keep important information about your users from being searched if the search engines manage to do that.

      unlovedwarrior



        Guru

      • someday this name will be known
      • Thanked: 13
        Re: My Robots.txt file
        « Reply #10 on: August 09, 2006, 02:37:27 PM »
        and how do i do that? :o

        Zylstra

          Topic Starter
        • Moderator


        • Hacker

        • The Techinator!
        • Thanked: 45
          • Yes
          • Technology News and Information
        • Certifications: List
        • Computer: Specs
        • Experience: Guru
        • OS: Windows 7
        Re: My Robots.txt file
        « Reply #11 on: August 09, 2006, 02:42:14 PM »
        Quote
        and how do i do that? :o
        Which part?
        Adding the meta tag, or blocking location on your site from robots?

        or something else?

        unlovedwarrior



          Guru

        • someday this name will be known
        • Thanked: 13
          Re: My Robots.txt file
          « Reply #12 on: August 09, 2006, 02:45:45 PM »
          i got the meta tag down i just need to know how to do the other things...


          blocking location of site and what not

          Zylstra

            Topic Starter
          • Moderator


          • Hacker

          • The Techinator!
          • Thanked: 45
            • Yes
            • Technology News and Information
          • Certifications: List
          • Computer: Specs
          • Experience: Guru
          • OS: Windows 7
          Re: My Robots.txt file
          « Reply #13 on: August 09, 2006, 02:52:59 PM »
          Quote
          i got the meta tag down i just need to know how to do the other things...


          blocking location of site and what not
          Basic Robots.txt files:

          Code: [Select]
          # Number signs are used to leave comments in your robots.txt file. In fact, everything that is being placed
          # in this code could actually be used as a robots.txt file...

          # Want to block a directory from a robot? Well, here's how you would do that...
          User-agent: robotname
          # and now the disallowed directories
          disallow: /blocked/
          disallow: /email/

          # but what if you wanted to block ALL robots from those directories?
          # then it would be
          User-agent: *
          # the * is for ALL robots to follow
          disallow: /blocked/
          disallow: /email/


          For an even better example, please by all means, look at the www.computerhope.com/robots.txt file.
          Or, to learn how to make one yourself, just visit www.computerhope.com/promote.htm

          unlovedwarrior



            Guru

          • someday this name will be known
          • Thanked: 13
            Re: My Robots.txt file
            « Reply #14 on: August 09, 2006, 02:55:30 PM »
            k..

            i'll try to make one myself and have you guys check it over..

            how do i find the robots names??

            Zylstra

              Topic Starter
            • Moderator


            • Hacker

            • The Techinator!
            • Thanked: 45
              • Yes
              • Technology News and Information
            • Certifications: List
            • Computer: Specs
            • Experience: Guru
            • OS: Windows 7
            Re: My Robots.txt file
            « Reply #15 on: August 09, 2006, 03:00:59 PM »
            Quote
            k..

            i'll try to make one myself and have you guys check it over..

            how do i find the robots names??
            Find it here

            unlovedwarrior



              Guru

            • someday this name will be known
            • Thanked: 13
              Re: My Robots.txt file
              « Reply #16 on: August 09, 2006, 03:03:54 PM »
              k thanks

              Rob Pomeroy



                Prodigy

              • Systems Architect
              • Thanked: 124
                • Me
              • Experience: Expert
              • OS: Other
              Re: My Robots.txt file
              « Reply #17 on: August 09, 2006, 08:21:09 PM »
              Quote
              if you have a forum, you will want to block the admin directories and the member directories. This will keep important information about your users from being searched if the search engines manage to do that.
              Still not convinced you've quite go this, by virtue of the fact you're using the word "block".  It's best not to have (or give) the impression that robots.txt files can actually block anything.  Consider them more as a polite request, which will sometimes be ignored.

              You need to protect your /admin tree in much more robust ways.
              Only able to visit the forums sporadically, sorry.

              Geek & Dummy - honest news, reviews and howtos

              Zylstra

                Topic Starter
              • Moderator


              • Hacker

              • The Techinator!
              • Thanked: 45
                • Yes
                • Technology News and Information
              • Certifications: List
              • Computer: Specs
              • Experience: Guru
              • OS: Windows 7
              Re: My Robots.txt file
              « Reply #18 on: August 09, 2006, 08:32:34 PM »
              Quote
              Quote
              if you have a forum, you will want to block the admin directories and the member directories. This will keep important information about your users from being searched if the search engines manage to do that.
              Still not convinced you've quite go this, by virtue of the fact you're using the word "block".  It's best not to have (or give) the impression that robots.txt files can actually block anything.  Consider them more as a polite request, which will sometimes be ignored.
              [highlight]
              You need to protect your /admin tree in much more robust ways.[/highlight]
              Mind explaining how?

              Dilbert

              • Moderator


              • Egghead

              • Welcome to ComputerHope!
              • Thanked: 44
                Re: My Robots.txt file
                « Reply #19 on: August 09, 2006, 09:08:12 PM »
                Yes, please elaborate, I can use this info. :)
                "The geek shall inherit the Earth."

                unlovedwarrior



                  Guru

                • someday this name will be known
                • Thanked: 13
                  Re: My Robots.txt file
                  « Reply #20 on: August 10, 2006, 08:23:21 AM »
                  im confused rob :-[ explain plz so do i really need robot.txt??

                  Rob Pomeroy



                    Prodigy

                  • Systems Architect
                  • Thanked: 124
                    • Me
                  • Experience: Expert
                  • OS: Other
                  Re: My Robots.txt file
                  « Reply #21 on: August 11, 2006, 01:13:16 AM »
                  Quote
                  Mind explaining how?
                  Quote
                  Yes, please elaborate, I can use this info. :)
                  Sure.  WIth Apache, you can use an .htaccess file as an added layer of protection.  I suggest using password and IP-based protection.  If you will only access /admin from your local subnet, then only allow access from that subnet.  If you are connecting to a remote server, but from a fixed IP address, only allow access from that IP address.

                  You can do something similar with IIS and WIndows-based authentication.

                  Quote
                  im confused rob :-[ explain plz so do i really need robot.txt??
                  Well this is my point.  The Computer Hope article explains it quite well.  You tell spiders not to index parts of the web tree that it would be pointless for them to access - e.g. recursive directories (not something you're likely to encounter for a while) or any thread on YaBB that contains a post by Mac...
                  « Last Edit: August 11, 2006, 01:13:35 AM by robpomeroy »
                  Only able to visit the forums sporadically, sorry.

                  Geek & Dummy - honest news, reviews and howtos

                  Zylstra

                    Topic Starter
                  • Moderator


                  • Hacker

                  • The Techinator!
                  • Thanked: 45
                    • Yes
                    • Technology News and Information
                  • Certifications: List
                  • Computer: Specs
                  • Experience: Guru
                  • OS: Windows 7
                  Re: My Robots.txt file
                  « Reply #22 on: August 11, 2006, 01:15:13 AM »
                  Hmm. I already have HTACCESS, but not on my YaBB directories...

                  unlovedwarrior



                    Guru

                  • someday this name will be known
                  • Thanked: 13
                    Re: My Robots.txt file
                    « Reply #23 on: August 11, 2006, 08:23:28 AM »
                    i have the htaccess file also im using phpbb

                    Dilbert

                    • Moderator


                    • Egghead

                    • Welcome to ComputerHope!
                    • Thanked: 44
                      Re: My Robots.txt file
                      « Reply #24 on: August 11, 2006, 09:32:59 AM »
                      Quote
                      or any thread on YaBB that contains a post by Mac...

                      ROTFL ;D ;D ;D

                      Thanks, Rob, I'll keep that in mind.

                      So now I know what robots.txt is... thanks, Rob. :)
                      "The geek shall inherit the Earth."

                      Google



                        Mentor

                        Thanked: 2
                        • Certifications: List
                        • Experience: Experienced
                        • OS: Windows 7
                        Re: My Robots.txt file
                        « Reply #25 on: August 11, 2006, 10:16:29 AM »
                        You have reached the point of a "VERY HOT TOPIC" at [timestamp=1155312980]!
                        [highlight]CONGRATULATIONS![/highlight]