Computer Hope

Microsoft => Microsoft Windows => Windows XP => Topic started by: turbodiesel on November 14, 2009, 12:26:13 AM

Title: help with manipulating url
Post by: turbodiesel on November 14, 2009, 12:26:13 AM
I have a list of 500 similar urls.
approximately 490 of these link to the same page.
How can i find the remaining 10 from the list which do not take me to this page?
(without obviously clicking on each one)

TIA

Turbodiesel
Title: Re: help with manipulating url
Post by: Salmon Trout on November 14, 2009, 01:18:33 AM
old.txt

http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.dog.com/bones.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
http://www.dog.com/collars.htm

type old.txt | find /v "www.cat.com" > new.txt

only lines from old.txt which do not contain www.cat.com will end up in new.txt


Title: Re: help with manipulating url
Post by: Salmon Trout on November 14, 2009, 01:26:14 AM

From the command line you can use find.exe. The /v switch excludes lines which contain the search string. You can use the > filename syntax to redirect the output to a file.

old.txt

http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.dog.com/bones.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
http://www.dog.com/collars.htm

type old.txt | find /v "www.cat.com" > new.txt

only lines from old.txt which do not contain www.cat.com will end up in new.txt

If you remove the /v then only lines which do contain www.cat.com will be copied

Code: [Select]
C:\>type old.txt
http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.dog.com/bones.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
http://www.dog.com/collars.htm

Code: [Select]
C:\>type old.txt | find /v "www.cat.com"
http://www.dog.com/bones.htm
http://www.dog.com/collars.htm

Code: [Select]
C:\>type old.txt | find "www.cat.com"
http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm

Code: [Select]
C:\>type old.txt | find "collars"
http://www.cat.com/collars.htm
http://www.dog.com/collars.htm
Title: Re: help with manipulating url
Post by: turbodiesel on November 15, 2009, 08:34:03 AM
Thanks for taking the time to reply m8 :)
I am refering to the destination page rather then the actual content of the original url.
For example....

http://www.cat.com/collars/page1
http://www.cat.com/collars/page2
http://www.cat.com/collars/page3
v
v
v
v
http://www.cat.com/collars/page499
http://www.cat.com/collars/page500


490 of these urls will redirect me to a page displaying a picture of a cat

10 of these urls will redirect me to 10 different pages displaying different 10 different images

How do find the 10 "different" links without clicking on each of the 500 links to see where it takes me?
Title: Re: help with manipulating url
Post by: Spoiler on November 16, 2009, 09:20:05 AM
You really can't do this. You have to preview the page to know what you want to keep.

And even if the URL works for you today there is nothing stopping the web site to change the URL to show up as one of the other 500 you already have....

Is this one of those sites that have a picture of the day thing going on?

Title: Re: help with manipulating url
Post by: Salmon Trout on November 16, 2009, 10:39:53 AM
Could play around with wget