Computer Hope
Microsoft => Microsoft Windows => Windows XP => Topic started by: turbodiesel on November 14, 2009, 12:26:13 AM
-
I have a list of 500 similar urls.
approximately 490 of these link to the same page.
How can i find the remaining 10 from the list which do not take me to this page?
(without obviously clicking on each one)
TIA
Turbodiesel
-
old.txt
http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.dog.com/bones.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
http://www.dog.com/collars.htm
type old.txt | find /v "www.cat.com" > new.txt
only lines from old.txt which do not contain www.cat.com will end up in new.txt
-
From the command line you can use find.exe. The /v switch excludes lines which contain the search string. You can use the > filename syntax to redirect the output to a file.
old.txt
http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.dog.com/bones.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
http://www.dog.com/collars.htm
type old.txt | find /v "www.cat.com" > new.txt
only lines from old.txt which do not contain www.cat.com will end up in new.txt
If you remove the /v then only lines which do contain www.cat.com will be copied
C:\>type old.txt
http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.dog.com/bones.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
http://www.dog.com/collars.htm
C:\>type old.txt | find /v "www.cat.com"
http://www.dog.com/bones.htm
http://www.dog.com/collars.htm
C:\>type old.txt | find "www.cat.com"
http://www.cat.com/collars.htm
http://www.cat.com/fish.htm
http://www.cat.com/milk.htm
http://www.cat.com/scratch.htm
C:\>type old.txt | find "collars"
http://www.cat.com/collars.htm
http://www.dog.com/collars.htm
-
Thanks for taking the time to reply m8 :)
I am refering to the destination page rather then the actual content of the original url.
For example....
http://www.cat.com/collars/page1
http://www.cat.com/collars/page2
http://www.cat.com/collars/page3
v
v
v
v
http://www.cat.com/collars/page499
http://www.cat.com/collars/page500
490 of these urls will redirect me to a page displaying a picture of a cat
10 of these urls will redirect me to 10 different pages displaying different 10 different images
How do find the 10 "different" links without clicking on each of the 500 links to see where it takes me?
-
You really can't do this. You have to preview the page to know what you want to keep.
And even if the URL works for you today there is nothing stopping the web site to change the URL to show up as one of the other 500 you already have....
Is this one of those sites that have a picture of the day thing going on?
-
Could play around with wget