Computer Hope
Microsoft => Microsoft DOS => Topic started by: iONik on November 30, 2021, 09:43:17 AM
-
OK, this is totally over my head here. Never learned all this (for /f "tokens) stuff.
I have a large text file that has many, many, many URLs embedded in it. Going through this and copying and pasting all these URLs would be quite painful, mentally and perhaps even physically, let alone very time consuming. In the end I'd like to have a new text document that lists all the URLs.
The string would start with (http) and end with ("), absent the parenthesis. Removing the (") from the result would be great but not critical.
Can anyone help me out?
I would appreciate it!
Brian
-
This can be done easily with a vbscript or a Powershell script that use Regex.
-
eeks! ...and I know even less about those languages.
-
Can you give us a little example of the inputfile and what did you expected as result after extracting i mean the output !
We can write a script that can let the user drag and drop the inputfile over the batch script or the vbscript and get the result in another file !
So, I'm waiting from you what i've asked above !
@+
-
A sample file:
"thumb_source_type":"screen","thumb_url":"","group_id":"2","screen_delay":0,"get_screen_method":"auto","need_sync_screen":0,"update_interval":"","thumb_width":728,"thumb_height":454.9101678183613,"position":6,"clicks":5,"deny":0,"screen_maked":1,"global_id":"vWNVsZDrcsdhsCGdFonDn5BkW9nRRJ","thumb_version":1,"id":14,"thumb":"filesystem:moz-extension://f65056c1-5c41-4f9e-b577-96f5fd992fb0/persistent/sd_previews/vWNVsZDrcssCGdFonDn55EBkW9nRRJ.png","rowid":14,"auto_title":"PortableApps.com - Portable software for USB, portable, and cloud drives","last_preview_update":1619905296092},{"url":"https://google.com/","title":"Unlock, speed up and easily transfer ","thumb_source_type":"screen","group_id":"2","get_screen_method":"auto","thumb":"filesystem:moz-extension://f65056c1-5c41-4f9e-b577-96f5fd992fb0/persistent/sd_previews/zN4KncDseuEPF65WBllpPVbEke9m1u.png","screen_delay":0,"position":5,"clicks":191,"deny":0,"screen_maked":1,"global_id":"zN4cDseuEPF65t6WBllpPVbEke9m1u","thumb_version":1,"id":16,"last_preview_update":1619905289896,"auto_title":"Unlock, speed up and easily transfer content from the cloud - Offcloud.com","thumb_width":728,"thumb_height":454.9101678183613,"thumb_url":"","need_sync_screen":0,"update_interval":"","rowid":16},{"url":"https://drive.google.com/drive/","title":"My Drive - Google
Sample Output file:
https://google.com/"
https://drive.google.com/drive/"
or
https://google.com/
https://drive.google.com/drive/
can't stop with (/) as some URLs have multiple .../text1/text2/ ie...subdirectories as well as different top-level-domains
-
You can give a try for this batch file : Extracting_Links.bat
@echo off
Mode 70,4 & color 0B
Title Extracting URLs links from InputFile by Drag and Drop
Set "InputFile=%1"
If [%InputFile%] EQU [] Call :Help
Set "Tmpvbs=%Tmp%\%~n0.vbs"
Set "OutPutFile=%~n1_Output.txt"
echo( & echo( Please wait ... Extracting URLs and Links from "%~nx1"
Call :Extract "%InputFile%" "%OutPutFile%"
Start "" "%OutPutFile%" & exit
::--------------------------------------------------------------------------------------------------------------------
:Extract <InputData> <OutPutData>
(
echo Data = WScript.StdIn.ReadAll
echo Data = Extract(Data,"(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?"^)
echo WScript.StdOut.WriteLine Data
echo Function Extract(Data,Pattern^)
echo Dim oRE,oMatches,Match,Line
echo set oRE = New RegExp
echo oRE.IgnoreCase = True
echo oRE.Global = True
echo oRE.Pattern = Pattern
echo set oMatches = oRE.Execute(Data^)
echo If not isEmpty(oMatches^) then
echo For Each Match in oMatches
echo Line = Line ^& Match.Value ^& vbcrlf
echo Next
echo Extract = Line
echo End if
echo End Function
)>"%Tmpvbs%"
cscript /nologo "%Tmpvbs%" < "%~1" > "%~2"
If Exist "%Tmpvbs%" Del "%Tmpvbs%"
exit /b
::--------------------------------------------------------------------------------------------------------------------
:Help
Color 0C
echo(
echo( You should drag and drop a file over,
echo( this script "%~nx0" in order to extract URLs and Links
Timeout /T 10 /NoBreak>nul
Exit
::--------------------------------------------------------------------------------------------------------------------
8) ;)
-
I N S A N E !
This script would have taken me 10x longer to write than to manually extract all the websites.
Worked perfectly!
Thank You!
You don't know how much I appreciate your help.
Brian
;D
-
Putting this link to the O/P's other questions on the provided code here for posterity.
Inserting extracted code into new file surrounded by specific text (https://stackoverflow.com/questions/70222069/inserting-extracted-code-into-new-file-surrounded-by-specific-text)