Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Replace multiple lines in multiple html files issue  (Read 6579 times)

0 Members and 1 Guest are viewing this topic.

DaveLembke

    Topic Starter


    Sage
  • Thanked: 662
  • Certifications: List
  • Computer: Specs
  • Experience: Expert
  • OS: Windows 10
Replace multiple lines in multiple html files issue
« on: October 20, 2019, 03:11:55 PM »
Looking for a way to change the pathing of hyperlinks in multiple html files that all share the same structure. I tried to use notepad and use the Replace function for when this is found, replace it with that. But notepad only allows a single line replacement and if notepad was used it would be a manual process of replacement for each file unless I make a macro to automate the process for an iteration equal to the number of files to edit.

Scripts that I have used in the past are also only single line or targeted character replacements.

Was wondering if anyone had any suggestions of how to replace like 74 lines of html instructions with corrections without having to run scripts that I already have that can replace single lines. I figure there has to be an easier way than making 74 find and replace instructions for a script to parse through the html files and overwrite to correct for pathing. Maybe there is a way to find a match to 74 lines and when detected as a match, replace it with the correction which would be way more efficient than a script having to parse pass 74 times though each html file looking for the match and replacing it.

Looking online I didnt seem to find anything yet that would work to get around having to have 74 instructions for single line replacements. I thought sed or awk would but nothing found yet, so figured I'd ask here. If making 74 replace instructions as a script is the best method then I will run with that.  :-\

patio

  • Moderator


  • Genius
  • Maud' Dib
  • Thanked: 1769
    • Yes
  • Experience: Beginner
  • OS: Windows 7
Re: Replace multiple lines in multiple html files issue
« Reply #1 on: October 20, 2019, 05:45:31 PM »
Notepad ++ i believe does this...i'll check
" Anyone who goes to a psychiatrist should have his head examined. "

DaveLembke

    Topic Starter


    Sage
  • Thanked: 662
  • Certifications: List
  • Computer: Specs
  • Experience: Expert
  • OS: Windows 10
Re: Replace multiple lines in multiple html files issue
« Reply #2 on: October 20, 2019, 06:59:19 PM »
Thanks for checking on Notepad++ for this. The best part of this replacement need is that its a solid 74 line "paragraph like chunk" that needs to be replaced, so no having to jump around for replacements, so I figured there must be an easier method than 74 instructions of line replacements since all lines are consecutive within the 200+ line HTML files. If the hyperlink paths to be replaced were scattered among the HTML then it would require single line replacements where structure would differ between the HTML files.

Thinking I might at some point go towards Dynamic HTML so that changes like this can be easier, but for now a band aid fix I am hoping.

Hackoo



    Hopeful
  • Thanked: 42
  • Experience: Expert
  • OS: Windows 10
Re: Replace multiple lines in multiple html files issue
« Reply #3 on: October 20, 2019, 07:57:25 PM »
Hi  ;)
Perhaps this can be done with a batch file using regex in vbscript !
So, if you can join some HTML files and explain more what strings would be replaced, i will try if i have enough time to make a batch script like this one :
==> Replace with regular expression using Batch multiple text (Windows) <==

DaveLembke

    Topic Starter


    Sage
  • Thanked: 662
  • Certifications: List
  • Computer: Specs
  • Experience: Expert
  • OS: Windows 10
Re: Replace multiple lines in multiple html files issue
« Reply #4 on: October 22, 2019, 12:02:10 PM »
Hello Hackoo

Checked out the link you provided but haven't messed with trying to tweak that to work for my 74 line replacement need yet. Work has me working a 60+ hour week this week and this project is not work related but my own, so gets set to the side as life gets crazy busy.

Here is a sample of what I have, but it would be 74 lines all consecutive that get changed from one path to another. I just put 8 lines here to test with and show the structure without making this post one that you would have to scroll down the 74 lines for. Also while it says chapters its not chapters that the links point to, but instead of listing the actual names it gives this example a simplistic structure since what is to be replaced is just as simplistic.

The original hyperlinks are like below:

<a target="_top" href="default.htm"> Home[/url]
<a target="_top" href="Chapter1.htm">Chapter 1[/url]
<a target="_top" href="Chapter2.htm">Chapter 2[/url]
<a target="_top" href="Chapter3.htm">Chapter 3[/url]
<a target="_top" href="Chapter4.htm">Chapter 4[/url]
<a target="_top" href="Chapter5.htm">Chapter 5[/url]
<a target="_top" href="Chapter6.htm">Chapter 6[/url]
<a target="_top" href="Chapter7.htm">Chapter 7[/url]

The hyperlink targets are changing to www paths to multiple domains with full URL's. Basically I had a local intranet type of webpage that use to be populated from multiple internet sources by using HTTrack, and instead of using an offline cache of collected information, I am changing to no longer get a copy of sites via HTTRack and instead have it run out to the sources of the URL paths for information that is current vs the last time HTTrack grabbed a copy via scheduled task. I originally had this grabbing offline copies of site information because the internet use to be not everywhere, but these days with hotspots everywhere including xfinity now turns just about every home and business into a hot spot there is no longer a need to have offline data that is gathered through HTTrack and so I am trying to undo the local paths and direct multiple pages to the original sources of the content and because I have built upon this for about 6 years now I have multiple html documents that all were similarly constructed for navigation bar on far left side and so its an easy fix in that the navigation bars are all the same, but tedious to where I have lots of HTML files to edit some of which pivot off of others as it branches out and a line for line replacement I thought their might be a better method vs specifying 74 individual line replacements but say read in a multiple line string and compare and when match found replace that with the correct information.

Just wanted to add that the whitespace doesnt differ between the HTML files either for the navigation html instructions, but maybe whitespace would need to be taken into consideration for multiple line string replacement as a single read-in when comparing.  :-\

When messing around with single line replacements in the past I have had to at times read-in the contents to an array and then target the elements in that array to change the characters at the targeted elements locations, but this only seems to work when the depth of the data is always in the same location in the file to be edited. The header information above the navigation hyperlink content is subject to differing among all of the html files and so that makes specifically targeting array elements deeper in a read-in a mess.  And I've done this with C++ ::)

Hackoo



    Hopeful
  • Thanked: 42
  • Experience: Expert
  • OS: Windows 10
Re: Replace multiple lines in multiple html files issue
« Reply #5 on: October 22, 2019, 03:32:04 PM »
Hi DaveLembke  ;)
So just to fix idea with you about one file :
For example if you had something in the source code of your HTML file like this one :
Code: [Select]
<a target="_top" href="default.htm"> Home</a>
<a target="_top" href="Chapter1.htm">Chapter 1</a>
<a target="_top" href="Chapter2.htm">Chapter 2</a>
<a target="_top" href="Chapter3.htm">Chapter 3</a>
<a target="_top" href="Chapter4.htm">Chapter 4</a>
<a target="_top" href="Chapter5.htm">Chapter 5</a>
<a target="_top" href="Chapter6.htm">Chapter 6</a>
<a target="_top" href="Chapter7.htm">Chapter 7</a>

And for example you want to replace those by a https://www.somedomain.com like that :

Code: [Select]
<a target="_top" "https://www.somedomain.com/default.htm"> Home</a>
<a target="_top" "https://www.somedomain.com/Chapter1.htm">Chapter 1</a>
<a target="_top" "https://www.somedomain.com/Chapter2.htm">Chapter 2</a>
<a target="_top" "https://www.somedomain.com/Chapter3.htm">Chapter 3</a>
<a target="_top" "https://www.somedomain.com/Chapter4.htm">Chapter 4</a>
<a target="_top" "https://www.somedomain.com/Chapter5.htm">Chapter 5</a>
<a target="_top" "https://www.somedomain.com/Chapter6.htm">Chapter 6</a>
<a target="_top" "https://www.somedomain.com/Chapter7.htm">Chapter 7</a>

Is this what you want to get ?


DaveLembke

    Topic Starter


    Sage
  • Thanked: 662
  • Certifications: List
  • Computer: Specs
  • Experience: Expert
  • OS: Windows 10
Re: Replace multiple lines in multiple html files issue
« Reply #6 on: October 22, 2019, 04:28:06 PM »
Hi Hackoo

The URL would be replaced with a path that is nothing like the original local path, so like chapter1.htm would be replaced with https://www.somedomain.com/.......? where .....? is not related in naming convention to that of the local path I had created for it.

So an injection of https://www.somedomain.com/ prepending the name of chapter1.htm wouldn't work.  :(

That would be so easy if it were that simple.  ;D

My thoughts were that a solution would likely have to read in from one text file what to search for which is a 74 line string of html instructions. Then parse through the hundreds of html files looking for a match and if match found, replace the contents with that from another text file. This solution would then be dynamic and could be used later for other content replacements if needed vs static instructed within the script which would require customizing for each application. But I just dont know how to pull that off. It exceeds my programming skills because I only know of single-line methods, especially where the placement of the 74 lines may vary between the html files but of which the structure is always the same. Given the placement may be different among the html files it makes it very difficult to target it from a large array standpoint I am thinking because each character would be placed into an array element and differing text info on one page from another in word length or an additional line in one page vs another would mean that the element to target to change within the array is elsewhere.  :-\

However a static method vs dynamic method could also be created where 2 files are not used as say File1.txt contains the multiple line string to search for and File2.txt contains the multiple line string to replace the contents of targeted file with at that location within the HTML file. But my thoughts with the dynamic approach is that it could work for someone else visiting computer hope in the future who might have a same issue and they can just run with what is shared here to recreate it at their end as a free solution vs them also having to make a static solution that is specific to their multiple line replacement needs.


DaveLembke

    Topic Starter


    Sage
  • Thanked: 662
  • Certifications: List
  • Computer: Specs
  • Experience: Expert
  • OS: Windows 10
Re: Replace multiple lines in multiple html files issue
« Reply #7 on: November 12, 2019, 04:43:28 PM »
Just wanted to state that I found a work around that was able to edit all those files with proper changes for multiple lines. I ended up using a batch file along with a keyboard, mouse macro that I created as an EXE to be called from batch file for as many files that existed and it basically copied each file to an alternate location, edited the file for a copy paste of multiple lines of instructions to be replaced upon using the target of Find in Notepad which placed the target for replacement to be always in the same location for X,Y mouse moves to replace the HTML instructions and save the file at the alternate location, and then do this for all files in the source location for as many iterations that the EXE is called to run in accordance with the batch file. So the batch file basically had the macro listed for as many times as I needed to run through this redundant 74 line replacements for 327 HTML files.  ;D So the EXE is listed 327 times so that it would run that many times to edit all these files. I just needed to go through the process the first time recording the mouse and keyboard actions and then compile it as an EXE and then call to that from batch execution. It took about 4 hours and 15 minutes for my computer to go through all these files doing the redundant edits. So glad I didnt have to do this manually myself  ;D . The macro creation tool I used was Jitbit Macro Recorder that I bought years ago and of which they were cool enough to give me an updated version that works with Windows 10 because I still had proof of purchase from 2006. A scripted method if I went that route vs this copy cat follow what i did 327 times method would have been faster than 4 hours and 15 minutes and probably done in like 54.5 minutes if it took 10 seconds per edit iteration of each file for 74 lines , but its a 1 time use fix and  from here on out I will have it all pointing to the web vs locally. So all set now, however saving a reference to Hackoo's method that he shared here because I can see that being useful later for other projects where edits are needed.  8)