Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Embedded for loop.  (Read 6322 times)

0 Members and 1 Guest are viewing this topic.

Betty

    Topic Starter


    Rookie

    • Experience: Beginner
    • OS: Windows XP
    Embedded for loop.
    « on: August 21, 2011, 03:11:59 AM »
    Win XP Pro SP3 and all updates.

    I want to interleave lines from two plain text files so that the first line of File1 is followed by the first line of File2, the second line of File1 is followed by the second line of File2 etc...  Both files contain the same number of lines.

    Can this be achieved in For loops?   I tried the following script and the first lines appear ok but the script seems to loop through File2 before returning to pick up the second line of File1.  Thanks

    Code: [Select]
    @echo off
    cls
    setlocal

    for /f "tokens=*" %%A in (file1.txt) do (

    for /f "tokens=*" %%a in (file2.txt) do (

    (
    echo %%A
    echo %%a
    )>>newfile.txt


    REM  At this point I want to loop to the first For loop to input
    REM  the second line of File1.txt...

      )
    )

    type newfile.txt
    To the world you are just one person,
    To one person you are the world.

    Salmon Trout

    • Guest
    Re: Embedded for loop.
    « Reply #1 on: August 21, 2011, 05:16:49 AM »
    The FOR /F command will, as you have noticed, process the whole of the file before terminating so while your idea of an outer loop processing file1 and an inner loop processing file2 is OK, you will have to resort to some extra stuff to isolate the line you want in the inner loop. Using delayed expansion allows you to set and read line count variables so you can grab the right line in the inner loop, and a first attempt might look something like this

    Code: [Select]
    setlocal enabledelayedexpansion
    set File1LineCount=1
    for /f "delims=" %%A in (file1.txt) do (
    set File1Line=%%A
    set File2LineCount=1
    for /f "delims=" %%B in (file2.txt) do (
    set File2Line=%%B
    If !File1LineCount! equ !File2LineCount! (
    echo !File1Line! >> File3.txt
    echo !File2Line! >> File3.txt
    )
    set /a File2LineCount+=1
    )
    set /a File1LineCount+=1
    )

    File1.txt:
    Code: [Select]
    This is line 1 of file1.txt
    This is line 2 of file1.txt
    This is line 3 of file1.txt
    This is line 4 of file1.txt
    This is line 5 of file1.txt
    This is line 6 of file1.txt
    This is line 7 of file1.txt
    This is line 8 of file1.txt
    This is line 9 of file1.txt
    This is line 10 of file1.txt

    File2.txt:
    Code: [Select]
    This is line 1 of file2.txt
    This is line 2 of file2.txt
    This is line 3 of file2.txt
    This is line 4 of file2.txt
    This is line 5 of file2.txt
    This is line 6 of file2.txt
    This is line 7 of file2.txt
    This is line 8 of file2.txt
    This is line 9 of file2.txt
    This is line 10 of file2.txt

    File3.txt:
    Code: [Select]
    This is line 1 of file1.txt
    This is line 1 of file2.txt
    This is line 2 of file1.txt
    This is line 2 of file2.txt
    This is line 3 of file1.txt
    This is line 3 of file2.txt
    This is line 4 of file1.txt
    This is line 4 of file2.txt
    This is line 5 of file1.txt
    This is line 5 of file2.txt
    This is line 6 of file1.txt
    This is line 6 of file2.txt
    This is line 7 of file1.txt
    This is line 7 of file2.txt
    This is line 8 of file1.txt
    This is line 8 of file2.txt
    This is line 9 of file1.txt
    This is line 9 of file2.txt
    This is line 10 of file1.txt
    This is line 10 of file2.txt

    However, although it is possible to demonstrate this method convincingly with dinky little test files of 10 or so lines, and see the result (in effect) at once, it does not scale well as the number of lines increases. For each line of File 1 you are forced to read in every line of File 2, even though you only need one. On my system, (3.0 GHz quad core, 64 bit Windows 7, 4 GB RAM, 7200 rpm SATA drive) I got these times with lines around 25 characters each...

    10 lines each file 30 milliseconds
    50 lines each file 280 milliseconds
    100 lines each file 880 milliseconds   
    500 lines each file 21.33 seconds   
    1000 lines each file 78.2 seconds

    You could achieve a reduction of around 50% (with larger line counts) by taking the inner loop out and making it a CALLed subroutine, that way you can jump out of processing File 2 as soon as you have the line you want...

    Note: goto :EOF (with the colon, EOF is not actually a label present in the script) causes a jump out of the called subroutine back to the line following the CALL statement in the main code.

    Code: [Select]
    setlocal enabledelayedexpansion
    set File1LineCount=1
    for /f "delims=" %%A in (file1.txt) do (
    set File1Line=%%A
    echo !File1Line! >> File3.txt
    call :SubRoutine
    set /a File1LineCount+=1
    )
    goto end

    :SubRoutine
    set File2LineCount=1
    for /f "delims=" %%B in (file2.txt) do (
    set File2Line=%%B
    If !File1LineCount! equ !File2LineCount! (
    echo !File2Line! >> File3.txt
    goto :EOF
    )
    set /a File2LineCount+=1
    )

    :end

    Of course the time may not matter, but this is precisely the sort of thing that other scripting languages such as VBScript (and definitely Powershell!) are good at. Read both files once each (into arrays) and then read out the pairs of lines and write them to the output file.

    Bear in mind: with the batch approach, if either or both of your text files to be merged have "poison characters" such as & <> etc you will hit problems.



    « Last Edit: August 21, 2011, 05:49:02 AM by Salmon Trout »

    Salmon Trout

    • Guest
    Re: Embedded for loop.
    « Reply #2 on: August 21, 2011, 09:59:35 AM »
    By passing the File1 linecount as a parameter to the subroutine where we subtract 1 and use the result if it is > 0 as the SKIP parameter in FOR /F you can get the time down by another 50% or so

    Code: [Select]
    set File1LineCount=1
    for /f "delims=" %%A in (file1.txt) do (
    set File1Line=%%A
    echo !File1Line! >> File3.txt
    call :SubRoutine !File1LineCount!
    set /a File1LineCount+=1
    )
    goto end

    :SubRoutine
    set /a skipnum=%1-1
    if %skipnum% gtr 0 (
    set parblock="skip=%skipnum% delims="
        ) else (
    set parblock="delims="
        )
    for /f %parblock% %%B in (file2.txt) do (
           echo %%B >> File3.txt
           goto :EOF
        )

    :end


    So I got these times (1000 lines)

    average of 10 runs each

    Method 1 (straight code, no subroutine, process all lines of File2 for each line of File1) 83 seconds
    Method 2 (subroutine, jump out when correct line number match reached) 51 seconds (so not quite 50%)
    Method 3 (subroutine, use SKIP and jump out after reading 1 line)  19 seconds (a bit more than 50%)




    Betty

      Topic Starter


      Rookie

      • Experience: Beginner
      • OS: Windows XP
      Re: Embedded for loop.
      « Reply #3 on: August 21, 2011, 03:46:05 PM »
      Salmon Trout - Thank you for devoting so much time to my query, you've given me lots of things to ponder.

      B.
      To the world you are just one person,
      To one person you are the world.

      Salmon Trout

      • Guest
      Re: Embedded for loop.
      « Reply #4 on: August 22, 2011, 07:10:24 AM »
      When I posted the script examples above, intended to interleave the  lines of two arbitrary text files, I chose to use as demos, input files which had lines of the form:

      This is line 1 of file1.txt
      This is line 2 of file1.txt

      This is line 1 of file2.txt
      This is line 2 of file2.txt

      ... etc


      This was so that the output could be easily verified as correct. In fact the scripts I posted would work with text files containing any content allowed by FOR /F (no poison characters, no blank lines etc)

      However, the profoundly dumb (banned) troll who persists in infesting this thread keeps posting code, in spite of moderator deletion, which only works with input files that have lines of that type ("This is line 1") etc.

      Now I don't know if he really is so stupid that he thinks that is an acceptable solution, or if he is just making mischief because of his mental illness. Whatever.
      « Last Edit: August 22, 2011, 07:32:22 AM by Salmon Trout »

      Salmon Trout

      • Guest
      Re: Embedded for loop.
      « Reply #5 on: August 22, 2011, 07:23:32 AM »
      Anyhow, here's one way to do it in VBscript.

      Time to process 1,000 lines on my hardware, down from 19 seconds (batch) to 0.07 seconds.
      Time to process 10,000 lines: 0.28 seconds

      A win for VBscript, I think.

      Code: [Select]
      Dim fso, f1, f2
      Const ForReading = 1, ForWriting = 2, ForAppending = 8
      Const FormatSystemDefault = -2, FormatUnicode = -1, FormatASCII = 0

      ' Supply the 3 file names as parameters
      ' E.g.   Cscript.exe myscript.vbs "Text file 1.txt" "Text file 2.txt" "Interleaved output.txt"

      InputFileName1=wscript.arguments(0)
      InputFileName2=wscript.arguments(1)
      OutPutFileName=wscript.arguments(2)

      Set fso = CreateObject("Scripting.FileSystemObject")

      ' Read file 1 into array
      Set f1 = fso.OpenTextFile(InputFileName1, ForReading)
      FileContents1 =  f1.ReadAll
      FileArray1 = Split(FileContents1,vbcrlf)

      ' Read file 2 into array
      Set f2 = fso.OpenTextFile(InputFileName2, ForReading)
      FileContents2 =  f2.ReadAll
      FileArray2 = Split(FileContents2,vbcrlf)

      ' Get line counts
      LineCount1 = Ubound(FileArray1)
      LineCount2 = Ubound(FileArray2)

      'Display information
      wscript.echo "File 1 name     : " & InputFileName1
      wscript.echo "File 2 name     : " & InputFileName2
      wscript.echo "Lines in file 1 : " & LineCount1
      wscript.echo "Lines:in file 2 : " & LineCount2

      ' Check they have same number of lines, if not, quit
      If LineCount2 <> LineCount1 Then
      wscript.echo
      wscript.echo "Line counts not equal"
      wscript.echo
      Wscript.Quit
      End If

      ' Create file 3 (output file)
      Set OutPutFile = fso.CreateTextFile(OutPutFileName, True)

      ' Read out contents of the 2 arrays, interleaved
      For i = 0 to Ubound(FileArray1)-1
      OutPutFile.WriteLine ( FileArray1 (i) )
      OutPutFile.WriteLine ( FileArray2 (i) )
      Next

      ' Close output file
      OutPutFile.Close

      « Last Edit: August 22, 2011, 07:34:50 AM by Salmon Trout »

      Salmon Trout

      • Guest
      Re: Embedded for loop.
      « Reply #6 on: August 22, 2011, 08:54:52 AM »
      And here, shorn of bells and whistles is a Powershell script to do the same thing... see how far Windows scripting has come...

      It's a bit slower than the VBscript but look at the compact and clear and simple looking code!

      Code: [Select]
      $a=Get-content "text1.txt"
      $b=Get-content "text2.txt"
      $c=@()
      For ($i=0; $i -lt $a.length; $i++) {
      $c=$c+$a[$i]
      $c=$c+$b[$i]
      }
      $c | out-file "text3.txt" -encoding ascii

      Betty

        Topic Starter


        Rookie

        • Experience: Beginner
        • OS: Windows XP
        Re: Embedded for loop.
        « Reply #7 on: August 22, 2011, 05:33:22 PM »
        Salmon Trout - thank you again for all the work you've done, PowerShell looks interesting and I will be investigating it shortly.

        However, the profoundly dumb (banned) troll who persists in infesting this thread keeps posting code, in spite of moderator deletion, which only works with input files that have lines of that type ("This is line 1") etc.

        Now I don't know if he really is so stupid that he thinks that is an acceptable solution, or if he is just making mischief because of his mental illness. Whatever.

        I have received several PM's from member NatHeim who persists in using Findstr in the scripts he has sent me.   I have advised him that no lines of the two files will contain identical strings and asked for his permission to post his scripts on this forum to allow other members to comment.   Could he be the member to whom you refer in the above quote?  Your mention of mental illness makes me a little uneasy.

        To the world you are just one person,
        To one person you are the world.

        Salmon Trout

        • Guest
        Re: Embedded for loop.
        « Reply #8 on: August 22, 2011, 11:59:20 PM »
        The "script" you showed would prefix every line it read in with a number and a colon. These would appear in the output lines. Furthermore the output would start to get stupid with files having more than 10 lines.

        edited by Allan
        « Last Edit: August 23, 2011, 05:12:23 AM by Allan »

        patio

        • Moderator


        • Genius
        • Maud' Dib
        • Thanked: 1769
          • Yes
        • Experience: Beginner
        • OS: Windows 7
        Re: Embedded for loop.
        « Reply #9 on: August 25, 2011, 07:27:19 PM »
        Congrats betty... you have met BillRich...
        You can safely ignore all of his advice...and his PM's...

        Sorry 'bout your luck...but it happens.
        " Anyone who goes to a psychiatrist should have his head examined. "

        Betty

          Topic Starter


          Rookie

          • Experience: Beginner
          • OS: Windows XP
          Re: Embedded for loop.
          « Reply #10 on: August 26, 2011, 04:09:02 PM »
          Patio - thank you but who is BillRich?  The PM's I got and am still getting are from NatHeim.  Salmon Trout's latest post has me confused, I don't recall posting any script which will add a prefix to any line and  Allan has edited the post for some reason..

          I think I best remove myself from this 'funny farm' until the inmates settle down.
          To the world you are just one person,
          To one person you are the world.

          Salmon Trout

          • Guest
          Re: Embedded for loop.
          « Reply #11 on: August 26, 2011, 05:01:14 PM »
          About two years ago, a person whose screen name was "Bill Richardson" joined this forum. He pretty soon became a problem; annoying other users; posting nonsense "solutions" and then abusing people who pointed out their uselessness. In short he is a "troll". He was banned and his account was revoked. He re-registered as "BillRich" and that is the moniker by which we know him. Over the last couple of years he has been back countless times under different names. The moderators have got wise to him to a certain extent and usually delete his posts pretty promptly. Having become aware of this, he has resorted to sending his "solutions" by personal message (PM). You made a post into which you pasted some code which he (posting as "NatHeim")  had sent you in a PM. This was the code which added a prefix (a number and a colon) to each output line. It was useless nonsense.

          He is perhaps the biggest threat to Computerhope's credibility at this time. The "Microsoft DOS" section of the forum is gravely compromised by his apparent impunity in posting (and PMing) whenever he wants. If you ask me, I would only allow new members to PM after a certain number of posts. 25 ought to be enough for us to recognise Bill the troll. He isn't very good at disguising himself.

          Do you understand now?

          Betty

            Topic Starter


            Rookie

            • Experience: Beginner
            • OS: Windows XP
            Re: Embedded for loop.
            « Reply #12 on: August 26, 2011, 09:54:40 PM »
            Salmon Trout - thank you, that's clarifies the situation for me.
            To the world you are just one person,
            To one person you are the world.

            ghostdog74



              Specialist

              Thanked: 27
              Re: Embedded for loop.
              « Reply #13 on: August 26, 2011, 11:40:59 PM »
              @Betty, specialized tools for doing such kind of task already exists long time ago in the *nix domain. If you can use tools, download coreutils for windows. There is a tool called paste. So simply use that and your lines of code is just one line
              Code: [Select]
              paste file1 file2 >newfile

              Betty

                Topic Starter


                Rookie

                • Experience: Beginner
                • OS: Windows XP
                Re: Embedded for loop.
                « Reply #14 on: August 27, 2011, 03:36:08 AM »
                @Ghostdog74 - Thank you for your interest. 

                Paste appears not to do what I want in that it merges corresponding lines from each file, it does not appear to interleave the lines. 
                To the world you are just one person,
                To one person you are the world.

                Salmon Trout

                • Guest
                Re: Embedded for loop.
                « Reply #15 on: August 27, 2011, 03:40:26 AM »
                @Ghostdog74 - Thank you for your interest. 

                Paste appears not to do what I want in that it merges corresponding lines from each file, it does not appear to interleave the lines.

                yes, it does this

                (File 1)
                Line A
                Line B

                (File 2)
                Line 1
                Line 2

                Output

                Line A <TAB> Line 1
                Line B <TAB> Line 2

                You can select other delimiters than the dafault TAB but essentially that's what it does I think. If the delimiter could be CR/LF that might be a solution?



                ghostdog74



                  Specialist

                  Thanked: 27
                  Re: Embedded for loop.
                  « Reply #16 on: August 27, 2011, 04:15:50 AM »
                  Yes, salmon is right, if Betty wants to intersperse the lines,
                  Code: [Select]
                  paste -d "\n" fileA.txt fileB.txt
                  Otherwise, the default places the lines "side by side"

                  CN-DOS



                    Rookie

                    • Experience: Experienced
                    • OS: Windows Vista
                    Re: Embedded for loop.
                    « Reply #17 on: August 31, 2011, 05:20:18 AM »
                    Quote
                    Method 1 (straight code, no subroutine, process all lines of File2 for each line of File1) 83 seconds
                    Method 2 (subroutine, jump out when correct line number match reached) 51 seconds (so not quite 50%)
                    Method 3 (subroutine, use SKIP and jump out after reading 1 line)  19 seconds (a bit more than 50%)

                    Hi Salmon Trout,

                    Could you please help test the performace of this one:

                    Code: [Select]
                    @echo off
                    setlocal enabledelayedexpansion
                    (for /f "delims=" %%a in (file1.txt) do (
                        set /p _f=
                        echo,%%a
                        echo,!_f!
                    ))<file2.txt >file3.txt

                    Sidewinder



                      Guru

                      Thanked: 139
                    • Experience: Familiar
                    • OS: Windows 10
                    Re: Embedded for loop.
                    « Reply #18 on: August 31, 2011, 06:08:44 AM »
                    Quote
                    Both files contain the same number of lines.

                    I understand the OP specs, but what if the files are of unequal length? Batch code does not support arrays, however borrowing from REXX, you can mimic the stem.tails technique. I'm also a big fan of prompts for non-automation scripts. Makes them more generic.

                    This is an alternate approach, nothing more:

                    Code: [Select]
                    @echo off
                    setlocal enabledelayedexpansion

                    :file1
                      set /p file1=Enter File 1 Label:
                      if not exist %file1% goto file1
                     
                    :file2
                      set /p file2=Enter File 2 Label:
                      if not exist %file2% goto file2 

                    for /f "tokens=* delims=" %%y in (%file1%) do (
                      call set /a idx=%%idx%%+1
                      call set array.%%idx%%=%%y
                    )

                    set array.0=%idx%

                    for /f "tokens=* delims=" %%i in (%file2%) do (
                      call set /a index+=1
                      if !index! LEQ %array.0% (
                        call echo %%array.!index!%% >> c:\temp\merged.txt
                      )
                      call echo %%i >> c:\temp\merged.txt
                    )

                    if %index% LSS %array.0% (
                      for /l %%i in (%index%, 1, %array.0%) do (
                        echo !array.%%i! >> c:\temp\merged.txt
                      )
                    )

                    Powershell and VBScript can also be used with varying degrees of simplicity.

                     8)
                    The true sign of intelligence is not knowledge but imagination.

                    -- Albert Einstein

                    Salmon Trout

                    • Guest
                    Re: Embedded for loop.
                    « Reply #19 on: August 31, 2011, 10:02:56 AM »
                    Could you please help test the performace of this one:

                    What performance result did you get?


                    CN-DOS



                      Rookie

                      • Experience: Experienced
                      • OS: Windows Vista
                      Re: Embedded for loop.
                      « Reply #20 on: August 31, 2011, 10:08:10 PM »
                      I used timeit.exe to count the time, and 1000 lines for each file.

                      Method 3 of Salmon Trout:
                      Elapsed Time:     0:00:06.364
                      Process Time:     0:00:04.368

                      Method of CN-DOS:
                      Elapsed Time:     0:00:00.468
                      Process Time:     0:00:00.265

                      BTW, the PMs from NatHeim are realy boring. Is it possible for moderator to disable him to use PM? Or does this forum have black list function, so I can put him in it?

                      Salmon Trout

                      • Guest
                      Re: Embedded for loop.
                      « Reply #21 on: September 01, 2011, 12:03:22 AM »
                      Method of CN-DOS 0.38 seconds elapsed time.

                      In your profile you have an Ignore list