Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.  (Read 15734 times)

0 Members and 1 Guest are viewing this topic.

TomTheCabinBoy

    Topic Starter


    Greenhorn

    • Experience: Experienced
    • OS: Windows XP
    I'm using 'FOR /F' to change every occurrence of 'X' to 'Y' in every line of a text file.

    So far so good, but problems occur if the text file contains exclamation marks [!]. Exclamation marks, and any text between them are lost.

    My input file [Test-myfile.txt] contains...

    Line 01 containing A but not X!
    Line 02 containing A but not X!Watch this space!X
    Line 03 containing X but not Y! and another xylophone.
    Line 04 containing X and Y and another x and a word like axiom.
    Line 05 containing Y but not X;
    Line 06 containing Xxx but not Yyy:
    Line 07 containing B and Z and Y~
    Line 08 containing Y but not X#
    Line 09 containing X and Y and another xylophone

    My MS-DOS batch code [Test_Change_CharactersInStrings_110508.bat] is as follows...

    @ECHO OFF
    COLOR F5

    REM   ====================================================
    REM   Code to change every occurrence of 'X' to 'Y' --
    REM   in every line of 'Test-myfile.txt'...
    REM   ====================================================
    REM   Need to check why exclamation marks [!], and any text between them are lost???
    REM   ====================================================

       @ECHO OFF > newfile.txt
       SETLOCAL EnableDelayedExpansion
       FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO (
    REM      PAUSE
          ECHO a[%%a]
    REM      PAUSE
          ECHO b[%%b]
          ECHO c[%%c]
          ECHO d[%%d]
          ECHO e[%%e]
          ECHO f[%%f]
          ECHO g[%%g]
          ECHO h[%%h]
          ECHO i[%%i]
          ECHO j[%%j]
          ECHO k[%%k]
          ECHO l[%%l]
          ECHO m[%%m]
          ECHO n[%%n]
          ECHO o[%%o]
    REM      SET LINE_TEXT=%%a
          SET LINE_TEXT=%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o
          ECHO Original line...!LINE_TEXT!
          SET LINE_TEXT=!LINE_TEXT:X=Y!
          ECHO Modified line...!LINE_TEXT!
          ECHO !LINE_TEXT! >> newfile.txt
          )
       ECHO Done!
       SETLOCAL DisableDelayedExpansion
       ECHO Last modified line...%LINE_TEXT%
       PAUSE
       EXIT

    After running the code, my output file [newfile.txt] contains...

    Line 01 containing A but not Y         
    Line 02 containing A but not YY       
    Line 03 containing Y but not Y and another Yylophone.     
    Line 04 containing Y and Y and another Y and a word like aYiom. 
    Line 05 containing Y but not Y;         
    Line 06 containing YYY but not Yyy:         
    Line 07 containing B and Z and Y~       
    Line 08 containing Y but not Y#         
    Line 09 containing Y and Y and another Yylophone

    Any ideas of how I can prevent exclamation marks, and any text between them, from being lost?

    Salmon Trout

    • Guest
    The string variable for each line is not being created correctly because with delayed expansion enabled, exclamation marks are variable delimiters and thus are ignored along with everything between them. Batch scripts get terribly fiddly when special characters ("poison characters" are embedded in strings to be processed. The worst is the ampersand but parentheses and percent signs are among the others.

    What I did below is to take the set of tokens spat out by FOR /F and create a string enclosed in quotes so I could pass it as one single parameter to a CALLed subroutine where the tilde variable modifier strips off the quotes and then the X=Y substitution takes place, followed by the output to the file. No delayed expansion needed.

    Consider "CALL :replace" to be equivalent to "GOSUB replace" in BASIC, and "GOTO :eof" to be equivalent to "RETURN" in BASIC. The colons are obligatory, not optional like they are when using GOTO with labels.

    batch script
    Code: [Select]
    @ECHO OFF
    REM   ====================================================
    REM   Code to change every occurrence of 'X' to 'Y' --
    REM   in every line of 'Test-myfile.txt'...
    REM   ====================================================
    @ECHO OFF > newfile.txt
    FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO (
        call :replace "%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o"
       )
    ECHO Done!
    pause
    exit
       
    :replace
    set string1=%~1
    set string2=%string1:X=Y%
    echo %string2% >> newfile.txt
    goto :eof

    newfile.txt
    Code: [Select]
    Line 01 containing A but not Y!         
    Line 02 containing A but not Y!Watch this space!Y       
    Line 03 containing Y but not Y! and another Yylophone.     
    Line 04 containing Y and Y and another Y and a word like aYiom. 
    Line 05 containing Y but not Y;         
    Line 06 containing YYY but not Yyy:         
    Line 07 containing B and Z and Y~       
    Line 08 containing Y but not Y#         
    Line 09 containing Y and Y and another Yylophone       

    TomTheCabinBoy

      Topic Starter


      Greenhorn

      • Experience: Experienced
      • OS: Windows XP
      ST, thank you very much for providing a solution to my problem -- I had unsuccessfully searched several websites looking for information on it before you came up with the goods.

      One further point about my example is that I notice that the character substitution changes 'X' or 'x' (upper and lower case) to 'Y' (the resulting character is the case as specified). Is there a way to preserve the original case of the target character? (i.e. 'X' becomes 'Y', and 'x' becomes 'y'). Thanks in anticipation.


      TomTheCabinBoy

        Topic Starter


        Greenhorn

        • Experience: Experienced
        • OS: Windows XP
        Incidently, ST, I tried to thank you by clicking the 'Thank Salmon Trout' button but got...

        "An Error Has Occurred! Sorry, you can't repeat a karma action without waiting 2 hours."

        Don't know what that's all about!

        Salmon Trout

        • Guest
        Give this a try

        Code: [Select]
        @ECHO OFF
        REM   ====================================================
        REM   Code to change every occurrence of 'X' to 'Y' --
        REM   in every line of 'Test-myfile.txt'...
        REM   ====================================================
        @ECHO OFF > newfile.txt
        FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO (
            call :replace "%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o"
           )
        ECHO Done!
        pause
        exit
           
        :replace
        set string1=%~1
        set string2=
        set j=0
        :Loop
        call set inchar=%%string1:~%j%,1%%
        if "%inchar%"=="" goto ExitLoop
        set outchar=%inchar%
        IF "%inchar%"=="X" set outchar=Y
        IF "%inchar%"=="x" set outchar=y
        set "string2=%string2%%outchar%"
        set /a j=%j%+1
        goto Loop
        :ExitLoop
        echo %string2% >> newfile.txt
        goto :eof
           

        Salmon Trout

        • Guest
        Alternative in which a  one-line VBScript is created, used and destroyed; it will run slightly more slowly. Note that FOR loop metavariables are case sensitive, so %%A will not interfere with %%a. Note also that the first FOR loop can be one line.

        Code: [Select]
        @ECHO OFF
        REM   ====================================================
        REM   Code to change every occurrence of 'X' to 'Y' --
        REM   in every line of 'Test-myfile.txt'...
        REM   ====================================================
        echo wscript.echo Replace(wscript.arguments(0), wscript.arguments(1), wscript.arguments(2))>Srep.vbs
        @ECHO OFF > newfile.txt
        FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO call :replace "%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o"
        ECHO Done!
        del Srep.vbs>nul
        pause
        exit
           
        :replace
        set string2=%~1
        for /f "delims=" %%A in ( ' cscript //nologo Srep.vbs "%string2%" "X" "Y" ' ) do set string2=%%A
        for /f "delims=" %%A in ( ' cscript //nologo Srep.vbs "%string2%" "x" "y" ' ) do set string2=%%A
        echo %string2% >> newfile.txt
        goto :eof 
           
        « Last Edit: May 10, 2011, 11:16:26 AM by Salmon Trout »

        TomTheCabinBoy

          Topic Starter


          Greenhorn

          • Experience: Experienced
          • OS: Windows XP
          ST, thanks again for the 2 solutions which you provided -- both of which preserve case sensitivity.

          Am I right in assuming that the use of double-quotes in the [SET "string2=%string2%%outchar%"] statement -- in the non-VBS version -- is a way of ensuring that spaces, currently at the end of the string being processed, are not lost?

          Salmon Trout

          • Guest
          ST, thanks again for the 2 solutions which you provided -- both of which preserve case sensitivity.

          Am I right in assuming that the use of double-quotes in the [SET "string2=%string2%%outchar%"] statement -- in the non-VBS version -- is a way of ensuring that spaces, currently at the end of the string being processed, are not lost?

          Yes, that is correct. The SET command used without quotes ignores trailing spaces, so to be sure that, if %outchar% should happen to be a space, it is added, the quotes are used as you saw.

          Salmon Trout

          • Guest
          [Update] I tested the batch without quotes in that line, and it seems to work just the same. I think I just assumed you needed them.

          TomTheCabinBoy

            Topic Starter


            Greenhorn

            • Experience: Experienced
            • OS: Windows XP
            ST, you warned me, in your initial reply, that "poison characters" embedded in text strings can cause problems when processing them in batch scripts. Sure enough, when my test text file was modified to contain ampersands [&], percent signs [%], and double quotes ["], in addition to the dreaded exclamation marks [!], problems were encountered. Sorry to say, all 3 of your solutions -- which worked fine for exclamation marks [!] -- failed when processing the aforementioned additional rogue characters.

            I think you might be interested in a solution, I have developed, which seems to cope with all of the above "poison characters". It is based on some of your conversion code (but after many experiments with combinations of enabling/disabling delayed expansion, and in or out of sub-routines). I have commented the script as a reminder of how different methods work, or don't work, in certain situations.

            My input file [Test-myfile.txt] now contains...

            Code: [Select]
            Line 01 containing x but not X!
            Line 02 containing A but not X!Watch this space!x
            Line 03 containing X but not Y! and another xylophone.
            Line 3A containing X and 1st percent % sign, 2nd percent % sign, and 3rd percent % sign.
            Line 3B containing X and just 2 percent % signs. (Here's the 2nd percent % sign).
            Line 3C containing X and 1st percent ^% sign, 2nd percent ^% sign, and 3rd percent ^% sign.
            Line 3D containing X and just 2 percent ^% signs. (Here's the 2nd percent ^% sign).
            Line 04 containing X and Y and another x and a word like axiom.
            Line 4A containing x and an ampersand & and another & followed by one more & and a percent % sign.
            Line 4B containing x and an ampersand & on its own.
            Line 4C containing 2 ampersands -- this one & and this one & as well.
            Line 05 containing Y but not x; also an upper-case 'X' in single quotes
            Line 5A containing W but not X; also a lower-case "x" in double quotes
            Line 06 containing Xxx (but not Yyy):
            Line 07 containing B and Z and Y~
            Line 08 containing Y but not X#
            Line 09 containing X and Y and another Xylophone

            My MS-DOS batch code is now as follows...

            Code: [Select]
            @ECHO OFF
            REM ====================================================
            REM Code to change every occurrence of 'X' to 'Y' --
            REM in every line of 'Test-myfile.txt' -- without losing --
            REM 'special characters'. e.g. [!][%][&][#][~]["]['].
            REM Preserving upper/lower case is selectable.
            REM ====================================================

            REM Display/obtain case preservation options/choice...
            ECHO.
            ECHO When changing 'X' to 'Y' should the original case be preserved?
            ECHO.
            ECHO   [0] Case-preservation is not important. (Default - Quicker)
            ECHO   [1] Preserve original case. (Slower)
            ECHO.
            SET /P CASE_CHOICE=Enter option...
            ECHO.
            IF "%CASE_CHOICE%" EQU "1" ECHO Original case will be preserved. (Slower option)
            IF "%CASE_CHOICE%" NEQ "1" ECHO Original case may not be preserved. (Quicker option)
            ECHO.
            PAUSE

            REM Initialise a count for lines processed.
            SET LINES_COUNT=0

            REM Create an empty Output File.
            @ECHO OFF > newfile.txt

            REM One at a time, read each complete line of the Input File into a single string.
            FOR /F "tokens=1* delims=" %%a in (Test-myfile.txt) DO (
            REM ECHO Original line..."%%a"
            REM At this point, complete string is intact.

            REM Copy the complete intact string --
            REM enclosing in quotes to protect special characters (most).
            SET string1="%%a"
            REM At this point, string1 contains the complete intact quoted string --
            REM but appears to be empty/undefined if echoed.

            REM For each line of the Input File, string processing requires 'EnableDelayedExpansion' --
            REM which can only be executed a maximum of 16 times before reaching the --
            REM 'Maximum setlocal recursion level' (despite also executing 'DisableDelayedExpansion').
            REM Also, within 'EnableDelayedExpansion' %%a will lose exclamation marks --
            REM fortunately we no longer need %%a (for the current line) as it has been copied to string1.
            REM For these reasons, the main string processing is done in a subroutine...
            CALL :REPLACE_CHARS
            )

            ENDLOCAL
            ECHO Done!
            PAUSE
            EXIT

            :REPLACE_CHARS

            REM Increment lines processed count...
            SET /A LINES_COUNT=%LINES_COUNT%+1%

            REM Enable delayed environment variable expansion --
            REM so that the value of certain variables can be dynamically redefined at run time --
            REM using !Variable! instead of %Variable% (or a combination)...
            SETLOCAL EnableDelayedExpansion

            REM Display progress...
            ECHO Original Line !LINES_COUNT! ...!string1!...
            REM At this point, string1 still contains the complete intact quoted string.

            REM Ascertain whether preservation of original case is required...
            IF "!CASE_CHOICE!" EQU "1" GOTO PRESERVE_CASE

            REM Original case is not required to be preserved...

            REM Convert all 'X's in the string (line) to 'Y's --
            REM upper and lower-case 'X's will be converted to upper-case 'Y's.
            SET string2=!string1:X=Y!
            GOTO WRITE_LINE

            REM Original case must be preserved...

            :PRESERVE_CASE

            REM Initialise an index for the current character position --
            REM and an empty string to construct the converted line...
            set j=0
            set string2=

            :Loop

            REM One at a time, isolate each individual character in the string (line)...
            set inchar=!string1:~%j%,1%!
            IF "!inchar!END"=="END" GOTO :ExitLoop

            REM Copy the current character for passing to the modified string (line)...
            SET outchar=!inchar!

            REM Convert 'X's to 'Y's -- preserving the original case...
            IF "!inchar!"=="X" set outchar=Y
            IF "!inchar!"=="x" set outchar=y

            REM Construct the new string (line) by concatenating the current/modified character...
            SET "string2=!string2!!outchar%!"

            REM Increment the index for the next character position, and repeat...
            SET /A j=!j!+1%
            GOTO :Loop
            :ExitLoop

            :WRITE_LINE

            REM Strip-off the leading/trailing quotes from the processed string (line) --
            REM then write it to the Output File -- ensuring no spaces are added at end of line.
            ECHO !string2:~1,-1!>> newfile.txt

            REM Display progress...
            ECHO Modified Line !LINES_COUNT! ...!string2!...

            REM Disable delayed environment variable expansion.
            SETLOCAL DisableDelayedExpansion

            REM Return whence we came.
            GOTO :EOF

            Salmon Trout

            • Guest
            Quote
            REM      For each line of the Input File, string processing requires 'EnableDelayedExpansion' --
            REM      which can only be executed a maximum of 16 times before reaching the --
            REM      'Maximum setlocal recursion level' (despite also executing 'DisableDelayedExpansion').

            I'm not sure what you mean by this - you only enable delayed expansion once, at any point before the loop or other structure. You don't have to re-enable it for each line that is read from a file!




            Salmon Trout

            • Guest
            Isn't this better? Sooner or later if you are doing serious textfile processing you are going to need to look at something else. Visual Basic Script is present in every Windows installation these days.

            Save as a .vbs file and run with

            Cscript //nologo yourname.vbs


            Code: [Select]
            Set fso = CreateObject("Scripting.FileSystemObject")
            Const ForReading = 1, ForWriting = 2, ForAppending = 8
            Const FormatSystemDefault = -2, FormatUnicode = -1, FormatASCII = 0
            ReadfileName ="Input.txt"
            WriteFileName="Output.txt"
            Wscript.echo "Read file..."
            Set InputFile =  fso.openTextFile (ReadFileName,  ForReading, FormatASCII)
            Set OutputFile = fso.openTextFile (WriteFileName, ForWriting, FormatASCII)
            Do While Not InputFile.AtEndOfStream
                InputLine = InputFile.readline
                Wscript.Echo "Input  " & InputLine
                TempLine  = InputLine
                TempLine  = Replace(TempLine, "X", "Y")
                TempLine  = Replace(TempLine, "x", "y")
                OutputLine = TempLine
                Wscript.Echo "Output " & OutputLine
                OutputFile.WriteLine(OutputLine)
            Loop
            InputFile.Close   
            Outputfile.Close
            Set fso = Nothing
            Set Shell = Nothing