Computer Hope

Microsoft => Microsoft DOS => Topic started by: TomTheCabinBoy on May 08, 2011, 11:55:30 AM

Title: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: TomTheCabinBoy on May 08, 2011, 11:55:30 AM
I'm using 'FOR /F' to change every occurrence of 'X' to 'Y' in every line of a text file.

So far so good, but problems occur if the text file contains exclamation marks [!]. Exclamation marks, and any text between them are lost.

My input file [Test-myfile.txt] contains...

Line 01 containing A but not X!
Line 02 containing A but not X!Watch this space!X
Line 03 containing X but not Y! and another xylophone.
Line 04 containing X and Y and another x and a word like axiom.
Line 05 containing Y but not X;
Line 06 containing Xxx but not Yyy:
Line 07 containing B and Z and Y~
Line 08 containing Y but not X#
Line 09 containing X and Y and another xylophone

My MS-DOS batch code [Test_Change_CharactersInStrings_110508.bat] is as follows...

@ECHO OFF
COLOR F5

REM   ====================================================
REM   Code to change every occurrence of 'X' to 'Y' --
REM   in every line of 'Test-myfile.txt'...
REM   ====================================================
REM   Need to check why exclamation marks [!], and any text between them are lost???
REM   ====================================================

   @ECHO OFF > newfile.txt
   SETLOCAL EnableDelayedExpansion
   FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO (
REM      PAUSE
      ECHO a[%%a]
REM      PAUSE
      ECHO b[%%b]
      ECHO c[%%c]
      ECHO d[%%d]
      ECHO e[%%e]
      ECHO f[%%f]
      ECHO g[%%g]
      ECHO h[%%h]
      ECHO i[%%i]
      ECHO j[%%j]
      ECHO k[%%k]
      ECHO l[%%l]
      ECHO m[%%m]
      ECHO n[%%n]
      ECHO o[%%o]
REM      SET LINE_TEXT=%%a
      SET LINE_TEXT=%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o
      ECHO Original line...!LINE_TEXT!
      SET LINE_TEXT=!LINE_TEXT:X=Y!
      ECHO Modified line...!LINE_TEXT!
      ECHO !LINE_TEXT! >> newfile.txt
      )
   ECHO Done!
   SETLOCAL DisableDelayedExpansion
   ECHO Last modified line...%LINE_TEXT%
   PAUSE
   EXIT

After running the code, my output file [newfile.txt] contains...

Line 01 containing A but not Y         
Line 02 containing A but not YY       
Line 03 containing Y but not Y and another Yylophone.     
Line 04 containing Y and Y and another Y and a word like aYiom. 
Line 05 containing Y but not Y;         
Line 06 containing YYY but not Yyy:         
Line 07 containing B and Z and Y~       
Line 08 containing Y but not Y#         
Line 09 containing Y and Y and another Yylophone

Any ideas of how I can prevent exclamation marks, and any text between them, from being lost?
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on May 08, 2011, 01:58:14 PM
The string variable for each line is not being created correctly because with delayed expansion enabled, exclamation marks are variable delimiters and thus are ignored along with everything between them. Batch scripts get terribly fiddly when special characters ("poison characters" are embedded in strings to be processed. The worst is the ampersand but parentheses and percent signs are among the others.

What I did below is to take the set of tokens spat out by FOR /F and create a string enclosed in quotes so I could pass it as one single parameter to a CALLed subroutine where the tilde variable modifier strips off the quotes and then the X=Y substitution takes place, followed by the output to the file. No delayed expansion needed.

Consider "CALL :replace" to be equivalent to "GOSUB replace" in BASIC, and "GOTO :eof" to be equivalent to "RETURN" in BASIC. The colons are obligatory, not optional like they are when using GOTO with labels.

batch script
Code: [Select]
@ECHO OFF
REM   ====================================================
REM   Code to change every occurrence of 'X' to 'Y' --
REM   in every line of 'Test-myfile.txt'...
REM   ====================================================
@ECHO OFF > newfile.txt
FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO (
    call :replace "%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o"
   )
ECHO Done!
pause
exit
   
:replace
set string1=%~1
set string2=%string1:X=Y%
echo %string2% >> newfile.txt
goto :eof

newfile.txt
Code: [Select]
Line 01 containing A but not Y!         
Line 02 containing A but not Y!Watch this space!Y       
Line 03 containing Y but not Y! and another Yylophone.     
Line 04 containing Y and Y and another Y and a word like aYiom. 
Line 05 containing Y but not Y;         
Line 06 containing YYY but not Yyy:         
Line 07 containing B and Z and Y~       
Line 08 containing Y but not Y#         
Line 09 containing Y and Y and another Yylophone       
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: TomTheCabinBoy on May 10, 2011, 07:37:06 AM
ST, thank you very much for providing a solution to my problem -- I had unsuccessfully searched several websites looking for information on it before you came up with the goods.

One further point about my example is that I notice that the character substitution changes 'X' or 'x' (upper and lower case) to 'Y' (the resulting character is the case as specified). Is there a way to preserve the original case of the target character? (i.e. 'X' becomes 'Y', and 'x' becomes 'y'). Thanks in anticipation.

Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: TomTheCabinBoy on May 10, 2011, 07:44:07 AM
Incidently, ST, I tried to thank you by clicking the 'Thank Salmon Trout' button but got...

"An Error Has Occurred! Sorry, you can't repeat a karma action without waiting 2 hours."

Don't know what that's all about!
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on May 10, 2011, 09:57:52 AM
Give this a try

Code: [Select]
@ECHO OFF
REM   ====================================================
REM   Code to change every occurrence of 'X' to 'Y' --
REM   in every line of 'Test-myfile.txt'...
REM   ====================================================
@ECHO OFF > newfile.txt
FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO (
    call :replace "%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o"
   )
ECHO Done!
pause
exit
   
:replace
set string1=%~1
set string2=
set j=0
:Loop
call set inchar=%%string1:~%j%,1%%
if "%inchar%"=="" goto ExitLoop
set outchar=%inchar%
IF "%inchar%"=="X" set outchar=Y
IF "%inchar%"=="x" set outchar=y
set "string2=%string2%%outchar%"
set /a j=%j%+1
goto Loop
:ExitLoop
echo %string2% >> newfile.txt
goto :eof
   
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on May 10, 2011, 10:52:06 AM
Alternative in which a  one-line VBScript is created, used and destroyed; it will run slightly more slowly. Note that FOR loop metavariables are case sensitive, so %%A will not interfere with %%a. Note also that the first FOR loop can be one line.

Code: [Select]
@ECHO OFF
REM   ====================================================
REM   Code to change every occurrence of 'X' to 'Y' --
REM   in every line of 'Test-myfile.txt'...
REM   ====================================================
echo wscript.echo Replace(wscript.arguments(0), wscript.arguments(1), wscript.arguments(2))>Srep.vbs
@ECHO OFF > newfile.txt
FOR /F "tokens=1,2,3,4,5,6,7-15* delims= " %%a in (Test-myfile.txt) DO call :replace "%%a %%b %%c %%d %%e %%f %%g %%h %%i %%j %%k %%l %%m %%n %%o"
ECHO Done!
del Srep.vbs>nul
pause
exit
   
:replace
set string2=%~1
for /f "delims=" %%A in ( ' cscript //nologo Srep.vbs "%string2%" "X" "Y" ' ) do set string2=%%A
for /f "delims=" %%A in ( ' cscript //nologo Srep.vbs "%string2%" "x" "y" ' ) do set string2=%%A
echo %string2% >> newfile.txt
goto :eof 
   
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: TomTheCabinBoy on May 11, 2011, 04:59:14 AM
ST, thanks again for the 2 solutions which you provided -- both of which preserve case sensitivity.

Am I right in assuming that the use of double-quotes in the [SET "string2=%string2%%outchar%"] statement -- in the non-VBS version -- is a way of ensuring that spaces, currently at the end of the string being processed, are not lost?
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on May 11, 2011, 10:07:48 AM
ST, thanks again for the 2 solutions which you provided -- both of which preserve case sensitivity.

Am I right in assuming that the use of double-quotes in the [SET "string2=%string2%%outchar%"] statement -- in the non-VBS version -- is a way of ensuring that spaces, currently at the end of the string being processed, are not lost?

Yes, that is correct. The SET command used without quotes ignores trailing spaces, so to be sure that, if %outchar% should happen to be a space, it is added, the quotes are used as you saw.
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on May 11, 2011, 11:37:03 AM
[Update] I tested the batch without quotes in that line, and it seems to work just the same. I think I just assumed you needed them.
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: TomTheCabinBoy on June 01, 2011, 07:16:02 AM
ST, you warned me, in your initial reply, that "poison characters" embedded in text strings can cause problems when processing them in batch scripts. Sure enough, when my test text file was modified to contain ampersands [&], percent signs [%], and double quotes ["], in addition to the dreaded exclamation marks [!], problems were encountered. Sorry to say, all 3 of your solutions -- which worked fine for exclamation marks [!] -- failed when processing the aforementioned additional rogue characters.

I think you might be interested in a solution, I have developed, which seems to cope with all of the above "poison characters". It is based on some of your conversion code (but after many experiments with combinations of enabling/disabling delayed expansion, and in or out of sub-routines). I have commented the script as a reminder of how different methods work, or don't work, in certain situations.

My input file [Test-myfile.txt] now contains...

Code: [Select]
Line 01 containing x but not X!
Line 02 containing A but not X!Watch this space!x
Line 03 containing X but not Y! and another xylophone.
Line 3A containing X and 1st percent % sign, 2nd percent % sign, and 3rd percent % sign.
Line 3B containing X and just 2 percent % signs. (Here's the 2nd percent % sign).
Line 3C containing X and 1st percent ^% sign, 2nd percent ^% sign, and 3rd percent ^% sign.
Line 3D containing X and just 2 percent ^% signs. (Here's the 2nd percent ^% sign).
Line 04 containing X and Y and another x and a word like axiom.
Line 4A containing x and an ampersand & and another & followed by one more & and a percent % sign.
Line 4B containing x and an ampersand & on its own.
Line 4C containing 2 ampersands -- this one & and this one & as well.
Line 05 containing Y but not x; also an upper-case 'X' in single quotes
Line 5A containing W but not X; also a lower-case "x" in double quotes
Line 06 containing Xxx (but not Yyy):
Line 07 containing B and Z and Y~
Line 08 containing Y but not X#
Line 09 containing X and Y and another Xylophone

My MS-DOS batch code is now as follows...

Code: [Select]
@ECHO OFF
REM ====================================================
REM Code to change every occurrence of 'X' to 'Y' --
REM in every line of 'Test-myfile.txt' -- without losing --
REM 'special characters'. e.g. [!][%][&][#][~]["]['].
REM Preserving upper/lower case is selectable.
REM ====================================================

REM Display/obtain case preservation options/choice...
ECHO.
ECHO When changing 'X' to 'Y' should the original case be preserved?
ECHO.
ECHO   [0] Case-preservation is not important. (Default - Quicker)
ECHO   [1] Preserve original case. (Slower)
ECHO.
SET /P CASE_CHOICE=Enter option...
ECHO.
IF "%CASE_CHOICE%" EQU "1" ECHO Original case will be preserved. (Slower option)
IF "%CASE_CHOICE%" NEQ "1" ECHO Original case may not be preserved. (Quicker option)
ECHO.
PAUSE

REM Initialise a count for lines processed.
SET LINES_COUNT=0

REM Create an empty Output File.
@ECHO OFF > newfile.txt

REM One at a time, read each complete line of the Input File into a single string.
FOR /F "tokens=1* delims=" %%a in (Test-myfile.txt) DO (
REM ECHO Original line..."%%a"
REM At this point, complete string is intact.

REM Copy the complete intact string --
REM enclosing in quotes to protect special characters (most).
SET string1="%%a"
REM At this point, string1 contains the complete intact quoted string --
REM but appears to be empty/undefined if echoed.

REM For each line of the Input File, string processing requires 'EnableDelayedExpansion' --
REM which can only be executed a maximum of 16 times before reaching the --
REM 'Maximum setlocal recursion level' (despite also executing 'DisableDelayedExpansion').
REM Also, within 'EnableDelayedExpansion' %%a will lose exclamation marks --
REM fortunately we no longer need %%a (for the current line) as it has been copied to string1.
REM For these reasons, the main string processing is done in a subroutine...
CALL :REPLACE_CHARS
)

ENDLOCAL
ECHO Done!
PAUSE
EXIT

:REPLACE_CHARS

REM Increment lines processed count...
SET /A LINES_COUNT=%LINES_COUNT%+1%

REM Enable delayed environment variable expansion --
REM so that the value of certain variables can be dynamically redefined at run time --
REM using !Variable! instead of %Variable% (or a combination)...
SETLOCAL EnableDelayedExpansion

REM Display progress...
ECHO Original Line !LINES_COUNT! ...!string1!...
REM At this point, string1 still contains the complete intact quoted string.

REM Ascertain whether preservation of original case is required...
IF "!CASE_CHOICE!" EQU "1" GOTO PRESERVE_CASE

REM Original case is not required to be preserved...

REM Convert all 'X's in the string (line) to 'Y's --
REM upper and lower-case 'X's will be converted to upper-case 'Y's.
SET string2=!string1:X=Y!
GOTO WRITE_LINE

REM Original case must be preserved...

:PRESERVE_CASE

REM Initialise an index for the current character position --
REM and an empty string to construct the converted line...
set j=0
set string2=

:Loop

REM One at a time, isolate each individual character in the string (line)...
set inchar=!string1:~%j%,1%!
IF "!inchar!END"=="END" GOTO :ExitLoop

REM Copy the current character for passing to the modified string (line)...
SET outchar=!inchar!

REM Convert 'X's to 'Y's -- preserving the original case...
IF "!inchar!"=="X" set outchar=Y
IF "!inchar!"=="x" set outchar=y

REM Construct the new string (line) by concatenating the current/modified character...
SET "string2=!string2!!outchar%!"

REM Increment the index for the next character position, and repeat...
SET /A j=!j!+1%
GOTO :Loop
:ExitLoop

:WRITE_LINE

REM Strip-off the leading/trailing quotes from the processed string (line) --
REM then write it to the Output File -- ensuring no spaces are added at end of line.
ECHO !string2:~1,-1!>> newfile.txt

REM Display progress...
ECHO Modified Line !LINES_COUNT! ...!string2!...

REM Disable delayed environment variable expansion.
SETLOCAL DisableDelayedExpansion

REM Return whence we came.
GOTO :EOF
Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on June 01, 2011, 10:46:56 AM
Quote
REM      For each line of the Input File, string processing requires 'EnableDelayedExpansion' --
REM      which can only be executed a maximum of 16 times before reaching the --
REM      'Maximum setlocal recursion level' (despite also executing 'DisableDelayedExpansion').

I'm not sure what you mean by this - you only enable delayed expansion once, at any point before the loop or other structure. You don't have to re-enable it for each line that is read from a file!



Title: Re: Exclamation Marks [!] in DOS text files cause 'FOR /F' processing problems.
Post by: Salmon Trout on June 01, 2011, 01:50:32 PM
Isn't this better? Sooner or later if you are doing serious textfile processing you are going to need to look at something else. Visual Basic Script is present in every Windows installation these days.

Save as a .vbs file and run with

Cscript //nologo yourname.vbs


Code: [Select]
Set fso = CreateObject("Scripting.FileSystemObject")
Const ForReading = 1, ForWriting = 2, ForAppending = 8
Const FormatSystemDefault = -2, FormatUnicode = -1, FormatASCII = 0
ReadfileName ="Input.txt"
WriteFileName="Output.txt"
Wscript.echo "Read file..."
Set InputFile =  fso.openTextFile (ReadFileName,  ForReading, FormatASCII)
Set OutputFile = fso.openTextFile (WriteFileName, ForWriting, FormatASCII)
Do While Not InputFile.AtEndOfStream
    InputLine = InputFile.readline
    Wscript.Echo "Input  " & InputLine
    TempLine  = InputLine
    TempLine  = Replace(TempLine, "X", "Y")
    TempLine  = Replace(TempLine, "x", "y")
    OutputLine = TempLine
    Wscript.Echo "Output " & OutputLine
    OutputFile.WriteLine(OutputLine)
Loop
InputFile.Close   
Outputfile.Close
Set fso = Nothing
Set Shell = Nothing