Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: MSDOS Bat File program to Copy the Next Line if Findstr Line is True  (Read 12077 times)

0 Members and 1 Guest are viewing this topic.

Squashman



    Specialist
  • Thanked: 134
  • Experience: Experienced
  • OS: Other
Re: MSDOS Bat File program to Copy the Next Line if Findstr Line is True
« Reply #15 on: December 28, 2013, 09:06:50 PM »
you are using type + for loop? or other methods? how about timing? what are you using to time the script?
Windows Resource kit comes with a program called TIMEIT.  Works quite well.

BC_Programmer


    Mastermind
  • Typing is no substitute for thinking.
  • Thanked: 1140
    • Yes
    • Yes
    • BC-Programming.com
  • Certifications: List
  • Computer: Specs
  • Experience: Beginner
  • OS: Windows 11
Re: MSDOS Bat File program to Copy the Next Line if Findstr Line is True
« Reply #16 on: December 28, 2013, 10:26:22 PM »
its almost a fact that lower level (in the likes of C/C++ ) and how its coded (algorithm) to read big files play a part in performance.

The latter far more so than the former, even in this case.

C# is compiled to IL (Intermediate Language) and is subsequently run on the .NET CLR.

I Created a 8-million and 1 line file consisting of "this is an example line of text, line #<num>" where <num> was of course the current iteration, starting from 0. The resulting size of the file was 382,888,890 Bytes.

A C# Program that simply reads it in and counts lines:

Code: [Select]
    class Program
    {
        static void Main(string[] args)
        {
            Stopwatch watch = new Stopwatch();
            watch.Start();
            int linecount = 0;
            using(StreamReader sw = new StreamReader("D:\\testoutput.txt"))
            {
                while (!sw.EndOfStream)
                {
                    String currentline = sw.ReadLine();
                    linecount++;
                }
            }
            watch.Stop();
            Console.WriteLine("Finished. Total Time:" + watch.Elapsed.ToString() + ", read " + linecount + " Lines.");
            Console.ReadKey();
        }
    }


Output:

Code: [Select]
Finished. Total Time:00:00:07.8800219, read 8000000 Lines.

About 8 seconds to process the entire file.

My VBScript is a bit rusty but I came up with this:

Code: [Select]
Dim FSO
Dim TStream
Dim StringRead,CurrentLine
Dim StartTime,EndTime
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TStream = FSO.OpenTextFile("D:\testoutput.txt")
StartTime = Timer
Do While Not TStream.AtEndOfStream
   StringRead = TStream.ReadLine()
   CurrentLine = CurrentLine + 1
Loop
TStream.Close()
EndTime = Timer
WScript.echo EndTime-StartTime
Which should be functionally similar. It gave me this back:

Code: [Select]
47.14844

So the first thought would be that this extra time must be because VBScript is interpreted.

However, I'm not entirely certain this is the case. And this suspicion is proven on some level by inserting the same code into a Visual Basic 6 project. Visual Basic 6 supports compiling to Native code. Doing so yields a time of 55 seconds- almost 10 seconds slower than VBScript. Interestingly, having it compile to P-code, (an intermediate language of sorts) resulting in the program finishing a few seconds faster (53.2 seconds).

For VBScript, all Variables use a 'Variant' Data type. This effectively means that any access or assignment to a variable needs to package and unpackage a OLE_VARIANT structure (internally, of course). Additionally, VBScript is Late-bound, which means that it's access to COM objects (such as the File System Object) are all performed using IDispatch. suffice it to say that this is much slower than an Early Bound call; and pretty much means it has to lookup the method name each time it's used. In this case, that's a problem since there is both Variable access (incrementing the line count) as well as late-bound Method calls (both the termination expression as well as the actual ReadLine() method call) being done within the loop body.

Within Visual Basic 6, I made two changes- I referenced the Scripting Runtime (allowing Early Bound calls), and made all variables strongly typed. this reduced processing time to 28.6. Still not as fast as C#; but, the thing is that C# is always interpreted at the IL level, and in this case Visual Basic 6 is compiling to Native code, so clearly "lower-level" doesn't translate directly to faster performance. In this case the C# version is faster simply because the Interpreter is able to use new Processor features and run in Long mode (rather than the 32-bit WoW), and that would end up changing what the native code output by the Jitter contains. Visual Basic 6 has a Native code compiler but it will always optimize for a Pentium. Even enabling all advanced optimizations and the "favour Pentium Pro" option didn't allow it to run faster than about 26 seconds.

You might think this is related to Visual Basic itself. This appears partly true. Using Visual Studio 2013 and C++ and the following code, with all optimizations set to full and Release:
Code: [Select]
#include <string>
#include <algorithm>
#include <vector>
#include <hash_map>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
string line;
int linecount = 0;
ifstream myfile;
myfile.open("D:\\testoutput.txt");
cout << "processing..." << endl;
clock_t startTime = clock();
while (myfile.good()){

getline(myfile, line);
linecount++;

}
cout << double (clock() - startTime)/CLOCKS_PER_SEC << " seconds." << endl;
//cout << (double)(clock() – startTime) / (double) CLOCKS_PER_SEC << " seconds." << endl;
cout << "Finished." << endl;
cout << "processed " << linecount << " lines.";
int test;
cin >> test;
}

resulted in this output:

Code: [Select]
processing...
9.535 seconds.
Finished.
processed 8000001 lines.

(This was with ALL optimizations set to full and for speed (/Ox, /Ot)). The only thing I can think of that accounts for the small difference would be that the C# program ran in native 64-bit Mode, whereas the C++ is only compiling to 32-bit (by default), but switching the C++ program to x64 caused it to take about twice as long to complete. My guess as to why it's slower than C# in this case would have to be the ifstream library.



I was trying to dereference Null Pointers before it was cool.

Salmon Trout

  • Guest
Re: MSDOS Bat File program to Copy the Next Line if Findstr Line is True
« Reply #17 on: December 29, 2013, 02:16:00 AM »
you are using type + for loop?

First batch method

@echo off
set Tfile="%1"
setlocal enabledelayedexpansion
set line=1
echo %Tfile%
echo %date% %time%
for /f "delims=" %%L in ('type "%Tfile%"') do (
    set /a line+=1
    echo %%L | find "Trigger" >nul && goto found
    )
:found
echo Found previous
echo %date% %time%
echo Line  %line%
set /a sk=%line%-1
for /f "skip=%sk% delims=" %%L in ('type %Tfile%') do (
    echo %%L
    goto done
    )
:done
echo %date% %time%


Quote
or other methods?

Second batch method

@echo off
set Tfile="%1"
echo %Tfile%
echo FIND start   %date% %time%
for /f "delims=[] tokens=1*" %%A in ('find /N "Trigger" %Tfile% ^| find "Trigger"') do set triggerlinenumber=%%A
echo FIND end     %date% %time%
echo FOR /F start %date% %time%
for /f "skip=%triggerlinenumber% delims=" %%L in ('type %Tfile%') do (
    echo %%L
    goto done
    )
:done
echo FOR /F end   %date% %time%


VBScript:

CONST ForReading = 1
strTextFile = wscript.arguments(0)
wscript.echo strTextFile
Set objFSO = CreateObject("Scripting.FileSystemObject")

start = Timer
strData = objFSO.OpenTextFile(strTextFile,ForReading).ReadAll
wscript.echo "Read file   " & formatnumber(Timer-start, 4, True) & " secs"
start = Timer
arrLines = Split(strData,vbCrLf)
wscript.echo "Split array " & formatnumber(Timer-start, 4, True) & " secs"

start = Timer
iArrayIndex=0
Do While iArrayIndex <= Ubound(arrLines)
    if instr(arrLines(iArrayIndex), "Trigger") > 0 then
        Exit Do
    End If
    iArrayIndex = iArrayIndex + 1
Loop
wscript.echo "Find line   " & formatnumber(Timer-start, 4, True) & " secs"
wscript.Echo "Wanted line " & arrLines(iArrayIndex+1)


Quote
how about timing? what are you using to time the script?

Batch:

echo Start %date% %time%
[Command]
echo End %date% %time%


VBScript:

start = Timer
[Code lines]
wscript.echo "Elapsed " & formatnumber(Timer-start, 4, True) & " secs"