Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: how to get the Count of string in file  (Read 36672 times)

0 Members and 1 Guest are viewing this topic.

vishuvishal



    Beginner
  • Thanked: 3
    Re: how to get the Count of string in file
    « Reply #90 on: August 15, 2010, 05:45:55 PM »
    He he...
    I hope this is form for dos.
    Not for VBS or VB or C

    Don't mind it.
    But, I started liking batch programming.
    I really appreciate your knowledge of expertise.
    As I think windows functionality can be operated from dos. Cause window itself is dos operated operating system.
    So, I think you must count on batch. Rather than other languages.


    If I said anything dis-hearting the integrity of any programmer. I really apologize for that.
    I didn't mean that way.
    But, can you point which is the best IDE for the season.
    Like, C is the best language.

    comment appreciated.

    I know this is going off topic.



    Thanks and regards.
    Vishu

    ghostdog74



      Specialist

      Thanked: 27
      Re: how to get the Count of string in file
      « Reply #91 on: August 15, 2010, 06:11:30 PM »
      One thousand and twenty-five thousand million and two bytes (1,025,000,002) as I posted above.
      so its 1 million (but your filename passed to your vbscript states 1 billion. )
      Quote
      Did I imply that I did not already realise this?
      appears to me. You showed a benchmark between BCP and your code, then says BCP's one is sluggish after a while without stating your reasons and conclusion of your findings. Makes one wonder why it happens right?

      vishuvishal



        Beginner
      • Thanked: 3
        Re: how to get the Count of string in file
        « Reply #92 on: August 15, 2010, 06:15:13 PM »
        so its 1 million (but your filename passed to your vbscript states 1 billion. )appears to me. You showed a benchmark between BCP and your code, then says BCP's one is sluggish after a while without stating your reasons and conclusion of your findings. Makes one wonder why it happens right?


        Really don't know what you talking about.

        ghostdog74



          Specialist

          Thanked: 27
          Re: How to get the Count of string in file
          « Reply #93 on: August 15, 2010, 06:17:40 PM »
          The Sed solution is the best solution.
          not really! If its a big file, using your method of substituting the word to include newlines, (which is expensive compared to pure string counting) , and then piping to 2 calls of find command to find the count is not the best way to go. The best way is to count the number of words found AS YOU ITERATE THE FILE (with whatever tool that is processing it) and put the count in memory. That said, sed is not the best tool to use in this case.


          ghostdog74



            Specialist

            Thanked: 27
            Re: how to get the Count of string in file
            « Reply #94 on: August 15, 2010, 06:18:37 PM »

            Really don't know what you talking about.
            sorry i don't care if you know or not. My words are not for you.

            BC_Programmer


              Mastermind
            • Typing is no substitute for thinking.
            • Thanked: 1140
              • Yes
              • Yes
              • BC-Programming.com
            • Certifications: List
            • Computer: Specs
            • Experience: Beginner
            • OS: Windows 11
            Re: how to get the Count of string in file
            « Reply #95 on: August 15, 2010, 06:19:59 PM »
            Quote
            One thousand and twenty-five thousand million and two bytes (1,025,000,002)
            so its 1 million (but your filename passed to your vbscript states 1 billion. )

            a Billion is a thousand millions... (In North America, at least)


            I was trying to dereference Null Pointers before it was cool.

            ghostdog74



              Specialist

              Thanked: 27
              Re: how to get the Count of string in file
              « Reply #96 on: August 15, 2010, 06:25:33 PM »
              so its 1 million (but your filename passed to your vbscript states 1 billion. )

              a Billion is a thousand millions... (In North America, at least)

              ok ok. But i am talking about post #83. where ST said he download "1 million places of pi", then his file name for testing the benchmark is "1 billion places of pi". He is showing a benchmark, and when there are ambiguities, its only natural for the inquisitive mind to ask questions.

              BC_Programmer


                Mastermind
              • Typing is no substitute for thinking.
              • Thanked: 1140
                • Yes
                • Yes
                • BC-Programming.com
              • Certifications: List
              • Computer: Specs
              • Experience: Beginner
              • OS: Windows 11
              Re: how to get the Count of string in file
              « Reply #97 on: August 15, 2010, 06:31:11 PM »
              ok ok. But i am talking about post #83. where ST said he download "1 million places of pi", then his file name for testing the benchmark is "1 billion places of pi". He is showing a benchmark, and when there are ambiguities, someone like me will question.

              Doesn't much matter if it's a billion or a million, as long as the same inputs were used to test both- the exact size is more a curiousity (except in some cases).
              I was trying to dereference Null Pointers before it was cool.

              ghostdog74



                Specialist

                Thanked: 27
                Re: how to get the Count of string in file
                « Reply #98 on: August 15, 2010, 06:37:35 PM »
                a billion and a million is different.

                ghostdog74



                  Specialist

                  Thanked: 27
                  Re: how to get the Count of string in file
                  « Reply #99 on: August 15, 2010, 06:42:03 PM »
                  Two \\ should be one

                  C:\\test>type   cntstr.bat
                  rem @echo  off
                  sed s/%1/%1\\n/g %2 | egrep -c %1

                  C:\\test>cntstr.bat  the yz.txt

                  C:\\test>rem @echo  off

                  C:\\test>sed s/the/the\\n/g yz.txt   | egrep -c the
                  10

                  C:\\test>type yz.txt
                  the
                  the
                  the
                  the
                  the the the
                  the the the

                  this example will also count words like thesis,  stethescope, etc, which is not exactly the word "the". egrep is also deprecated. Use grep -E
                  Code: [Select]
                  grep -Eo "\bthe\b" file|wc -l
                  the above does not need to do substitution on the entire file and gets the exact string.

                  BC_Programmer


                    Mastermind
                  • Typing is no substitute for thinking.
                  • Thanked: 1140
                    • Yes
                    • Yes
                    • BC-Programming.com
                  • Certifications: List
                  • Computer: Specs
                  • Experience: Beginner
                  • OS: Windows 11
                  Re: how to get the Count of string in file
                  « Reply #100 on: August 15, 2010, 06:54:07 PM »
                  a billion and a million is different.

                  Not in this case. What difference would it have on the results? sure, the numbers will be larger for a billion then for a million, but it's not the actual number that's important, it's how the two numbers compare.

                  ST performed two tests: one with a smaller file, and one with a larger file. the two tests revealed that with a larger amount of data to read, my method causes a large IO bottleneck. Two points of reference is enough for a crude line-chart comparison of the two, and while it may not be entirely accurate, it can reveal specific trends in the two functions. For example, we can determine that my routine seems to run at something like O((n/4)^2), whereas his is a more linear method whose time taken is linearly related to the length of the file. In mine, this is not the case because additional overhead is required for the system to properly manage the larger amount of memory being used to store the entire string.

                  What is important here is that we are comparing the programs used, As long as the inputs are the same the comparisons are valid.

                  if you test program A and Program B with Input C, it's a fair comparison between A and B as long as C is the same for both.

                  It doesn't matter if there was a mixup over the specifics of the size of C. The comparison was between A and B.

                  If you compare a Quick Sort with a Merge Sort,  wether you are testing with a million or a billion elements is largely redundant; what's important is the comparison. If there was confusion over the layout of the data (such as how a quicksort takes longer then a merge sort with a nearly sorted array) and it was relevant, then yes, I would agree. but while there is indeed some ambiguity, it's irrelevant.
                  I was trying to dereference Null Pointers before it was cool.

                  ghostdog74



                    Specialist

                    Thanked: 27
                    Re: how to get the Count of string in file
                    « Reply #101 on: August 15, 2010, 07:04:58 PM »
                    Not in this case. What difference would it have on the results? sure, the numbers will be larger for a billion then for a million, but it's not the actual number that's important, it's how the two numbers compare.
                    If its a larger file, then your method of slurping all into memory is not a good solution. That's the difference. why do you say its not important? If the test files are like 1 thousand vs 100 , then of course your method will work. Size of the test samples do matter when doing benchmarks as it will affect the design of the algorithm being used.

                    victoria



                      Beginner

                      Thanked: 1
                      Re: how to get the Count of string in file
                      « Reply #102 on: August 15, 2010, 07:52:35 PM »
                      this example will also count words like thesis,  stethescope, etc, which is not exactly the word the egrep is also deprecated. Use grep -E
                      Code: [Select]
                      grep -Eo bthe file|wc -l
                      the above does not need to do substitution on the entire file and gets the exact string.

                      Ghost,
                      Your grep works. I had an old 2005 version.

                      Your skill level has improved. Who is your Tutor?

                      C:test>grep  -Eo the  yz.txt
                      the
                      the
                      the
                      the
                      the
                      the
                      the
                      the
                      the
                      the

                      C:test>grep  -Eo the  yz.txt  |  wc -l
                            10

                      C:test>type yz.txt
                      the
                      the
                      the
                      the
                      the the the
                      the the the
                      C:test>
                      « Last Edit: August 15, 2010, 08:28:07 PM by victoria »
                      Have a Nice Day

                      BC_Programmer


                        Mastermind
                      • Typing is no substitute for thinking.
                      • Thanked: 1140
                        • Yes
                        • Yes
                        • BC-Programming.com
                      • Certifications: List
                      • Computer: Specs
                      • Experience: Beginner
                      • OS: Windows 11
                      Re: how to get the Count of string in file
                      « Reply #103 on: August 15, 2010, 08:04:13 PM »
                      If its a larger file, then your method of slurping all into memory is not a good solution. That's the difference. why do you say its not important? If the test files are like 1 thousand vs 100 , then of course your method will work. Size of the test samples do matter when doing benchmarks as it will affect the design of the algorithm being used.

                      Reread my post.
                      I was trying to dereference Null Pointers before it was cool.

                      ghostdog74



                        Specialist

                        Thanked: 27
                        Re: how to get the Count of string in file
                        « Reply #104 on: August 15, 2010, 08:54:59 PM »
                        Reread my post.

                        I reiterate my point. Size of a file does not matter if what you are comparing is the result of the output between to 2 pieces of code. That is, you want to make sure the output produced by the 2 pieces of code are the same. Size of file does matter in a benchmark, when you are concerned about the way the program is written and the algorithm used. That's is whether you have use the most optimized method when dealing with big files.

                        Because of the size of the file, you have chosen to read the files in chunks. That's a direct consequence of taking size into consideration when designing your program. That's why size does matter in a benchmark. 1 million is way different 1 billion!