Welcome guest. Before posting on our computer help forum, you must register. Click here it's easy and free.

Author Topic: Awk - A nifty little tool for text manipulation and more.  (Read 14179 times)

0 Members and 1 Guest are viewing this topic.

briandams

    Topic Starter


    Beginner

    Thanked: 2
    • Experience: Guru
    • OS: Unknown
    Awk - A nifty little tool for text manipulation and more.
    « on: January 16, 2014, 05:47:09 AM »
    awk has been around since the unix days in the 70s and has been a standard tool in most unix-like OS. It is primarily used for text and string manipulation. The GNU awk is one of the most widely used awk version nowadays and now its also ported to the windows OS so its now convenient to use it as part of your batch scripting tool set.

    In this thread, I shall show some examples on how one can use this tool for easy file/text manipulation in a Windows batch environment. (if you can download 3rd party tools not installed by default). This is mostly for beginners to using awk or looking for tools to parse strings/text.

    The syntax for awk is simple
    Code: [Select]
    pattern { action }

    For example, to print a file

    Code: [Select]
    C:\> awk "{print}" myFile.txt

    In the above example, the "action" is "print". This is the equivalent of the command
    Code: [Select]
    type myFile.txt

    The cmd.exe on windows doesn't like single quotes, so we have use double quotes for the "action" part.

    awk has a "BEGIN" and "END" pattern block. The "BEGIN" pattern only executes once before the first record is read by awk. For example, you can initialize variables inside this block

    Code: [Select]
    awk "BEGIN{a=10} ....." myFile.txt

    or just do some calculation (simple calculator)
    Code: [Select]
    C:\>awk "BEGIN { print 1+2 } "
    3

    Likewise the "END" pattern is executed once only after all the records in the file has been read. For example, you want to print the last line of the file
    Code: [Select]
    C:\> more myFile.txt
    C:\original\1\2\3
    C:\original\1\2\4
    C:\original\1\2\5
    C:\original\1\2\36
    test

    C:\>awk "END{ print $0} " myFile.txt
    test

    "$0" means the current line/record.

    to be continued...

    - brianadams

    briandams

      Topic Starter


      Beginner

      Thanked: 2
      • Experience: Guru
      • OS: Unknown
      Post Parsing Structured Text
      « Reply #1 on: January 16, 2014, 05:49:53 AM »
      One common task almost everyone does is getting information from files. If you have a field delimited text file to parse, then awk might be just the tool you need.

      For example, if you have this simple csv file where the delimiter is "|"

      Code: [Select]
      C:\>type myFile.txt
      1|2|3|4|5
      6|7|8|9|10
      a|b|c|d|e

      and you wish to get the 3rd column. In awk, the 3rd column is denoted by $3. Likewise, 2nd column as $2 and so on. So to get the 3rd column, issue this command

      Code: [Select]
      C:\>awk -F"|" "{print $3}" myFile.txt
      3
      8
      c

      the -F option is the field delimiter. Here, the "|" is specified as field delimiter. Hence awk will break the record into fields or tokens, with each field denoted by "$" and a numeric value. eg $1 means the first field, $9 means the 9th field and so on.
      The above is equivalent to DOS for /f command with tokens
      Code: [Select]
      for /f "tokens=* delims=|" ..........


      To print the last field, use $NF. To print the last 2nd field, use $(NF-1)
      One of the feature of awk and -F is its ability to take in a regular expression, or multiple characters as the field delimiter. For example we have a file with delimiters ",%#"

      Code: [Select]
      C:\>type myFile.txt
      1,%#2,%#3,%#4,%#5
      6,%#7,%#8,%#9,%#10
      a,%#b,%#c,%#d,%#e
      Issuing the same commands as before but pass ',%#' to -F and printing the 2nd column
      Code: [Select]
      C:\>awk -F",%#" "{print $2}" myFile.txt
      2
      7
      b

      to be continued

      - brianadams

      briandams

        Topic Starter


        Beginner

        Thanked: 2
        • Experience: Guru
        • OS: Unknown
        Length of a string or record
        « Reply #2 on: January 16, 2014, 05:52:45 AM »
        Often we need to get the length of a string or a line of record in the file. awk provides the length() function to do this. For example
        Code: [Select]
        C:\>echo "test"| awk "{print length}"
        6

        why is it 6 and not 4 ? This is because the "echo" command in DOS "counts" the double quotes as characters. Hence you get 6. To calculate the string length of some variable you can just pipe to awk without the quotes
        Code: [Select]
        C:\>echo test| awk "{print length}"
        4

        Use the usual DOS for loop (for /f ...) to capture the result

        How about going through a file and displaying the lines that are of a certain length?
        eg we have this file
        Code: [Select]
        C:\>type myFile.txt
        abcd
        abcd
        abcdefghi
        abcdefghi
        abcdefghijklmn

        and we want to get those lines whose length is 4.

        Code: [Select]
        C:\>awk "length==4" myFile.txt
        abcd
        abcd

        writing length==4 this way is considered the "pattern" part of the awk syntax. So its not like this:
        Code: [Select]
        c:\>awk "{length==4}" myFile.txt

        The "pattern" part of the awk syntax is usually a regular expression or some conditions.

        Another example, search for length greater than 4 and less than 10 will yield

        Code: [Select]
        C:\>awk "length>4 && length <10" myFile.txt
        abcdefghi
        abcdefghi

        If you want to write out the "action" part of the awk syntax, then the above is the same as
        Code: [Select]
        C:\>awk "length>4 && length <10 {print} " myFile.txt
        abcdefghi
        abcdefghi

        to be continued..

        - brianadams

        briandams

          Topic Starter


          Beginner

          Thanked: 2
          • Experience: Guru
          • OS: Unknown
          Operators
          « Reply #3 on: January 16, 2014, 05:55:20 AM »
          awk provides the usual maths operators to help you perform calculations in your script. Here I only list some that are commonly used.

          Exponents
          x ^ y
          x ** y

          Add, minus, divide, multiply -> +, - , / , *

          Modulus : %

          x++ , ++x : post and pre increment operators
          x-- , --x : post and pre decrement operators

          x += 1 : Adds 1 to the value of x
          x -= 1 : Minus 1 from the value of x


          Boolean operators:
          ! : not operator
          && : Logical AND
          || : Logical OR

          Relational Operators
          <<, >>, <, > , =

          Regular expression matching operators
          ~ : matching
          !~ : non-matching

          Ternary operator (conditional expression )
          ?:


          For square roots, there is the sqrt() function. eg sqrt(100)

          For trigonometry, there are cosine(), sine(), tan() etc functions.

          To generate random numbers, there is the rand() function eg
          Code: [Select]
          C:\>awk "BEGIN{ print rand() }"
          0.237788

          To generate different random numbers everytime you run the awk command, use the srand() function

          Code: [Select]
          C:\>gawk "BEGIN{ srand(); print rand() }"
          0.14306

          C:\>gawk "BEGIN{ srand(); print rand() }"
          0.807121

          C:\>gawk "BEGIN{ srand(); print rand() }"
          0.663245


          To concatenate strings , just write them next to each other, like this
          Code: [Select]
          C:\>awk "BEGIN{ print \"2\" \"3\" }"
          23

          If writing the awk command on the command line, we have to take care of the double quotes that is used inside awk , by escaping the quotes. In unix shell, it can be written like this :

          Code: [Select]
          awk 'BEGIN{ print "2" "3"}'

          For more information on operators, please consult the manual.

          to be continued ...

          - brianadams

          briandams

            Topic Starter


            Beginner

            Thanked: 2
            • Experience: Guru
            • OS: Unknown
            Simple string manipulation
            « Reply #4 on: January 16, 2014, 05:59:24 AM »
            Here I cover simple string manipulation in awk using its in-built string functions
            1) Getting part of a string - substring-ing
            2) Getting index of a string
            3) Splitting a string
            4) Uppercase and lowercase

            1) Getting part of a string

            awk provides the substr() function to get part of a string, for example
            Code: [Select]
            C:\>echo chimpanzee| awk "{print substr($0,2,5) }"
            himpa

            $0 is the current record/line, in this case, its the standard input passed to awk using the pipe. substr($0,2,5) just says to get the 5 characters starting at position 2 of the current record. It is the same as the DOS internal build in

            Code: [Select]
            %variable:~1,5%

            where %variable% is "chimpanzee". Note that the "echo" command in DOS is particular about spaces (ref:foxidrive), so in the example above, no spaces after "echo chimpanzee"

            2) Getting index of a string
            This is equivalent to saying "get the first occurence of a string inside a string. eg To find the first occurence of the letter "h" in "elephant"

            Code: [Select]
            C:\>echo elephant| awk "{print index($0,\"h\") }"
            5

            (take note of the escaping of double quotes when writing on the command line)
            If the letter is not found, index() will return 0. So you can check for ERRORLEVEL ==0 in DOS shell. This is useful if you want to see if a string is found inside another string.

            Code: [Select]
            C:\>echo elephant| awk "{print index($0,\"z\") }"
            0

            Next, the split command. awk provides the split() command to split a string based on a pattern. For example, let's split the word "euphoria" on the letter "p"
            Code: [Select]
            C:\>echo euphoria| awk "{ n=split($0,array,\"p\") } END{ print array[1], n} "
            eu 2

            Again, $0 means current record (which is euphoria passed in from std input). the split() function takes in the first argument as the record, the 2nd argument as an array, and the last argument as the pattern to split on. This pattern can be a regular expression.

            The results of the split are stored in "array". In the above example, we print out the first item of the array at the END block. split() returns the number of items in the array. So in the above, "n" has a value of 2, meaning there are 2 items in the array.

            4) Uppercase and lowercase
            Often you might want to change the case of words/strings in your task objective. Awk provides in-built functions tolower() and toupper(). eg
            This one liner change all the characters in the file to uppercase

            Code: [Select]
            C:\>type myFile.txt
            computerhope.com

            C:\>awk "{ print toupper($0) ;}" myFile.txt
            COMPUTERHOPE.COM

            If you want to change only one string,
            Code: [Select]
            C:\>echo test|awk "{print toupper($0) }"
            TEST

            C:\>echo TEST|awk "{print tolower($0) }"
            test

            As usual, capture the result in using a DOS for loop.


            to be continued...

            - brianadams

            briandams

              Topic Starter


              Beginner

              Thanked: 2
              • Experience: Guru
              • OS: Unknown
              Printing in awk
              « Reply #5 on: January 16, 2014, 06:02:01 AM »
              awk provides at least 2 forms of printing to output,
              1) print
              2) printf and
              3) sprintf()

              1) print.
              The basic statement to display output to the user is the print statement. It should be too difficult to understand how to use it. Just
              Code: [Select]
              print "your string"

              Sometimes you also can redirect to an output file inside of awk by using the output redirection operator ">"
              Code: [Select]
              C:\>awk "BEGIN{print \"computerhope.com\" > \"testfile\" }"

              C:\>type testfile
              computerhope.com

              2) printf().
              This printf statement syntax look like this:
              Code: [Select]
              printf("format" , item1, item2 ...)

              This printf statement is very much similar to printf() from C language, where you can put format specifiers such as %s (string), %d (integer), %f (float). For example, to format some number or floats to 2 decimal places
              Code: [Select]
              C:\>awk "BEGIN{ printf(\"%.2f\" , 100) }"
              100.00
              C:\>awk "BEGIN{ printf(\"%.2f\" , 3.14244) }"
              3.14

              To right justify a string 15 places
              Code: [Select]
              C:\>awk "BEGIN{ printf(\"%15s\" , \"mystring\") }"
                     mystring

              If you want to pad a string with 0's in front, eg
              Code: [Select]
              C:\>awk "BEGIN{ printf(\"%05d\" , 100) }"
              00100


              3) sprintf().
              sprintf() works the same as printf() and it allows the formatting to be saved to a variable.
              Code: [Select]
              C:\>awk "BEGIN{PI = sprintf(\"%.4f\", 22/7); print PI }"
              3.1429

              here, the value of 22/7 is saved to "PI" variable with 4 decimal places. This variable can then be used in other parts of the awk script.

              For more info and examples on print, printf and sprintf please consult the manual.

              to be continued

              - brianadams

              briandams

                Topic Starter


                Beginner

                Thanked: 2
                • Experience: Guru
                • OS: Unknown
                Awk Loops
                « Reply #6 on: January 16, 2014, 06:03:43 AM »
                Awk loops works the same as those in C language. Here I touch 2 of the most common loops ,
                1) for loop
                2) while loop

                The syntax for a "for" loop in awk is this

                Code: [Select]

                for (initialization; condition; increment)
                       body
                eg to generate a range of numbers from 1 to 9
                Code: [Select]
                C:\>awk "BEGIN{ for(i=1;i<10;i++){ print i }  }"
                1
                2
                3
                4
                5
                6
                7
                8
                9

                Use a DOS for loop to catch each number and use as desired. This is the same as
                Code: [Select]
                FOR /L %%G IN (1,1,9) DO echo %%G

                The while loop is another popular form of looping construct in most programming language. For example, setting a count down and printing 10 "*"
                Code: [Select]
                C:\>awk "BEGIN{count=10; while(count>0 ){ print \"*\" ; count--}  }"
                *
                *
                *
                *
                *
                *
                *
                *
                *
                *
                To put in more clearly

                Code: [Select]
                BEGIN{
                    count=10         # set count of 10.
                    while(count>0 ) {
                        print \"*\"    # print *
                        count--       # decrement the count each time through the loop
                    }     
                }

                Of course, the above can be written with the for loop as well
                Code: [Select]
                C:\>awk "BEGIN{for(c=10;c>0;c--) print \"*\"  }"
                *
                *
                *
                *
                *
                *
                *
                *
                *
                *


                to be continued

                - brianadams

                briandams

                  Topic Starter


                  Beginner

                  Thanked: 2
                  • Experience: Guru
                  • OS: Unknown
                  Awk arrays
                  « Reply #7 on: January 16, 2014, 06:06:27 AM »
                  Most programming language support data structures such as arrays that can be used to stored similar collection of items, instead of individual variables. Awk has arrays too and its called associative arrays. That means each array is a collection of pairs, called, index and value.

                  Here are simple example of how to use arrays in Awk.

                  Code: [Select]
                  C:\>awk "BEGIN{a[1]=\"one\" ; a[2]=10; print a[1]\",\"a[2] }"
                  one,10


                  In the above example, we declare array "a" with item "1" having a value of "one" (a string) and item "2" with value of 10 (integer). Arrays in awk can have different data types for items and values. eg
                  Code: [Select]
                  C:\>awk "BEGIN{a[\"two\"]=2; print a[\"two\"] }"
                  2
                  here, the item is "two" (a string) and value is the integer 2.


                  To iterate an array:, use an awk for loop
                  Code: [Select]
                  C:\>awk "BEGIN{a[1]=\"one\" ; a[\"two\"]=2; for(item in a) {print item\" \"a[item]  } }"
                  two 2
                  1 one

                  To put it more clearly:
                  Code: [Select]
                  BEGIN{
                      a[1]=\"one\"
                      a[\"two\"]=2
                      for( item in a )  {
                         print item\" \"a[item]
                      }
                  }

                  In awk, arrays have no order indexing, not like normal arrays in C. So by printing the array in the above case, the result is arbitrary.

                  To get the size of an array, you can use length() function as described earlier
                  Code: [Select]
                  C:\>awk "BEGIN{a[1]=\"one\" ; a[\"two\"]=2; a[2]=100; print length(a) }"
                  3

                  To see if an item exists in array, we can use the if statement
                  Code: [Select]
                  C:\>awk "BEGIN{a[1]=\"one\" ; a[\"one\"]=1; a[2]=100; if (2 in a) { print \"ok\"}  }"
                  ok

                  More clearer this way:
                  Code: [Select]
                  BEGIN{
                      a[1] =\"one\"    # define array items and values
                      a[\"one\"] = 1
                      a[2] = 100
                      if (2 in a) {
                         print \"ok\"
                      }
                  }

                  To remove an item in array, use the delete statement, eg
                  Code: [Select]
                  C:\>awk "BEGIN{a[1]=\"one\" ; a[\"one\"]=1; a[2]=100; delete a[2]; if (2 in a) { print \"ok\"}else {print \"not ok\"}  }"
                  not ok

                  More clearer this way:
                  Code: [Select]

                  BEGIN{
                      a[1]=\"one\"
                      a[\"one\"]=1
                      a[2]=100;
                      delete a[2]    # delete item "2"
                      if (2 in a) {
                          print \"ok\"
                      }else {
                          print \"not ok\"
                      }
                  }

                  To delete a whole array: just do delete array

                  See the manual for more elaborate examples on using arrays

                  to be continued ...

                  - brianadams

                  briandams

                    Topic Starter


                    Beginner

                    Thanked: 2
                    • Experience: Guru
                    • OS: Unknown
                    Awk Flow control
                    « Reply #8 on: January 16, 2014, 06:08:35 AM »
                    Making decisions are part of our thought process every day. If you want to tell the computer to do something then the language must provide if/else constructs for that.  :)

                    Awk provides the usual if/else/else if constructs that most languages have.

                    Code: [Select]
                    C:\>awk "BEGIN{ b=2; if( b==2 ) print \"it is 2\" }"
                    it is 2

                    Basic construct
                    Code: [Select]
                    if ( condition ) {
                       ....
                    } else if (condition) {
                       ...
                    } else {
                      ....
                    }


                    The break Statement jumps out of loops, like this:
                    Code: [Select]
                    for ( conditions ){
                       ...
                       break   #breaks out of for loop
                       ...
                    }


                    The continue Statement is also used in loop and skips the rest of the loop causing the next cycle around the loop to begin immediately. eg
                    Code: [Select]
                       for (x = 0; x <= 10; x++) {
                            if (x == 2) {
                               continue     # this continue skips the print statement below
                            }
                            print "something"
                       }


                    Later version of Awk supports the switch statement, but its seldom needed as an if/else is good enough for most task. If you want to know more about switch statements, check the manual.

                    to be continued

                    - brianadams

                    briandams

                      Topic Starter


                      Beginner

                      Thanked: 2
                      • Experience: Guru
                      • OS: Unknown
                      Some common Awk Variables
                      « Reply #9 on: January 16, 2014, 06:10:52 AM »
                      Awk has some internal variables that you should be familiar with for parsing string and files.
                      1) NR
                      2) NF
                      3) FS
                      4) RS
                      5) ORS
                      6) OFS


                      1) NR
                      NR stands for number of input records awk has processed since the beginning of the program's execution. For example, you want to find the line count of a file
                      Code: [Select]
                      C:\>type myFile.txt
                      1,%#2,%#3,%#4,%#5
                      6,%#7,%#8,%#9,%#10
                      a,%#b,%#c,%#d,%#e

                      C:\>type myFile.txt | awk "END{print NR}"
                      3
                      This is the same as what the Unix wc -l command gives you.

                      2) NF
                      NF stands for the number of fields in the current input record. For example
                      Code: [Select]
                      C:\>type myFile.txt
                      1,2,3,4,5
                      6,7,8,9,0,10

                      C:\> awk -F"," "{print NF}" myFile.txt
                      5
                      6
                      here, because we have set the -F option (field delimiter) as comma, then the first record will have 5 fields, and the 2nd record will have 6.


                      3) FS
                      This is the input field separator, similar to -F option passed to awk. Usually its defined in the BEGIN block before any records are processed
                      Code: [Select]
                      awk "BEGIN{FS=","} {print}" myFile.txt
                      FS can be any characters (multicharacters as well) and regular expressions

                      4) RS
                      RS stands for input record separator. By default awk's RS is the newline character, that's why awk processed lines one by one by default. You can set the RS to a different value, for example, let's say you want to display the above myFile.txt each number on a line by itself.
                      Code: [Select]
                      C:\>more myFile.txt
                      1,2,3,4,5
                      6,7,8,9,0,10

                      C:\>awk "BEGIN{RS=\",\"}{ print $0 } " myFile.txt
                      1
                      2
                      3
                      4
                      5
                      6
                      7
                      8
                      9
                      0
                      10
                      Here, RS is set to comma "," , so now each record is just the numbers by itself.

                      5) ORS
                      ORS stands for Output record separator. Its default is newline "\n" and is the output of every print statement. For example, let's say you want "wrap" lines in a file to become a single line eg,
                      Code: [Select]
                      C:\>awk "BEGIN{ORS=\"#\"}{ print $0 } " myFile.txt
                      1,2,3,4,5#6,7,8,9,0,10#


                      you can change the ORS to "#", and the output will become one line. Notice the "#". Orignially, its "\n", now its "#". Hence this gives the effect of joining to become a single line.


                      6) OFS
                      This is the output field separator. ITs default is space, and its the output between the fields printed by a print statement. For example, changing the field separator to "#"
                      Code: [Select]
                      C:\>type myFile.txt
                      1,2,3,4,5
                      6,7,8,9,0,10

                      C:\>awk "BEGIN{OFS=\"#\"; FS=\",\"}{$1=$1;print } " myFile.txt
                      1#2#3#4#5
                      6#7#8#9#0#10


                      In the above example, because we are changing the OFS, the record need to be rebuild to "reflect" the changes. Hence its common idiom to use $1=$1. (you can consult the manual for explanation)


                      to be continued ..

                      -brianadams

                      briandams

                        Topic Starter


                        Beginner

                        Thanked: 2
                        • Experience: Guru
                        • OS: Unknown
                        Pattern Matching and Substitution
                        « Reply #10 on: January 16, 2014, 06:13:28 AM »
                        Awk has in built pattern matching and functions for string substitutions. Here I show some basic examples of simple matching and substitution. Regular expressions is a vast topic so if for in depth regex , please consult a regex book. My favorite is Mastering Regular Expression from Oreilly.

                        Pattern matching
                        In awk, simple matching goes like this using the ~ operator. (all examples use myFile.txt)
                        Code: [Select]
                        C:\> type myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
                        1981,NL,ATL,1,W,N,4,54,25,29
                        1981,NL,ATL,2,W,N,5,52,25,27
                        1981,AL,BAL,1,E,N,2,54,31,23
                        1981,AL,BAL,2,E,N,4,51,28,23
                        1981,AL,BOS,1,E,N,5,56,30,26
                        1981,AL,BOS,2,E,N,2,52,29,23
                        1981,AL,CAL,1,W,N,4,60,31,29
                        1981,AL,CAL,2,W,N,6,50,20,30
                        1981,AL,CHA,1,W,N,3,53,31,22
                        1981,AL,CHA,2,W,N,6,53,23,30
                        1981,NL,CHN,1,E,N,6,52,15,37
                        1981,NL,CHN,2,E,N,5,51,23,28
                        1981,NL,CIN,1,W,N,2,56,35,21
                        1981,NL,CIN,2,W,N,2,52,31,21
                        1981,AL,CLE,1,E,N,6,50,26,24
                        1981,AL,CLE,2,E,N,5,53,26,27
                        1981,AL,DET,1,E,N,4,57,31,26
                        1981,AL,DET,2,E,N,2,52,29,23
                        1981,NL,HOU,1,W,N,3,57,28,29
                        1981,NL,HOU,2,W,N,1,53,33,20
                        1981,AL,KCA,1,W,N,5,50,20,30
                        1981,AL,KCA,2,W,N,1,53,30,23
                        1981,NL,LAN,1,W,N,1,57,36,21
                        1981,NL,LAN,2,W,N,4,53,27,26
                        1981,AL,MIN,1,W,N,7,56,17,39
                        1981,AL,MIN,2,W,N,4,53,24,29

                        C:\>awk "/divID/" myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L

                        The above says to find any lines that has the string "divID" . For pattern matching, the regex pattern to find is usually enclosed in / /.

                        If you want case-insensitive search , use the IGNORECASE variable
                        Code: [Select]
                        C:\>awk "BEGIN{IGNORECASE=1}/divid/" myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L

                        Setting IGNORECASE to 0 toggles it back to case-sensitive.

                        If you want to find all records with 2nd column starting with "A", then
                        Code: [Select]
                        C:\>awk -F"," "$2 ~ /^A/ {print}" myFile.txt
                        1981,AL,BAL,1,E,N,2,54,31,23
                        1981,AL,BAL,2,E,N,4,51,28,23
                        1981,AL,BOS,1,E,N,5,56,30,26
                        1981,AL,BOS,2,E,N,2,52,29,23
                        1981,AL,CAL,1,W,N,4,60,31,29
                        1981,AL,CAL,2,W,N,6,50,20,30
                        1981,AL,CHA,1,W,N,3,53,31,22
                        1981,AL,CHA,2,W,N,6,53,23,30
                        1981,AL,CLE,1,E,N,6,50,26,24
                        1981,AL,CLE,2,E,N,5,53,26,27
                        1981,AL,DET,1,E,N,4,57,31,26
                        1981,AL,DET,2,E,N,2,52,29,23
                        1981,AL,KCA,1,W,N,5,50,20,30
                        1981,AL,KCA,2,W,N,1,53,30,23
                        1981,AL,MIN,1,W,N,7,56,17,39
                        1981,AL,MIN,2,W,N,4,53,24,29

                        First, give the -F"," option because the file is "," delimited. Then use $2 because its the 2nd column. Then using the regex /^A/. "^" means "starts with". After that "{print}" action will print the relevant records.

                        In awk, you can negate matches using !~ operator. For example , you want to find
                        records that doesn't have "DET" as the 3rd field
                        Code: [Select]
                        C:\>awk -F"," "$3 !~ /DET/{print}" myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
                        1981,NL,ATL,1,W,N,4,54,25,29
                        1981,NL,ATL,2,W,N,5,52,25,27
                        1981,AL,BAL,1,E,N,2,54,31,23
                        1981,AL,BAL,2,E,N,4,51,28,23
                        1981,AL,BOS,1,E,N,5,56,30,26
                        1981,AL,BOS,2,E,N,2,52,29,23
                        1981,AL,CAL,1,W,N,4,60,31,29
                        1981,AL,CAL,2,W,N,6,50,20,30
                        1981,AL,CHA,1,W,N,3,53,31,22
                        1981,AL,CHA,2,W,N,6,53,23,30
                        1981,NL,CHN,1,E,N,6,52,15,37
                        1981,NL,CHN,2,E,N,5,51,23,28
                        1981,NL,CIN,1,W,N,2,56,35,21
                        1981,NL,CIN,2,W,N,2,52,31,21
                        1981,AL,CLE,1,E,N,6,50,26,24
                        1981,AL,CLE,2,E,N,5,53,26,27
                        1981,NL,HOU,1,W,N,3,57,28,29
                        1981,NL,HOU,2,W,N,1,53,33,20
                        1981,AL,KCA,1,W,N,5,50,20,30
                        1981,AL,KCA,2,W,N,1,53,30,23
                        1981,NL,LAN,1,W,N,1,57,36,21
                        1981,NL,LAN,2,W,N,4,53,27,26
                        1981,AL,MIN,1,W,N,7,56,17,39
                        1981,AL,MIN,2,W,N,4,53,24,29


                        If you just want to find records that doesn't have the string "DET", just do a !/DET/ using the "!" operator
                        Code: [Select]
                        C:\>awk -F"," "!/DET/" myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
                        1981,NL,ATL,1,W,N,4,54,25,29
                        1981,NL,ATL,2,W,N,5,52,25,27
                        1981,AL,BAL,1,E,N,2,54,31,23
                        1981,AL,BAL,2,E,N,4,51,28,23
                        1981,AL,BOS,1,E,N,5,56,30,26
                        1981,AL,BOS,2,E,N,2,52,29,23
                        1981,AL,CAL,1,W,N,4,60,31,29
                        1981,AL,CAL,2,W,N,6,50,20,30
                        1981,AL,CHA,1,W,N,3,53,31,22
                        1981,AL,CHA,2,W,N,6,53,23,30
                        1981,NL,CHN,1,E,N,6,52,15,37
                        1981,NL,CHN,2,E,N,5,51,23,28
                        1981,NL,CIN,1,W,N,2,56,35,21
                        1981,NL,CIN,2,W,N,2,52,31,21
                        1981,AL,CLE,1,E,N,6,50,26,24
                        1981,AL,CLE,2,E,N,5,53,26,27
                        1981,NL,HOU,1,W,N,3,57,28,29
                        1981,NL,HOU,2,W,N,1,53,33,20
                        1981,AL,KCA,1,W,N,5,50,20,30
                        1981,AL,KCA,2,W,N,1,53,30,23
                        1981,NL,LAN,1,W,N,1,57,36,21
                        1981,NL,LAN,2,W,N,4,53,27,26
                        1981,AL,MIN,1,W,N,7,56,17,39
                        1981,AL,MIN,2,W,N,4,53,24,29

                        These are very simple examples on using regex operator ~, !~ for searching strings.

                        String replacement
                        Awk provides the sub() and gsub() functions to replace strings in files
                        The syntax for sub() is
                        sub(regexp, replacement [, target])

                        for example, replace "LAN" with "NAL"

                        Code: [Select]
                        C:\>awk "{sub(\"LAN\",\"NAL\", $0); print }" myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
                        1981,NL,ATL,1,W,N,4,54,25,29
                        1981,NL,ATL,2,W,N,5,52,25,27
                        1981,AL,BAL,1,E,N,2,54,31,23
                        1981,AL,BAL,2,E,N,4,51,28,23
                        1981,AL,BOS,1,E,N,5,56,30,26
                        1981,AL,BOS,2,E,N,2,52,29,23
                        1981,AL,CAL,1,W,N,4,60,31,29
                        1981,AL,CAL,2,W,N,6,50,20,30
                        1981,AL,CHA,1,W,N,3,53,31,22
                        1981,AL,CHA,2,W,N,6,53,23,30
                        1981,NL,CHN,1,E,N,6,52,15,37
                        1981,NL,CHN,2,E,N,5,51,23,28
                        1981,NL,CIN,1,W,N,2,56,35,21
                        1981,NL,CIN,2,W,N,2,52,31,21
                        1981,AL,CLE,1,E,N,6,50,26,24
                        1981,AL,CLE,2,E,N,5,53,26,27
                        1981,AL,DET,1,E,N,4,57,31,26
                        1981,AL,DET,2,E,N,2,52,29,23
                        1981,NL,HOU,1,W,N,3,57,28,29
                        1981,NL,HOU,2,W,N,1,53,33,20
                        1981,AL,KCA,1,W,N,5,50,20,30
                        1981,AL,KCA,2,W,N,1,53,30,23
                        1981,NL,[color=#800000]NAL[/color],1,W,N,1,57,36,21
                        1981,NL,[color=#800000]NAL[/color],2,W,N,4,53,27,26
                        1981,AL,MIN,1,W,N,7,56,17,39
                        1981,AL,MIN,2,W,N,4,53,24,29

                        sub() only replaces one occurence of the string. For global replacement, use gsub() which has the same syntax as sub().

                        To replace the "BAL" string from the 4th line only, use NR==4 as the "pattern". then use sub().
                        Code: [Select]

                        C:\>awk "NR==4 { sub(\"BAL\",\"LAB\") } {print}" myFile.txt
                        yearID,lgID,teamID,Half,divID,DivWin,Rank,G,W,L
                        1981,NL,ATL,1,W,N,4,54,25,29
                        1981,NL,ATL,2,W,N,5,52,25,27
                        1981,AL,[color=#800000]LAB[/color],1,E,N,2,54,31,23
                        1981,AL,BAL,2,E,N,4,51,28,23
                        1981,AL,BOS,1,E,N,5,56,30,26
                        .....

                        to be continued

                        - brianadams

                        briandams

                          Topic Starter


                          Beginner

                          Thanked: 2
                          • Experience: Guru
                          • OS: Unknown
                          Writing an awk script - Averaging example.
                          « Reply #11 on: January 16, 2014, 06:15:54 AM »
                          Awk commands are not just for one liners as we have seen so far. You can put awk commands in a script (aka, text file) and have awk run them for you. Its the same as writing a vbscript and having cscript engine runs the command for you.

                          The syntax for running awk scripting is simply (-f option)
                          Code: [Select]
                          c:\> awk -f myawkscript.awk input_file.csv

                          Lets round up this part of the primer by an example. Say you have last 20 days worth of Google financial data in a csv comma delimited file. you want to find the average of the closing price (column 5) and find out how many of the days (records) have their closing price greater than the average.

                          Code: [Select]
                          Date,Open,High,Low,Close,Volume,Adj Close
                          2014-01-03,1115.00,1116.93,1104.93,1105.00,1666700,1105.00
                          2014-01-02,1115.46,1117.75,1108.26,1113.12,1821400,1113.12
                          2013-12-31,1112.24,1121.00,1106.26,1120.71,1357900,1120.71
                          2013-12-30,1120.34,1120.50,1109.02,1109.46,1236100,1109.46
                          2013-12-27,1120.00,1120.28,1112.94,1118.40,1569700,1118.40
                          2013-12-26,1114.01,1119.00,1108.69,1117.46,1337800,1117.46
                          2013-12-24,1114.97,1115.24,1108.10,1111.84,734200,1111.84
                          2013-12-23,1107.84,1115.80,1105.12,1115.10,1721600,1115.10
                          2013-12-20,1088.30,1101.17,1088.00,1100.62,3261600,1100.62
                          2013-12-19,1080.77,1091.99,1079.08,1086.22,1665700,1086.22
                          2013-12-18,1071.85,1084.95,1059.04,1084.75,2210300,1084.75
                          2013-12-17,1072.82,1080.76,1068.38,1069.86,1535700,1069.86
                          2013-12-16,1064.00,1074.69,1062.01,1072.98,1602000,1072.98
                          2013-12-13,1075.40,1076.29,1057.89,1060.79,2162400,1060.79
                          2013-12-12,1079.57,1082.94,1069.00,1069.96,1595900,1069.96
                          2013-12-11,1087.40,1091.32,1075.17,1077.29,1695800,1077.29
                          2013-12-10,1076.15,1092.31,1075.65,1084.66,1853900,1084.66
                          2013-12-09,1070.99,1082.31,1068.02,1078.14,1482600,1078.14
                          2013-12-06,1069.79,1070.00,1060.08,1069.87,1428800,1069.87
                          2013-12-05,1057.20,1059.66,1051.09,1057.34,1133700,1057.34

                          For this, its too "complicated" to be a one liner so we put commands inside a file. You can use any text editor to create your script.
                          The basic layout of the script goes like this:
                          Code: [Select]
                          BEGIN{
                           # here you can initialize variables
                          }

                          {
                            # here you Do processing For every record
                          }

                          End {
                             # here you can Do End processing like printing final result
                          }

                          Here's a snapshot of a the script
                          Code: [Select]
                          BEGIN{
                           # here you can initialize variables
                           FS = ","      # Set the field delimiter To comma
                           sum = 0      # Set a variable called sum To store the total of column 5
                          }

                          NR>1{
                            # use NR > 1 To exclude the header row
                            # here you Do processing For every record
                            sum += $5 # awk convert implictly Each column 5 values To integer
                          }


                          END {
                             # here you can Do End processing like printing final result
                             print "The total sum is " sum
                             print "The average is " sum/NR
                          }
                          NR is the total number of records, so to average column 5 which is the closing price, just divide the sum by NR at the END block.

                          Running the script gives
                          Code: [Select]
                          C:\>awk -f average.awk google.csv
                          The total sum is 21823.6
                          The average is 1039.22


                          Next we find how many days are there in the file that has closing price greater than average. This is the code
                          Code: [Select]
                          BEGIN{
                           # here you can initialize variables
                           FS = ","      # Set the field delimiter To comma
                           sum = 0      # Set a variable called sum To store the total of column 5
                          }

                          NR>1{
                            # use NR > 1 To exclude the header row
                            # here you Do processing For every record
                            sum += $5     # awk convert implictly Each column 5 values To integer
                            days[$1] = $5 # store the closing price into Array, With the first column as index
                          }


                          END {
                             # here you can Do End processing like printing final result
                             average = sum/NR
                             print "The total sum is " sum
                             print "The average is " average
                             print "Days greater than average"
                             
                             for( d in days ) {
                                   if ( days[d] > average ) {
                                      print d, days[d]
                                   }
                             }
                          }

                          running the script gives
                          Code: [Select]

                          C:\>awk -f  average.awk google.csv
                          The total sum is 21823.6
                          The average is 1039.22
                          Days greater than average
                          2013-12-10 1084.66
                          2013-12-11 1077.29
                          2013-12-20 1100.62
                          2013-12-12 1069.96
                          2013-12-30 1109.46
                          2013-12-13 1060.79
                          2013-12-31 1120.71
                          2013-12-05 1057.34
                          2013-12-23 1115.10
                          2014-01-02 1113.12
                          2013-12-06 1069.87
                          2013-12-24 1111.84
                          2014-01-03 1105.00
                          2013-12-16 1072.98
                          2013-12-17 1069.86
                          2013-12-26 1117.46
                          2013-12-09 1078.14
                          2013-12-18 1084.75
                          2013-12-27 1118.40
                          2013-12-19 1086.22


                          Very simple example to illustrate concepts shown so far. hope you understand how to use simple awk in your batch.

                          to be continued ..

                          - brianadams

                          briandams

                            Topic Starter


                            Beginner

                            Thanked: 2
                            • Experience: Guru
                            • OS: Unknown
                            Writing an awk script - Parsing systeminfo example
                            « Reply #12 on: January 16, 2014, 06:17:42 AM »

                            Let's say you want to get some information from systeminfo command. eg you want to get the data from these items:

                            OS Name
                            System type
                            System Up Time
                            Original Install Date"
                            Total Physical Memory
                            Available Physical Memory
                            BIOS Version
                            OS Version

                            Here is the code, save as parse_systeminfo.awk
                            Code: [Select]
                            BEGIN{
                             # here you can initialize variables
                             FS = ":[ ]+"      # Set the field delimiter To : and one or more spaces
                             
                             # initialize lookup table
                             array["OS Name"]=""
                             array["System type"] = ""
                             array["System Up Time"] = ""
                             array["Original Install Date"] = ""
                             array["Total Physical Memory"] = ""
                             array["Available Physical Memory"] = ""
                             array["BIOS Version"] = ""
                             array["OS Version"] = ""
                            }

                            {
                                # update table
                                if ( $1 in array  ){
                                    array[$1] = $2
                                }
                            }


                            END {
                                for( item in array ){
                                    # beautify output by adjusting width using printf
                                    printf("%-30s  ===> %-30s\n" , item,  array[item])
                                }
                            }


                            Another way to do it is just to use a regex inside the body, eg

                            Code: [Select]
                            /OS Name|Bios Version|....../ {
                                array[$1] = $2
                            }


                            Results:
                            Code: [Select]
                            C:\>systeminfo | awk -f parse_systeminfo.awk
                            System Up Time                  ===> 0 Days, 8 Hours, 25 Minutes, 58 Seconds
                            OS Version                      ===> 5.1.2600 Service Pack 3 Build 2600
                            System type                     ===> X86-based PC
                            Available Physical Memory       ===> 244 MB
                            Total Physical Memory           ===> 575 MB
                            BIOS Version                    ===> VBOX   - 1
                            OS Name                         ===> Microsoft Windows XP Professional
                            Original Install Date           ===> 2013/12/09, 12:04:49 AM

                            briandams

                              Topic Starter


                              Beginner

                              Thanked: 2
                              • Experience: Guru
                              • OS: Unknown
                              Awk User Defined Functions
                              « Reply #13 on: January 16, 2014, 06:20:10 AM »
                              For this section I am going to introduce user defined functions in awk. Awk in fact is a little "programming language" as you can already see what features it has so far. As such, you can create user defined functions inside an awk script. The purpose of functions is to provide a means for running repetitive tasks in the program. The syntax of awk functions is similar to other languages.

                              Code: [Select]
                                   function name( argument1, argument2 ... )
                                   {
                                        body-of-function
                                        return [expression]
                                   }
                              you can put all the functions declarations before the BEGIN block, eg say you want to create a function that prints horizontal lines at various part of your code
                              Code: [Select]
                              function horizontal_line(){
                                  # function prints 100 "dashs"
                                  for(i=0;i<100;i++){
                                      printf "-"
                                  }
                                  print       # add final new line
                              }

                              BEGIN{
                                  print "Initializing..."
                                  horizontal_line()
                                  print "After horizontal_line function is called ..."
                              }


                              output results:
                              Code: [Select]
                              C:\>awk -f myScript.awk
                              Initializing...
                              ----------------------------------------------------------------------------------------------------
                              After horizontal_line function is called ...


                              This is a simple example of a function with no arguments.

                              In awk, if you pass an array as the function argument, then the array is said to be "passed as reference". Otherwise, the argument is said to be "passed by value". For example, a string is passed by value.


                              Code: [Select]
                              animal = "monkey"
                              z = zoo( animal )

                              function zoo( string ){
                                  print string
                                  string = "snake"
                                  print string
                              }

                              the function zoo does not change the value of "animal" in the main code. This is called "passed by value"

                              For arrays, its passed by reference, as in this example
                              Code: [Select]
                              function zoo(b){
                                  b[1] = "hippo"   # here we change the item to hippo
                              }

                              BEGIN{
                                  # main code
                                  a[1] = "test"     # define an item in array
                                  print "a[1] before function is: " a[1]
                                  zoo(a)             # call zoo function
                                  print "a[1] after function is: " a[1]
                              }


                              result:
                              Code: [Select]

                              C:\>awk -f myScript.awk
                              a[1] before function is: test
                              a[1] after function is: hippo


                              we can see that the array item is changed in the main code after calling the function zoo.

                              Values can be passed back to the calling program by using the return keyword.
                              Code: [Select]
                              function calculate(){
                                 .. calculation code here...
                                 result = ....
                                 return result
                              }

                              This is a simple introduction to user defined functions in awk

                              to be continued...

                              - brianadams

                              briandams

                                Topic Starter


                                Beginner

                                Thanked: 2
                                • Experience: Guru
                                • OS: Unknown
                                Getting User Input and File Reading
                                « Reply #14 on: January 16, 2014, 06:22:54 AM »
                                In awk, you can get user input using the getline function eg
                                Code: [Select]
                                BEGIN{
                                    print "Enter something"
                                    getline entered
                                    print "You entered " entered
                                }
                                result
                                Code: [Select]
                                C:\>awk -f test.awk
                                Enter something
                                test
                                You entered test

                                here,  the variable "entered" will contain the value of what the user has entered.


                                There is another common usage of getline function. Reading a file. Here's an example of how to read a file inside an awk script
                                Code: [Select]
                                BEGIN{     
                                    while ( ( getline line < "myFile.txt" ) > 0 ){
                                        print "Read: " line
                                    }
                                }

                                result
                                Code: [Select]
                                C:\>type myFile.txt
                                computerhope.com
                                is
                                the
                                best

                                C:\>awk -f myScript.awk
                                Read: computerhope.com
                                Read: is
                                Read: the
                                Read: best


                                Let's dissect the while loop , first use getline to read in the file
                                Code: [Select]
                                ( getline line < "myFile.txt" )

                                Every line that is successfully read in has a value more than 0.
                                Code: [Select]
                                ( getline line < "myFile.txt" ) > 0

                                You can then use a while loop to iterate the file,
                                Code: [Select]
                                    while ( ( getline line < "myFile.txt" ) > 0 ){
                                        # do something with line
                                    }

                                each time checking the value if its greater than 0. Otherwise, getline will finish processing when reached end of file, and the while loop will end.

                                Lastly, another common way to use getline is using a pipe. Let's say you want to display the output of the "dir" DOS command inside awk. Here's how to do it. Its still using a while loop coupled with the getline function
                                Code: [Select]
                                BEGIN{
                                   
                                    while ( ("dir" | getline line ) > 0 ){
                                        print "Read: " line
                                    }
                                    close("dir")    # close the pipe properly for next use in the program
                                }

                                result
                                Code: [Select]
                                C:\>awk -f myScript.awk
                                Read:  Volume in drive C has no label.
                                Read:  Volume Serial Number is DCEB-67C9
                                Read:
                                Read:  Directory of C:\
                                Read:
                                ....
                                ... [ too long ] ...
                                That's how you can call an external DOS command and have it displayed inside awk program itself.

                                getline returns 1 if it finds a record, and 0 if the end of the file is encountered. If there is some error in getting a record, such as a file that cannot be opened, then getline returns -1. It is generally good practice to always explicitly test for >0 while reading a file or handling input from pipes.

                                to be continued

                                - brianadams

                                briandams

                                  Topic Starter


                                  Beginner

                                  Thanked: 2
                                  • Experience: Guru
                                  • OS: Unknown
                                  Date and Time
                                  « Reply #15 on: January 16, 2014, 06:24:41 AM »
                                  Dealing with date and time is more or less a common task when batch scripting. Awk provides simple date and time function for basic time/date manipulation needs.
                                  1) systime()
                                  2) strftime()
                                  3) mktime()

                                  1) systime().
                                  This is the the number of seconds since the system epoch. systime is commonly used to create a random number seed.

                                  Code: [Select]
                                  C:\>awk "BEGIN{ print systime(); } "
                                  1389169226

                                  2) strftime().
                                  This is a function to format a timestamp based on the contents of the format string. This is useful if you want to create a time stamp on windows.eg To get the full 4-digits year, use the "%Y" format
                                  Code: [Select]
                                  C:\>awk "BEGIN{ print strftime(\"%Y\") } "
                                  2014


                                  To get YYYY-MM-DD-HH-mm-ss timestamp
                                  Code: [Select]
                                  C:\>awk "BEGIN{ print strftime(\"%Y-%m-%d-%H-%M-%S\") } "
                                  2014-01-08-16-24-23

                                  you can then capture the results in the usual DOS for loop.


                                  3) mktime( date specs )
                                  "date specs" argument to mktime is a string of the form YYYY MM DD HH MM SS.
                                  YYYY = full year
                                  MM = month, 1 to 12
                                  DD = day, 1 to 31
                                  HH = hour, 0 to 23
                                  mm = minute, 0 to 59
                                  SS = seconds, 0 to 59
                                  mktime will create a timestamp similar to systime()
                                  eg
                                  Code: [Select]
                                  C:\>awk "BEGIN{string=\"2014 01 01 0 0 0\"; print mktime(string) } "
                                  1388505600

                                  mktime is commonly use to get time difference. eg compare the date "2014 01 01 0 0 0 " against today's date and get their difference (in secs)

                                  Code: [Select]
                                  C:\>awk "BEGIN{string=\"2014 01 01 0 0 0\"; s=mktime(string); print (systime() - s) } "
                                  664866
                                  this is useful if for example, you are parsing a log file and filtering the date/time column for a specific date.


                                  to be continued

                                  - brianadams

                                  briandams

                                    Topic Starter


                                    Beginner

                                    Thanked: 2
                                    • Experience: Guru
                                    • OS: Unknown
                                    Merging strings of similar items (keys).
                                    « Reply #16 on: January 16, 2014, 06:25:53 AM »
                                    Sometimes you many want to merge a collection of similar items. eg
                                    Code: [Select]
                                    C_1,KOG0155
                                    C_1,KOG0306
                                    C_2,KOG3259
                                    C_3,KOG0931
                                    C_2,KOG3638
                                    C_4,KOG0956
                                    C_6,KOG0155
                                    C_1,KOG0306
                                    C_3,KOG3259
                                    C_4,KOG0931
                                    C_5,KOG3638
                                    C_1,KOG0956

                                    to become something like this:
                                    Code: [Select]
                                    C_1,KOG0155 ,KOG0306,KOG0306,KOG0956
                                    C_2,KOG3259, KOG3638
                                    C_3,KOG0931, KOG3259
                                    C_4,KOG0956, KOG0931
                                    C_6,KOG0155
                                    C_5,KOG3638

                                    You can make use of associative arrays in awk

                                    Code: [Select]
                                    C:\>awk -F"," "{ array[$1] = array[$1]\",\"$2 }END{ for(idx in array) print idx, a[idx]}"
                                    C_3 ,KOG0931,KOG3259
                                    C_4 ,KOG0956,KOG0931
                                    C_5 ,KOG3638
                                    C_6 ,KOG0155
                                    C_1 ,KOG0155,KOG0306,KOG0306,KOG0956
                                    C_2 ,KOG3259,KOG3638

                                    Salmon Trout

                                    • Guest
                                    Re: Awk - A nifty little tool for text manipulation and more.
                                    « Reply #17 on: January 16, 2014, 08:06:49 AM »
                                    Lots of awk stuff lately from you.

                                    Squashman



                                      Specialist
                                    • Thanked: 134
                                    • Experience: Experienced
                                    • OS: Other
                                    Re: Getting User Input and File Reading
                                    « Reply #18 on: January 16, 2014, 08:49:21 AM »

                                    That's how you can call an external DOS command and have it displayed inside awk program itself.

                                    getline returns 1 if it finds a record, and 0 if the end of the file is encountered. If there is some error in getting a record, such as a file that cannot be opened, then getline returns -1. It is generally good practice to always explicitly test for >0 while reading a file or handling input from pipes.

                                    to be continued

                                    - brianadams
                                    The error is stored within AWK's error variable.  It does not pass the error back to the calling batch file or CMD window you have open.

                                    briandams

                                      Topic Starter


                                      Beginner

                                      Thanked: 2
                                      • Experience: Guru
                                      • OS: Unknown
                                      Re: Getting User Input and File Reading
                                      « Reply #19 on: January 16, 2014, 09:03:05 AM »
                                      The error is stored within AWK's error variable.  It does not pass the error back to the calling batch file or CMD window you have open.


                                      awk internally doesn't have a mechanism for checking file existence such as -f test for linux. so most of the time if you want to do that then have to make a system call , OR to call getline and check -1.
                                      Code: [Select]

                                      C:\>awk "BEGIN{ x=getline < \"ddd\"   ; print x  }"
                                      -1
                                      ERRNO is just a string internal for awk.
                                      Code: [Select]
                                      C:\>awk "BEGIN{ getline < \"ddd\"   ; print ERRNO  }"
                                      No such file or directory

                                      so it doesn't get returned to DOS errorlevel. you can capture it though using exit().
                                      Code: [Select]
                                      C:\>awk "BEGIN{ x=getline < \"ddd\"   ; exit(x)  }"
                                      C:\>echo %errorlevel%
                                      -1

                                      or
                                      Code: [Select]
                                      C:\> awk "BEGIN{ if ((\"ddd\" | getline) <= 0 ) exit(-1) ;   }"  2>nul
                                      C:\>echo %errorlevel%
                                      -1

                                      Squashman



                                        Specialist
                                      • Thanked: 134
                                      • Experience: Experienced
                                      • OS: Other
                                      Re: Getting User Input and File Reading
                                      « Reply #20 on: January 16, 2014, 09:42:31 AM »

                                      awk internally doesn't have a mechanism for checking file existence such as -f test for linux. so most of the time if you want to do that then have to make a system call , OR to call getline and check -1.

                                      Then why not use the shells built-in functionality to check for the file existence before running your AWK command.

                                      Code: [Select]
                                      IF EXIST foo.txt awk.........

                                      BC_Programmer


                                        Mastermind
                                      • Typing is no substitute for thinking.
                                      • Thanked: 1140
                                        • Yes
                                        • Yes
                                        • BC-Programming.com
                                      • Certifications: List
                                      • Computer: Specs
                                      • Experience: Beginner
                                      • OS: Windows 11
                                      Re: Getting User Input and File Reading
                                      « Reply #21 on: January 16, 2014, 10:42:34 AM »
                                      Quote
                                      The error is stored within AWK's error variable.  It does not pass the error back to the calling batch file or CMD window you have open.
                                      awk internally doesn't have a mechanism for checking file existence such as -f test for linux. so most of the time if you want to do that then have to make a system call , OR to call getline and check -1.

                                      I can see copy-pasting your posts from another forum practically verbatim, because they had never really been posted here so could be valuable to some.But when responses like the above are copy-pasted verbatim to rather different questions, that's just a bit weird, I think.
                                      I was trying to dereference Null Pointers before it was cool.

                                      Salmon Trout

                                      • Guest
                                      Re: Getting User Input and File Reading
                                      « Reply #22 on: January 16, 2014, 11:12:34 AM »
                                      copy-pasting your posts from another forum

                                      I wondered about that.

                                      briandams

                                        Topic Starter


                                        Beginner

                                        Thanked: 2
                                        • Experience: Guru
                                        • OS: Unknown
                                        Re: Getting User Input and File Reading
                                        « Reply #23 on: January 16, 2014, 04:21:14 PM »
                                        Then why not use the shells built-in functionality to check for the file existence before running your AWK command.

                                        Code: [Select]
                                        IF EXIST foo.txt awk.........
                                        this can be done in awk as well as shown in the examples but if you want to do it in the shell , thats up to individual.

                                        briandams

                                          Topic Starter


                                          Beginner

                                          Thanked: 2
                                          • Experience: Guru
                                          • OS: Unknown
                                          Re: Getting User Input and File Reading
                                          « Reply #24 on: January 16, 2014, 04:24:26 PM »
                                          But when responses like the above are copy-pasted verbatim to rather different questions, that's just a bit weird, I think.

                                          The author for that dostips thread is yours truly . Hence I can copy and paste all I want. I don't have a blog, if not, i would just redirect readers there.  That's not a different question. I just felt the response for the question look a bit similar as i had answered it in dostips. hence the copy and paste.

                                          briandams

                                            Topic Starter


                                            Beginner

                                            Thanked: 2
                                            • Experience: Guru
                                            • OS: Unknown
                                            Re: Getting User Input and File Reading
                                            « Reply #25 on: January 16, 2014, 04:27:38 PM »
                                            I wondered about that.

                                            as explained. I am the original author of that dostip thread.

                                            Squashman



                                              Specialist
                                            • Thanked: 134
                                            • Experience: Experienced
                                            • OS: Other
                                            Re: Awk - A nifty little tool for text manipulation and more.
                                            « Reply #26 on: January 16, 2014, 06:02:27 PM »
                                            Not that hard to start a free blog or free website.

                                            briandams

                                              Topic Starter


                                              Beginner

                                              Thanked: 2
                                              • Experience: Guru
                                              • OS: Unknown
                                              Filtering items in a file with another file
                                              « Reply #27 on: January 17, 2014, 02:41:37 AM »
                                              Sometimes you may need to filter a file using keywords from another file. say you have file1.txt and file2.txt
                                              Code: [Select]

                                              C:\>type file1.txt
                                              cheese
                                              milk
                                              sausage

                                              C:\>type file2.txt
                                              milk
                                              cheese
                                              popcorn
                                              pasta
                                              milk
                                              sausage
                                              cheese
                                              melon

                                              you want to filter file2.txt with file1.txt such that only those not matching remains. eg
                                              Code: [Select]
                                              popcorn
                                              pasta
                                              melon

                                              We can do this with awk one liner.
                                              Code: [Select]
                                              C:\>awk "FNR==NR{ a[$1] ;next} { if ( !($0 in a) ) { print }    }" file1.txt file2.txt
                                              popcorn
                                              pasta
                                              melon

                                              Explanation:
                                              FNR==NR   : FNR means the number of records read so far. NR means the TOTAL number of records read from all files. Hence, the idiom FNR==NR means to read all the records from the first file and store to array.
                                              When awk finish processing the first file, the FNR and NR would be different values, so the 2nd file will be processed. In this case the
                                              Code: [Select]
                                              if ( !($0 in a) ) { print }
                                              statement just says to compare the item inside the array and print the record if not found.

                                              briandams

                                                Topic Starter


                                                Beginner

                                                Thanked: 2
                                                • Experience: Guru
                                                • OS: Unknown
                                                Handy one liners
                                                « Reply #28 on: January 17, 2014, 07:59:42 PM »
                                                Here are some commonly used one liners for file/text parsing

                                                1) Deleting last line of a file
                                                2) Deleting first line of file
                                                3) Print a range of lines
                                                4) Print lines not in a range
                                                5) Concatenating two files
                                                6) Transposing a file
                                                7) Print first and last line
                                                8) Print the line above and below a pattern
                                                9) Print all lines until a matched pattern
                                                10) Print  from a matched pattern till the end of file


                                                1) Deleting last line of a file
                                                Code: [Select]
                                                C:\>type myFile.txt
                                                CAT
                                                MAT
                                                RAT

                                                C:\>awk "BEGIN{ RS=\"\0\"} { for(i=1;i<NF;i++) print $i  } " myFile.txt
                                                CAT
                                                MAT


                                                2) Deleting first line of file
                                                Code: [Select]
                                                C:\> awk "NR>1 { print  } " myFile.txt
                                                MAT
                                                RAT


                                                3) Print a range of lines. eg print line 3 to line 5
                                                Code: [Select]
                                                C:\> type myFile.txt
                                                CAT
                                                MAT
                                                RAT
                                                BAT
                                                TAT
                                                DAT
                                                PAT

                                                C:\> awk "NR==3,NR==5{ print  } " myFile.txt
                                                RAT
                                                BAT
                                                TAT

                                                4) Print lines not in a range . eg don't  print lines number 3 to 5
                                                Code: [Select]
                                                C:\>awk "!(NR>=3 && NR<=5) { print  }"  myFile.txt
                                                CAT
                                                MAT
                                                DAT
                                                PAT

                                                5) Concatenating two files
                                                Code: [Select]
                                                C:\>awk "{print}"  file1 file2 > newFile.txt


                                                6) Transposing a file
                                                Code: [Select]
                                                C:\> awk "BEGIN{ORS=\" \"}{print}" myFile.txt
                                                CAT MAT RAT BAT TAT DAT PAT

                                                7) Print first and last line
                                                Code: [Select]
                                                C:\> awk "NR==1;END{print}" myFile.txt
                                                CAT
                                                PAT

                                                8) Print the line above and below a pattern. eg Search for "RAT" and print the lines above and below
                                                Code: [Select]
                                                C:\> type myFile.txt
                                                CAT
                                                MAT
                                                RAT
                                                BAT
                                                TAT
                                                DAT
                                                PAT

                                                C:\> awk "/RAT/{print y;print;f=1;next}f{print;f=0}{y=$0}" myFile.txt
                                                MAT
                                                RAT
                                                BAT


                                                9) Print all lines until a matched pattern
                                                . eg Print until the word "BAT" is found
                                                Code: [Select]
                                                C:\> awk "/BAT/{exit}{print}" myFile.txt

                                                10) Print  from a matched pattern till the end of file
                                                Code: [Select]
                                                C:\> awk "/TAT/,0" myFile.txt
                                                TAT
                                                DAT
                                                PAT