Saturday, 6 August 2011

AWK - Introduction

AWK is a convenient and expressive programming language that can be applied to a wide variety of computing and data-manipulation tasks. AWK was created at Bell Labs in the 1970s. It is one of the early tools to appear in Version 7 Unix and gained popularity as a way to add computational features to a Unix pipeline. There are three variations of AWK:
                AWK - the original from AT&T
                NAWK - A newer, improved version from AT&T
                GAWK - The Free Software foundation's version
The awk utility is a pattern scanning and processing language.
It searches one or more files to see if they contain lines that match specified patterns and then perform associatedactions , such as writing the line to the standard output or incrementing a counter each time it finds a match.
Some of the features of awk are:
  • Its ability to view a text file as made up of records and fields in a textual database.
  • Its use of variables to manipulate the database.
  • Its use of arithmetic and string operators.
  • Its use of common programming constructs such as loops and conditionals.
  • Its ability to generate formatted reports.
Basic Structure
An awk program consists of one or more program lines containing a pattern and/or action in the following format:
        pattern { action }

The pattern selects lines from the input file. The awk utility performs the action on all lines that the pattern selects. You must enclose the action within braces so that awk can differentiate it from the pattern.  Two other important patterns are specified by the keywords "BEGIN" and "END." As you might expect, these two words specify actions to be taken before any lines are read, and after the last line is read.
The AWK program below:

BEGIN { print "START" }
{ print         }
END      { print "STOP" }

adds one line before and one line after the input file. This isn't very useful, but with a simple change, we can make this into a typical AWK program:

BEGIN { print "File\tOwner"," }
{ print $8, "\t", $3}
END { print " - DONE -" }

The characters "\t" Indicates a tab character. The "$8" and "$3" have a meaning similar to a shell script. Instead of the eighth and third argument, they mean the eighth and third field of the input line. You can think of a field as a column, and the action you specify operates on each line or row read in.

There are two differences between AWK and a shell.
  1. AWK understands special characters follow the “\” character like “t”. The Brourne and C UNIX shells do not.
  2. AWK doesnot evaluate variables within strings.

Since AWK is also an interpretor, you can save yourself a step and make the file executable by add one line in the beginning of the file: #!/bin/awk -f
BEGIN { print "File\tOwner" }
{ print $9, "\t", $3}
END { print " - DONE -" }

Change the permission with the chmod command, (i.e. "chmod +x awk_filename..awk"), and the script becomes a new command. The "-f" option specifies the AWK file containing the instructions. As you can see, AWK considers lines that start with a "#" to be a comment, just like the shell.

AWK command syntax



Command Line Option
Purpose
-f program-file
The -f program-file option specifies the file containing the awk program code to execute, and is used as an alternative to writing the code on the command line with the program source option.
program source
The program source command line option is used to specify awk code on the command line itself. If this option is used the awk code is best enclosed in single quotes (') to protect it from the shell.
-Fc
The -Fc command line option allows you to specify the field seperator (FS) character. By default this is set to whitespace (SPACE and TAB). To set the field seperator to the number zero you would add -F0 or -F"0" to the command line.
variable=value
This option enables us to initialise variables on the command line. To do this we use the format variable=value, which will set the appropriate variableto its related value prior to execution.


AWK Functions
Awk provides us with several built in functions for manipulating numbers and strings.
Function Name
Operation
length(string) returns the number of characters in string

int(number) returns the integer portion of number.
index(string1, string2) returns the index of string2 in string1 or 0 if string2 is not present.
split(string, array, delimiter) places elements of string, delimited by the delimeter, in the array array[1]...array[n]; returning the number of elements in the array.
sprintf(format, arguments) formats arguments according to the format and returns the formatted string; mimics the C programming language function of the same name.
substr(string, position, length) returns a substring of string that begins at position and is length characters long.






Arithmetic Expressions
There are several arithmetic operators, similar to C. These are the binary operators, which operate on two variables:

+                 Arithmetic   Addition       
-                  Arithmetic   Subtraction    
*                 Arithmetic   Multiplication 
/                  Arithmetic   Division       
%                Arithmetic   Modulo         
<space>      String        Concatenation  

Unary arithmetic operators
The "+" and "-" operators can be used before variables and numbers. If X equals 4, then the statement:
print -x;
will print "-4."

Autoincrement and Autodecrement operators
AWK also supports the "++" and "--" operators of C. The operator can only be used with a single variable, and can be before or after the variable.  As an example, if X has the value of 3, then the AWK statement
print x++, " ", ++x;
would print the numbers 3 and 5.

Assignment Operators
Variables can be assigned new values with the assignment operators.
variable = arithmetic_expression
Certain operators have precedence over others; parenthesis can be used to control grouping. The statement
x=1+2*3 4;
is the same as
x = (1 + (2 * 3)) "4";
The complete list follows:

+=           Add result to variable        
-=            Subtract result from variable  
*=            Multiply variable by result   
/=            Divide variable by result     
%=          Apply modulo to variable   

Relational operators
Arithmetic values can also be converted into boolean conditions by using relational operators:
==            Is equal                    
!=             Is not equal to             
>              Is greater than             
>=            Is greater than or equal to 
<              Is less than                
<=            Is less than or equal to    
These operators are the same as the C operators. They can be used to compare numbers or strings. With respect to strings, lower case letters are greater than upper case letters.

Regular Expressions
Two operators are used to compare strings to regular expressions:
~ Matches
!~ Doesn't match
The regular expression must be enclosed by slashes, and comes after the operator. AWK supports extended regular expressions, so the following are examples of valid tests:
word !~ /START/ lawrence_welk ~ /(one|two|three)/

And/Or/Not
you can combine two conditional expressions with the "or" or "and" operators: "&&" and "||." There is also the unary not operator: "!."

Commands
There are only a few commands in AWK. The list and syntax follows:


if ( conditional ) statement [ else statement ] 



while ( conditional ) statement  



for ( expression ; conditional ; expression ) statement 



for ( variable in array ) statement



break 



continue 



{ [ statement ] ...}  variable=expression



 print [ expression-list ] [ > expression ]



printf format [ , expression-list ] [ > expression ] 



next 



exit
If you want to know more about AWK, visit Awk - A tutorial and Introduction

No comments:

Post a Comment