Back

gawk - GNU awk

Awk is a program that you can use to select particular records in a text file and perform operations upon them. When you run awk, you specify an awk script (containing a list of rules) that tells awk what to do.

If the rule list is short, it is easiest to include it in the command that runs awk, like this:

	awk 'rules' datafile1 datafile2 ...

The complete syntax is:

	awk [-v var=value] [-Fre] [--] 'pattern {action}' [var=value] [datafile(s)]

When the rule list is long, it is usually more convenient to put the rules in a script file and run it with a command like this:

	awk -f scriptfile datafile1 datafile2 ...

The complete syntax is:

	awk [-v var=value] [-Fre] -f scriptfile [--] [var=value] [datafile(s)]

-v initialization options take effect before the program is started and the -v must be repeated for each variable being initialized.

	awk -v var1=value1 -v var2=value2 ...

Other initializations may be interspersed with data filenames.

	awk '{...}' var1=value1 datafile1 var1=value2 datafile2
Here datafile1 is processed with var1=value1 and datafile2 is processed with var1=value2

The -Fre option redefines the default field separator (white space). The argument is a regular expression. The field separator can also be set with the FS built-in variable.

	awk -F '\t' '{...}' files FS="[\f\v]" files
Things proceed left to right so the -F option applies to the first group of files and the value of FS applies to the second group of files.

The -- option indicates that there are no further options for awk. Any remaining options will be available to the script via ARGC and ARGV.

- when used represents /dev/stdin. Also /dev/stdout and /dev/stderr are available within awk scripts.

awk can also be invoked with a self-contained script using the '#!' script (shebang) mechanism.

	#!/usr/bin/awk -f
Self-contained awk scripts (also known as 'shell wrapper' scripts) are useful when you want to write a script that users can invoke without their having to know that the script is written in awk. It is invoked like this:
	./scriptname

Awk parameters and the input filename can be specified on the command line that invokes the script.

Awk can also be invoked with a shell script:

	#!/bin/sh -
	cat datafile | awk '
	BEGIN {...}
	/pattern/ {... long awk script ...}
	END {...}'
	exit 0
	#!/bin/sh -
	$AWK=${AWK:-nawk}
	AWKPROG='
	  ... long script here ...
	'
	$AWK "$AWKPROG" "$@"

The script consists of a series of rules. Each rule specifies one pattern to search for and/or one action to perform upon finding the pattern. The action is enclosed in curly braces to separate it from the pattern. Newlines usually separate rules. The pattern can consist of:

  • /regular expression/ (must be enclosed in slashes)
  • relational expression (uses < <= > >= != ==)
  • BEGIN
  • END
  • pattern, pattern (addresses a range of lines as in sed)
Therefore, an awk script looks like this:
	pattern { action }
	pattern { action }
	pattern (no action - the default action is to print the line)
	{ action } (no pattern - the action is applied to all lines)
	...

Technical information is available from

	info gawk

A user's guide can be found at http://www.gnu.org/software/gawk/manual/html_node/

The original version of awk was written in 1977 at AT&T Bell Laboratories. The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan.

Paul Rubin wrote the GNU implementation, gawk, in 1986. It is fully compatible with the System V Release 4 version of awk. gawk is also compatible with the POSIX specification of the awk language. This means that all properly written awk scripts should work with gawk. Thus, we usually don't distinguish between gawk and other awk implementations.

awk is a full featured programming language. It does Conditionals, Loops, and Arrays. It has User Defined Functions, Variables, Relational and Boolean Operators and does I/O.

There are two big differences between awk and traditional programming languages; awk expects a text file to operate on, and its programming script consists of three parts; A BEGIN section, A main loop section and an END section.

It is in the main loop section that each line of the input text is processed, once for each applicable rule. i.e. if the input line passes the pattern test the action is applied to it. The BEGIN section executes before the main loop is entered and the END section executes after the main loop completes for the last time.

Here is a skeleton shell wrapper script for awk:

#!/usr/bin/awk -f
BEGIN {
  # Rules here execute first. No input lines are processed.
}

/<Regex pattern>/ {
# Rules here processes the input one line at a time if
it matches the Regex pattern. In an awk rule, either the pattern or the
action can be omitted. If the pattern is omitted, then the
action is performed for every input line. If the action is omitted, the
default action is to print all lines that match the pattern.
}

END {
  # Rules here execute just before the script terminates
}

All three blocks are optional. If used, the braces are mandatory Here is a complete awk script

#!/usr/bin/awk -f
BEGIN { print "Hello, world!" }

Technically, there is only one section to an awk script which contain a sequence of pattern-action statements with the format

	pattern command
	pattern command
	etc
If the pattern matches the current input line, the command is executed. BEGIN and END are special patterns that are not tested against the input. The words action and command are synonymous.

Looping constructs follow. Multiple action statements must be enclosed in braces.

Conditionals

if ( expression )
  action1
else
  action2

Another form:

if ( expression ) action1; else action2

Ternary Conditional

expression ? action1 : action2

Loops

while (condition)
  action

do
  action
while (condition)

for set_counter; test_counter; incremewnt_counter
  action
Arrays
All arrays are associative. Index can be numeric or string
Arrays can be mulit-dimensional - array[0,1]
array[index] = value
print array[index]
delete array[index]
for(item in array)
if(index in array)
delete array -or-
split("", array)

External commands: The system() function executes a command specified as a character string. The command executes and displays its output; however the output is not available for use in the script. For that, use a command pipeline. The pipeline can contain arbitrary shell commands.

	"date" | getline now
	close("date")
	print "The current time is ", now

To use a command pipeline in a loop:

	command = "head -n 15 /etc/hosts"
	while ((command | getline s) > 0)
	  print s
	close(command)