AWK cheat sheet
Summary
This AWK cheat sheet provides essential commands, syntax, variables, and examples for text processing and data extraction. It covers environment, built-in, and user-defined variables, field separators, pattern matching, conditional statements, loops, and arithmetic operations.
Introduction #
AWK is a versatile programming language designed for text processing and data extraction. Named after its creators — Alfred Aho, Peter Weinberger, and Brian Kernighan — AWK is particularly useful for working with structured data, such as log files, CSV files, and other text-based formats.
Use this cheat sheet as a quick reference to help you master AWK.
Basic syntax #
The general syntax of an AWK command is:
awk 'pattern { action }' filename
pattern
: Defines when theaction
should be executed.action
: Specifies what to do when the pattern matches.
Example:
awk '{ print $1 }' file.txt
This command prints the first column of each line in file.txt
.
Variables #
Environment variables #
Different implementations of AWK support various environment variables:
Variable | Description | Supported By |
---|---|---|
AWKPATH | Search path for AWK programs | gawk |
AWKLIBPATH | Search path for dynamically loaded libraries | gawk |
POSIXLY_CORRECT | Enables strict POSIX compliance | gawk, mawk |
LC_CTYPE | Defines character encoding | gawk, mawk, nawk |
ARGC | Number of command-line arguments | nawk, gawk |
ARGV | Array of command-line arguments | nawk, gawk |
ARGIND | Index in ARGV of current file being processed | gawk |
FNR | Record number in the current file | nawk, gawk |
OFMT | Output format for numbers (default %.6g ) | nawk, gawk |
RSTART | Start position of the most recent match | nawk, gawk |
RLENGTH | Length of the most recent match | nawk, gawk |
SUBSEP | Separator for multi-dimensional array indices | nawk, gawk |
ENVIRON | Array containing environment variables | gawk |
IGNORECASE | Controls case-insensitive matching (1 = enabled) | gawk |
CONVFMT | Number conversion format (default %.6g ) | gawk |
ERRNO | System error messages from failed I/O operations | gawk |
FIELDWIDTHS | Specifies fixed-width fields for input parsing | gawk |
These environment variables enhance AWK’s functionality, providing control over input handling, error reporting, and numerical formatting. Depending on your AWK implementation, some of these variables may not be available.
Built-in variables #
AWK provides several built-in variables that help with data manipulation:
Variable | Description |
---|---|
$0 | The entire current line |
$1, $2, ... | Fields in the line |
NR | Current record (line) number |
NF | Number of fields in the current line |
FS | Field separator (default is space or tab) |
OFS | Output field separator |
RS | Record separator |
ORS | Output record separator |
FNR | Record number in the current file |
ARGC | Number of command-line arguments |
ARGV | Array of command-line arguments |
FILENAME | Name of the current input file |
Example:
awk '{ print NR, $0 }' file.txt
This prints each line prefixed with its line number.
User defined variables #
AWK allows you to define and use custom variables within your scripts.
Custom variables do not need to be declared before use and are initialized to an empty string or zero.
Defining and using variables #
This AWK script defines two variables. It performs the following steps:
BEGIN { x = 10; y = 20; print "Sum:", x + y }
BEGIN
Block:
TheBEGIN
block is a special pattern in AWK that is executed before any input is processed. It is typically used for initialization tasks.- Variable Assignment:
Inside theBEGIN
block, two variables,x
andy
, are assigned the values10
and20
, respectively. - Print Statement:
Theprint
command outputs the string"Sum:"
followed by the result of the expressionx + y
, which is the sum ofx
andy
.
Modifying variables inside a script #
This AWK script modifies the variables. It performs the following steps:
{ count += 1; total += $2 }
END { print "Total records:", count; print "Sum of column 2:", total }
- Main Block (
{ ... }
):- This block is executed for every line of the input.
count += 1
: Increments thecount
variable by1
for each line processed. This effectively counts the total number of records (lines) in the input.total += $2
: Adds the value of the second field (column) of the current line to thetotal
variable. This calculates the cumulative sum of all values in the second column.
END
Block:- The
END
block is executed after all input lines have been processed. print "Total records:", count
: Prints the total number of records (lines) processed.print "Sum of column 2:", total
: Prints the sum of all values in the second column.
- The
User-defined variables in command line #
You can also define variables directly in the command line:
awk -v myvar="Hello" 'BEGIN { print myvar }'
Field separators #
AWK uses field separators to split input lines into fields, which can be accessed using $1
, $2
, and so on. The default field separator is any whitespace (spaces or tabs). You can modify field separators using the FS
(input field separator) and OFS
(output field separator) variables.
Changing the field separator #
You can change the field separator by setting FS
within an awk
command:
awk 'BEGIN { FS="," } { print $1, $2 }' file.csv
This sets FS
to a comma, allowing AWK to process comma-separated values.
Using -F
to set the field separator #
Instead of using FS
in a BEGIN
block, you can specify the separator with the -F
option:
awk -F":" '{ print $1, $2 }' /etc/passwd
This sets the field separator to a colon (:
), commonly used in system files like /etc/passwd
.
Multi-character field separators #
You can use regular expressions for more complex field separators:
awk 'BEGIN { FS="[,:]" } { print $1, $2 }' file.txt
This splits fields on either a comma or a colon.
Output field separator (OFS
) #
To change how AWK prints output fields, modify OFS
:
awk 'BEGIN { FS=","; OFS="|" } { print $1, $2 }' file.csv
This converts a comma-separated file into a pipe-separated (|
) format.
Fixed-width field parsing with FIELDWIDTHS
#
For fixed-width data, FIELDWIDTHS
can define column widths explicitly:
awk 'BEGIN { FIELDWIDTHS="5 10 8" } { print $1, $2, $3 }' file.txt
This sets the first field to 5 characters, the second to 10, and the third to 8.
Pattern matching #
Pattern matching in AWK is primarily done using regular expressions. A pattern specifies a condition that must be met for the associated action to be executed. AWK supports built-in pattern-matching operators such as /pattern/
, ~
(matches), and !~
(does not match).
/pattern/
matches lines that contain the specified pattern.expression ~ /pattern/
checks if an expression matches the pattern.expression !~ /pattern/
checks if an expression does not match the pattern.
Example:
awk '/error/ { print $0 }' logfile.txt
This prints all lines in logfile.txt
that contain the word “error”.
Conditionals and loops #
You can use if
, for
, and while
for control flow in AWK scripts.
If statement #
Example using if
:
awk '{ if ($3 > 50) print $1, $3 }' file.txt
This prints the first and third column only if the third column is greater than 50.
For loop #
Example using for
loop:
awk '{ for (i = 1; i <= NF; i++) print $i }' file.txt
This prints each field on a new line.
While loop #
Example using while
loop:
awk '{ i = 1; while (i <= NF) { print $i; i++ } }' file.txt
This prints each field on a new line using a while
loop.
Arithmetic operations #
AWK supports a variety of arithmetic operations:
Basic arithmetic operators #
Operator | Description |
---|---|
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Modulus (remainder) |
^ or ** | Exponentiation |
Unary arithmetic operators #
Operator | Description |
---|---|
+ | Unary plus (positive value) |
- | Unary minus (negative value) |
Autoincrement and autodecrement operators #
Operator | Description |
---|---|
++var | Pre-increment (increments before use) |
var++ | Post-increment (increments after use) |
--var | Pre-decrement (decrements before use) |
var-- | Post-decrement (decrements after use) |
Assignment operators #
Operator | Description |
---|---|
= | Assign value |
+= | Add and assign |
-= | Subtract and assign |
*= | Multiply and assign |
/= | Divide and assign |
%= | Modulus and assign |
^= or **= | Exponentiation and assign |
FAQ's #
Most common questions and brief, easy-to-understand answers on the topic:
What is AWK used for?
AWK is a text-processing language used for pattern scanning and processing in files or streams. It is commonly used in data extraction, transformation, and reporting.
How do I run an AWK command?
You can run an AWK command directly in the terminal using: awk '{print $1}' filename
. This example prints the first column of a file.
How do you run an AWK script?
You can run an AWK script using the command awk -f script.awk inputfile
, where script.awk
contains your AWK commands and inputfile
is the data file.
What are AWK variables?
AWK provides built-in variables like NR
(number of records), NF
(number of fields), and $0
(entire line). You can also define custom variables.
Can AWK be used with regular expressions?
Yes, AWK supports regular expressions for pattern matching. For example, awk '/pattern/ {print $0}' file
prints lines containing 'pattern'.
What is the difference between AWK and sed?
AWK is a full-fledged programming language for text processing, while sed
is a stream editor mainly used for simple text substitutions and line-based editing.
What are AWK patterns and actions?
AWK works by matching patterns in the input data and executing actions. A pattern is a condition that, when met, triggers the associated action, such as printing a line or performing a calculation.
Can AWK handle multiple input files?
Yes, AWK can process multiple input files sequentially. You can specify multiple files in the command, e.g., awk '{print}' file1 file2
.
Further readings #
Sources and recommended, further resources on the topic:
License
AWK cheat sheet by Jonas Jared Jacek is licensed under CC BY-SA 4.0.
This license requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only. To give credit, provide a link back to the original source, the author, and the license e.g. like this:
<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://www.ditig.com/awk-cheat-sheet">AWK cheat sheet</a> by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://www.j15k.com/">Jonas Jared Jacek</a> is licensed under <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="license noopener noreferrer">CC BY-SA 4.0</a>.</p>
For more information see the Ditig legal page.