Skip to main content
Text Processing:

AWK cheat sheet

Summary

This AWK cheat sheet provides essential commands, syntax, variables, and examples for text processing and data extraction. It covers environment, built-in, and user-defined variables, field separators, pattern matching, conditional statements, loops, and arithmetic operations.

Introduction #

AWK is a versatile programming language designed for text processing and data extraction. Named after its creators — Alfred Aho, Peter Weinberger, and Brian Kernighan — AWK is particularly useful for working with structured data, such as log files, CSV files, and other text-based formats.

Use this cheat sheet as a quick reference to help you master AWK.

Basic syntax #

The general syntax of an AWK command is:

awk 'pattern { action }' filename
  • pattern: Defines when the action should be executed.
  • action: Specifies what to do when the pattern matches.

Example:

awk '{ print $1 }' file.txt

This command prints the first column of each line in file.txt.

Variables #

Environment variables #

Different implementations of AWK support various environment variables:

VariableDescriptionSupported By
AWKPATHSearch path for AWK programsgawk
AWKLIBPATHSearch path for dynamically loaded librariesgawk
POSIXLY_CORRECTEnables strict POSIX compliancegawk, mawk
LC_CTYPEDefines character encodinggawk, mawk, nawk
ARGCNumber of command-line argumentsnawk, gawk
ARGVArray of command-line argumentsnawk, gawk
ARGINDIndex in ARGV of current file being processedgawk
FNRRecord number in the current filenawk, gawk
OFMTOutput format for numbers (default %.6g)nawk, gawk
RSTARTStart position of the most recent matchnawk, gawk
RLENGTHLength of the most recent matchnawk, gawk
SUBSEPSeparator for multi-dimensional array indicesnawk, gawk
ENVIRONArray containing environment variablesgawk
IGNORECASEControls case-insensitive matching (1 = enabled)gawk
CONVFMTNumber conversion format (default %.6g)gawk
ERRNOSystem error messages from failed I/O operationsgawk
FIELDWIDTHSSpecifies fixed-width fields for input parsinggawk

These environment variables enhance AWK’s functionality, providing control over input handling, error reporting, and numerical formatting. Depending on your AWK implementation, some of these variables may not be available.

Built-in variables #

AWK provides several built-in variables that help with data manipulation:

VariableDescription
$0The entire current line
$1, $2, ...Fields in the line
NRCurrent record (line) number
NFNumber of fields in the current line
FSField separator (default is space or tab)
OFSOutput field separator
RSRecord separator
ORSOutput record separator
FNRRecord number in the current file
ARGCNumber of command-line arguments
ARGVArray of command-line arguments
FILENAMEName of the current input file

Example:

awk '{ print NR, $0 }' file.txt

This prints each line prefixed with its line number.

User defined variables #

AWK allows you to define and use custom variables within your scripts.

Custom variables do not need to be declared before use and are initialized to an empty string or zero.

Defining and using variables #

This AWK script defines two variables. It performs the following steps:

BEGIN { x = 10; y = 20; print "Sum:", x + y }
  1. BEGIN Block:
    The BEGIN block is a special pattern in AWK that is executed before any input is processed. It is typically used for initialization tasks.
  2. Variable Assignment:
    Inside the BEGIN block, two variables, x and y, are assigned the values 10 and 20, respectively.
  3. Print Statement:
    The print command outputs the string "Sum:" followed by the result of the expression x + y, which is the sum of x and y.

Modifying variables inside a script #

This AWK script modifies the variables. It performs the following steps:

{ count += 1; total += $2 } 
END { print "Total records:", count; print "Sum of column 2:", total }
  1. Main Block ({ ... }):
    • This block is executed for every line of the input.
    • count += 1: Increments the count variable by 1 for each line processed. This effectively counts the total number of records (lines) in the input.
    • total += $2: Adds the value of the second field (column) of the current line to the total variable. This calculates the cumulative sum of all values in the second column.
  2. END Block:
    • The END block is executed after all input lines have been processed.
    • print "Total records:", count: Prints the total number of records (lines) processed.
    • print "Sum of column 2:", total: Prints the sum of all values in the second column.

User-defined variables in command line #

You can also define variables directly in the command line:

awk -v myvar="Hello" 'BEGIN { print myvar }'

Field separators #

AWK uses field separators to split input lines into fields, which can be accessed using $1, $2, and so on. The default field separator is any whitespace (spaces or tabs). You can modify field separators using the FS (input field separator) and OFS (output field separator) variables.

Changing the field separator #

You can change the field separator by setting FS within an awk command:

awk 'BEGIN { FS="," } { print $1, $2 }' file.csv

This sets FS to a comma, allowing AWK to process comma-separated values.

Using -F to set the field separator #

Instead of using FS in a BEGIN block, you can specify the separator with the -F option:

awk -F":" '{ print $1, $2 }' /etc/passwd

This sets the field separator to a colon (:), commonly used in system files like /etc/passwd.

Multi-character field separators #

You can use regular expressions for more complex field separators:

awk 'BEGIN { FS="[,:]" } { print $1, $2 }' file.txt

This splits fields on either a comma or a colon.

Output field separator (OFS) #

To change how AWK prints output fields, modify OFS:

awk 'BEGIN { FS=","; OFS="|" } { print $1, $2 }' file.csv

This converts a comma-separated file into a pipe-separated (|) format.

Fixed-width field parsing with FIELDWIDTHS #

For fixed-width data, FIELDWIDTHS can define column widths explicitly:

awk 'BEGIN { FIELDWIDTHS="5 10 8" } { print $1, $2, $3 }' file.txt

This sets the first field to 5 characters, the second to 10, and the third to 8.

Pattern matching #

Pattern matching in AWK is primarily done using regular expressions. A pattern specifies a condition that must be met for the associated action to be executed. AWK supports built-in pattern-matching operators such as /pattern/, ~ (matches), and !~ (does not match).

  • /pattern/ matches lines that contain the specified pattern.
  • expression ~ /pattern/ checks if an expression matches the pattern.
  • expression !~ /pattern/ checks if an expression does not match the pattern.

Example:

awk '/error/ { print $0 }' logfile.txt

This prints all lines in logfile.txt that contain the word “error”.

Conditionals and loops #

You can use if, for, and while for control flow in AWK scripts.

If statement #

Example using if:

awk '{ if ($3 > 50) print $1, $3 }' file.txt

This prints the first and third column only if the third column is greater than 50.

For loop #

Example using for loop:

awk '{ for (i = 1; i <= NF; i++) print $i }' file.txt

This prints each field on a new line.

While loop #

Example using while loop:

awk '{ i = 1; while (i <= NF) { print $i; i++ } }' file.txt

This prints each field on a new line using a while loop.

Arithmetic operations #

AWK supports a variety of arithmetic operations:

Basic arithmetic operators #

OperatorDescription
+Addition
-Subtraction
*Multiplication
/Division
%Modulus (remainder)
^ or **Exponentiation

Unary arithmetic operators #

OperatorDescription
+Unary plus (positive value)
-Unary minus (negative value)

Autoincrement and autodecrement operators #

OperatorDescription
++varPre-increment (increments before use)
var++Post-increment (increments after use)
--varPre-decrement (decrements before use)
var--Post-decrement (decrements after use)

Assignment operators #

OperatorDescription
=Assign value
+=Add and assign
-=Subtract and assign
*=Multiply and assign
/=Divide and assign
%=Modulus and assign
^= or **=Exponentiation and assign

FAQ's #

Most common questions and brief, easy-to-understand answers on the topic:

What is AWK used for?

AWK is a text-processing language used for pattern scanning and processing in files or streams. It is commonly used in data extraction, transformation, and reporting.

How do I run an AWK command?

You can run an AWK command directly in the terminal using: awk '{print $1}' filename. This example prints the first column of a file.

How do you run an AWK script?

You can run an AWK script using the command awk -f script.awk inputfile, where script.awk contains your AWK commands and inputfile is the data file.

What are AWK variables?

AWK provides built-in variables like NR (number of records), NF (number of fields), and $0 (entire line). You can also define custom variables.

Can AWK be used with regular expressions?

Yes, AWK supports regular expressions for pattern matching. For example, awk '/pattern/ {print $0}' file prints lines containing 'pattern'.

What is the difference between AWK and sed?

AWK is a full-fledged programming language for text processing, while sed is a stream editor mainly used for simple text substitutions and line-based editing.

What are AWK patterns and actions?

AWK works by matching patterns in the input data and executing actions. A pattern is a condition that, when met, triggers the associated action, such as printing a line or performing a calculation.

Can AWK handle multiple input files?

Yes, AWK can process multiple input files sequentially. You can specify multiple files in the command, e.g., awk '{print}' file1 file2.

Further readings #

Sources and recommended, further resources on the topic:

Author

Jonas Jared Jacek • J15k

Jonas Jared Jacek (J15k)

Jonas works as project manager, web designer, and web developer since 2001. On top of that, he is a Linux system administrator with a broad interest in things related to programming, architecture, and design. See: https://www.j15k.com/

License

AWK cheat sheet by Jonas Jared Jacek is licensed under CC BY-SA 4.0.

This license requires that reusers give credit to the creator. It allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, for noncommercial purposes only. To give credit, provide a link back to the original source, the author, and the license e.g. like this:

<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://www.ditig.com/awk-cheat-sheet">AWK cheat sheet</a> by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://www.j15k.com/">Jonas Jared Jacek</a> is licensed under <a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank" rel="license noopener noreferrer">CC BY-SA 4.0</a>.</p>

For more information see the Ditig legal page.

All Topics

Random Quote

“Less is more.”

Ludwig Mies van der Rohe German architect and designerWord of mouth, - IT quotes