GNU Info

Info Node: (gawk.info)Basic Data Typing

(gawk.info)Basic Data Typing


Next: Floating Point Issues Prev: Basic High Level Up: Basic Concepts
Enter node , (file) or (file)node

Data Values in a Computer
=========================

   In a program, you keep track of information and values in things
called "variables".  A variable is just a name for a given value, such
as `first_name', `last_name', `address', and so on.  `awk' has several
pre-defined variables, and it has special names to refer to the current
input record and the fields of the record.  You may also group multiple
associated values under one name, as an array.

   Data, particularly in `awk', consists of either numeric values, such
as 42 or 3.1415927, or string values.  String values are essentially
anything that's not a number, such as a name.  Strings are sometimes
referred to as "character data", since they store the individual
characters that comprise them.  Individual variables, as well as
numeric and string variables, are referred to as "scalar" values.
Groups of values, such as arrays, are not scalars.

   Within computers, there are two kinds of numeric values: "integers",
and "floating-point".  In school, integer values were referred to as
"whole" numbers--that is, numbers without any fractional part, such as
1, 42, or -17.  The advantage to integer numbers is that they represent
values exactly.  The disadvantage is that their range is limited.  On
most modern systems, this range is -2,147,483,648 to 2,147,483,647.

   Integer values come in two flavors: "signed" and "unsigned".  Signed
values may be negative or positive, with the range of values just
described.  Unsigned values are always positive.  On most modern
systems, the range is from 0 to 4,294,967,295.

   Floating-point numbers represent what are called "real" numbers;
i.e., those that do have a fractional part, such as 3.1415927.  The
advantage to floating-point numbers is that they can represent a much
larger range of values.  The disadvantage is that there are numbers
that they cannot represent exactly.  `awk' uses "double-precision"
floating-point numbers, which can hold more digits than
"single-precision" floating-point numbers.  Floating-point issues are
discussed more fully in Note: Floating-Point Number Caveats.


   At the very lowest level, computers store values as groups of binary
digits, or "bits".  Modern computers group bits into groups of eight,
called "bytes".  Advanced applications sometimes have to manipulate
bits directly, and `gawk' provides functions for doing so.

   While you are probably used to the idea of a number without a value
(i.e., zero), it takes a bit more getting used to the idea of
zero-length character data.  Nevertheless, such a thing exists.  It is
called the "null string".  The null string is character data that has
no value.  In other words, it is empty.  It is written in `awk' programs
like this: `""'.

   Humans are used to working in decimal; i.e., base 10.  In base 10,
numbers go from 0 to 9, and then "roll over" into the next column.
(Remember grade school? 42 is 4 times 10 plus 2.)

   There are other number bases though.  Computers commonly use base 2
or "binary", base 8 or "octal", and base 16 or "hexadecimal".  In
binary, each column represents two times the value in the column to its
right. Each column may contain either a 0 or a 1.  Thus, binary 1010
represents 1 times 8, plus 0 times 4, plus 1 times 2, plus 0 times 1,
or decimal 10.  Octal and hexadecimal are discussed more in Note: Octal
and Hexadecimal Numbers.

   Programs are written in programming languages.  Hundreds, if not
thousands, of programming languages exist.  One of the most popular is
the C programming language.  The C language had a very strong influence
on the design of the `awk' language.

   There have been several versions of C.  The first is often referred
to as "K&R" C, after the initials of Brian Kernighan and Dennis Ritchie,
the authors of the first book on C.  (Dennis Ritchie created the
language, and Brian Kernighan was one of the creators of `awk'.)

   In the mid-1980's, an effort began to produce an international
standard for C.  This work culminated in 1989, with the production of
the ANSI standard for C.  This standard became an ISO standard in 1990.
Where it makes sense, POSIX `awk' is compatible with 1990 ISO C.

   In 1999, a revised ISO C standard was approved and released.  Future
versions of `gawk' will be as compatible as possible with this standard.


automatically generated by info2www version 1.2.2.9