GNU Info

Info Node: (zsh.info)Filename Generation

(zsh.info)Filename Generation


Prev: Filename Expansion Up: Expansion
Enter node , (file) or (file)node

Filename Generation
===================

If a word contains an unquoted instance of one of the characters `*',
`(', `|', `<', `[', or `?', it is regarded as a pattern for filename
generation, unless the GLOB option is unset.  If the EXTENDED_GLOB
option is set, the `^' and `#' characters also denote a pattern;
otherwise they are not treated specially by the shell.

The word is replaced with a list of sorted filenames that match the
pattern.  If no matching pattern is found, the shell gives an error
message, unless the NULL_GLOB option is set, in which case the word is
deleted; or unless the NOMATCH option is unset, in which case the word
is left unchanged.

In filename generation, the character `/' must be matched explicitly;
also, a `.' must be matched explicitly at the beginning of a pattern or
after a `/', unless the GLOB_DOTS option is set.  No filename
generation pattern matches the files `.' or `..'.  In other instances
of pattern matching, the `/' and `.' are not treated specially.

Glob Operators
--------------

*
     Matches any string, including the null string.

?
     Matches any character.

[...]
     Matches any of the enclosed characters.  Ranges of characters can
     be specified by separating two characters by a `-'.  A `-' or `]'
     may be matched by including it as the first character in the list.
     There are also several named classes of characters, in the form
     `[:NAME:]' with the following meanings:  `[:alnum:]' alphanumeric,
     `[:alpha:]' alphabetic, `[:blank:]' space or tab, `[:cntrl:]'
     control character, `[:digit:]' decimal digit, `[:graph:]'
     printable character except whitespace, `[:lower:]' lowercase
     letter, `[:print:]' printable character, `[:punct:]' printable
     character neither alphanumeric nor whitespace, `[:space:]'
     whitespace character, `[:upper:]' uppercase letter, `[:xdigit:]'
     hexadecimal digit.  These use the macros provided by the operating
     system to test for the given character combinations, including any
     modifications due to local language settings:  see man page
     ctype(3).  Note that the square brackets are additional to those
     enclosing the whole set of characters, so to test for a single
     alphanumeric character you need `[[:alnum:]]'.  Named character
     sets can be used alongside other types, e.g. `[[:alpha:]0-9]'.

[^...]
[!...]
     Like [...], except that it matches any character which is not in
     the given set.

<[X]-[Y]>
     Matches any number in the range X to Y, inclusive.  Either of the
     numbers may be omitted to make the range open-ended; hence `<->'
     matches any number.  To match individual digits, the [...] form is
     more efficient.

     Be careful when using other wildcards adjacent to patterns of this
     form; for example, <0-9>* will actually match any number
     whatsoever at the start of the string, since the `<0-9>' will
     match the first digit, and the `*' will match any others.  This is
     a trap for the unwary, but is in fact an inevitable consequence of
     the rule that the longest possible match always succeeds.
     Expressions such as `<0-9>[^[:digit:]]*' can be used instead.

(...)
     Matches the enclosed pattern.  This is used for grouping.  If the
     KSH_GLOB option is set, then a `@', `*', `+', `?' or `!'
     immediately preceding the `(' is treated specially, as detailed
     below. The option SH_GLOB prevents bare parentheses from being
     used in this way, though the KSH_GLOB option is still available.

     Note that grouping cannot extend over multiple directories: it is
     an error to have a `/' within a group (this only applies for
     patterns used in filename generation).  There is one exception:  a
     group of the form (PAT/)# appearing as a complete path segment can
     match a sequence of directories.  For example, foo/(a*/)#bar
     matches foo/bar, foo/any/bar, foo/any/anyother/bar, and so on.

X|Y
     Matches either X or Y.  This operator has lower precedence than
     any other.  The `|' character must be within parentheses, to avoid
     interpretation as a pipeline.

^X
     (Requires EXTENDED_GLOB to be set.)  Matches anything except the
     pattern X.  This has a higher precedence than `/', so `^foo/bar'
     will search directories in `.' except `./foo' for a file named
     `bar'.

X~Y
     (Requires EXTENDED_GLOB to be set.)  Match anything that matches
     the pattern X but does not match Y.  This has lower precedence
     than any operator except `|', so `*/*~foo/bar' will search for all
     files in all directories in `.'  and then exclude `foo/bar' if
     there was such a match.  Multiple patterns can be excluded by
     `FOO~BAR~BAZ'.  In the exclusion pattern (Y), `/' and `.' are not
     treated specially the way they usually are in globbing.

X#
     (Requires EXTENDED_GLOB to be set.)  Matches zero or more
     occurrences of the pattern X.  This operator has high precedence;
     `12#' is equivalent to `1(2#)', rather than `(12)#'.  It is an
     error for an unquoted `#' to follow something which cannot be
     repeated; this includes an empty string, a pattern already
     followed by `##', or parentheses when part of a KSH_GLOB pattern
     (for example, `!(FOO)#' is invalid and must be replaced by
     `*(!(FOO))').

X##
     (Requires EXTENDED_GLOB to be set.)  Matches one or more
     occurrences of the pattern X.  This operator has high precedence;
     `12##' is equivalent to `1(2##)', rather than `(12)##'.  No more
     than two active `#' characters may appear together.

ksh-like Glob Operators
-----------------------

If the KSH_GLOB option is set, the effects of parentheses can be
modified by a preceding `@', `*', `+', `?' or `!'.  This character need
not be unquoted to have special effects, but the `(' must be.

@(...)
     Match the pattern in the parentheses.  (Like `(...)'.)

*(...)
     Match any number of occurrences.  (Like `(...)#'.)

+(...)
     Match at least one occurrence.  (Like `(...)##'.)

?(...)
     Match zero or one occurrence.  (Like `(|...)'.)

!(...)
     Match anything but the expression in parentheses.  (Like
     `(^(...))'.)

Precedence
----------

The precedence of the operators given above is (highest) `^', `/', `~',
`|' (lowest); the remaining operators are simply treated from left to
right as part of a string, with `#' and `##' applying to the shortest
possible preceding unit (i.e. a character, `?', `[...]', `<...>', or a
parenthesised expression).  As mentioned above, a `/' used as a
directory separator may not appear inside parentheses, while a `|' must
do so; in patterns used in other contexts than filename generation (for
example, in case statements and tests within `[[...]]'), a `/' is not
special; and `/' is also not special after a `~' appearing outside
parentheses in a filename pattern.

Globbing Flags
--------------

There are various flags which affect any text to their right up to the
end of the enclosing group or to the end of the pattern; they require
the EXTENDED_GLOB option. All take the form (#X) where X may have one
of the following forms:

i
     Case insensitive:  upper or lower case characters in the pattern
     match upper or lower case characters.

l
     Lower case characters in the pattern match upper or lower case
     characters; upper case characters in the pattern still only match
     upper case characters.

I
     Case sensitive:  locally negates the effect of i or l from that
     point on.

b
     Activate backreferences for parenthesised groups in the pattern;
     this does not work in filename generation.  When a pattern with a
     set of active parentheses is matched, the strings matched by the
     groups are stored in the array $match, the indices of the
     beginning of the matched parentheses in the array $mbegin, and the
     indices of the end in the array $mend, with the first element of
     each array corresponding to the first parenthesised group, and so
     on.  These arrays are not otherwise special to the shell.  The
     indices use the same convention as does parameter substitution, so
     that elements of $mend and $mbegin may be used in subscripts; the
     KSH_ARRAYS option is respected.  Sets of globbing flags are not
     considered parenthesised groups; only the first nine active
     parentheses can be referenced.

     For example,

          foo="a string with a message"
          if [[ $foo = (a|an)' '(#b)(*)' '* ]]; then
            print ${foo[$mbegin[1],$mend[1]]}
          fi

     prints `string with a'.  Note that the first parenthesis is before
     the (#b) and does not create a backreference.

     Backreferences work with all forms of pattern matching other than
     filename generation, but note that when performing matches on an
     entire array, such as ${ARRAY#PATTERN}, or a global substitution,
     such as ${PARAM//PAT/REPL}, only the data for the last match
     remains available.  In the case of global replacements this may
     still be useful.  See the example for the m flag below.

     The numbering of backreferences strictly follows the order of the
     opening parentheses from left to right in the pattern string,
     although sets of parentheses may be nested.  There are special
     rules for parentheses followed by `#' or `##'.  Only the last
     match of the parenthesis is remembered: for example, in `[[ abab =
     (#b)([ab])# ]]', only the final `b' is stored in match[1].  Thus
     extra parentheses may be necessary to match the complete segment:
     for example, use `X((ab|cd)#)Y' to match a whole string of either
     `ab' or `cd' between `X' and `Y', using the value of $match[1]
     rather than $match[2].

     If the match fails none of the parameters is altered, so in some
     cases it may be necessary to initialise them beforehand.  If some
     of the backreferences fail to match -- which happens if they are
     in an alternate branch which fails to match, or if they are
     followed by # and matched zero times -- then the matched string is
     set to the empty string, and the start and end indices are set to
     -1.

     Pattern matching with backreferences is slightly slower than
     without.

B
     Deactivate backreferences, negating the effect of the b flag from
     that point on.

m
     Set references to the match data for the entire string matched;
     this is similar to backreferencing and does not work in filename
     generation.  The flag must be in effect at the end of the pattern,
     i.e. not local to a group. The parameters $MATCH,  $MBEGIN and
     $MEND will be set to the string matched and to the indices of the
     beginning and end of the string, respectively.  This is most
     useful in parameter substitutions, as otherwise the string matched
     is obvious.

     For example,

          arr=(veldt jynx grimps waqf zho buck)
          print ${arr//(#m)[aeiou]/${(U)MATCH}}

     forces all the matches (i.e. all vowels) into uppercase, printing
     `vEldt jynx grImps wAqf zhO bUck'.

     Unlike backreferences, there is no speed penalty for using match
     references, other than the extra substitutions required for the
     replacement strings in cases such as the example shown.

M
     Deactivate the m flag, hence no references to match data will be
     created.

aNUM
     Approximate matching: NUM errors are allowed in the string matched
     by the pattern.  The rules for this are described in the next
     subsection.

s, e
     Unlike the other flags, these have only a local effect, and each
     must appear on its own:  `(#s)' and `(#e)' are the only valid
     forms.  The `(#s)' flag succeeds only at the start of the test
     string, and the `(#e)' flag succeeds only at the end of the test
     string; they correspond to `^' and `$' in standard regular
     expressions.  They are useful for matching path segments in
     patterns other than those in filename generation (where path
     segments are in any case treated separately).  For example,
     `*((#s)|/)test((#e)|/)*' matches a path segment `test' in any of
     the following strings: test, test/at/start, at/end/test,
     in/test/middle.

     Another use is in parameter substitution; for example
     `${array/(#s)A*Z(#e)}' will remove only elements of an array which
     match the complete pattern `A*Z'.  There are other ways of
     performing many operations of this type, however the combination
     of the substitution operations `/' and `//' with the `(#s)' and
     `(#e)' flags provides a single simple and memorable method.

     Note that assertions of the form `(^(#s))' also work, i.e. match
     anywhere except at the start of the string, although this actually
     means `anything except a zero-length portion at the start of the
     string'; you need to use `(""~(#s))' to match a zero-length
     portion of the string not at the start.

For example, the test string fooxx can be matched by the pattern
(#i)FOOXX, but not by (#l)FOOXX, (#i)FOO(#I)XX or ((#i)FOOX)X.  The
string (#ia2)readme specifies case-insensitive matching of readme with
up to two errors.

When using the ksh syntax for grouping both KSH_GLOB and EXTENDED_GLOB
must be set and the left parenthesis should be preceded by @.  Note
also that the flags do not affect letters inside [...] groups, in other
words (#i)[a-z] still matches only lowercase letters.  Finally, note
that when examining whole paths case-insensitively every directory must
be searched for all files which match, so that a pattern of the form
(#i)/foo/bar/... is potentially slow.

Approximate Matching
--------------------

When matching approximately, the shell keeps a count of the errors
found, which cannot exceed the number specified in the (#aNUM) flags.
Four types of error are recognised:

1.
     Different characters, as in fooxbar and fooybar.

2.
     Transposition of characters, as in banana and abnana.

3.
     A character missing in the target string, as with the pattern road
     and target string rod.

4.
     An extra character appearing in the target string, as with stove
     and strove.

Thus, the pattern (#a3)abcd matches dcba, with the errors occurring by
using the first rule twice and the second once, grouping the string as
[d][cb][a] and [a][bc][d].

Non-literal parts of the pattern must match exactly, including
characters in character ranges: hence (#a1)???  matches strings of
length four, by applying rule 4 to an empty part of the pattern, but not
strings of length two, since all the ? must match.  Other characters
which must match exactly are initial dots in filenames (unless the
GLOB_DOTS option is set), and all slashes in filenames, so that a/bc is
two errors from ab/c (the slash cannot be transposed with another
character).  Similarly, errors are counted separately for
non-contiguous strings in the pattern, so that (ab|cd)ef is two errors
from aebf.

When using exclusion via the ~ operator, approximate matching is
treated entirely separately for the excluded part and must be activated
separately.  Thus, (#a1)README~READ_ME matches READ.ME but not READ_ME,
as the trailing READ_ME is matched without approximation.  However,
(#a1)README~(#a1)READ_ME does not match any pattern of the form READ?ME
as all such forms are now excluded.

Apart from exclusions, there is only one overall error count; however,
the maximum errors allowed may be altered locally, and this can be
delimited by grouping.  For example, (#a1)cat((#a0)dog)fox allows one
error in total, which may not occur in the dog section, and the pattern
(#a1)cat(#a0)dog(#a1)fox is equivalent.  Note that the point at which
an error is first found is the crucial one for establishing whether to
use approximation; for example, (#a1)abc(#a0)xyz will not match
abcdxyz, because the error occurs at the `x', where approximation is
turned off.

Entire path segments may be matched approximately, so that
`(#a1)/foo/d/is/available/at/the/bar' allows one error in any path
segment.  This is much less efficient than without the (#a1), however,
since every directory in the path must be scanned for a possible
approximate match.  It is best to place the (#a1) after any path
segments which are known to be correct.

Recursive Globbing
------------------

A pathname component of the form `(FOO/)#' matches a path consisting of
zero or more directories matching the pattern FOO.

As a shorthand, `**/' is equivalent to `(*/)#'; note that this
therefore matches files in the current directory as well as
subdirectories.  Thus:

     ls (*/)#bar

or

     ls **/bar

does a recursive directory search for files named `bar' (potentially
including the file `bar' in the current directory).  This form does not
follow symbolic links; the alternative form `***/' does, but is
otherwise identical.  Neither of these can be combined with other forms
of globbing within the same path segment; in that case, the `*'
operators revert to their usual effect.

Glob Qualifiers
---------------

Patterns used for filename generation may end in a list of qualifiers
enclosed in parentheses.  The qualifiers specify which filenames that
otherwise match the given pattern will be inserted in the argument list.

If the option BARE_GLOB_QUAL is set, then a trailing set of parentheses
containing no `|' or `(' characters (or `~' if it is special) is taken
as a set of glob qualifiers.  A glob subexpression that would normally
be taken as glob qualifiers, for example `(^x)', can be forced to be
treated as part of the glob pattern by doubling the parentheses, in
this case producing `((^x))'.

A qualifier may be any one of the following:

/
     directories

.
     plain files

@
     symbolic links

=
     sockets

p
     named pipes (FIFOs)

*
     executable plain files (0100)

%
     device files (character or block special)

%b
     block special files

%c
     character special files

r
     owner-readable files (0400)

w
     owner-writable files (0200)

x
     owner-executable files (0100)

A
     group-readable files (0040)

I
     group-writable files (0020)

E
     group-executable files (0010)

R
     world-readable files (0004)

W
     world-writable files (0002)

X
     world-executable files (0001)

s
     setuid files (04000)

S
     setgid files (02000)

t
     files with the sticky bit (01000)

fSPEC
     files with access rights matching SPEC. This SPEC may be a octal
     number optionally preceded by a `=', a `+', or a `-'. If none of
     these characters is given, the behavior is the same as for `='.
     The octal number describes the mode bits to be expected, if
     combined with a `=', the value given must match the file-modes
     exactly, with a `+', at least the bits in the given number must be
     set in the file-modes, and with a `-', the bits in the number must
     not be set. Giving a `?' instead of a octal digit anywhere in the
     number ensures that the corresponding bits in the file-modes are
     not checked, this is only useful in combination with `='.

     If the qualifier `f' is followed by any other character anything
     up to the next matching character (`[', `{', and `<' match `]',
     `}', and `>' respectively, any other character matches itself) is
     taken as a list of comma-separated SUB-SPECs. Each SUB-SPEC may be
     either an octal number as described above or a list of any of the
     characters `u', `g', `o', and `a', followed by a `=', a `+', or a
     `-', followed by a list of any of the characters `r', `w', `x',
     `s', and `t', or an octal digit. The first list of characters
     specify which access rights are to be checked. If a `u' is given,
     those for the owner of the file are used, if a `g' is given, those
     of the group are checked, a `o' means to test those of other
     users, and the `a' says to test all three groups. The `=', `+',
     and `-' again says how the modes are to be checked and have the
     same meaning as described for the first form above. The second
     list of characters finally says which access rights are to be
     expected: `r' for read access, `w' for write access, `x' for the
     right to execute the file (or to search a directory), `s' for the
     setuid and setgid bits, and `t' for the sticky bit.

     Thus, `*(f70?)' gives the files for which the owner has read,
     write, and execute permission, and for which other group members
     have no rights, independent of the permissions for other users.
     The pattern `*(f-100)' gives all files for which the owner does
     not have execute permission, and `*(f:gu+w,o-rx:)' gives the files
     for which the owner and the other members of the group have at
     least write permission, and for which other users don't have read
     or execute permission.

eSTRING
     The STRING will be executed as shell code.  The filename will be
     included in the list if and only if the code returns a zero status
     (usually the status of the last command).  The first character
     after the `e' will be used as a separator and anything up to the
     next matching separator will be taken  as the STRING; `[', `{',
     and `<' match `]', `}', and `>', respectively, while any other
     character matches itself. Note that expansions must be quoted in
     the STRING to prevent them from being expanded before globbing is
     done.

     During the execution of STRING the filename currently being tested
     is available in the parameter REPLY; the parameter may be altered
     to a string to be inserted into the list instead of the original
     filename.  In addition, the parameter reply may be set to an array
     or a string, which overrides the value of REPLY.  If set to an
     array, the latter is inserted into the command line word by word.

     For example, suppose a directory contains a single file `lonely'.
     Then the expression `*(e:'reply=(${REPLY}{1,2})':)' will cause the
     words `lonely1 lonely2' to be inserted into the command line.
     Note the quotation marks.

dDEV
     files on the device DEV

l[-|+]CT
     files having a link count less than CT (-), greater than CT (+),
     or equal to CT

U
     files owned by the effective user ID

G
     files owned by the effective group ID

uID
     files owned by user ID ID if it is a number, if not, than the
     character after the `u' will be used as a separator and the string
     between it and the next matching separator (`[', `{', and `<'
     match `]', `}', and `>' respectively, any other character matches
     itself) will be taken as a user name, and the user ID of this user
     will be taken (e.g. `u:foo:' or `u[foo]' for user `foo')

gID
     like uID but with group IDs or names

a[Mwhms][-|+]N
     files accessed exactly N days ago.  Files accessed within the last
     N days are selected using a negative value for N (-N).  Files
     accessed more than N days ago are selected by a positive N value
     (+N).  Optional unit specifiers `M', `w', `h', `m' or `s' (e.g.
     `ah5') cause the check to be performed with months (of 30 days),
     weeks, hours, minutes or seconds instead of days, respectively.
     For instance, `echo *(ah-5)' would echo files accessed within the
     last five hours.

m[Mwhms][-|+]N
     like the file access qualifier, except that it uses the file
     modification time.

c[Mwhms][-|+]N
     like the file access qualifier, except that it uses the file inode
     change time.

L[+|-]N
     files less than N bytes (-), more than N bytes (+), or exactly N
     bytes in length. If this flag is directly followed by a `k' (`K'),
     `m' (`M'), or `p' (`P') (e.g. `Lk-50') the check is performed with
     kilobytes, megabytes, or blocks (of 512 bytes) instead.

^
     negates all qualifiers following it

-
     toggles between making the qualifiers work on symbolic links (the
     default) and the files they point to

M
     sets the MARK_DIRS option for the current pattern

T
     appends a trailing qualifier mark to the filenames, analogous to
     the LIST_TYPES option, for the current pattern (overrides M)

N
     sets the NULL_GLOB option for the current pattern

D
     sets the GLOB_DOTS option for the current pattern

n
     sets the NUMERIC_GLOB_SORT option for the current pattern

oC
     specifies how the names of the files should be sorted. If C is n
     they are sorted by name (the default); if it is L they are sorted
     depending on the size (length) of the files; if l they are sorted
     by the number of links; if a, m, or c they are sorted by the time
     of the last access, modification, or inode change respectively; if
     d, files in subdirectories appear before those in the current
     directory at each level of the search -- this is best combined
     with other criteria, for example `odon' to sort on names for files
     within the same directory.  Note that a, m, and c compare the age
     against the current time, hence the first name in the list is the
     youngest file. Also note that the modifiers ^ and - are used, so
     `*(^-oL)' gives a list of all files sorted by file size in
     descending order, following any symbolic links.

OC
     like `o', but sorts in descending order; i.e. `*(^oc)' is the same
     as `*(Oc)' and `*(^Oc)' is the same as `*(oc)'; `Od' puts files in
     the current directory before those in subdirectories at each level
     of the search.

[BEG[,END]]
     specifies which of the matched filenames should be included in the
     returned list. The syntax is the same as for array subscripts. BEG
     and the optional END may be mathematical expressions. As in
     parameter subscripting they may be negative to make them count
     from the last match backward. E.g.: `*(-OL[1,3])' gives a list of
     the names of the three largest files.

More than one of these lists can be combined, separated by commas. The
whole list matches if at least one of the sublists matches (they are
`or'ed, the qualifiers in the sublists are `and'ed).  Some qualifiers,
however, affect all matches generated, independent of the sublist in
which they are given.  These are the qualifiers `M', `T', `N', `D',
`n', `o', `O' and the subscripts given in brackets (`[...]').

If a `:' appears in a qualifier list, the remainder of the expression in
parenthesis is interpreted as a modifier (see Note: Modifiers in
Note: History Expansion).  Note that each modifier must be introduced
by a separate `:'.  Note also that the result after modification does
not have to be an existing file.  The name of any existing file can be
followed by a modifier of the form `(:..)' even if no actual filename
generation is performed.  Thus:

     ls *(-/)

lists all directories and symbolic links that point to directories, and

     ls *(%W)

lists all world-writable device files in the current directory, and

     ls *(W,X)

lists all files in the current directory that are world-writable or
world-executable, and

     echo /tmp/foo*(u0^@:t)

outputs the basename of all root-owned files beginning with the string
`foo' in /tmp, ignoring symlinks, and

     ls *.*~(lex|parse).[ch](^D^l1)

lists all files having a link count of one whose names contain a dot
(but not those starting with a dot, since GLOB_DOTS is explicitly
switched off) except for lex.c, lex.h, parse.c and parse.h.


automatically generated by info2www version 1.2.2.9