GNU Info

Info Node: (python2.1-api.info)Unicode Objects

(python2.1-api.info)Unicode Objects


Next: Buffer Objects Prev: String Objects Up: Sequence Objects
Enter node , (file) or (file)node

Unicode Objects
---------------

This manual section was written by Marc-Andre Lemburg <mal@lemburg.com>.
These are the basic Unicode object types used for the Unicode
implementation in Python:

`Py_UNICODE'
     This type represents a 16-bit unsigned storage type which is used
     by Python internally as basis for holding Unicode ordinals. On
     platforms where `wchar_t' is available and also has 16-bits,
     `Py_UNICODE' is a typedef alias for `wchar_t' to enhance native
     platform compatibility. On all other platforms, `Py_UNICODE' is a
     typedef alias for `unsigned short'.

`PyUnicodeObject'
     This subtype of `PyObject' represents a Python Unicode object.

`PyTypeObject PyUnicode_Type'
     This instance of `PyTypeObject' represents the Python Unicode type.

The following APIs are really C macros and can be used to do fast
checks and to access internal read-only data of Unicode objects:

`int PyUnicode_Check(PyObject *o)'
     Returns true if the object O is a Unicode object.

`int PyUnicode_GET_SIZE(PyObject *o)'
     Returns the size of the object.  o has to be a PyUnicodeObject
     (not checked).

`int PyUnicode_GET_DATA_SIZE(PyObject *o)'
     Returns the size of the object's internal buffer in bytes. o has
     to be a PyUnicodeObject (not checked).

`Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)'
     Returns a pointer to the internal Py_UNICODE buffer of the object.
     o has to be a PyUnicodeObject (not checked).

`const char* PyUnicode_AS_DATA(PyObject *o)'
     Returns a (const char *) pointer to the internal buffer of the
     object.  o has to be a PyUnicodeObject (not checked).

Unicode provides many different character properties. The most often
needed ones are available through these macros which are mapped to C
functions depending on the Python configuration.

`int Py_UNICODE_ISSPACE(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a whitespace character.

`int Py_UNICODE_ISLOWER(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a lowercase character.

`int Py_UNICODE_ISUPPER(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is an uppercase character.

`int Py_UNICODE_ISTITLE(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a titlecase character.

`int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a linebreak character.

`int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a decimal character.

`int Py_UNICODE_ISDIGIT(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a digit character.

`int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is a numeric character.

`int Py_UNICODE_ISALPHA(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is an alphabetic character.

`int Py_UNICODE_ISALNUM(Py_UNICODE ch)'
     Returns 1/0 depending on whether CH is an alphanumeric character.

These APIs can be used for fast direct character conversions:

`Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)'
     Returns the character CH converted to lower case.

`Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)'
     Returns the character CH converted to upper case.

`Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)'
     Returns the character CH converted to title case.

`int Py_UNICODE_TODECIMAL(Py_UNICODE ch)'
     Returns the character CH converted to a decimal positive integer.
     Returns -1 in case this is not possible. Does not raise exceptions.

`int Py_UNICODE_TODIGIT(Py_UNICODE ch)'
     Returns the character CH converted to a single digit integer.
     Returns -1 in case this is not possible. Does not raise exceptions.

`double Py_UNICODE_TONUMERIC(Py_UNICODE ch)'
     Returns the character CH converted to a (positive) double.
     Returns -1.0 in case this is not possible. Does not raise
     exceptions.

To create Unicode objects and access their basic sequence properties,
use these APIs:

`PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, int size)'
     Create a Unicode Object from the Py_UNICODE buffer U of the given
     size. U may be `NULL' which causes the contents to be undefined.
     It is the user's responsibility to fill in the needed data.  The
     buffer is copied into the new object.

`Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)'
     Return a read-only pointer to the Unicode object's internal
     `Py_UNICODE' buffer.

`int PyUnicode_GetSize(PyObject *unicode)'
     Return the length of the Unicode object.

`PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)'
     Coerce an encoded object obj to an Unicode object and return a
     reference with incremented refcount.

     Coercion is done in the following way:
       1. Unicode objects are passed back as-is with incremented
          refcount. Note: these cannot be decoded; passing a non-NULL
          value for encoding will result in a TypeError.

       2. String and other char buffer compatible objects are decoded
          according to the given encoding and using the error handling
          defined by errors. Both can be NULL to have the interface use
          the default values (see the next section for details).

       3. All other objects cause an exception.
          The API returns NULL in case of an error. The caller is
     responsible for decref'ing the returned objects.

`PyObject* PyUnicode_FromObject(PyObject *obj)'
     Shortcut for PyUnicode_FromEncodedObject(obj, NULL, "strict")
     which is used throughout the interpreter whenever coercion to
     Unicode is needed.

If the platform supports `wchar_t' and provides a header file wchar.h,
Python can interface directly to this type using the following
functions. Support is optimized if Python's own `Py_UNICODE' type is
identical to the system's `wchar_t'.

`PyObject* PyUnicode_FromWideChar(const wchar_t *w, int size)'
     Create a Unicode Object from the `whcar_t' buffer W of the given
     size. Returns `NULL' on failure.

`int PyUnicode_AsWideChar(PyUnicodeObject *unicode, wchar_t *w, int size)'
     Copies the Unicode Object contents into the `whcar_t' buffer W.
     At most SIZE `whcar_t' characters are copied.  Returns the number
     of `whcar_t' characters copied or -1 in case of an error.

Builtin Codecs
Methods and Slot Functions

automatically generated by info2www version 1.2.2.9