Whole document tree
    

Whole document tree

libstdc++-v3 HOWTO: Extensions

Extensions

Here we will make an attempt at describing the non-Standard extensions to the library. Some of these are from SGI's STL, some of these are GNU's, and some just seemed to appear on the doorstep.

Before you leap in and use these, be aware of two things:

  1. Non-Standard means exactly that. The behavior, and the very existence, of these extensions may change with little or no warning. (Ideally, the really good ones will appear in the next revision of C++.) Also, other platforms, other compilers, other versions of g++ or libstdc++-v3 may not recognize these names, or treat them differently, or...
  2. You should know how to access these headers properly.


Contents


Ropes and trees and hashes, oh my!

The SGI headers

     <bvector>
     <hash_map>
     <hash_set>
     <rope>
     <slist>
     <tree>
     
are all here; <bvector> exposes the old bit_vector class that was used before specialization of vector<bool> was available (it's actually a typedef for the specialization now). <hash_map> and <hash_set> are discussed further below. <rope> is the SGI specialization for large strings ("rope," "large strings," get it? love those SGI folks). <slist> is a singly-linked list, for when the doubly-linked list<> is too much space overhead, and <tree> exposes the red-black tree classes used in the implementation of the standard maps and sets.

Okay, about those hashing classes... I'm going to foist most of the work off onto SGI's own site.

Each of the associative containers map, multimap, set, and multiset have a counterpart which uses a hashing function to do the arranging, instead of a strict weak ordering function. The classes take as one of their template parameters a function object that will return the hash value; by default, an instantiation of hash. You should specialize this functor for your class, or define your own, before trying to use one of the hashing classes.

The hashing classes support all the usual associative container functions, as well as some extra constructors specifying the number of buckets, etc.

Why would you want to use a hashing class instead of the "normal" implementations? Matt Austern writes:

[W]ith a well chosen hash function, hash tables generally provide much better average-case performance than binary search trees, and much worse worst-case performance. So if your implementation has hash_map, if you don't mind using nonstandard components, and if you aren't scared about the possibility of pathological cases, you'll probably get better performance from hash_map.

(Side note: for those of you wondering, "Why wasn't a hash table included in the Standard in the first #!$@ place?" I'll give a quick answer: it was proposed, but too late and in too unorganized a fashion. Some sort of hashing will undoubtedly be included in a future Standard.

Return to top of page or to the FAQ.


Added members and types

Some of the classes in the Standard Library have additional publicly-available members, and some classes are themselves not in the standard. Of those, some are intended purely for the implementors, for example, additional typedefs. Those won't be described here (or anywhere else).

  • The extensions added by SGI are so numerous that they have their own page. Since the SGI STL is no longer actively maintained, we will try and keep this code working ourselves.
  • filebufs have another ctor with this signature:
    basic_filebuf(__c_file_type*, ios_base::openmode, int_type);
    This comes in very handy in a number of places, such as attaching Unix sockets, pipes, and anything else which uses file descriptors, into the IOStream buffering classes. The three arguments are as follows:
    • __c_file_type* F // the __c_file_type typedef usually boils down to stdio's FILE
    • ios_base::openmode M // same as all the other uses of openmode
    • int_type B // buffer size, defaults to BUFSIZ
    For those wanting to use file descriptors instead of FILE*'s, I invite you to contemplate the mysteries of C's fdopen().

Return to top of page or to the FAQ.


Allocators

Thread-safety, space efficiency, high speed, portability... this is a mess. Where to begin?

The Rules

The C++ standard only gives a few directives in this area:

  • When you add elements to a container, and the container must allocate more memory to hold them, the container makes the request via its Allocator template parameter. This includes adding char's to the string class, which acts as a regular STL container in this respect.
  • The default Allocator of every container-of-T is std::allocator<T>.
  • The interface of the allocator<T> class is extremely simple. It has about 20 public declarations (nested typedefs, member functions, etc), but the two which concern us most are:
          T*    allocate   (size_type n, const void* hint = 0);
          void  deallocate (T* p, size_type n);
    (This is a simplicifcation; the real signatures use nested typedefs.) The "n" arguments in both those functions is a count of the number of T's to allocate space for, not their total size.
  • "The storage is obtained by calling ::operator new(size_t), but it is unspecified when or how often this function is called. The use of hint is unspecified, but intended as an aid to locality if an implementation so desires." [20.4.1.1]/6

Problems and Possibilities

The easiest way of fulfilling the requirements is to call operator new each time a container needs memory, and to call operator delete each time the container releases memory. BUT this method is horribly slow.

Or we can keep old memory around, and reuse it in a pool to save time. The old libstdc++-v2 used a memory pool, and so do we. As of 3.0, it's on by default. The pool is shared among all the containers in the program: when your program's std::vector<int> gets cut in half and frees a bunch of its storage, that memory can be reused by the private std::list<WonkyWidget> brought in from a KDE library that you linked against. And we don't have to call operator's new and delete to pass the memory on, ether, which is a speed bonus. BUT...

What about threads? No problem: in a threadsafe environment, the memory pool is manipulated atomically, so you can grow a container in one thread and shrink it in another, etc. BUT what if threads in libstdc++-v3 aren't set up properly? That's been answered already.

BUT what if you want to use your own allocator? What if you plan on using a runtime-loadable version of malloc() which uses shared telepathic anonymous mmap'd sections serializable over a network, so that memory requests should go through malloc? And what if you need to debug it?

Well then:

Available allocators in namespace std

First I'll describe the situation as it exists for the code which will be released in GCC 3.1. This situation is extremely fluid. Then I'll describe the differences for 3.0.x, which will not change much in this respect.

As a general rule of thumb, users are not allowed to use names which begin with an underscore. This means that to be portable between compilers, none of the following may be used in your program directly. (If you decide to be unportable, then you're free do do what you want, but it's not our fault if stuff breaks.) They are presented here for information for maintainers and contributors in addition to users, but we will probably make them available for users in 3.1 somehow.

These classes are always available:

  • __new_alloc simply wraps ::operator new and ::operator delete.
  • __malloc_alloc_template<int inst> simply wraps malloc and free. There is also a hook for an out-of-memory handler (for new/delete this is taken care of elsewhere). The inst parameter is described below. This class was called malloc_alloc in earlier versions.
  • allocator<T> has already been described; it is The Standard Allocator for instances of T. It uses the internal __alloc typedef (see below) to satisy its requests.
  • __simple_alloc<T,A> is a wrapper around another allocator, A, which itself is an allocator for instances of T. This is primarily used in an internal "allocator traits" class which helps encapsulate the different styles of allocators.
  • __debug_alloc<A> is also a wrapper around an arbitrary allocator A. It passes on slightly increased size requests to A, and uses the extra memory to store size information. When a pointer is passed to deallocate(), the stored size is checked, and assert() is used to guarantee they match.
  • __allocator<T,A> is an adaptor. Many of these allocator classes have a consistent yet non-standard interface. Such classes can be changed to a conforming interface with this wrapper: __allocator<T, __alloc> is thus the same as allocator<T>.

An internal typedef, __mem_interface , is defined to be __new_alloc by default.

Normally, __default_alloc_template<bool thr, int inst> is also available. This is the high-speed pool, called the default node allocator. The reusable memory is shared among identical instantiations of this type. It calls through __mem_interface to obtain new memory when its lists run out. If a client container requests a block larger than a certain threshold size, then the pool is bypassed, and the allocate/deallocate request is passed to __mem_interface directly.

Its inst parameter is described below. The thr boolean determines whether the pool should be manipulated atomically or not. Two typedefs are provided: __alloc is defined as this node allocator with thr=true, and therefore is threadsafe, while __single_client_alloc defines thr=false, and is slightly faster but unsafe for multiple threads.

(Note that the GCC thread abstraction layer allows us to provide safe zero-overhead stubs for the threading routines, if threads were disabled at configuration time. In this situation, __alloc should not be noticably slower than __single_client_alloc.)

[Another threadsafe allocator where each thread keeps its own free list, so that no locking is needed, might be described here.]

A cannon to swat a fly: __USE_MALLOC

If you've already read this advice and decided to define this macro, then the situation changes thusly:

  1. __mem_interface, and
  2. __alloc, and
  3. __single_client_alloc are all typedef'd to __malloc_alloc_template.
  4. __default_alloc_template is no longer available. At all. Anywhere.

Writing your own allocators

Depending on your application (a specific program, a generic library, etc), allocator classes tend to be one of two styles: "SGI" or "standard". See the comments in stl_alloc.h for more information on this crucial difference.

At the bottom of that header is a helper type, _Alloc_traits, and various specializations of it. This allows the container classes to make possible compile-time optimizations based on features of the allocator. You should provide a specialization of this type for your allocator (doing so takes only two or three statements).

Using non-default allocators

You can specify different memory management schemes on a per-container basis, by overriding the default Allocator template parameter. For example, an easy (but nonportable) method of specifying that only malloc/free should be used instead of the default node allocator is:

    std::list <my_type, std::__malloc_alloc_template<0> >  my_malloc_based_list;
Likewise, a debugging form of whichever allocator is currently in use:
    std::deque <my_type, std::__debug_alloc<std::__alloc> >  debug_deque;

inst

The __malloc_alloc_template and __default_alloc_template classes take an integer parameter, called inst here. This number is completely unused.

The point of the number is to allow multiple instantiations of the classes without changing the semantics at all. All three of

    typedef  __default_alloc_template<true,0>    normal;
    typedef  __default_alloc_template<true,1>    private;
    typedef  __default_alloc_template<true,42>   also_private;
behave exactly the same way. However, the memory pool for each type (and remember that different instantiations result in different types) remains separate.

The library uses 0 in all its instantiations. If you wish to keep separate free lists for a particular purpose, use a different number.

3.0.x

For 3.0.x, many of the names were incorrectly not prefixed with underscores. So symbols such as "std::single_client_alloc" are present. Be very careful to not depend on these names any more than you would depend on implementation-only names.

Certain macros like _NOTHREADS and __STL_THREADS can affect the 3.0.x allocators. Do not use them. Those macros have been completely removed for 3.1.

More notes as we remember them...

Return to top of page or to the FAQ.


Compile-time checks

Currently libstdc++-v3 uses the concept checkers from the Boost library to perform optional compile-time checking of template instantiations of the standard containers. They are described in the linked-to page.

Return to top of page or to the FAQ.


LWG Issues

Everybody's got issues. Even the C++ Standard Library.

The Library Working Group, or LWG, is the ISO subcommittee responsible for making changes to the library. They periodically publish an Issues List containing problems and possible solutions. As they reach a consensus on proposed solutions, we often incorporate the solution into libstdc++-v3.

Here are the issues which have resulted in code changes to the library. The links are to the specific defect reports from a partial copy of the Issues List. You can read the full version online at the ISO C++ Committee homepage, linked to on the GCC "Readings" page. If you spend a lot of time reading the issues, we recommend downloading the ZIP file and reading them locally.

(NB: partial copy means that not all links within the lwg-*.html pages will work. Specifically, links to defect reports that have not been accorded full DR status will probably break. Rather than trying to mirror the entire issues list on our overworked web server, we recommend you go to the LWG homepage instead.)

If a DR is not listed here, we may simply not have gotten to it yet; feel free to submit a patch. Search the include/bits and src directories for appearances of _GLIBCPP_RESOLVE_LIB_DEFECTS for examples of style. Note that we usually do not make changes to the code until an issue has reached DR status.

5: string::compare specification questionable
This should be two overloaded functions rather than a single function.
17: Bad bool parsing
Apparently extracting Boolean values was messed up...
22: Member open vs flags
Re-opening a file stream does not clear the state flags.
25: String operator<< uses width() value wrong
Padding issues.
48: Use of non-existent exception constructor
An instance of ios_base::failure is constructed instead.
49: Underspecification of ios_base::sync_with_stdio
The return type is the previous state of synchronization.
50: Copy constructor and assignment operator of ios_base
These members functions are declared private and are thus inaccessible. Specifying the correct semantics of "copying stream state" was deemed too complicated.
68: Extractors for char* should store null at end
And they do now. An editing glitch in the last item in the list of [27.6.1.2.3]/7.
74: Garbled text for codecvt::do_max_length
The text of the standard was gibberish. Typos gone rampant.
83: string::npos vs. string::max_size()
Safety checks on the size of the string should test against max_size() rather than npos.
109: Missing binders for non-const sequence elements
The binder1st and binder2nd didn't have an operator() taking a non-const parameter.
110: istreambuf_iterator::equal not const
This was not a const member function. Note that the DR says to replace the function with a const one; we have instead provided an overloaded version with identical contents.
117: basic_ostream uses nonexistent num_put member functions
num_put::put() was overloaded on the wrong types.
118: basic_istream uses nonexistent num_get member functions
Same as 117, but for num_get::get().
129: Need error indication from seekp() and seekg()
These functions set failbit on error now.
136: seekp, seekg setting wrong streams?
seekp should only set the output stream, and seekg should only set the input stream.
167: Improper use of traits_type::length()
op<< with a const char* was calculating an incorrect number of characters to write.
181: make_pair() unintended behavior
This function used to take its arguments as reference-to-const, now it copies them (pass by value).
195: Should basic_istream::sentry's constructor ever set eofbit?
Yes, it can, specifically if EOF is reached while skipping whitespace.
211: operator>>(istream&, string&) doesn't set failbit
If nothing is extracted into the string, op>> now sets failbit (which can cause an exception, etc, etc).
214: set::find() missing const overload
Both set and multiset were missing overloaded find, lower_bound, upper_bound, and equal_range functions for const instances.
251: basic_stringbuf missing allocator_type
This nested typdef was originally not specified.
265: std::pair::pair() effects overly restrictive
The default ctor would build its members from copies of temporaries; now it simply uses their respective default ctors.
266: bad_exception::~bad_exception() missing Effects clause
The bad_* classes no longer have destructors (they are trivial), since no description of them was ever given.
275: Wrong type in num_get::get() overloads
Similar to 118.

Return to top of page or to the FAQ.


See license.html for copying conditions. Comments and suggestions are welcome, and may be sent to the libstdc++ mailing list.