OpenCores

 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"

 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"

[ ]>

[ ]>

C++

C++

      library

      library

      parallel

      parallel

Parallel Mode

Parallel Mode

 The libstdc++ parallel mode is an experimental parallel

 The libstdc++ parallel mode is an experimental parallel

implementation of many algorithms the C++ Standard Library.

implementation of many algorithms the C++ Standard Library.

Several of the standard algorithms, for instance

Several of the standard algorithms, for instance

std::sort, are made parallel using OpenMP

std::sort, are made parallel using OpenMP

annotations. These parallel mode constructs and can be invoked by

annotations. These parallel mode constructs and can be invoked by

explicit source declaration or by compiling existing sources with a

explicit source declaration or by compiling existing sources with a

specific compiler flag.

specific compiler flag.

  Intro

  Intro

The following library components in the include

The following library components in the include

numeric are included in the parallel mode:

numeric are included in the parallel mode:

  std::accumulate

  std::accumulate

  std::adjacent_difference

  std::adjacent_difference

  std::inner_product

  std::inner_product

  std::partial_sum

  std::partial_sum

The following library components in the include

The following library components in the include

algorithm are included in the parallel mode:

algorithm are included in the parallel mode:

  std::adjacent_find

  std::adjacent_find

  std::count

  std::count

  std::count_if

  std::count_if

  std::equal

  std::equal

  std::find

  std::find

  std::find_if

  std::find_if

  std::find_first_of

  std::find_first_of

  std::for_each

  std::for_each

  std::generate

  std::generate

  std::generate_n

  std::generate_n

  std::lexicographical_compare

  std::lexicographical_compare

  std::mismatch

  std::mismatch

  std::search

  std::search

  std::search_n

  std::search_n

  std::transform

  std::transform

  std::replace

  std::replace

  std::replace_if

  std::replace_if

  std::max_element

  std::max_element

  std::merge

  std::merge

  std::min_element

  std::min_element

  std::nth_element

  std::nth_element

  std::partial_sort

  std::partial_sort

  std::partition

  std::partition

  std::random_shuffle

  std::random_shuffle

  std::set_union

  std::set_union

  std::set_intersection

  std::set_intersection

  std::set_symmetric_difference

  std::set_symmetric_difference

  std::set_difference

  std::set_difference

  std::sort

  std::sort

  std::stable_sort

  std::stable_sort

  std::unique_copy

  std::unique_copy

  Semantics

  Semantics

 The parallel mode STL algorithms are currently not exception-safe,

 The parallel mode STL algorithms are currently not exception-safe,

i.e. user-defined functors must not throw exceptions.

i.e. user-defined functors must not throw exceptions.

Also, the order of execution is not guaranteed for some functions, of course.

Also, the order of execution is not guaranteed for some functions, of course.

Therefore, user-defined functors should not have any concurrent side effects.

Therefore, user-defined functors should not have any concurrent side effects.

 Since the current GCC OpenMP implementation does not support

 Since the current GCC OpenMP implementation does not support

OpenMP parallel regions in concurrent threads,

OpenMP parallel regions in concurrent threads,

it is not possible to call parallel STL algorithm in

it is not possible to call parallel STL algorithm in

concurrent threads, either.

concurrent threads, either.

It might work with other compilers, though.

It might work with other compilers, though.

  Using

  Using

  Prerequisite Compiler Flags

  Prerequisite Compiler Flags

  Any use of parallel functionality requires additional compiler

  Any use of parallel functionality requires additional compiler

  and runtime support, in particular support for OpenMP. Adding this support is

  and runtime support, in particular support for OpenMP. Adding this support is

  not difficult: just compile your application with the compiler

  not difficult: just compile your application with the compiler

  flag -fopenmp. This will link

  flag -fopenmp. This will link

  in libgomp, the GNU

  in libgomp, the GNU

  OpenMP implementation,

  OpenMP implementation,

  whose presence is mandatory.

  whose presence is mandatory.

In addition, hardware that supports atomic operations and a compiler

In addition, hardware that supports atomic operations and a compiler

  capable of producing atomic operations is mandatory: GCC defaults to no

  capable of producing atomic operations is mandatory: GCC defaults to no

  support for atomic operations on some common hardware

  support for atomic operations on some common hardware

  architectures. Activating atomic operations may require explicit

  architectures. Activating atomic operations may require explicit

  compiler flags on some targets (like sparc and x86), such

  compiler flags on some targets (like sparc and x86), such

  as -march=i686,

  as -march=i686,

  -march=native or -mcpu=v9. See

  -march=native or -mcpu=v9. See

  the GCC manual for more information.

  the GCC manual for more information.

  Using Parallel Mode

  Using Parallel Mode

  To use the libstdc++ parallel mode, compile your application with

  To use the libstdc++ parallel mode, compile your application with

  the prerequisite flags as detailed above, and in addition

  the prerequisite flags as detailed above, and in addition

  add -D_GLIBCXX_PARALLEL. This will convert all

  add -D_GLIBCXX_PARALLEL. This will convert all

  use of the standard (sequential) algorithms to the appropriate parallel

  use of the standard (sequential) algorithms to the appropriate parallel

  equivalents. Please note that this doesn't necessarily mean that

  equivalents. Please note that this doesn't necessarily mean that

  everything will end up being executed in a parallel manner, but

  everything will end up being executed in a parallel manner, but

  rather that the heuristics and settings coded into the parallel

  rather that the heuristics and settings coded into the parallel

  versions will be used to determine if all, some, or no algorithms

  versions will be used to determine if all, some, or no algorithms

  will be executed using parallel variants.

  will be executed using parallel variants.

Note that the _GLIBCXX_PARALLEL define may change the

Note that the _GLIBCXX_PARALLEL define may change the

  sizes and behavior of standard class templates such as

  sizes and behavior of standard class templates such as

  std::search, and therefore one can only link code

  std::search, and therefore one can only link code

  compiled with parallel mode and code compiled without parallel mode

  compiled with parallel mode and code compiled without parallel mode

  if no instantiation of a container is passed between the two

  if no instantiation of a container is passed between the two

  translation units. Parallel mode functionality has distinct linkage,

  translation units. Parallel mode functionality has distinct linkage,

  and cannot be confused with normal mode symbols.

  and cannot be confused with normal mode symbols.

  Using Specific Parallel Components

  Using Specific Parallel Components

When it is not feasible to recompile your entire application, or

When it is not feasible to recompile your entire application, or

  only specific algorithms need to be parallel-aware, individual

  only specific algorithms need to be parallel-aware, individual

  parallel algorithms can be made available explicitly. These

  parallel algorithms can be made available explicitly. These

  parallel algorithms are functionally equivalent to the standard

  parallel algorithms are functionally equivalent to the standard

  drop-in algorithms used in parallel mode, but they are available in

  drop-in algorithms used in parallel mode, but they are available in

  a separate namespace as GNU extensions and may be used in programs

  a separate namespace as GNU extensions and may be used in programs

  compiled with either release mode or with parallel mode.

  compiled with either release mode or with parallel mode.

An example of using a parallel version

An example of using a parallel version

of std::sort, but no other parallel algorithms, is:

of std::sort, but no other parallel algorithms, is:

#include <vector>

#include <vector>

#include <parallel/algorithm>

#include <parallel/algorithm>

int main()

int main()

  std::vector<int> v(100);

  std::vector<int> v(100);

  // ...

  // ...

  // Explicitly force a call to parallel sort.

  // Explicitly force a call to parallel sort.

  __gnu_parallel::sort(v.begin(), v.end());

  __gnu_parallel::sort(v.begin(), v.end());

  return 0;

  return 0;

Then compile this code with the prerequisite compiler flags

Then compile this code with the prerequisite compiler flags

(-fopenmp and any necessary architecture-specific

(-fopenmp and any necessary architecture-specific

flags for atomic operations.)

flags for atomic operations.)

 The following table provides the names and headers of all the

 The following table provides the names and headers of all the

  parallel algorithms that can be used in a similar manner:

  parallel algorithms that can be used in a similar manner:


        
      
      
        Parallel Algorithms
        Parallel Algorithms
      
      
        
        
      
      
        
        
      
      
        
        
      
      
        
        
      
      
        
        
      
      
        
        
      
      
        

        

      
      
          
          
      
      
            Algorithm
            Algorithm
      
      
            Header
            Header
      
      
            Parallel algorithm
            Parallel algorithm
      
      
            Parallel header
            Parallel header
      
      
          
          
      
      
        
        
      
      
        
        
      
      
        

        

      
      
          
          
      
      
            std::accumulate
            std::accumulate
      
      
            numeric
            numeric
      
      
            __gnu_parallel::accumulate
            __gnu_parallel::accumulate
      
      
            parallel/numeric
            parallel/numeric
      
      
          
          
      
      
          
          
      
      
            std::adjacent_difference
            std::adjacent_difference
      
      
            numeric
            numeric
      
      
            __gnu_parallel::adjacent_difference
            __gnu_parallel::adjacent_difference
      
      
            parallel/numeric
            parallel/numeric
      
      
          
          
      
      
          
          
      
      
            std::inner_product
            std::inner_product
      
      
            numeric
            numeric
      
      
            __gnu_parallel::inner_product
            __gnu_parallel::inner_product
      
      
            parallel/numeric
            parallel/numeric
      
      
          
          
      
      
          
          
      
      
            std::partial_sum
            std::partial_sum
      
      
            numeric
            numeric
      
      
            __gnu_parallel::partial_sum
            __gnu_parallel::partial_sum
      
      
            parallel/numeric
            parallel/numeric
      
      
          
          
      
      
          
          
      
      
            std::adjacent_find
            std::adjacent_find
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::adjacent_find
            __gnu_parallel::adjacent_find
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::count
            std::count
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::count
            __gnu_parallel::count
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::count_if
            std::count_if
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::count_if
            __gnu_parallel::count_if
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::equal
            std::equal
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::equal
            __gnu_parallel::equal
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::find
            std::find
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::find
            __gnu_parallel::find
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::find_if
            std::find_if
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::find_if
            __gnu_parallel::find_if
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::find_first_of
            std::find_first_of
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::find_first_of
            __gnu_parallel::find_first_of
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::for_each
            std::for_each
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::for_each
            __gnu_parallel::for_each
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::generate
            std::generate
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::generate
            __gnu_parallel::generate
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::generate_n
            std::generate_n
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::generate_n
            __gnu_parallel::generate_n
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::lexicographical_compare
            std::lexicographical_compare
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::lexicographical_compare
            __gnu_parallel::lexicographical_compare
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::mismatch
            std::mismatch
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::mismatch
            __gnu_parallel::mismatch
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::search
            std::search
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::search
            __gnu_parallel::search
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::search_n
            std::search_n
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::search_n
            __gnu_parallel::search_n
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::transform
            std::transform
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::transform
            __gnu_parallel::transform
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::replace
            std::replace
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::replace
            __gnu_parallel::replace
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::replace_if
            std::replace_if
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::replace_if
            __gnu_parallel::replace_if
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::max_element
            std::max_element
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::max_element
            __gnu_parallel::max_element
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::merge
            std::merge
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::merge
            __gnu_parallel::merge
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::min_element
            std::min_element
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::min_element
            __gnu_parallel::min_element
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::nth_element
            std::nth_element
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::nth_element
            __gnu_parallel::nth_element
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::partial_sort
            std::partial_sort
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::partial_sort
            __gnu_parallel::partial_sort
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::partition
            std::partition
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::partition
            __gnu_parallel::partition
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::random_shuffle
            std::random_shuffle
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::random_shuffle
            __gnu_parallel::random_shuffle
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::set_union
            std::set_union
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::set_union
            __gnu_parallel::set_union
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::set_intersection
            std::set_intersection
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::set_intersection
            __gnu_parallel::set_intersection
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::set_symmetric_difference
            std::set_symmetric_difference
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::set_symmetric_difference
            __gnu_parallel::set_symmetric_difference
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::set_difference
            std::set_difference
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::set_difference
            __gnu_parallel::set_difference
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::sort
            std::sort
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::sort
            __gnu_parallel::sort
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::stable_sort
            std::stable_sort
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::stable_sort
            __gnu_parallel::stable_sort
      
      
            parallel/algorithm
            parallel/algorithm
      
      
          
          
      
      
        
        
      
      
          
          
      
      
            std::unique_copy
            std::unique_copy
      
      
            algorithm
            algorithm
      
      
            __gnu_parallel::unique_copy
            __gnu_parallel::unique_copy
      
      
            parallel/algorithm
            parallel/algorithm

  Design

  Design

  Interface Basics

  Interface Basics

All parallel algorithms are intended to have signatures that are

All parallel algorithms are intended to have signatures that are

equivalent to the ISO C++ algorithms replaced. For instance, the

equivalent to the ISO C++ algorithms replaced. For instance, the

std::adjacent_find function is declared as:

std::adjacent_find function is declared as:

namespace std

namespace std

  template<typename _FIter>

  template<typename _FIter>

    _FIter

    _FIter

    adjacent_find(_FIter, _FIter);

    adjacent_find(_FIter, _FIter);

Which means that there should be something equivalent for the parallel

Which means that there should be something equivalent for the parallel

version. Indeed, this is the case:

version. Indeed, this is the case:

namespace std

namespace std

  namespace __parallel

  namespace __parallel

    template<typename _FIter>

    template<typename _FIter>

      _FIter

      _FIter

      adjacent_find(_FIter, _FIter);

      adjacent_find(_FIter, _FIter);

...

...

But.... why the ellipses?

But.... why the ellipses?

 The ellipses in the example above represent additional overloads

 The ellipses in the example above represent additional overloads

required for the parallel version of the function. These additional

required for the parallel version of the function. These additional

overloads are used to dispatch calls from the ISO C++ function

overloads are used to dispatch calls from the ISO C++ function

signature to the appropriate parallel function (or sequential

signature to the appropriate parallel function (or sequential

function, if no parallel functions are deemed worthy), based on either

function, if no parallel functions are deemed worthy), based on either

compile-time or run-time conditions.

compile-time or run-time conditions.

 The available signature options are specific for the different

 The available signature options are specific for the different

algorithms/algorithm classes.

algorithms/algorithm classes.

 The general view of overloads for the parallel algorithms look like this:

 The general view of overloads for the parallel algorithms look like this:

   ISO C++ signature

   ISO C++ signature

   ISO C++ signature + sequential_tag argument

   ISO C++ signature + sequential_tag argument

   ISO C++ signature + algorithm-specific tag type

   ISO C++ signature + algorithm-specific tag type

    (several signatures)

    (several signatures)

 Please note that the implementation may use additional functions

 Please note that the implementation may use additional functions

(designated with the _switch suffix) to dispatch from the

(designated with the _switch suffix) to dispatch from the

ISO C++ signature to the correct parallel version. Also, some of the

ISO C++ signature to the correct parallel version. Also, some of the

algorithms do not have support for run-time conditions, so the last

algorithms do not have support for run-time conditions, so the last

overload is therefore missing.

overload is therefore missing.

  Configuration and Tuning

  Configuration and Tuning

  Setting up the OpenMP Environment

  Setting up the OpenMP Environment

Several aspects of the overall runtime environment can be manipulated

Several aspects of the overall runtime environment can be manipulated

by standard OpenMP function calls.

by standard OpenMP function calls.

To specify the number of threads to be used for the algorithms globally,

To specify the number of threads to be used for the algorithms globally,

use the function omp_set_num_threads. An example:

use the function omp_set_num_threads. An example:

#include <stdlib.h>

#include <stdlib.h>

#include <omp.h>

#include <omp.h>

int main()

int main()

  // Explicitly set number of threads.

  // Explicitly set number of threads.

  const int threads_wanted = 20;

  const int threads_wanted = 20;

  omp_set_dynamic(false);

  omp_set_dynamic(false);

  omp_set_num_threads(threads_wanted);

  omp_set_num_threads(threads_wanted);

  // Call parallel mode algorithms.

  // Call parallel mode algorithms.

  return 0;

  return 0;

 Some algorithms allow the number of threads being set for a particular call,

 Some algorithms allow the number of threads being set for a particular call,

 by augmenting the algorithm variant.

 by augmenting the algorithm variant.

 See the next section for further information.

 See the next section for further information.

Other parts of the runtime environment able to be manipulated include

Other parts of the runtime environment able to be manipulated include

nested parallelism (omp_set_nested), schedule kind

nested parallelism (omp_set_nested), schedule kind

(omp_set_schedule), and others. See the OpenMP

(omp_set_schedule), and others. See the OpenMP

documentation for more information.

documentation for more information.

  Compile Time Switches

  Compile Time Switches

To force an algorithm to execute sequentially, even though parallelism

To force an algorithm to execute sequentially, even though parallelism

is switched on in general via the macro _GLIBCXX_PARALLEL,

is switched on in general via the macro _GLIBCXX_PARALLEL,

add __gnu_parallel::sequential_tag() to the end

add __gnu_parallel::sequential_tag() to the end

of the algorithm's argument list.

of the algorithm's argument list.

Like so:

Like so:

std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());

std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());

Some parallel algorithm variants can be excluded from compilation by

Some parallel algorithm variants can be excluded from compilation by

preprocessor defines. See the doxygen documentation on

preprocessor defines. See the doxygen documentation on

compiletime_settings.h and features.h for details.

compiletime_settings.h and features.h for details.

For some algorithms, the desired variant can be chosen at compile-time by

For some algorithms, the desired variant can be chosen at compile-time by

appending a tag object. The available options are specific to the particular

appending a tag object. The available options are specific to the particular

algorithm (class).

algorithm (class).

For the "embarrassingly parallel" algorithms, there is only one "tag object

For the "embarrassingly parallel" algorithms, there is only one "tag object

type", the enum _Parallelism.

type", the enum _Parallelism.

It takes one of the following values,

It takes one of the following values,

__gnu_parallel::parallel_tag,

__gnu_parallel::parallel_tag,

__gnu_parallel::balanced_tag,

__gnu_parallel::balanced_tag,

__gnu_parallel::unbalanced_tag,

__gnu_parallel::unbalanced_tag,

__gnu_parallel::omp_loop_tag,

__gnu_parallel::omp_loop_tag,

__gnu_parallel::omp_loop_static_tag.

__gnu_parallel::omp_loop_static_tag.

This means that the actual parallelization strategy is chosen at run-time.

This means that the actual parallelization strategy is chosen at run-time.

(Choosing the variants at compile-time will come soon.)

(Choosing the variants at compile-time will come soon.)

For the following algorithms in general, we have

For the following algorithms in general, we have

__gnu_parallel::parallel_tag and

__gnu_parallel::parallel_tag and

__gnu_parallel::default_parallel_tag, in addition to

__gnu_parallel::default_parallel_tag, in addition to

__gnu_parallel::sequential_tag.

__gnu_parallel::sequential_tag.

__gnu_parallel::default_parallel_tag chooses the default

__gnu_parallel::default_parallel_tag chooses the default

algorithm at compiletime, as does omitting the tag.

algorithm at compiletime, as does omitting the tag.

__gnu_parallel::parallel_tag postpones the decision to runtime

__gnu_parallel::parallel_tag postpones the decision to runtime

(see next section).

(see next section).

For all tags, the number of threads desired for this call can optionally be

For all tags, the number of threads desired for this call can optionally be

passed to the respective tag's constructor.

passed to the respective tag's constructor.

The multiway_merge algorithm comes with the additional choices,

The multiway_merge algorithm comes with the additional choices,

__gnu_parallel::exact_tag and

__gnu_parallel::exact_tag and

__gnu_parallel::sampling_tag.

__gnu_parallel::sampling_tag.

Exact and sampling are the two available splitting strategies.

Exact and sampling are the two available splitting strategies.

For the sort and stable_sort algorithms, there are

For the sort and stable_sort algorithms, there are

several additional choices, namely

several additional choices, namely

__gnu_parallel::multiway_mergesort_tag,

__gnu_parallel::multiway_mergesort_tag,

__gnu_parallel::multiway_mergesort_exact_tag,

__gnu_parallel::multiway_mergesort_exact_tag,

__gnu_parallel::multiway_mergesort_sampling_tag,

__gnu_parallel::multiway_mergesort_sampling_tag,

__gnu_parallel::quicksort_tag, and

__gnu_parallel::quicksort_tag, and

__gnu_parallel::balanced_quicksort_tag.

__gnu_parallel::balanced_quicksort_tag.

Multiway mergesort comes with the two splitting strategies for multi-way

Multiway mergesort comes with the two splitting strategies for multi-way

merging. The quicksort options cannot be used for stable_sort.

merging. The quicksort options cannot be used for stable_sort.

  Run Time Settings and Defaults

  Run Time Settings and Defaults

The default parallelization strategy, the choice of specific algorithm

The default parallelization strategy, the choice of specific algorithm

strategy, the minimum threshold limits for individual parallel

strategy, the minimum threshold limits for individual parallel

algorithms, and aspects of the underlying hardware can be specified as

algorithms, and aspects of the underlying hardware can be specified as

desired via manipulation

desired via manipulation

of __gnu_parallel::_Settings member data.

of __gnu_parallel::_Settings member data.

First off, the choice of parallelization strategy: serial, parallel,

First off, the choice of parallelization strategy: serial, parallel,

or heuristically deduced. This corresponds

or heuristically deduced. This corresponds

to __gnu_parallel::_Settings::algorithm_strategy and is a

to __gnu_parallel::_Settings::algorithm_strategy and is a

value of enum __gnu_parallel::_AlgorithmStrategy

value of enum __gnu_parallel::_AlgorithmStrategy

type. Choices

type. Choices

include: heuristic, force_sequential,

include: heuristic, force_sequential,

and force_parallel. The default is heuristic.

and force_parallel. The default is heuristic.

Next, the sub-choices for algorithm variant, if not fixed at compile-time.

Next, the sub-choices for algorithm variant, if not fixed at compile-time.

Specific algorithms like find or sort

Specific algorithms like find or sort

can be implemented in multiple ways: when this is the case,

can be implemented in multiple ways: when this is the case,

a __gnu_parallel::_Settings member exists to

a __gnu_parallel::_Settings member exists to

pick the default strategy. For

pick the default strategy. For

example, __gnu_parallel::_Settings::sort_algorithm can

example, __gnu_parallel::_Settings::sort_algorithm can

have any values of

have any values of

enum __gnu_parallel::_SortAlgorithm: MWMS, QS,

enum __gnu_parallel::_SortAlgorithm: MWMS, QS,

or QS_BALANCED.

or QS_BALANCED.

Likewise for setting the minimal threshold for algorithm

Likewise for setting the minimal threshold for algorithm

parallelization.  Parallelism always incurs some overhead. Thus, it is

parallelization.  Parallelism always incurs some overhead. Thus, it is

not helpful to parallelize operations on very small sets of

not helpful to parallelize operations on very small sets of

data. Because of this, measures are taken to avoid parallelizing below

data. Because of this, measures are taken to avoid parallelizing below

a certain, pre-determined threshold. For each algorithm, a minimum

a certain, pre-determined threshold. For each algorithm, a minimum

problem size is encoded as a variable in the

problem size is encoded as a variable in the

active __gnu_parallel::_Settings object.  This

active __gnu_parallel::_Settings object.  This

threshold variable follows the following naming scheme:

threshold variable follows the following naming scheme:

__gnu_parallel::_Settings::[algorithm]_minimal_n.  So,

__gnu_parallel::_Settings::[algorithm]_minimal_n.  So,

for fill, the threshold variable

for fill, the threshold variable

is __gnu_parallel::_Settings::fill_minimal_n,

is __gnu_parallel::_Settings::fill_minimal_n,

Finally, hardware details like L1/L2 cache size can be hardwired

Finally, hardware details like L1/L2 cache size can be hardwired

via __gnu_parallel::_Settings::L1_cache_size and friends.

via __gnu_parallel::_Settings::L1_cache_size and friends.

All these configuration variables can be changed by the user, if

All these configuration variables can be changed by the user, if

desired.

desired.

There exists one global instance of the class _Settings,

There exists one global instance of the class _Settings,

i. e. it is a singleton. It can be read and written by calling

i. e. it is a singleton. It can be read and written by calling

__gnu_parallel::_Settings::get and

__gnu_parallel::_Settings::get and

__gnu_parallel::_Settings::set, respectively.

__gnu_parallel::_Settings::set, respectively.

Please note that the first call return a const object, so direct manipulation

Please note that the first call return a const object, so direct manipulation

is forbidden.

is forbidden.

See

See

  settings.h

  settings.h

for complete details.

for complete details.

A small example of tuning the default:

A small example of tuning the default:

#include <parallel/algorithm>

#include <parallel/algorithm>

#include <parallel/settings.h>

#include <parallel/settings.h>

int main()

int main()

  __gnu_parallel::_Settings s;

  __gnu_parallel::_Settings s;

  s.algorithm_strategy = __gnu_parallel::force_parallel;

  s.algorithm_strategy = __gnu_parallel::force_parallel;

  __gnu_parallel::_Settings::set(s);

  __gnu_parallel::_Settings::set(s);

  // Do work... all algorithms will be parallelized, always.

  // Do work... all algorithms will be parallelized, always.

  return 0;

  return 0;

  Implementation Namespaces

  Implementation Namespaces

 One namespace contain versions of code that are always

 One namespace contain versions of code that are always

explicitly sequential:

explicitly sequential:

__gnu_serial.

__gnu_serial.

 Two namespaces contain the parallel mode:

 Two namespaces contain the parallel mode:

std::__parallel and __gnu_parallel.

std::__parallel and __gnu_parallel.

 Parallel implementations of standard components, including

 Parallel implementations of standard components, including

template helpers to select parallelism, are defined in namespace

template helpers to select parallelism, are defined in namespace

std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in

std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in

std::__parallel::transform from parallel/algorithm. In addition, these parallel

std::__parallel::transform from parallel/algorithm. In addition, these parallel

implementations are injected into namespace

implementations are injected into namespace

__gnu_parallel with using declarations.

__gnu_parallel with using declarations.

 Support and general infrastructure is in namespace

 Support and general infrastructure is in namespace

__gnu_parallel.

__gnu_parallel.

 More information, and an organized index of types and functions

 More information, and an organized index of types and functions

related to the parallel mode on a per-namespace basis, can be found in

related to the parallel mode on a per-namespace basis, can be found in

the generated source documentation.

the generated source documentation.

  Testing

  Testing

    Both the normal conformance and regression tests and the

    Both the normal conformance and regression tests and the

    supplemental performance tests work.

    supplemental performance tests work.

    To run the conformance and regression tests with the parallel mode

    To run the conformance and regression tests with the parallel mode

    active,

    active,

  make check-parallel

  make check-parallel

    The log and summary files for conformance testing are in the

    The log and summary files for conformance testing are in the

    testsuite/parallel directory.

    testsuite/parallel directory.

    To run the performance tests with the parallel mode active,

    To run the performance tests with the parallel mode active,

  make check-performance-parallel

  make check-performance-parallel

    The result file for performance testing are in the

    The result file for performance testing are in the

    testsuite directory, in the file

    testsuite directory, in the file

    libstdc++_performance.sum. In addition, the

    libstdc++_performance.sum. In addition, the

    policy-based containers have their own visualizations, which have

    policy-based containers have their own visualizations, which have

    additional software dependencies than the usual bare-boned text

    additional software dependencies than the usual bare-boned text

    file, and can be generated by using the make

    file, and can be generated by using the make

    doc-performance rule in the testsuite's Makefile.

    doc-performance rule in the testsuite's Makefile.

Bibliography

Bibliography

    </code></pre></td>
        <td class="diff"><pre><code>    <title></code></pre></td>
      </tr>
      <tr class="diffcode">
        <td class="diff"><pre><code>      Parallelization of Bulk Operations for STL Dictionaries</code></pre></td>
        <td class="diff"><pre><code>      Parallelization of Bulk Operations for STL Dictionaries</code></pre></td>
      </tr>
      <tr class="diffcode">
        <td class="diff"><pre><code>

      Johannes

      Johannes

      Singler

      Singler

      Leonor

      Leonor

      Frias

      Frias

        Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)

        Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)

    </code></pre></td>
        <td class="diff"><pre><code>    <title></code></pre></td>
      </tr>
      <tr class="diffcode">
        <td class="diff"><pre><code>      The Multi-Core Standard Template Library</code></pre></td>
        <td class="diff"><pre><code>      The Multi-Core Standard Template Library</code></pre></td>
      </tr>
      <tr class="diffcode">
        <td class="diff"><pre><code>

      Johannes

      Johannes

      Singler

      Singler

      Peter

      Peter

      Sanders

      Sanders

      Felix

      Felix

      Putze

      Putze

         Euro-Par 2007: Parallel Processing. (LNCS 4641)

         Euro-Par 2007: Parallel Processing. (LNCS 4641)

Browse

Tools

Subversion Repositories openrisc

[/] [openrisc/] [tags/] [gnu-src/] [gcc-4.5.1/] [gcc-4.5.1-or32-1.0rc4/] [libstdc++-v3/] [doc/] [xml/] [manual/] [parallel_mode.xml] - Diff between revs 424 and 519