OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [tags/] [gnu-src/] [gcc-4.5.1/] [gcc-4.5.1-or32-1.0rc4/] [libstdc++-v3/] [doc/] [xml/] [manual/] [parallel_mode.xml] - Blame information for rev 519

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 424 jeremybenn
2
3
 "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"
4
[ ]>
5
 
6
7
8
 
9
10
  
11
    
12
      C++
13
    
14
    
15
      library
16
    
17
    
18
      parallel
19
    
20
  
21
22
 
23
Parallel Mode
24
 
25
 The libstdc++ parallel mode is an experimental parallel
26
implementation of many algorithms the C++ Standard Library.
27
28
 
29
30
Several of the standard algorithms, for instance
31
std::sort, are made parallel using OpenMP
32
annotations. These parallel mode constructs and can be invoked by
33
explicit source declaration or by compiling existing sources with a
34
specific compiler flag.
35
36
 
37
 
38
39
  Intro
40
 
41
The following library components in the include
42
numeric are included in the parallel mode:
43
44
  std::accumulate
45
  std::adjacent_difference
46
  std::inner_product
47
  std::partial_sum
48
49
 
50
The following library components in the include
51
algorithm are included in the parallel mode:
52
53
  std::adjacent_find
54
  std::count
55
  std::count_if
56
  std::equal
57
  std::find
58
  std::find_if
59
  std::find_first_of
60
  std::for_each
61
  std::generate
62
  std::generate_n
63
  std::lexicographical_compare
64
  std::mismatch
65
  std::search
66
  std::search_n
67
  std::transform
68
  std::replace
69
  std::replace_if
70
  std::max_element
71
  std::merge
72
  std::min_element
73
  std::nth_element
74
  std::partial_sort
75
  std::partition
76
  std::random_shuffle
77
  std::set_union
78
  std::set_intersection
79
  std::set_symmetric_difference
80
  std::set_difference
81
  std::sort
82
  std::stable_sort
83
  std::unique_copy
84
85
 
86
87
 
88
89
  Semantics
90
 
91
 The parallel mode STL algorithms are currently not exception-safe,
92
i.e. user-defined functors must not throw exceptions.
93
Also, the order of execution is not guaranteed for some functions, of course.
94
Therefore, user-defined functors should not have any concurrent side effects.
95
96
 
97
 Since the current GCC OpenMP implementation does not support
98
OpenMP parallel regions in concurrent threads,
99
it is not possible to call parallel STL algorithm in
100
concurrent threads, either.
101
It might work with other compilers, though.
102
 
103
104
 
105
106
  Using
107
 
108
109
  Prerequisite Compiler Flags
110
 
111
112
  Any use of parallel functionality requires additional compiler
113
  and runtime support, in particular support for OpenMP. Adding this support is
114
  not difficult: just compile your application with the compiler
115
  flag -fopenmp. This will link
116
  in libgomp, the GNU
117
  OpenMP implementation,
118
  whose presence is mandatory.
119
120
 
121
122
In addition, hardware that supports atomic operations and a compiler
123
  capable of producing atomic operations is mandatory: GCC defaults to no
124
  support for atomic operations on some common hardware
125
  architectures. Activating atomic operations may require explicit
126
  compiler flags on some targets (like sparc and x86), such
127
  as -march=i686,
128
  -march=native or -mcpu=v9. See
129
  the GCC manual for more information.
130
131
 
132
133
 
134
135
  Using Parallel Mode
136
 
137
138
  To use the libstdc++ parallel mode, compile your application with
139
  the prerequisite flags as detailed above, and in addition
140
  add -D_GLIBCXX_PARALLEL. This will convert all
141
  use of the standard (sequential) algorithms to the appropriate parallel
142
  equivalents. Please note that this doesn't necessarily mean that
143
  everything will end up being executed in a parallel manner, but
144
  rather that the heuristics and settings coded into the parallel
145
  versions will be used to determine if all, some, or no algorithms
146
  will be executed using parallel variants.
147
148
 
149
Note that the _GLIBCXX_PARALLEL define may change the
150
  sizes and behavior of standard class templates such as
151
  std::search, and therefore one can only link code
152
  compiled with parallel mode and code compiled without parallel mode
153
  if no instantiation of a container is passed between the two
154
  translation units. Parallel mode functionality has distinct linkage,
155
  and cannot be confused with normal mode symbols.
156
157
158
 
159
160
  Using Specific Parallel Components
161
 
162
When it is not feasible to recompile your entire application, or
163
  only specific algorithms need to be parallel-aware, individual
164
  parallel algorithms can be made available explicitly. These
165
  parallel algorithms are functionally equivalent to the standard
166
  drop-in algorithms used in parallel mode, but they are available in
167
  a separate namespace as GNU extensions and may be used in programs
168
  compiled with either release mode or with parallel mode.
169
170
 
171
 
172
An example of using a parallel version
173
of std::sort, but no other parallel algorithms, is:
174
175
 
176
177
#include <vector>
178
#include <parallel/algorithm>
179
 
180
int main()
181
{
182
  std::vector<int> v(100);
183
 
184
  // ...
185
 
186
  // Explicitly force a call to parallel sort.
187
  __gnu_parallel::sort(v.begin(), v.end());
188
  return 0;
189
}
190
191
 
192
193
Then compile this code with the prerequisite compiler flags
194
(-fopenmp and any necessary architecture-specific
195
flags for atomic operations.)
196
197
 
198
 The following table provides the names and headers of all the
199
  parallel algorithms that can be used in a similar manner:
200
201
 
202
203
Parallel Algorithms
204
205
206
207
208
209
 
210
211
  
212
    Algorithm
213
    Header
214
    Parallel algorithm
215
    Parallel header
216
  
217
218
 
219
220
  
221
    std::accumulate
222
    numeric
223
    __gnu_parallel::accumulate
224
    parallel/numeric
225
  
226
  
227
    std::adjacent_difference
228
    numeric
229
    __gnu_parallel::adjacent_difference
230
    parallel/numeric
231
  
232
  
233
    std::inner_product
234
    numeric
235
    __gnu_parallel::inner_product
236
    parallel/numeric
237
  
238
  
239
    std::partial_sum
240
    numeric
241
    __gnu_parallel::partial_sum
242
    parallel/numeric
243
  
244
  
245
    std::adjacent_find
246
    algorithm
247
    __gnu_parallel::adjacent_find
248
    parallel/algorithm
249
  
250
 
251
  
252
    std::count
253
    algorithm
254
    __gnu_parallel::count
255
    parallel/algorithm
256
  
257
 
258
  
259
    std::count_if
260
    algorithm
261
    __gnu_parallel::count_if
262
    parallel/algorithm
263
  
264
 
265
  
266
    std::equal
267
    algorithm
268
    __gnu_parallel::equal
269
    parallel/algorithm
270
  
271
 
272
  
273
    std::find
274
    algorithm
275
    __gnu_parallel::find
276
    parallel/algorithm
277
  
278
 
279
  
280
    std::find_if
281
    algorithm
282
    __gnu_parallel::find_if
283
    parallel/algorithm
284
  
285
 
286
  
287
    std::find_first_of
288
    algorithm
289
    __gnu_parallel::find_first_of
290
    parallel/algorithm
291
  
292
 
293
  
294
    std::for_each
295
    algorithm
296
    __gnu_parallel::for_each
297
    parallel/algorithm
298
  
299
 
300
  
301
    std::generate
302
    algorithm
303
    __gnu_parallel::generate
304
    parallel/algorithm
305
  
306
 
307
  
308
    std::generate_n
309
    algorithm
310
    __gnu_parallel::generate_n
311
    parallel/algorithm
312
  
313
 
314
  
315
    std::lexicographical_compare
316
    algorithm
317
    __gnu_parallel::lexicographical_compare
318
    parallel/algorithm
319
  
320
 
321
  
322
    std::mismatch
323
    algorithm
324
    __gnu_parallel::mismatch
325
    parallel/algorithm
326
  
327
 
328
  
329
    std::search
330
    algorithm
331
    __gnu_parallel::search
332
    parallel/algorithm
333
  
334
 
335
  
336
    std::search_n
337
    algorithm
338
    __gnu_parallel::search_n
339
    parallel/algorithm
340
  
341
 
342
  
343
    std::transform
344
    algorithm
345
    __gnu_parallel::transform
346
    parallel/algorithm
347
  
348
 
349
  
350
    std::replace
351
    algorithm
352
    __gnu_parallel::replace
353
    parallel/algorithm
354
  
355
 
356
  
357
    std::replace_if
358
    algorithm
359
    __gnu_parallel::replace_if
360
    parallel/algorithm
361
  
362
 
363
  
364
    std::max_element
365
    algorithm
366
    __gnu_parallel::max_element
367
    parallel/algorithm
368
  
369
 
370
  
371
    std::merge
372
    algorithm
373
    __gnu_parallel::merge
374
    parallel/algorithm
375
  
376
 
377
  
378
    std::min_element
379
    algorithm
380
    __gnu_parallel::min_element
381
    parallel/algorithm
382
  
383
 
384
  
385
    std::nth_element
386
    algorithm
387
    __gnu_parallel::nth_element
388
    parallel/algorithm
389
  
390
 
391
  
392
    std::partial_sort
393
    algorithm
394
    __gnu_parallel::partial_sort
395
    parallel/algorithm
396
  
397
 
398
  
399
    std::partition
400
    algorithm
401
    __gnu_parallel::partition
402
    parallel/algorithm
403
  
404
 
405
  
406
    std::random_shuffle
407
    algorithm
408
    __gnu_parallel::random_shuffle
409
    parallel/algorithm
410
  
411
 
412
  
413
    std::set_union
414
    algorithm
415
    __gnu_parallel::set_union
416
    parallel/algorithm
417
  
418
 
419
  
420
    std::set_intersection
421
    algorithm
422
    __gnu_parallel::set_intersection
423
    parallel/algorithm
424
  
425
 
426
  
427
    std::set_symmetric_difference
428
    algorithm
429
    __gnu_parallel::set_symmetric_difference
430
    parallel/algorithm
431
  
432
 
433
  
434
    std::set_difference
435
    algorithm
436
    __gnu_parallel::set_difference
437
    parallel/algorithm
438
  
439
 
440
  
441
    std::sort
442
    algorithm
443
    __gnu_parallel::sort
444
    parallel/algorithm
445
  
446
 
447
  
448
    std::stable_sort
449
    algorithm
450
    __gnu_parallel::stable_sort
451
    parallel/algorithm
452
  
453
 
454
  
455
    std::unique_copy
456
    algorithm
457
    __gnu_parallel::unique_copy
458
    parallel/algorithm
459
  
460
461
462
463
 
464
465
 
466
467
 
468
469
  Design
470
  
471
  
472
473
  Interface Basics
474
 
475
476
All parallel algorithms are intended to have signatures that are
477
equivalent to the ISO C++ algorithms replaced. For instance, the
478
std::adjacent_find function is declared as:
479
480
481
namespace std
482
{
483
  template<typename _FIter>
484
    _FIter
485
    adjacent_find(_FIter, _FIter);
486
}
487
488
 
489
490
Which means that there should be something equivalent for the parallel
491
version. Indeed, this is the case:
492
493
 
494
495
namespace std
496
{
497
  namespace __parallel
498
  {
499
    template<typename _FIter>
500
      _FIter
501
      adjacent_find(_FIter, _FIter);
502
 
503
    ...
504
  }
505
}
506
507
 
508
But.... why the ellipses?
509
510
 
511
 The ellipses in the example above represent additional overloads
512
required for the parallel version of the function. These additional
513
overloads are used to dispatch calls from the ISO C++ function
514
signature to the appropriate parallel function (or sequential
515
function, if no parallel functions are deemed worthy), based on either
516
compile-time or run-time conditions.
517
518
 
519
 The available signature options are specific for the different
520
algorithms/algorithm classes.
521
 
522
 The general view of overloads for the parallel algorithms look like this:
523
524
525
   ISO C++ signature
526
   ISO C++ signature + sequential_tag argument
527
   ISO C++ signature + algorithm-specific tag type
528
    (several signatures)
529
530
 
531
 Please note that the implementation may use additional functions
532
(designated with the _switch suffix) to dispatch from the
533
ISO C++ signature to the correct parallel version. Also, some of the
534
algorithms do not have support for run-time conditions, so the last
535
overload is therefore missing.
536
537
 
538
 
539
540
 
541
542
  Configuration and Tuning
543
 
544
 
545
546
  Setting up the OpenMP Environment
547
 
548
549
Several aspects of the overall runtime environment can be manipulated
550
by standard OpenMP function calls.
551
552
 
553
554
To specify the number of threads to be used for the algorithms globally,
555
use the function omp_set_num_threads. An example:
556
557
 
558
559
#include <stdlib.h>
560
#include <omp.h>
561
 
562
int main()
563
{
564
  // Explicitly set number of threads.
565
  const int threads_wanted = 20;
566
  omp_set_dynamic(false);
567
  omp_set_num_threads(threads_wanted);
568
 
569
  // Call parallel mode algorithms.
570
 
571
  return 0;
572
}
573
574
 
575
576
 Some algorithms allow the number of threads being set for a particular call,
577
 by augmenting the algorithm variant.
578
 See the next section for further information.
579
580
 
581
582
Other parts of the runtime environment able to be manipulated include
583
nested parallelism (omp_set_nested), schedule kind
584
(omp_set_schedule), and others. See the OpenMP
585
documentation for more information.
586
587
 
588
589
 
590
591
  Compile Time Switches
592
 
593
594
To force an algorithm to execute sequentially, even though parallelism
595
is switched on in general via the macro _GLIBCXX_PARALLEL,
596
add __gnu_parallel::sequential_tag() to the end
597
of the algorithm's argument list.
598
599
 
600
601
Like so:
602
603
 
604
605
std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
606
607
 
608
609
Some parallel algorithm variants can be excluded from compilation by
610
preprocessor defines. See the doxygen documentation on
611
compiletime_settings.h and features.h for details.
612
613
 
614
615
For some algorithms, the desired variant can be chosen at compile-time by
616
appending a tag object. The available options are specific to the particular
617
algorithm (class).
618
619
 
620
621
For the "embarrassingly parallel" algorithms, there is only one "tag object
622
type", the enum _Parallelism.
623
It takes one of the following values,
624
__gnu_parallel::parallel_tag,
625
__gnu_parallel::balanced_tag,
626
__gnu_parallel::unbalanced_tag,
627
__gnu_parallel::omp_loop_tag,
628
__gnu_parallel::omp_loop_static_tag.
629
This means that the actual parallelization strategy is chosen at run-time.
630
(Choosing the variants at compile-time will come soon.)
631
632
 
633
634
For the following algorithms in general, we have
635
__gnu_parallel::parallel_tag and
636
__gnu_parallel::default_parallel_tag, in addition to
637
__gnu_parallel::sequential_tag.
638
__gnu_parallel::default_parallel_tag chooses the default
639
algorithm at compiletime, as does omitting the tag.
640
__gnu_parallel::parallel_tag postpones the decision to runtime
641
(see next section).
642
For all tags, the number of threads desired for this call can optionally be
643
passed to the respective tag's constructor.
644
645
 
646
647
The multiway_merge algorithm comes with the additional choices,
648
__gnu_parallel::exact_tag and
649
__gnu_parallel::sampling_tag.
650
Exact and sampling are the two available splitting strategies.
651
652
 
653
654
For the sort and stable_sort algorithms, there are
655
several additional choices, namely
656
__gnu_parallel::multiway_mergesort_tag,
657
__gnu_parallel::multiway_mergesort_exact_tag,
658
__gnu_parallel::multiway_mergesort_sampling_tag,
659
__gnu_parallel::quicksort_tag, and
660
__gnu_parallel::balanced_quicksort_tag.
661
Multiway mergesort comes with the two splitting strategies for multi-way
662
merging. The quicksort options cannot be used for stable_sort.
663
664
 
665
666
 
667
668
  Run Time Settings and Defaults
669
 
670
671
The default parallelization strategy, the choice of specific algorithm
672
strategy, the minimum threshold limits for individual parallel
673
algorithms, and aspects of the underlying hardware can be specified as
674
desired via manipulation
675
of __gnu_parallel::_Settings member data.
676
677
 
678
679
First off, the choice of parallelization strategy: serial, parallel,
680
or heuristically deduced. This corresponds
681
to __gnu_parallel::_Settings::algorithm_strategy and is a
682
value of enum __gnu_parallel::_AlgorithmStrategy
683
type. Choices
684
include: heuristic, force_sequential,
685
and force_parallel. The default is heuristic.
686
687
 
688
 
689
690
Next, the sub-choices for algorithm variant, if not fixed at compile-time.
691
Specific algorithms like find or sort
692
can be implemented in multiple ways: when this is the case,
693
a __gnu_parallel::_Settings member exists to
694
pick the default strategy. For
695
example, __gnu_parallel::_Settings::sort_algorithm can
696
have any values of
697
enum __gnu_parallel::_SortAlgorithm: MWMS, QS,
698
or QS_BALANCED.
699
700
 
701
702
Likewise for setting the minimal threshold for algorithm
703
parallelization.  Parallelism always incurs some overhead. Thus, it is
704
not helpful to parallelize operations on very small sets of
705
data. Because of this, measures are taken to avoid parallelizing below
706
a certain, pre-determined threshold. For each algorithm, a minimum
707
problem size is encoded as a variable in the
708
active __gnu_parallel::_Settings object.  This
709
threshold variable follows the following naming scheme:
710
__gnu_parallel::_Settings::[algorithm]_minimal_n.  So,
711
for fill, the threshold variable
712
is __gnu_parallel::_Settings::fill_minimal_n,
713
714
 
715
716
Finally, hardware details like L1/L2 cache size can be hardwired
717
via __gnu_parallel::_Settings::L1_cache_size and friends.
718
719
 
720
721
722
 
723
724
All these configuration variables can be changed by the user, if
725
desired.
726
There exists one global instance of the class _Settings,
727
i. e. it is a singleton. It can be read and written by calling
728
__gnu_parallel::_Settings::get and
729
__gnu_parallel::_Settings::set, respectively.
730
Please note that the first call return a const object, so direct manipulation
731
is forbidden.
732
See 
733
  settings.h
734
for complete details.
735
736
 
737
738
A small example of tuning the default:
739
740
 
741
742
#include <parallel/algorithm>
743
#include <parallel/settings.h>
744
 
745
int main()
746
{
747
  __gnu_parallel::_Settings s;
748
  s.algorithm_strategy = __gnu_parallel::force_parallel;
749
  __gnu_parallel::_Settings::set(s);
750
 
751
  // Do work... all algorithms will be parallelized, always.
752
 
753
  return 0;
754
}
755
756
 
757
758
 
759
760
 
761
762
  Implementation Namespaces
763
 
764
 One namespace contain versions of code that are always
765
explicitly sequential:
766
__gnu_serial.
767
768
 
769
 Two namespaces contain the parallel mode:
770
std::__parallel and __gnu_parallel.
771
772
 
773
 Parallel implementations of standard components, including
774
template helpers to select parallelism, are defined in namespace
775
std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in
776
std::__parallel::transform from parallel/algorithm. In addition, these parallel
777
implementations are injected into namespace
778
__gnu_parallel with using declarations.
779
780
 
781
 Support and general infrastructure is in namespace
782
__gnu_parallel.
783
784
 
785
 More information, and an organized index of types and functions
786
related to the parallel mode on a per-namespace basis, can be found in
787
the generated source documentation.
788
789
 
790
791
 
792
793
 
794
795
  Testing
796
 
797
  
798
    Both the normal conformance and regression tests and the
799
    supplemental performance tests work.
800
  
801
 
802
  
803
    To run the conformance and regression tests with the parallel mode
804
    active,
805
  
806
 
807
  
808
  make check-parallel
809
  
810
 
811
  
812
    The log and summary files for conformance testing are in the
813
    testsuite/parallel directory.
814
  
815
 
816
  
817
    To run the performance tests with the parallel mode active,
818
  
819
 
820
  
821
  make check-performance-parallel
822
  
823
 
824
  
825
    The result file for performance testing are in the
826
    testsuite directory, in the file
827
    libstdc++_performance.sum. In addition, the
828
    policy-based containers have their own visualizations, which have
829
    additional software dependencies than the usual bare-boned text
830
    file, and can be generated by using the make
831
    doc-performance rule in the testsuite's Makefile.
832
833
834
 
835
836
Bibliography
837
 
838
  
839
    </code></pre></td>
      </tr>
      <tr valign="middle">
         <td>840</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>      Parallelization of Bulk Operations for STL Dictionaries</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>841</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    
842
 
843
    
844
      Johannes
845
      Singler
846
    
847
    
848
      Leonor
849
      Frias
850
    
851
 
852
    
853
      2007
854
      
855
    
856
 
857
    
858
      
859
        Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
860
      
861
    
862
  
863
 
864
  
865
    </code></pre></td>
      </tr>
      <tr valign="middle">
         <td>866</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>      The Multi-Core Standard Template Library</code></pre></td>
      </tr>
      <tr valign="middle">
         <td>867</td>
         <td></td>
         <td></td>
         <td class="code"><pre><code>    
868
 
869
    
870
      Johannes
871
      Singler
872
    
873
    
874
      Peter
875
      Sanders
876
    
877
    
878
      Felix
879
      Putze
880
    
881
 
882
    
883
      2007
884
      
885
    
886
 
887
    
888
      
889
         Euro-Par 2007: Parallel Processing. (LNCS 4641)
890
      
891
    
892
  
893
 
894
895
 
896

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.