OpenCores
URL https://opencores.org/ocsvn/openrisc/openrisc/trunk

Subversion Repositories openrisc

[/] [openrisc/] [trunk/] [gnu-dev/] [or1k-gcc/] [libstdc++-v3/] [doc/] [xml/] [manual/] [parallel_mode.xml] - Blame information for rev 742

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 742 jeremybenn
2
         xml:id="manual.ext.parallel_mode" xreflabel="Parallel Mode">
3
4
 
5
Parallel Mode
6
  
7
    
8
      C++
9
    
10
    
11
      library
12
    
13
    
14
      parallel
15
    
16
  
17
18
 
19
 
20
 
21
 The libstdc++ parallel mode is an experimental parallel
22
implementation of many algorithms the C++ Standard Library.
23
24
 
25
26
Several of the standard algorithms, for instance
27
std::sort, are made parallel using OpenMP
28
annotations. These parallel mode constructs and can be invoked by
29
explicit source declaration or by compiling existing sources with a
30
specific compiler flag.
31
32
 
33
 
34
Intro
35
 
36
 
37
The following library components in the include
38
numeric are included in the parallel mode:
39
40
  std::accumulate
41
  std::adjacent_difference
42
  std::inner_product
43
  std::partial_sum
44
45
 
46
The following library components in the include
47
algorithm are included in the parallel mode:
48
49
  std::adjacent_find
50
  std::count
51
  std::count_if
52
  std::equal
53
  std::find
54
  std::find_if
55
  std::find_first_of
56
  std::for_each
57
  std::generate
58
  std::generate_n
59
  std::lexicographical_compare
60
  std::mismatch
61
  std::search
62
  std::search_n
63
  std::transform
64
  std::replace
65
  std::replace_if
66
  std::max_element
67
  std::merge
68
  std::min_element
69
  std::nth_element
70
  std::partial_sort
71
  std::partition
72
  std::random_shuffle
73
  std::set_union
74
  std::set_intersection
75
  std::set_symmetric_difference
76
  std::set_difference
77
  std::sort
78
  std::stable_sort
79
  std::unique_copy
80
81
 
82
83
 
84
Semantics
85
 
86
 
87
 The parallel mode STL algorithms are currently not exception-safe,
88
i.e. user-defined functors must not throw exceptions.
89
Also, the order of execution is not guaranteed for some functions, of course.
90
Therefore, user-defined functors should not have any concurrent side effects.
91
92
 
93
 Since the current GCC OpenMP implementation does not support
94
OpenMP parallel regions in concurrent threads,
95
it is not possible to call parallel STL algorithm in
96
concurrent threads, either.
97
It might work with other compilers, though.
98
 
99
100
 
101
Using
102
 
103
 
104
Prerequisite Compiler Flags
105
 
106
 
107
108
  Any use of parallel functionality requires additional compiler
109
  and runtime support, in particular support for OpenMP. Adding this support is
110
  not difficult: just compile your application with the compiler
111
  flag -fopenmp. This will link
112
  in libgomp, the
113
  OpenMP GNU implementation,
114
  whose presence is mandatory.
115
116
 
117
118
In addition, hardware that supports atomic operations and a compiler
119
  capable of producing atomic operations is mandatory: GCC defaults to no
120
  support for atomic operations on some common hardware
121
  architectures. Activating atomic operations may require explicit
122
  compiler flags on some targets (like sparc and x86), such
123
  as -march=i686,
124
  -march=native or -mcpu=v9. See
125
  the GCC manual for more information.
126
127
 
128
129
 
130
Using Parallel Mode
131
 
132
 
133
134
  To use the libstdc++ parallel mode, compile your application with
135
  the prerequisite flags as detailed above, and in addition
136
  add -D_GLIBCXX_PARALLEL. This will convert all
137
  use of the standard (sequential) algorithms to the appropriate parallel
138
  equivalents. Please note that this doesn't necessarily mean that
139
  everything will end up being executed in a parallel manner, but
140
  rather that the heuristics and settings coded into the parallel
141
  versions will be used to determine if all, some, or no algorithms
142
  will be executed using parallel variants.
143
144
 
145
Note that the _GLIBCXX_PARALLEL define may change the
146
  sizes and behavior of standard class templates such as
147
  std::search, and therefore one can only link code
148
  compiled with parallel mode and code compiled without parallel mode
149
  if no instantiation of a container is passed between the two
150
  translation units. Parallel mode functionality has distinct linkage,
151
  and cannot be confused with normal mode symbols.
152
153
154
 
155
Using Specific Parallel Components
156
 
157
 
158
When it is not feasible to recompile your entire application, or
159
  only specific algorithms need to be parallel-aware, individual
160
  parallel algorithms can be made available explicitly. These
161
  parallel algorithms are functionally equivalent to the standard
162
  drop-in algorithms used in parallel mode, but they are available in
163
  a separate namespace as GNU extensions and may be used in programs
164
  compiled with either release mode or with parallel mode.
165
166
 
167
 
168
An example of using a parallel version
169
of std::sort, but no other parallel algorithms, is:
170
171
 
172
173
#include <vector>
174
#include <parallel/algorithm>
175
 
176
int main()
177
{
178
  std::vector<int> v(100);
179
 
180
  // ...
181
 
182
  // Explicitly force a call to parallel sort.
183
  __gnu_parallel::sort(v.begin(), v.end());
184
  return 0;
185
}
186
187
 
188
189
Then compile this code with the prerequisite compiler flags
190
(-fopenmp and any necessary architecture-specific
191
flags for atomic operations.)
192
193
 
194
 The following table provides the names and headers of all the
195
  parallel algorithms that can be used in a similar manner:
196
197
 
198
199
Parallel Algorithms
200
 
201
202
203
204
205
206
 
207
208
  
209
    Algorithm
210
    Header
211
    Parallel algorithm
212
    Parallel header
213
  
214
215
 
216
217
  
218
    std::accumulate
219
    numeric
220
    __gnu_parallel::accumulate
221
    parallel/numeric
222
  
223
  
224
    std::adjacent_difference
225
    numeric
226
    __gnu_parallel::adjacent_difference
227
    parallel/numeric
228
  
229
  
230
    std::inner_product
231
    numeric
232
    __gnu_parallel::inner_product
233
    parallel/numeric
234
  
235
  
236
    std::partial_sum
237
    numeric
238
    __gnu_parallel::partial_sum
239
    parallel/numeric
240
  
241
  
242
    std::adjacent_find
243
    algorithm
244
    __gnu_parallel::adjacent_find
245
    parallel/algorithm
246
  
247
 
248
  
249
    std::count
250
    algorithm
251
    __gnu_parallel::count
252
    parallel/algorithm
253
  
254
 
255
  
256
    std::count_if
257
    algorithm
258
    __gnu_parallel::count_if
259
    parallel/algorithm
260
  
261
 
262
  
263
    std::equal
264
    algorithm
265
    __gnu_parallel::equal
266
    parallel/algorithm
267
  
268
 
269
  
270
    std::find
271
    algorithm
272
    __gnu_parallel::find
273
    parallel/algorithm
274
  
275
 
276
  
277
    std::find_if
278
    algorithm
279
    __gnu_parallel::find_if
280
    parallel/algorithm
281
  
282
 
283
  
284
    std::find_first_of
285
    algorithm
286
    __gnu_parallel::find_first_of
287
    parallel/algorithm
288
  
289
 
290
  
291
    std::for_each
292
    algorithm
293
    __gnu_parallel::for_each
294
    parallel/algorithm
295
  
296
 
297
  
298
    std::generate
299
    algorithm
300
    __gnu_parallel::generate
301
    parallel/algorithm
302
  
303
 
304
  
305
    std::generate_n
306
    algorithm
307
    __gnu_parallel::generate_n
308
    parallel/algorithm
309
  
310
 
311
  
312
    std::lexicographical_compare
313
    algorithm
314
    __gnu_parallel::lexicographical_compare
315
    parallel/algorithm
316
  
317
 
318
  
319
    std::mismatch
320
    algorithm
321
    __gnu_parallel::mismatch
322
    parallel/algorithm
323
  
324
 
325
  
326
    std::search
327
    algorithm
328
    __gnu_parallel::search
329
    parallel/algorithm
330
  
331
 
332
  
333
    std::search_n
334
    algorithm
335
    __gnu_parallel::search_n
336
    parallel/algorithm
337
  
338
 
339
  
340
    std::transform
341
    algorithm
342
    __gnu_parallel::transform
343
    parallel/algorithm
344
  
345
 
346
  
347
    std::replace
348
    algorithm
349
    __gnu_parallel::replace
350
    parallel/algorithm
351
  
352
 
353
  
354
    std::replace_if
355
    algorithm
356
    __gnu_parallel::replace_if
357
    parallel/algorithm
358
  
359
 
360
  
361
    std::max_element
362
    algorithm
363
    __gnu_parallel::max_element
364
    parallel/algorithm
365
  
366
 
367
  
368
    std::merge
369
    algorithm
370
    __gnu_parallel::merge
371
    parallel/algorithm
372
  
373
 
374
  
375
    std::min_element
376
    algorithm
377
    __gnu_parallel::min_element
378
    parallel/algorithm
379
  
380
 
381
  
382
    std::nth_element
383
    algorithm
384
    __gnu_parallel::nth_element
385
    parallel/algorithm
386
  
387
 
388
  
389
    std::partial_sort
390
    algorithm
391
    __gnu_parallel::partial_sort
392
    parallel/algorithm
393
  
394
 
395
  
396
    std::partition
397
    algorithm
398
    __gnu_parallel::partition
399
    parallel/algorithm
400
  
401
 
402
  
403
    std::random_shuffle
404
    algorithm
405
    __gnu_parallel::random_shuffle
406
    parallel/algorithm
407
  
408
 
409
  
410
    std::set_union
411
    algorithm
412
    __gnu_parallel::set_union
413
    parallel/algorithm
414
  
415
 
416
  
417
    std::set_intersection
418
    algorithm
419
    __gnu_parallel::set_intersection
420
    parallel/algorithm
421
  
422
 
423
  
424
    std::set_symmetric_difference
425
    algorithm
426
    __gnu_parallel::set_symmetric_difference
427
    parallel/algorithm
428
  
429
 
430
  
431
    std::set_difference
432
    algorithm
433
    __gnu_parallel::set_difference
434
    parallel/algorithm
435
  
436
 
437
  
438
    std::sort
439
    algorithm
440
    __gnu_parallel::sort
441
    parallel/algorithm
442
  
443
 
444
  
445
    std::stable_sort
446
    algorithm
447
    __gnu_parallel::stable_sort
448
    parallel/algorithm
449
  
450
 
451
  
452
    std::unique_copy
453
    algorithm
454
    __gnu_parallel::unique_copy
455
    parallel/algorithm
456
  
457
458
459
460
 
461
462
 
463
464
 
465
Design
466
 
467
  
468
  
469
Interface Basics
470
 
471
 
472
473
All parallel algorithms are intended to have signatures that are
474
equivalent to the ISO C++ algorithms replaced. For instance, the
475
std::adjacent_find function is declared as:
476
477
478
namespace std
479
{
480
  template<typename _FIter>
481
    _FIter
482
    adjacent_find(_FIter, _FIter);
483
}
484
485
 
486
487
Which means that there should be something equivalent for the parallel
488
version. Indeed, this is the case:
489
490
 
491
492
namespace std
493
{
494
  namespace __parallel
495
  {
496
    template<typename _FIter>
497
      _FIter
498
      adjacent_find(_FIter, _FIter);
499
 
500
    ...
501
  }
502
}
503
504
 
505
But.... why the ellipses?
506
507
 
508
 The ellipses in the example above represent additional overloads
509
required for the parallel version of the function. These additional
510
overloads are used to dispatch calls from the ISO C++ function
511
signature to the appropriate parallel function (or sequential
512
function, if no parallel functions are deemed worthy), based on either
513
compile-time or run-time conditions.
514
515
 
516
 The available signature options are specific for the different
517
algorithms/algorithm classes.
518
 
519
 The general view of overloads for the parallel algorithms look like this:
520
521
522
   ISO C++ signature
523
   ISO C++ signature + sequential_tag argument
524
   ISO C++ signature + algorithm-specific tag type
525
    (several signatures)
526
527
 
528
 Please note that the implementation may use additional functions
529
(designated with the _switch suffix) to dispatch from the
530
ISO C++ signature to the correct parallel version. Also, some of the
531
algorithms do not have support for run-time conditions, so the last
532
overload is therefore missing.
533
534
 
535
 
536
537
 
538
Configuration and Tuning
539
 
540
 
541
 
542
Setting up the OpenMP Environment
543
 
544
 
545
546
Several aspects of the overall runtime environment can be manipulated
547
by standard OpenMP function calls.
548
549
 
550
551
To specify the number of threads to be used for the algorithms globally,
552
use the function omp_set_num_threads. An example:
553
554
 
555
556
#include <stdlib.h>
557
#include <omp.h>
558
 
559
int main()
560
{
561
  // Explicitly set number of threads.
562
  const int threads_wanted = 20;
563
  omp_set_dynamic(false);
564
  omp_set_num_threads(threads_wanted);
565
 
566
  // Call parallel mode algorithms.
567
 
568
  return 0;
569
}
570
571
 
572
573
 Some algorithms allow the number of threads being set for a particular call,
574
 by augmenting the algorithm variant.
575
 See the next section for further information.
576
577
 
578
579
Other parts of the runtime environment able to be manipulated include
580
nested parallelism (omp_set_nested), schedule kind
581
(omp_set_schedule), and others. See the OpenMP
582
documentation for more information.
583
584
 
585
586
 
587
Compile Time Switches
588
 
589
 
590
591
To force an algorithm to execute sequentially, even though parallelism
592
is switched on in general via the macro _GLIBCXX_PARALLEL,
593
add __gnu_parallel::sequential_tag() to the end
594
of the algorithm's argument list.
595
596
 
597
598
Like so:
599
600
 
601
602
std::sort(v.begin(), v.end(), __gnu_parallel::sequential_tag());
603
604
 
605
606
Some parallel algorithm variants can be excluded from compilation by
607
preprocessor defines. See the doxygen documentation on
608
compiletime_settings.h and features.h for details.
609
610
 
611
612
For some algorithms, the desired variant can be chosen at compile-time by
613
appending a tag object. The available options are specific to the particular
614
algorithm (class).
615
616
 
617
618
For the "embarrassingly parallel" algorithms, there is only one "tag object
619
type", the enum _Parallelism.
620
It takes one of the following values,
621
__gnu_parallel::parallel_tag,
622
__gnu_parallel::balanced_tag,
623
__gnu_parallel::unbalanced_tag,
624
__gnu_parallel::omp_loop_tag,
625
__gnu_parallel::omp_loop_static_tag.
626
This means that the actual parallelization strategy is chosen at run-time.
627
(Choosing the variants at compile-time will come soon.)
628
629
 
630
631
For the following algorithms in general, we have
632
__gnu_parallel::parallel_tag and
633
__gnu_parallel::default_parallel_tag, in addition to
634
__gnu_parallel::sequential_tag.
635
__gnu_parallel::default_parallel_tag chooses the default
636
algorithm at compiletime, as does omitting the tag.
637
__gnu_parallel::parallel_tag postpones the decision to runtime
638
(see next section).
639
For all tags, the number of threads desired for this call can optionally be
640
passed to the respective tag's constructor.
641
642
 
643
644
The multiway_merge algorithm comes with the additional choices,
645
__gnu_parallel::exact_tag and
646
__gnu_parallel::sampling_tag.
647
Exact and sampling are the two available splitting strategies.
648
649
 
650
651
For the sort and stable_sort algorithms, there are
652
several additional choices, namely
653
__gnu_parallel::multiway_mergesort_tag,
654
__gnu_parallel::multiway_mergesort_exact_tag,
655
__gnu_parallel::multiway_mergesort_sampling_tag,
656
__gnu_parallel::quicksort_tag, and
657
__gnu_parallel::balanced_quicksort_tag.
658
Multiway mergesort comes with the two splitting strategies for multi-way
659
merging. The quicksort options cannot be used for stable_sort.
660
661
 
662
663
 
664
Run Time Settings and Defaults
665
 
666
 
667
668
The default parallelization strategy, the choice of specific algorithm
669
strategy, the minimum threshold limits for individual parallel
670
algorithms, and aspects of the underlying hardware can be specified as
671
desired via manipulation
672
of __gnu_parallel::_Settings member data.
673
674
 
675
676
First off, the choice of parallelization strategy: serial, parallel,
677
or heuristically deduced. This corresponds
678
to __gnu_parallel::_Settings::algorithm_strategy and is a
679
value of enum __gnu_parallel::_AlgorithmStrategy
680
type. Choices
681
include: heuristic, force_sequential,
682
and force_parallel. The default is heuristic.
683
684
 
685
 
686
687
Next, the sub-choices for algorithm variant, if not fixed at compile-time.
688
Specific algorithms like find or sort
689
can be implemented in multiple ways: when this is the case,
690
a __gnu_parallel::_Settings member exists to
691
pick the default strategy. For
692
example, __gnu_parallel::_Settings::sort_algorithm can
693
have any values of
694
enum __gnu_parallel::_SortAlgorithm: MWMS, QS,
695
or QS_BALANCED.
696
697
 
698
699
Likewise for setting the minimal threshold for algorithm
700
parallelization.  Parallelism always incurs some overhead. Thus, it is
701
not helpful to parallelize operations on very small sets of
702
data. Because of this, measures are taken to avoid parallelizing below
703
a certain, pre-determined threshold. For each algorithm, a minimum
704
problem size is encoded as a variable in the
705
active __gnu_parallel::_Settings object.  This
706
threshold variable follows the following naming scheme:
707
__gnu_parallel::_Settings::[algorithm]_minimal_n.  So,
708
for fill, the threshold variable
709
is __gnu_parallel::_Settings::fill_minimal_n,
710
711
 
712
713
Finally, hardware details like L1/L2 cache size can be hardwired
714
via __gnu_parallel::_Settings::L1_cache_size and friends.
715
716
 
717
718
719
 
720
721
All these configuration variables can be changed by the user, if
722
desired.
723
There exists one global instance of the class _Settings,
724
i. e. it is a singleton. It can be read and written by calling
725
__gnu_parallel::_Settings::get and
726
__gnu_parallel::_Settings::set, respectively.
727
Please note that the first call return a const object, so direct manipulation
728
is forbidden.
729
See 
730
  settings.h
731
for complete details.
732
733
 
734
735
A small example of tuning the default:
736
737
 
738
739
#include <parallel/algorithm>
740
#include <parallel/settings.h>
741
 
742
int main()
743
{
744
  __gnu_parallel::_Settings s;
745
  s.algorithm_strategy = __gnu_parallel::force_parallel;
746
  __gnu_parallel::_Settings::set(s);
747
 
748
  // Do work... all algorithms will be parallelized, always.
749
 
750
  return 0;
751
}
752
753
 
754
755
 
756
757
 
758
Implementation Namespaces
759
 
760
 
761
 One namespace contain versions of code that are always
762
explicitly sequential:
763
__gnu_serial.
764
765
 
766
 Two namespaces contain the parallel mode:
767
std::__parallel and __gnu_parallel.
768
769
 
770
 Parallel implementations of standard components, including
771
template helpers to select parallelism, are defined in namespace
772
std::__parallel. For instance, std::transform from algorithm has a parallel counterpart in
773
std::__parallel::transform from parallel/algorithm. In addition, these parallel
774
implementations are injected into namespace
775
__gnu_parallel with using declarations.
776
777
 
778
 Support and general infrastructure is in namespace
779
__gnu_parallel.
780
781
 
782
 More information, and an organized index of types and functions
783
related to the parallel mode on a per-namespace basis, can be found in
784
the generated source documentation.
785
786
 
787
788
 
789
790
 
791
Testing
792
 
793
 
794
  
795
    Both the normal conformance and regression tests and the
796
    supplemental performance tests work.
797
  
798
 
799
  
800
    To run the conformance and regression tests with the parallel mode
801
    active,
802
  
803
 
804
  
805
  make check-parallel
806
  
807
 
808
  
809
    The log and summary files for conformance testing are in the
810
    testsuite/parallel directory.
811
  
812
 
813
  
814
    To run the performance tests with the parallel mode active,
815
  
816
 
817
  
818
  make check-performance-parallel
819
  
820
 
821
  
822
    The result file for performance testing are in the
823
    testsuite directory, in the file
824
    libstdc++_performance.sum. In addition, the
825
    policy-based containers have their own visualizations, which have
826
    additional software dependencies than the usual bare-boned text
827
    file, and can be generated by using the make
828
    doc-performance rule in the testsuite's Makefile.
829
830
831
 
832
Bibliography
833
 
834
 
835
  
836
    
837
      Parallelization of Bulk Operations for STL Dictionaries
838
    
839
 
840
    JohannesSingler
841
    LeonorFrias
842
 
843
    
844
      2007
845
      
846
    
847
 
848
    
849
      
850
        Workshop on Highly Parallel Processing on a Chip (HPPC) 2007. (LNCS)
851
      
852
    
853
  
854
 
855
  
856
    
857
      The Multi-Core Standard Template Library
858
    
859
 
860
    JohannesSingler
861
    PeterSanders
862
    FelixPutze
863
 
864
    
865
      2007
866
      
867
    
868
 
869
    
870
      
871
         Euro-Par 2007: Parallel Processing. (LNCS 4641)
872
      
873
    
874
  
875
 
876
877
 
878

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.