forked from oneapi-src/oneTBB
-
Notifications
You must be signed in to change notification settings - Fork 1
/
CHANGES
2359 lines (1802 loc) · 92.5 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
------------------------------------------------------------------------
The list of most significant changes made over time in
Intel(R) Threading Building Blocks (Intel(R) TBB).
Intel TBB 2017 Update 7
TBB_INTERFACE_VERSION == 9107
Changes (w.r.t. Intel TBB 2017 Update 6):
- In the huge pages mode, the memory allocator now is also able to use
transparent huge pages.
Preview Features:
- Added support for Intel TBB integration into CMake-aware
projects, with valuable guidance and feedback provided by Brad King
(Kitware).
Bugs fixed:
- Fixed scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS, 0)
to process memory left after exited threads.
------------------------------------------------------------------------
Intel TBB 2017 Update 6
TBB_INTERFACE_VERSION == 9106
Changes (w.r.t. Intel TBB 2017 Update 5):
- Added support for Android* NDK r14.
Preview Features:
- Added a blocking terminate extension to the task_scheduler_init class
that allows an object to wait for termination of worker threads.
Bugs fixed:
- Fixed compilation and testing issues with MinGW (GCC 6).
- Fixed compilation with /std:c++latest option of VS 2017
(https://github.com/01org/tbb/issues/13).
------------------------------------------------------------------------
Intel TBB 2017 Update 5
TBB_INTERFACE_VERSION == 9105
Changes (w.r.t. Intel TBB 2017 Update 4):
- Added support for Microsoft* Visual Studio* 2017.
- Added graph/matmult example to demonstrate support for compute offload
to Intel(R) Graphics Technology in the flow graph API.
- The "compiler" build option now allows to specify a full path to the
compiler.
Changes affecting backward compatibility:
- Constructors for many classes, including graph nodes, concurrent
containers, thread-local containers, etc., are declared explicit and
cannot be used for implicit conversions anymore.
Bugs fixed:
- Added a workaround for bug 16657 in the GNU C Library (glibc)
affecting the debug version of tbb::mutex.
- Fixed a crash in pool_identify() called for an object allocated in
another thread.
------------------------------------------------------------------------
Intel TBB 2017 Update 4
TBB_INTERFACE_VERSION == 9104
Changes (w.r.t. Intel TBB 2017 Update 3):
- Added support for C++11 move semantics in parallel_do.
- Added support for FreeBSD* 11.
Changes affecting backward compatibility:
- Minimal compiler versions required for support of C++11 move semantics
raised to GCC 4.5, VS 2012, and Intel(R) C++ Compiler 14.0.
Bugs fixed:
- The workaround for crashes in the library compiled with GCC 6
(-flifetime-dse=1) was extended to Windows*.
------------------------------------------------------------------------
Intel TBB 2017 Update 3
TBB_INTERFACE_VERSION == 9103
Changes (w.r.t. Intel TBB 2017 Update 2):
- Added support for Android* 7.0 and Android* NDK r13, r13b.
Preview Features:
- Added template class gfx_factory to the flow graph API. It implements
the Factory concept for streaming_node to offload computations to
Intel(R) processor graphics.
Bugs fixed:
- Fixed a possible deadlock caused by missed wakeup signals in
task_arena::execute().
Open-source contributions integrated:
- A build fix for Linux* s390x platform by Jerry J.
------------------------------------------------------------------------
Intel TBB 2017 Update 2
TBB_INTERFACE_VERSION == 9102
Changes (w.r.t. Intel TBB 2017 Update 1):
- Removed the long-outdated support for Xbox* consoles.
Bugs fixed:
- Fixed the issue with task_arena::execute() not being processed when
the calling thread cannot join the arena.
- Fixed dynamic memory allocation replacement failure on macOS* 10.12.
------------------------------------------------------------------------
Intel TBB 2017 Update 1
TBB_INTERFACE_VERSION == 9101
Changes (w.r.t. Intel TBB 2017):
Bugs fixed:
- Fixed dynamic memory allocation replacement failures on Windows* 10
Anniversary Update.
- Fixed emplace() method of concurrent unordered containers to not
require a copy constructor.
------------------------------------------------------------------------
Intel TBB 2017
TBB_INTERFACE_VERSION == 9100
Changes (w.r.t. Intel TBB 4.4 Update 5):
- static_partitioner class is now a fully supported feature.
- async_node class is now a fully supported feature.
- Improved dynamic memory allocation replacement on Windows* OS to skip
DLLs for which replacement cannot be done, instead of aborting.
- Intel TBB no longer performs dynamic memory allocation replacement
for Microsoft* Visual Studio* 2008.
- For 64-bit platforms, quadrupled the worst-case limit on the amount
of memory the Intel TBB allocator can handle.
- Added TBB_USE_GLIBCXX_VERSION macro to specify the version of GNU
libstdc++ when it cannot be properly recognized, e.g. when used
with Clang on Linux* OS. Inspired by a contribution from David A.
- Added graph/stereo example to demostrate tbb::flow::async_msg.
- Removed a few cases of excessive user data copying in the flow graph.
- Reworked split_node to eliminate unnecessary overheads.
- Added support for C++11 move semantics to the argument of
tbb::parallel_do_feeder::add() method.
- Added C++11 move constructor and assignment operator to
tbb::combinable template class.
- Added tbb::this_task_arena::max_concurrency() function and
max_concurrency() method of class task_arena returning the maximal
number of threads that can work inside an arena.
- Deprecated tbb::task_arena::current_thread_index() static method;
use tbb::this_task_arena::current_thread_index() function instead.
- All examples for commercial version of library moved online:
https://software.intel.com/en-us/product-code-samples. Examples are
available as a standalone package or as a part of Intel(R) Parallel
Studio XE or Intel(R) System Studio Online Samples packages.
Changes affecting backward compatibility:
- Renamed following methods and types in async_node class:
Old New
async_gateway_type => gateway_type
async_gateway() => gateway()
async_try_put() => try_put()
async_reserve() => reserve_wait()
async_commit() => release_wait()
- Internal layout of some flow graph nodes has changed; recompilation
is recommended for all binaries that use the flow graph.
Preview Features:
- Added template class streaming_node to the flow graph API. It allows
a flow graph to offload computations to other devices through
streaming or offloading APIs.
- Template class opencl_node reimplemented as a specialization of
streaming_node that works with OpenCL*.
- Added tbb::this_task_arena::isolate() function to isolate execution
of a group of tasks or an algorithm from other tasks submitted
to the scheduler.
Bugs fixed:
- Added a workaround for GCC bug #62258 in std::rethrow_exception()
to prevent possible problems in case of exception propagation.
- Fixed parallel_scan to provide correct result if the initial value
of an accumulator is not the operation identity value.
- Fixed a memory corruption in the memory allocator when it meets
internal limits.
- Fixed the memory allocator on 64-bit platforms to align memory
to 16 bytes by default for all allocations bigger than 8 bytes.
- As a workaround for crashes in the Intel TBB library compiled with
GCC 6, added -flifetime-dse=1 to compilation options on Linux* OS.
- Fixed a race in the flow graph implementation.
Open-source contributions integrated:
- Enabling use of C++11 'override' keyword by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.4 Update 6
TBB_INTERFACE_VERSION == 9006
Changes (w.r.t. Intel TBB 4.4 Update 5):
- For 64-bit platforms, quadrupled the worst-case limit on the amount
of memory the Intel TBB allocator can handle.
Bugs fixed:
- Fixed a memory corruption in the memory allocator when it meets
internal limits.
- Fixed the memory allocator on 64-bit platforms to align memory
to 16 bytes by default for all allocations bigger than 8 bytes.
- Fixed parallel_scan to provide correct result if the initial value
of an accumulator is not the operation identity value.
- As a workaround for crashes in the Intel TBB library compiled with
GCC 6, added -flifetime-dse=1 to compilation options on Linux* OS.
------------------------------------------------------------------------
Intel TBB 4.4 Update 5
TBB_INTERFACE_VERSION == 9005
Changes (w.r.t. Intel TBB 4.4 Update 4):
- Modified graph/fgbzip2 example to remove unnecessary data queuing.
Preview Features:
- Added a Python* module which is able to replace Python's thread pool
class with the implementation based on Intel TBB task scheduler.
Bugs fixed:
- Fixed the implementation of 64-bit tbb::atomic for IA-32 architecture
to work correctly with GCC 5.2 in C++11/14 mode.
- Fixed a possible crash when tasks with affinity (e.g. specified via
affinity_partitioner) are used simultaneously with task priority
changes.
------------------------------------------------------------------------
Intel TBB 4.4 Update 4
TBB_INTERFACE_VERSION == 9004
Changes (w.r.t. Intel TBB 4.4 Update 3):
- Removed a few cases of excessive user data copying in the flow graph.
- Improved robustness of concurrent_bounded_queue::abort() in case of
simultaneous push and pop operations.
Preview Features:
- Added tbb::flow::async_msg, a special message type to support
communications between the flow graph and external asynchronous
activities.
- async_node modified to support use with C++03 compilers.
Bugs fixed:
- Fixed a bug in dynamic memory allocation replacement for Windows* OS.
- Fixed excessive memory consumption on Linux* OS caused by enabling
zero-copy realloc.
- Fixed performance regression on Intel(R) Xeon Phi(tm) coprocessor with
auto_partitioner.
------------------------------------------------------------------------
Intel TBB 4.4 Update 3
TBB_INTERFACE_VERSION == 9003
Changes (w.r.t. Intel TBB 4.4 Update 2):
- Modified parallel_sort to not require a default constructor for values
and to use iter_swap() for value swapping.
- Added support for creating or initializing a task_arena instance that
is connected to the arena currently used by the thread.
- graph/binpack example modified to use multifunction_node.
- For performance analysis, use Intel(R) VTune(TM) Amplifier XE 2015
and higher; older versions are no longer supported.
- Improved support for compilation with disabled RTTI, by omitting its use
in auxiliary code, such as assertions. However some functionality,
particularly the flow graph, does not work if RTTI is disabled.
- The tachyon example for Android* can be built using Android Studio 1.5
and higher with experimental Gradle plugin 0.4.0.
Preview Features:
- Added class opencl_subbufer that allows using OpenCL* sub-buffer
objects with opencl_node.
- Class global_control supports the value of 1 for
max_allowed_parallelism.
Bugs fixed:
- Fixed a race causing "TBB Warning: setaffinity syscall failed" message.
- Fixed a compilation issue on OS X* with Intel(R) C++ Compiler 15.0.
- Fixed a bug in queuing_rw_mutex::downgrade() that could temporarily
block new readers.
- Fixed speculative_spin_rw_mutex to stop using the lazy subscription
technique due to its known flaws.
- Fixed memory leaks in the tool support code.
------------------------------------------------------------------------
Intel TBB 4.4 Update 2
TBB_INTERFACE_VERSION == 9002
Changes (w.r.t. Intel TBB 4.4 Update 1):
- Improved interoperability with Intel(R) OpenMP RTL (libiomp) on Linux:
OpenMP affinity settings do not affect the default number of threads
used in the task scheduler. Intel(R) C++ Compiler 16.0 Update 1
or later is required.
- Added a new flow graph example with different implementations of the
Cholesky Factorization algorithm.
Preview Features:
- Added template class opencl_node to the flow graph API. It allows a
flow graph to offload computations to OpenCL* devices.
- Extended join_node to use type-specified message keys. It simplifies
the API of the node by obtaining message keys via functions
associated with the message type (instead of node ports).
- Added static_partitioner that minimizes overhead of parallel_for and
parallel_reduce for well-balanced workloads.
- Improved template class async_node in the flow graph API to support
user settable concurrency limits.
Bugs fixed:
- Fixed a possible crash in the GUI layer for library examples on Linux.
------------------------------------------------------------------------
Intel TBB 4.4 Update 1
TBB_INTERFACE_VERSION == 9001
Changes (w.r.t. Intel TBB 4.4):
- Added support for Microsoft* Visual Studio* 2015.
- Intel TBB no longer performs dynamic replacement of memory allocation
functions for Microsoft Visual Studio 2005 and earlier versions.
- For GCC 4.7 and higher, the intrinsics-based platform isolation layer
uses __atomic_* built-ins instead of the legacy __sync_* ones.
This change is inspired by a contribution from Mathieu Malaterre.
- Improvements in task_arena:
Several application threads may join a task_arena and execute tasks
simultaneously. The amount of concurrency reserved for application
threads at task_arena construction can be set to any value between
0 and the arena concurrency limit.
- The fractal example was modified to demonstrate class task_arena
and moved to examples/task_arena/fractal.
Bugs fixed:
- Fixed a deadlock during destruction of task_scheduler_init objects
when one of destructors is set to wait for worker threads.
- Added a workaround for a possible crash on OS X* when dynamic memory
allocator replacement (libtbbmalloc_proxy) is used and memory is
released during application startup.
- Usage of mutable functors with task_group::run_and_wait() and
task_arena::enqueue() is disabled. An attempt to pass a functor
which operator()() is not const will produce compilation errors.
- Makefiles and environment scripts now properly recognize GCC 5.0 and
higher.
Open-source contributions integrated:
- Improved performance of parallel_for_each for inputs allowing random
access, by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.4
TBB_INTERFACE_VERSION == 9000
Changes (w.r.t. Intel TBB 4.3 Update 6):
- The following features are now fully supported:
tbb::flow::composite_node;
additional policies of tbb::flow::graph_node::reset().
- Platform abstraction layer for Windows* OS updated to use compiler
intrinsics for most atomic operations.
- The tbb/compat/thread header updated to automatically include
C++11 <thread> where available.
- Fixes and refactoring in the task scheduler and class task_arena.
- Added key_matching policy to tbb::flow::join_node, which removes
the restriction on the type that can be compared-against.
- For tag_matching join_node, tag_value is redefined to be 64 bits
wide on all architectures.
- Expanded the documentation for the flow graph with details about
node semantics and behavior.
- Added dynamic replacement of C11 standard function aligned_alloc()
under Linux* OS.
- Added C++11 move constructors and assignment operators to
tbb::enumerable_thread_specific container.
- Added hashing support for tbb::tbb_thread::id.
- On OS X*, binaries that depend on libstdc++ are not provided anymore.
In the makefiles, libc++ is now used by default; for building with
libstdc++, specify stdlib=libstdc++ in the make command line.
Preview Features:
- Added a new example, graph/fgbzip2, that shows usage of
tbb::flow::async_node.
- Modification to the low-level API for memory pools:
added a function for finding a memory pool by an object allocated
from that pool.
- tbb::memory_pool now does not request memory till the first allocation
from the pool.
Changes affecting backward compatibility:
- Internal layout of flow graph nodes has changed; recompilation is
recommended for all binaries that use the flow graph.
- Resetting a tbb::flow::source_node will immediately activate it,
unless it was created in inactive state.
Bugs fixed:
- Failure at creation of a memory pool will not cause process
termination anymore.
Open-source contributions integrated:
- Supported building TBB with Clang on AArch64 with use of built-in
intrinsics by David A.
------------------------------------------------------------------------
Intel TBB 4.3 Update 6
TBB_INTERFACE_VERSION == 8006
Changes (w.r.t. Intel TBB 4.3 Update 5):
- Supported zero-copy realloc for objects >1MB under Linux* via
mremap system call.
- C++11 move-aware insert and emplace methods have been added to
concurrent_hash_map container.
- install_name is set to @rpath/<library name> on OS X*.
Preview Features:
- Added template class async_node to the flow graph API. It allows a
flow graph to communicate with an external activity managed by
the user or another runtime.
- Improved speed of flow::graph::reset() clearing graph edges.
rf_extract flag has been renamed rf_clear_edges.
- extract() method of graph nodes now takes no arguments.
Bugs fixed:
- concurrent_unordered_{set,map} behaves correctly for degenerate
hashes.
- Fixed a race condition in the memory allocator that may lead to
excessive memory consumption under high multithreading load.
------------------------------------------------------------------------
Intel TBB 4.3 Update 5
TBB_INTERFACE_VERSION == 8005
Changes (w.r.t. Intel TBB 4.3 Update 4):
- Added add_ref_count() method of class tbb::task.
Preview Features:
- Added class global_control for application-wide control of allowed
parallelism and thread stack size.
- memory_pool_allocator now throws the std::bad_alloc exception on
allocation failure.
- Exceptions thrown for by memory pool constructors changed from
std::bad_alloc to std::invalid_argument and std::runtime_error.
Bugs fixed:
- scalable_allocator now throws the std::bad_alloc exception on
allocation failure.
- Fixed a race condition in the memory allocator that may lead to
excessive memory consumption under high multithreading load.
- A new scheduler created right after destruction of the previous one
might be unable to modify the number of worker threads.
Open-source contributions integrated:
- (Added but not enabled) push_front() method of class tbb::task_list
by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.3 Update 4
TBB_INTERFACE_VERSION == 8004
Changes (w.r.t. Intel TBB 4.3 Update 3):
- Added a C++11 variadic constructor for enumerable_thread_specific.
The arguments from this constructor are used to construct
thread-local values.
- Improved exception safety for enumerable_thread_specific.
- Added documentation for tbb::flow::tagged_msg class and
tbb::flow::output_port function.
- Fixed build errors for systems that do not support dynamic linking.
- C++11 move-aware insert and emplace methods have been added to
concurrent unordered containers.
Preview Features:
- Interface-breaking change: typedefs changed for node predecessor and
successor lists, affecting copy_predecessors and copy_successors
methods.
- Added template class composite_node to the flow graph API. It packages
a subgraph to represent it as a first-class flow graph node.
- make_edge and remove_edge now accept multiport nodes as arguments,
automatically using the node port with index 0 for an edge.
Open-source contributions integrated:
- Draft code for enumerable_thread_specific constructor with multiple
arguments (see above) by Adrien Guinet.
- Fix for GCC invocation on IBM* Blue Gene*
by Jeff Hammond and Raf Schietekat.
- Extended testing with smart pointers for Clang & libc++
by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.3 Update 3
TBB_INTERFACE_VERSION == 8003
Changes (w.r.t. Intel TBB 4.3 Update 2):
- Move constructor and assignment operator were added to unique_lock.
Preview Features:
- Time overhead for memory pool destruction was reduced.
Open-source contributions integrated:
- Build error fix for iOS* by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.3 Update 2
TBB_INTERFACE_VERSION == 8002
Changes (w.r.t. Intel TBB 4.3 Update 1):
- Binary files for 64-bit Android* applications were added as part of the
Linux* OS package.
- Exact exception propagation is enabled for Intel C++ Compiler on OS X*.
- concurrent_vector::shrink_to_fit was optimized for types that support
C++11 move semantics.
Bugs fixed:
- Fixed concurrent unordered containers to insert elements much faster
in debug mode.
- Fixed concurrent priority queue to support types that do not have
copy constructors.
- Fixed enumerable_thread_specific to forbid copying from an instance
with a different value type.
Open-source contributions integrated:
- Support for PathScale* EKOPath* Compiler by Erik Lindahl.
------------------------------------------------------------------------
Intel TBB 4.3 Update 1
TBB_INTERFACE_VERSION == 8001
Changes (w.r.t. Intel TBB 4.3):
- The ability to split blocked_ranges in a proportion, used by
affinity_partitioner since version 4.2 Update 4, became a formal
extension of the Range concept.
- More checks for an incorrect address to release added to the debug
version of the memory allocator.
- Different kind of solutions for each TBB example were merged.
Preview Features:
- Task priorities are re-enabled in preview binaries.
Bugs fixed:
- Fixed a duplicate symbol when TBB_PREVIEW_VARIADIC_PARALLEL_INVOKE is
used in multiple compilation units.
- Fixed a crash in __itt_fini_ittlib seen on Ubuntu 14.04.
- Fixed a crash in memory release after dynamic replacement of the
OS X* memory allocator.
- Fixed incorrect indexing of arrays in seismic example.
- Fixed a data race in lazy initialization of task_arena.
Open-source contributions integrated:
- Fix for dumping information about gcc and clang compiler versions
by Misty De Meo.
------------------------------------------------------------------------
Intel TBB 4.3
TBB_INTERFACE_VERSION == 8000
Changes (w.r.t. Intel TBB 4.2 Update 5):
- The following features are now fully supported: flow::indexer_node,
task_arena, speculative_spin_rw_mutex.
- Compatibility with C++11 standard improved for tbb/compat/thread
and tbb::mutex.
- C++11 move constructors have been added to concurrent_queue and
concurrent_bounded_queue.
- C++11 move constructors and assignment operators have been added to
concurrent_vector, concurrent_hash_map, concurrent_priority_queue,
concurrent_unordered_{set,multiset,map,multimap}.
- C++11 move-aware emplace/push/pop methods have been added to
concurrent_vector, concurrent_queue, concurrent_bounded_queue,
concurrent_priority_queue.
- Methods to insert a C++11 initializer list have been added:
concurrent_vector::grow_by(), concurrent_hash_map::insert(),
concurrent_unordered_{set,multiset,map,multimap}::insert().
- Testing for compatibility of containers with some C++11 standard
library types has been added.
- Dynamic replacement of standard memory allocation routines has been
added for OS X*.
- Microsoft* Visual Studio* projects for Intel TBB examples updated
to VS 2010.
- For open-source packages, debugging information (line numbers) in
precompiled binaries now matches the source code.
- Debug information was added to release builds for OS X*, Solaris*,
FreeBSD* operating systems and MinGW*.
- Various improvements in documentation, debug diagnostics and examples.
Preview Features:
- Additional actions on reset of graphs, and extraction of individual
nodes from a graph (TBB_PREVIEW_FLOW_GRAPH_FEATURES).
- Support for an arbitrary number of arguments in parallel_invoke
(TBB_PREVIEW_VARIADIC_PARALLEL_INVOKE).
Changes affecting backward compatibility:
- For compatibility with C++11 standard, copy and move constructors and
assignment operators are disabled for all mutex classes. To allow
the old behavior, use TBB_DEPRECATED_MUTEX_COPYING macro.
- flow::sequencer_node rejects messages with repeating sequence numbers.
- Changed internal interface between tbbmalloc and tbbmalloc_proxy.
- Following deprecated functionality has been removed:
old debugging macros TBB_DO_ASSERT & TBB_DO_THREADING_TOOLS;
no-op depth-related methods in class task;
tbb::deprecated::concurrent_queue;
deprecated variants of concurrent_vector methods.
- register_successor() and remove_successor() are deprecated as methods
to add and remove edges in flow::graph; use make_edge() and
remove_edge() instead.
Bugs fixed:
- Fixed incorrect scalable_msize() implementation for aligned objects.
- Flow graph buffering nodes now destroy their copy of forwarded items.
- Multiple fixes in task_arena implementation, including for:
inconsistent task scheduler state inside executed functions;
incorrect floating-point settings and exception propagation;
possible stalls in concurrent invocations of execute().
- Fixed floating-point settings propagation when the same instance of
task_group_context is used in different arenas.
- Fixed compilation error in pipeline.h with Intel Compiler on OS X*.
- Added missed headers for individual components to tbb.h.
Open-source contributions integrated:
- Range interface addition to parallel_do, parallel_for_each and
parallel_sort by Stephan Dollberg.
- Variadic template implementation of parallel_invoke
by Kizza George Mbidde (see Preview Features).
- Improvement in Seismic example for MacBook Pro* with Retina* display
by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.2 Update 5
TBB_INTERFACE_VERSION == 7005
Changes (w.r.t. Intel TBB 4.2 Update 4):
- The second template argument of class aligned_space<T,N> now is set
to 1 by default.
Preview Features:
- Better support for exception safety, task priorities and floating
point settings in class task_arena.
- task_arena::current_slot() has been renamed to
task_arena::current_thread_index().
Bugs fixed:
- Task priority change possibly ignored by a worker thread entering
a nested parallel construct.
- Memory leaks inside the task scheduler when running on
Intel(R) Xeon Phi(tm) coprocessor.
Open-source contributions integrated:
- Improved detection of X Window support for Intel TBB examples
and other feedback by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.2 Update 4
TBB_INTERFACE_VERSION == 7004
Changes (w.r.t. Intel TBB 4.2 Update 3):
- Added possibility to specify floating-point settings at invocation
of most parallel algorithms (including flow::graph) via
task_group_context.
- Added dynamic replacement of malloc_usable_size() under
Linux*/Android* and dlmalloc_usable_size() under Android*.
- Added new methods to concurrent_vector:
grow_by() that appends a sequence between two given iterators;
grow_to_at_least() that initializes new elements with a given value.
- Improved affinity_partitioner for better performance on balanced
workloads.
- Improvements in the task scheduler, including better scalability
when threads search for a task arena, and better diagnostics.
- Improved allocation performance for workloads that do intensive
allocation/releasing of same-size objects larger than ~8KB from
multiple threads.
- Exception support is enabled by default for 32-bit MinGW compilers.
- The tachyon example for Android* can be built for all targets
supported by the installed NDK.
- Added Windows Store* version of the tachyon example.
- GettingStarted/sub_string_finder example ported to offload execution
on Windows* for Intel(R) Many Integrated Core Architecture.
Preview Features:
- Removed task_scheduler_observer::on_scheduler_leaving() callback.
- Added task_scheduler_observer::may_sleep() callback.
- The CPF or_node has been renamed indexer_node. The input to
indexer_node is now a list of types. The output of indexer_node is
a tagged_msg type composed of a tag and a value. For indexer_node,
the tag is a size_t.
Bugs fixed:
- Fixed data races in preview extensions of task_scheduler_observer.
- Added noexcept(false) for destructor of task_group_base to avoid
crash on cancellation of structured task group in C++11.
Open-source contributions integrated:
- Improved concurrency detection for BG/Q, and other improvements
by Raf Schietekat.
- Fix for crashes in enumerable_thread_specific in case if a contained
object is too big to be constructed on the stack by Adrien Guinet.
------------------------------------------------------------------------
Intel TBB 4.2 Update 3
TBB_INTERFACE_VERSION == 7003
Changes (w.r.t. Intel TBB 4.2 Update 2):
- Added support for Microsoft* Visual Studio* 2013.
- Improved Microsoft* PPL-compatible form of parallel_for for better
support of auto-vectorization.
- Added a new example for cancellation and reset in the flow graph:
Kohonen self-organizing map (examples/graph/som).
- Various improvements in source code, tests, and makefiles.
Bugs fixed:
- Added dynamic replacement of _aligned_msize() previously missed.
- Fixed task_group::run_and_wait() to throw invalid_multiple_scheduling
exception if the specified task handle is already scheduled.
Open-source contributions integrated:
- A fix for ARM* processors by Steve Capper.
- Improvements in std::swap calls by Robert Maynard.
------------------------------------------------------------------------
Intel TBB 4.2 Update 2
TBB_INTERFACE_VERSION == 7002
Changes (w.r.t. Intel TBB 4.2 Update 1):
- Enable C++11 features for Microsoft* Visual Studio* 2013 Preview.
- Added a test for compatibility of TBB containers with C++11
range-based for loop.
Changes affecting backward compatibility:
- Internal layout changed for class tbb::flow::limiter_node.
Preview Features:
- Added speculative_spin_rw_mutex, a read-write lock class which uses
Intel(R) Transactional Synchronization Extensions.
Bugs fixed:
- When building for Intel(R) Xeon Phi(tm) coprocessor, TBB programs
no longer require explicit linking with librt and libpthread.
Open-source contributions integrated:
- Fixes for ARM* processors by Steve Capper, Leif Lindholm
and Steven Noonan.
- Support for Clang on Linux by Raf Schietekat.
- Typo correction in scheduler.cpp by Julien Schueller.
------------------------------------------------------------------------
Intel TBB 4.2 Update 1
TBB_INTERFACE_VERSION == 7001
Changes (w.r.t. Intel TBB 4.2):
- Added project files for Microsoft* Visual Studio* 2010.
- Initial support of Microsoft* Visual Studio* 2013 Preview.
- Enable C++11 features available in Intel(R) C++ Compiler 14.0.
- scalable_allocation_mode(TBBMALLOC_SET_SOFT_HEAP_LIMIT, <size>) can be
used to urge releasing memory from tbbmalloc internal buffers when
the given limit is exceeded.
Preview Features:
- Class task_arena no longer requires linking with a preview library,
though still remains a community preview feature.
- The method task_arena::wait_until_empty() is removed.
- The method task_arena::current_slot() now returns -1 if
the task scheduler is not initialized in the thread.
Changes affecting backward compatibility:
- Because of changes in internal layout of graph nodes, the namespace
interface number of flow::graph has been incremented from 6 to 7.
Bugs fixed:
- Fixed a race in lazy initialization of task_arena.
- Fixed flow::graph::reset() to prevent situations where tasks would be
spawned in the process of resetting the graph to its initial state.
- Fixed decrement bug in limiter_node.
- Fixed a race in arc deletion in the flow graph.
Open-source contributions integrated:
- Improved support for IBM* Blue Gene* by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.2
TBB_INTERFACE_VERSION == 7000
Changes (w.r.t. Intel TBB 4.1 Update 4):
- Added speculative_spin_mutex, which uses Intel(R) Transactional
Synchronization Extensions when they are supported by hardware.
- Binary files linked with libc++ (the C++ standard library in Clang)
were added on OS X*.
- For OS X* exact exception propagation is supported with Clang;
it requires use of libc++ and corresponding Intel TBB binaries.
- Support for C++11 initializer lists in constructor and assigment
has been added to concurrent_hash_map, concurrent_unordered_set,
concurrent_unordered_multiset, concurrent_unordered_map,
concurrent_unordered_multimap.
- The memory allocator may now clean its per-thread memory caches
when it cannot get more memory.
- Added the scalable_allocation_command() function for on-demand
cleaning of internal memory caches.
- Reduced the time overhead for freeing memory objects smaller than ~8K.
- Simplified linking with the debug library for applications that use
Intel TBB in code offloaded to Intel(R) Xeon Phi(tm) coprocessors.
See an example in
examples/GettingStarted/sub_string_finder/Makefile.
- Various improvements in source code, scripts and makefiles.
Changes affecting backward compatibility:
- tbb::flow::graph has been modified to spawn its tasks;
the old behaviour (task enqueuing) is deprecated. This change may
impact applications that expected a flow graph to make progress
without calling wait_for_all(), which is no longer guaranteed. See
the documentation for more details.
- Changed the return values of the scalable_allocation_mode() function.
Bugs fixed:
- Fixed a leak of parallel_reduce body objects when execution is
cancelled or an exception is thrown, as suggested by Darcy Harrison.
- Fixed a race in the task scheduler which can lower the effective
priority despite the existence of higher priority tasks.
- On Linux an error during destruction of the internal thread local
storage no longer results in an exception.
Open-source contributions integrated:
- Fixed task_group_context state propagation to unrelated context trees
by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.1 Update 4
TBB_INTERFACE_VERSION == 6105
Changes (w.r.t. Intel TBB 4.1 Update 3):
- Use /volatile:iso option with VS 2012 to disable extended
semantics for volatile variables.
- Various improvements in affinity_partitioner, scheduler,
tests, examples, makefiles.
- Concurrent_priority_queue class now supports initialization/assignment
via C++11 initializer list feature (std::initializer_list<T>).
Bugs fixed:
- Fixed more possible stalls in concurrent invocations of
task_arena::execute(), especially waiting for enqueued tasks.
- Fixed requested number of workers for task_arena(P,0).
- Fixed interoperability with Intel(R) VTune(TM) Amplifier XE in
case of using task_arena::enqueue() from a terminating thread.
Open-source contributions integrated:
- Type fixes, cleanups, and code beautification by Raf Schietekat.
- Improvements in atomic operations for big endian platforms
by Raf Schietekat.
------------------------------------------------------------------------
Intel TBB 4.1 Update 3
TBB_INTERFACE_VERSION == 6103
Changes (w.r.t. Intel TBB 4.1 Update 2):
- Binary files for Android* applications were added to the Linux* OS
package.
- Binary files for Windows Store* applications were added to the
Windows* OS package.
- Exact exception propagation (exception_ptr) support on Linux OS is
now turned on by default for GCC 4.4 and higher.
- Stopped implicit use of large memory pages by tbbmalloc (Linux-only).
Now use of large pages must be explicitly enabled with
scalable_allocation_mode() function or TBB_MALLOC_USE_HUGE_PAGES
environment variable.
Community Preview Features:
- Extended class task_arena constructor and method initialize() to
allow some concurrency to be reserved strictly for application
threads.
- New methods terminate() and is_active() were added to class
task_arena.
Bugs fixed:
- Fixed initialization of hashing helper constant in the hash
containers.
- Fixed possible stalls in concurrent invocations of
task_arena::execute() when no worker thread is available to make
progress.
- Fixed incorrect calculation of hardware concurrency in the presence
of inactive processor groups, particularly on systems running
Windows* 8 and Windows* Server 2012.
Open-source contributions integrated:
- The fix for the GUI examples on OS X* systems by Raf Schietekat.
- Moved some power-of-2 calculations to functions to improve readability
by Raf Schietekat.
- C++11/Clang support improvements by arcata.
- ARM* platform isolation layer by Steve Capper, Leif Lindholm, Leo Lara
(ARM).
------------------------------------------------------------------------
Intel TBB 4.1 Update 2
TBB_INTERFACE_VERSION == 6102
Changes (w.r.t. Intel TBB 4.1 Update 1):
- Objects up to 128 MB are now cached by the tbbmalloc. Previously
the threshold was 8MB. Objects larger than 128 MB are still
processed by direct OS calls.
- concurrent_unordered_multiset and concurrent_unordered_multimap
have been added, based on Microsoft* PPL prototype.
- Ability to value-initialize a tbb::atomic<T> variable on construction
in C++11, with const expressions properly supported.
Community Preview Features:
- Added a possibility to wait until all worker threads terminate.
This is necessary before calling fork() from an application.
Bugs fixed:
- Fixed data race in tbbmalloc that might lead to memory leaks
for large object allocations.
- Fixed task_arena::enqueue() to use task_group_context of target arena.
- Improved implementation of 64 bit atomics on ia32.