| 1 |
737 |
jeremybenn |
\input texinfo @c -*-texinfo-*-
|
| 2 |
|
|
|
| 3 |
|
|
@c %**start of header
|
| 4 |
|
|
@setfilename libitm.info
|
| 5 |
|
|
@settitle GNU libitm
|
| 6 |
|
|
@c %**end of header
|
| 7 |
|
|
|
| 8 |
|
|
|
| 9 |
|
|
@copying
|
| 10 |
|
|
Copyright @copyright{} 2011 Free Software Foundation, Inc.
|
| 11 |
|
|
|
| 12 |
|
|
Permission is granted to copy, distribute and/or modify this document
|
| 13 |
|
|
under the terms of the GNU Free Documentation License, Version 1.2 or
|
| 14 |
|
|
any later version published by the Free Software Foundation; with no
|
| 15 |
|
|
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
|
| 16 |
|
|
A copy of the license is included in the section entitled ``GNU
|
| 17 |
|
|
Free Documentation License''.
|
| 18 |
|
|
@end copying
|
| 19 |
|
|
|
| 20 |
|
|
@ifinfo
|
| 21 |
|
|
@dircategory GNU Libraries
|
| 22 |
|
|
@direntry
|
| 23 |
|
|
* libitm: (libitm). GNU Transactional Memory Library
|
| 24 |
|
|
@end direntry
|
| 25 |
|
|
|
| 26 |
|
|
This manual documents the GNU Transactional Memory Library.
|
| 27 |
|
|
|
| 28 |
|
|
@insertcopying
|
| 29 |
|
|
@end ifinfo
|
| 30 |
|
|
|
| 31 |
|
|
|
| 32 |
|
|
@setchapternewpage odd
|
| 33 |
|
|
|
| 34 |
|
|
@titlepage
|
| 35 |
|
|
@title The GNU Transactional Memory Library
|
| 36 |
|
|
@page
|
| 37 |
|
|
@vskip 0pt plus 1filll
|
| 38 |
|
|
@comment For the @value{version-GCC} Version*
|
| 39 |
|
|
@sp 1
|
| 40 |
|
|
@insertcopying
|
| 41 |
|
|
@end titlepage
|
| 42 |
|
|
|
| 43 |
|
|
@summarycontents
|
| 44 |
|
|
@contents
|
| 45 |
|
|
@page
|
| 46 |
|
|
|
| 47 |
|
|
|
| 48 |
|
|
@node Top
|
| 49 |
|
|
@top Introduction
|
| 50 |
|
|
@cindex Introduction
|
| 51 |
|
|
|
| 52 |
|
|
This manual documents the usage and internals of libitm, the GNU Transactional
|
| 53 |
|
|
Memory Library. It provides transaction support for accesses to a process'
|
| 54 |
|
|
memory, enabling easy-to-use synchronization of accesses to shared memory by
|
| 55 |
|
|
several threads.
|
| 56 |
|
|
|
| 57 |
|
|
|
| 58 |
|
|
@comment
|
| 59 |
|
|
@comment When you add a new menu item, please keep the right hand
|
| 60 |
|
|
@comment aligned to the same column. Do not use tabs. This provides
|
| 61 |
|
|
@comment better formatting.
|
| 62 |
|
|
@comment
|
| 63 |
|
|
@menu
|
| 64 |
|
|
* Enabling libitm:: How to enable libitm for your applications.
|
| 65 |
|
|
* C/C++ Language Constructs for TM::
|
| 66 |
|
|
Notes on the language-level interface supported
|
| 67 |
|
|
by gcc.
|
| 68 |
|
|
* The libitm ABI:: Notes on the external ABI provided by libitm.
|
| 69 |
|
|
* Internals:: Notes on libitm's internal synchronization.
|
| 70 |
|
|
* GNU Free Documentation License::
|
| 71 |
|
|
How you can copy and share this manual.
|
| 72 |
|
|
* Index:: Index of this documentation.
|
| 73 |
|
|
@end menu
|
| 74 |
|
|
|
| 75 |
|
|
|
| 76 |
|
|
@c ---------------------------------------------------------------------
|
| 77 |
|
|
@c Enabling libitm
|
| 78 |
|
|
@c ---------------------------------------------------------------------
|
| 79 |
|
|
|
| 80 |
|
|
@node Enabling libitm
|
| 81 |
|
|
@chapter Enabling libitm
|
| 82 |
|
|
|
| 83 |
|
|
To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm}
|
| 84 |
|
|
must be specified. This enables TM language-level constructs such as
|
| 85 |
|
|
transaction statements (@code{__transaction}, @pxref{C/C++ Language
|
| 86 |
|
|
Constructs for TM} for details).
|
| 87 |
|
|
|
| 88 |
|
|
@c ---------------------------------------------------------------------
|
| 89 |
|
|
@c C/C++ Language Constructs for TM
|
| 90 |
|
|
@c ---------------------------------------------------------------------
|
| 91 |
|
|
|
| 92 |
|
|
@node C/C++ Language Constructs for TM
|
| 93 |
|
|
@chapter C/C++ Language Constructs for TM
|
| 94 |
|
|
|
| 95 |
|
|
TODO: link to the C++ TM spec. a few examples. how gcc's support differs.
|
| 96 |
|
|
|
| 97 |
|
|
@c ---------------------------------------------------------------------
|
| 98 |
|
|
@c The libitm ABI
|
| 99 |
|
|
@c ---------------------------------------------------------------------
|
| 100 |
|
|
|
| 101 |
|
|
@node The libitm ABI
|
| 102 |
|
|
@chapter The libitm ABI
|
| 103 |
|
|
|
| 104 |
|
|
The ABI provided by libitm is basically equal to the Linux variant of Intel's
|
| 105 |
|
|
current TM ABI specification document (Revision 1.1, May 6 2009) but with the
|
| 106 |
|
|
differences listed in this chapter. It would be good if these changes would
|
| 107 |
|
|
eventually be merged into a future version of this specification. To ease
|
| 108 |
|
|
look-up, the following subsections mirror the structure of this specification.
|
| 109 |
|
|
|
| 110 |
|
|
@section [No changes] Objectives
|
| 111 |
|
|
@section [No changes] Non-objectives
|
| 112 |
|
|
|
| 113 |
|
|
@section Library design principles
|
| 114 |
|
|
@subsection [No changes] Calling conventions
|
| 115 |
|
|
@subsection [No changes] TM library algorithms
|
| 116 |
|
|
@subsection [No changes] Optimized load and store routines
|
| 117 |
|
|
@subsection [No changes] Aligned load and store routines
|
| 118 |
|
|
|
| 119 |
|
|
@subsection Data logging functions
|
| 120 |
|
|
|
| 121 |
|
|
The memory locations accessed with transactional loads and stores and the
|
| 122 |
|
|
memory locations whose values are logged must not overlap. This required
|
| 123 |
|
|
separation only extends to the scope of the execution of one transaction
|
| 124 |
|
|
including all the executions of all nested transactions.
|
| 125 |
|
|
|
| 126 |
|
|
The compiler must be consistent (within the scope of a single transaction)
|
| 127 |
|
|
about which memory locations are shared and which are not shared with other
|
| 128 |
|
|
threads (i.e., data must be accessed either transactionally or
|
| 129 |
|
|
nontransactionally). Otherwise, non-write-through TM algorithms would not work.
|
| 130 |
|
|
|
| 131 |
|
|
@subsection [No changes] Scatter/gather calls
|
| 132 |
|
|
@subsection [No changes] Serial and irrevocable mode
|
| 133 |
|
|
@subsection [No changes] Transaction descriptor
|
| 134 |
|
|
@subsection Store allocation
|
| 135 |
|
|
|
| 136 |
|
|
There is no @code{getTransaction} function.
|
| 137 |
|
|
|
| 138 |
|
|
@subsection [No changes] Naming conventions
|
| 139 |
|
|
|
| 140 |
|
|
@subsection Function pointer encryption
|
| 141 |
|
|
|
| 142 |
|
|
Currently, this is not implemented.
|
| 143 |
|
|
|
| 144 |
|
|
|
| 145 |
|
|
@section Types and macros list
|
| 146 |
|
|
|
| 147 |
|
|
@code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a
|
| 148 |
|
|
transaction}.
|
| 149 |
|
|
@code{_ITM_srcLocation} is not used.
|
| 150 |
|
|
|
| 151 |
|
|
|
| 152 |
|
|
@section Function list
|
| 153 |
|
|
|
| 154 |
|
|
@subsection Initialization and finalization functions
|
| 155 |
|
|
These functions are not part of the ABI.
|
| 156 |
|
|
|
| 157 |
|
|
@subsection [No changes] Version checking
|
| 158 |
|
|
@subsection [No changes] Error reporting
|
| 159 |
|
|
@subsection [No changes] inTransaction call
|
| 160 |
|
|
|
| 161 |
|
|
@subsection State manipulation functions
|
| 162 |
|
|
There is no @code{getTransaction} function. Transaction identifiers for
|
| 163 |
|
|
nested transactions will be ordered but not necessarily sequential (i.e., for
|
| 164 |
|
|
a nested transaction's identifier @var{IN} and its enclosing transaction's
|
| 165 |
|
|
identifier @var{IE}, it is guaranteed that @math{IN >= IE}).
|
| 166 |
|
|
|
| 167 |
|
|
@subsection [No changes] Source locations
|
| 168 |
|
|
|
| 169 |
|
|
@subsection Starting a transaction
|
| 170 |
|
|
|
| 171 |
|
|
@subsubsection Transaction code properties
|
| 172 |
|
|
|
| 173 |
|
|
@anchor{txn-code-properties}
|
| 174 |
|
|
The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}.
|
| 175 |
|
|
Iff it is set, vector register save/restore is not necessary for any target
|
| 176 |
|
|
machine.
|
| 177 |
|
|
|
| 178 |
|
|
The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating
|
| 179 |
|
|
point register save/restore is not necessary for any target machine.
|
| 180 |
|
|
|
| 181 |
|
|
@code{undoLogCode} is not supported and a fatal runtime error will be raised
|
| 182 |
|
|
if this bit is set. It is not properly defined in the ABI why barriers
|
| 183 |
|
|
other than undo logging are not present; Are they not necessary (e.g., a
|
| 184 |
|
|
transaction operating purely on thread-local data) or have they been omitted by
|
| 185 |
|
|
the compiler because it thinks that some kind of global synchronization
|
| 186 |
|
|
(e.g., serial mode) might perform better? The specification suggests that the
|
| 187 |
|
|
latter might be the case, but the former seems to be more useful.
|
| 188 |
|
|
|
| 189 |
|
|
The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic
|
| 190 |
|
|
scope?
|
| 191 |
|
|
|
| 192 |
|
|
@code{hasNoRetry} is not supported. If this bit is not set, but
|
| 193 |
|
|
@code{hasNoAbort} is set, the library can assume that transaction
|
| 194 |
|
|
rollback will not be requested.
|
| 195 |
|
|
|
| 196 |
|
|
It would be useful if the absence of externally-triggered rollbacks would be
|
| 197 |
|
|
reported for the dynamic scope as well, not just for the lexical scope
|
| 198 |
|
|
(@code{hasNoAbort}). Without this, a library cannot exploit this together
|
| 199 |
|
|
with flat nesting.
|
| 200 |
|
|
|
| 201 |
|
|
@code{exceptionBlock} is not supported because exception blocks are not used.
|
| 202 |
|
|
|
| 203 |
|
|
@subsubsection [No changes] Windows exception state
|
| 204 |
|
|
@subsubsection [No changes] Other machine state
|
| 205 |
|
|
|
| 206 |
|
|
@subsubsection [No changes] Results from beginTransaction
|
| 207 |
|
|
|
| 208 |
|
|
@subsection Aborting a transaction
|
| 209 |
|
|
|
| 210 |
|
|
@code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction}
|
| 211 |
|
|
is supported but the abort reasons @code{exceptionBlockAbort},
|
| 212 |
|
|
@code{TMConflict}, and @code{userRetry} are not supported. There are no
|
| 213 |
|
|
exception blocks in general, so the related cases also do not have to be
|
| 214 |
|
|
considered. To encode @code{__transaction_cancel [[outer]]}, compilers must
|
| 215 |
|
|
set the new @code{outerAbort} bit (@code{0x10}) additionally to the
|
| 216 |
|
|
@code{userAbort} bit in the abort reason.
|
| 217 |
|
|
|
| 218 |
|
|
@subsection Committing a transaction
|
| 219 |
|
|
|
| 220 |
|
|
The exception handling (EH) scheme is different. The Intel ABI requires the
|
| 221 |
|
|
@code{_ITM_tryCommitTransaction} function that will return even when the
|
| 222 |
|
|
commit failed and will have to be matched with calls to either
|
| 223 |
|
|
@code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast,
|
| 224 |
|
|
gcc relies on transactional wrappers for the functions of the Exception
|
| 225 |
|
|
Handling ABI and on one additional commit function (shown below). This allows
|
| 226 |
|
|
the TM to keep track of EH internally and thus it does not have to embed the
|
| 227 |
|
|
cleanup of EH state into the existing EH code in the program.
|
| 228 |
|
|
@code{_ITM_tryCommitTransaction} is not supported.
|
| 229 |
|
|
@code{_ITM_commitTransactionToId} is also not supported because the
|
| 230 |
|
|
propagation of thrown exceptions will not bypass commits of nested
|
| 231 |
|
|
transactions.
|
| 232 |
|
|
|
| 233 |
|
|
@example
|
| 234 |
|
|
void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM;
|
| 235 |
|
|
void *_ITM_cxa_allocate_exception (size_t);
|
| 236 |
|
|
void _ITM_cxa_throw (void *obj, void *tinfo, void *dest);
|
| 237 |
|
|
void *_ITM_cxa_begin_catch (void *exc_ptr);
|
| 238 |
|
|
void _ITM_cxa_end_catch (void);
|
| 239 |
|
|
@end example
|
| 240 |
|
|
|
| 241 |
|
|
@code{_ITM_commitTransactionEH} must be called to commit a transaction if an
|
| 242 |
|
|
exception could be in flight at this position in the code. @code{exc_ptr} is
|
| 243 |
|
|
the current exception or zero if there is no current exception.
|
| 244 |
|
|
The @code{_ITM_cxa...} functions are transactional wrappers for the respective
|
| 245 |
|
|
@code{__cxa...} functions and must be called instead of these in transactional
|
| 246 |
|
|
code.
|
| 247 |
|
|
|
| 248 |
|
|
To support this EH scheme, libstdc++ needs to provide one additional function
|
| 249 |
|
|
(@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception
|
| 250 |
|
|
handling state while rolling back a transaction:
|
| 251 |
|
|
|
| 252 |
|
|
@example
|
| 253 |
|
|
void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc,
|
| 254 |
|
|
unsigned int caught_count);
|
| 255 |
|
|
@end example
|
| 256 |
|
|
|
| 257 |
|
|
@code{unthrown_obj} is non-null if the program called
|
| 258 |
|
|
@code{__cxa_allocate_exception} for this exception but did not yet called
|
| 259 |
|
|
@code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is
|
| 260 |
|
|
currently processing a cleanup along an exception path but has not caught this
|
| 261 |
|
|
exception yet. @code{caught_count} is the nesting depth of
|
| 262 |
|
|
@code{__cxa_begin_catch} within the transaction (which can be counted by the TM
|
| 263 |
|
|
using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch});
|
| 264 |
|
|
@code{__cxa_tm_cleanup} then performs rollback by essentially performing
|
| 265 |
|
|
@code{__cxa_end_catch} that many times.
|
| 266 |
|
|
|
| 267 |
|
|
|
| 268 |
|
|
|
| 269 |
|
|
@subsection Exception handling support
|
| 270 |
|
|
|
| 271 |
|
|
Currently, there is no support for functionality like
|
| 272 |
|
|
@code{__transaction_cancel throw} as described in the C++ TM specification.
|
| 273 |
|
|
Supporting this should be possible with the EH scheme explained previously
|
| 274 |
|
|
because via the transactional wrappers for the EH ABI, the TM is able to
|
| 275 |
|
|
observe and intercept EH.
|
| 276 |
|
|
|
| 277 |
|
|
|
| 278 |
|
|
@subsection [No changes] Transition to serial--irrevocable mode
|
| 279 |
|
|
@subsection [No changes] Data transfer functions
|
| 280 |
|
|
@subsection [No changes] Transactional memory copies
|
| 281 |
|
|
|
| 282 |
|
|
@subsection Transactional versions of memmove
|
| 283 |
|
|
|
| 284 |
|
|
If either the source or destination memory region is to be accessed
|
| 285 |
|
|
nontransactionally, then source and destination regions must not be
|
| 286 |
|
|
overlapping. The respective @code{_ITM_memmove} functions are still
|
| 287 |
|
|
available but a fatal runtime error will be raised if such regions do overlap.
|
| 288 |
|
|
To support this functionality, the ABI would have to specify how the
|
| 289 |
|
|
intersection of the regions has to be accessed (i.e., transactionally or
|
| 290 |
|
|
nontransactionally).
|
| 291 |
|
|
|
| 292 |
|
|
@subsection [No changes] Transactional versions of memset
|
| 293 |
|
|
@subsection [No changes] Logging functions
|
| 294 |
|
|
|
| 295 |
|
|
@subsection User-registered commit and undo actions
|
| 296 |
|
|
|
| 297 |
|
|
Commit actions will get executed in the same order in which the respective
|
| 298 |
|
|
calls to @code{_ITM_addUserCommitAction} happened. Only
|
| 299 |
|
|
@code{_ITM_noTransactionId} is allowed as value for the
|
| 300 |
|
|
@code{resumingTransactionId} argument. Commit actions get executed after
|
| 301 |
|
|
privatization safety has been ensured.
|
| 302 |
|
|
|
| 303 |
|
|
Undo actions will get executed in reverse order compared to the order in which
|
| 304 |
|
|
the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of
|
| 305 |
|
|
undo actions w.r.t. the roll-back of other actions (e.g., data transfers or
|
| 306 |
|
|
memory allocations) is undefined.
|
| 307 |
|
|
|
| 308 |
|
|
@code{_ITM_getThreadnum} is not supported currently because its only purpose
|
| 309 |
|
|
is to provide a thread ID that matches some assumed performance tuning output,
|
| 310 |
|
|
but this output is not part of the ABI nor further defined by it.
|
| 311 |
|
|
|
| 312 |
|
|
@code{_ITM_dropReferences} is not supported currently because its semantics and
|
| 313 |
|
|
the intention behind it is not entirely clear. The
|
| 314 |
|
|
specification suggests that this function is necessary because of certain
|
| 315 |
|
|
orderings of data transfer undos and the releasing of memory regions (i.e.,
|
| 316 |
|
|
privatization). However, this ordering is never defined, nor is the ordering of
|
| 317 |
|
|
dropping references w.r.t. other events.
|
| 318 |
|
|
|
| 319 |
|
|
@subsection [New] Transactional indirect calls
|
| 320 |
|
|
|
| 321 |
|
|
Indirect calls (i.e., calls through a function pointer) within transactions
|
| 322 |
|
|
should execute the transactional clone of the original function (i.e., a clone
|
| 323 |
|
|
of the original that has been fully instrumented to use the TM runtime), if
|
| 324 |
|
|
such a clone is available. The runtime provides two functions to
|
| 325 |
|
|
register/deregister clone tables:
|
| 326 |
|
|
|
| 327 |
|
|
@example
|
| 328 |
|
|
struct clone_entry
|
| 329 |
|
|
@{
|
| 330 |
|
|
void *orig, *clone;
|
| 331 |
|
|
@};
|
| 332 |
|
|
|
| 333 |
|
|
void _ITM_registerTMCloneTable (clone_entry *table, size_t entries);
|
| 334 |
|
|
void _ITM_deregisterTMCloneTable (clone_entry *table);
|
| 335 |
|
|
@end example
|
| 336 |
|
|
|
| 337 |
|
|
Registered tables must be writable by the TM runtime, and must be live
|
| 338 |
|
|
throughout the life-time of the TM runtime.
|
| 339 |
|
|
|
| 340 |
|
|
@strong{TODO} The intention was always to drop the registration functions
|
| 341 |
|
|
entirely, and create a new ELF Phdr describing the linker-sorted table. Much
|
| 342 |
|
|
like what currently happens for @code{PT_GNU_EH_FRAME}.
|
| 343 |
|
|
This work kept getting bogged down in how to represent the @var{N} different
|
| 344 |
|
|
code generation variants. We clearly needed at least two---SW and HW
|
| 345 |
|
|
transactional clones---but there was always a suggestion of more variants for
|
| 346 |
|
|
different TM assumptions/invariants.
|
| 347 |
|
|
|
| 348 |
|
|
The compiler can then use two TM runtime functions to perform indirect calls in
|
| 349 |
|
|
transactions:
|
| 350 |
|
|
@example
|
| 351 |
|
|
void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM;
|
| 352 |
|
|
void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM;
|
| 353 |
|
|
@end example
|
| 354 |
|
|
|
| 355 |
|
|
If there is a registered clone for supplied function, both will return a
|
| 356 |
|
|
pointer to the clone. If not, the first runtime function will attempt to switch
|
| 357 |
|
|
to serial--irrevocable mode and return the original pointer, whereas the second
|
| 358 |
|
|
will raise a fatal runtime error.
|
| 359 |
|
|
|
| 360 |
|
|
@subsection [New] Transactional dynamic memory management
|
| 361 |
|
|
|
| 362 |
|
|
@example
|
| 363 |
|
|
void *_ITM_malloc (size_t)
|
| 364 |
|
|
__attribute__((__malloc__)) ITM_PURE;
|
| 365 |
|
|
void *_ITM_calloc (size_t, size_t)
|
| 366 |
|
|
__attribute__((__malloc__)) ITM_PURE;
|
| 367 |
|
|
void _ITM_free (void *) ITM_PURE;
|
| 368 |
|
|
@end example
|
| 369 |
|
|
|
| 370 |
|
|
These functions are essentially transactional wrappers for @code{malloc},
|
| 371 |
|
|
@code{calloc}, and @code{free}. Within transactions, the compiler should
|
| 372 |
|
|
replace calls to the original functions with calls to the wrapper functions.
|
| 373 |
|
|
|
| 374 |
|
|
|
| 375 |
|
|
@section [No changes] Future Enhancements to the ABI
|
| 376 |
|
|
|
| 377 |
|
|
@section Sample code
|
| 378 |
|
|
|
| 379 |
|
|
The code examples might not be correct w.r.t. the current version of the ABI,
|
| 380 |
|
|
especially everything related to exception handling.
|
| 381 |
|
|
|
| 382 |
|
|
|
| 383 |
|
|
@section [New] Memory model
|
| 384 |
|
|
|
| 385 |
|
|
The ABI should define a memory model and the ordering that is guaranteed for
|
| 386 |
|
|
data transfers and commit/undo actions, or at least refer to another memory
|
| 387 |
|
|
model that needs to be preserved. Without that, the compiler cannot ensure the
|
| 388 |
|
|
memory model specified on the level of the programming language (e.g., by the
|
| 389 |
|
|
C++ TM specification).
|
| 390 |
|
|
|
| 391 |
|
|
For example, if a transactional load is ordered before another load/store, then
|
| 392 |
|
|
the TM runtime must also ensure this ordering when accessing shared state. If
|
| 393 |
|
|
not, this might break the kind of publication safety used in the C++ TM
|
| 394 |
|
|
specification. Likewise, the TM runtime must ensure privatization safety.
|
| 395 |
|
|
|
| 396 |
|
|
|
| 397 |
|
|
|
| 398 |
|
|
@c ---------------------------------------------------------------------
|
| 399 |
|
|
@c Internals
|
| 400 |
|
|
@c ---------------------------------------------------------------------
|
| 401 |
|
|
|
| 402 |
|
|
@node Internals
|
| 403 |
|
|
@chapter Internals
|
| 404 |
|
|
|
| 405 |
|
|
@section TM methods and method groups
|
| 406 |
|
|
|
| 407 |
|
|
libitm supports several ways of synchronizing transactions with each other.
|
| 408 |
|
|
These TM methods (or TM algorithms) are implemented in the form of
|
| 409 |
|
|
subclasses of @code{abi_dispatch}, which provide methods for
|
| 410 |
|
|
transactional loads and stores as well as callbacks for rollback and commit.
|
| 411 |
|
|
All methods that are compatible with each other (i.e., that let concurrently
|
| 412 |
|
|
running transactions still synchronize correctly even if different methods
|
| 413 |
|
|
are used) belong to the same TM method group. Pointers to TM methods can be
|
| 414 |
|
|
obtained using the factory methods prefixed with @code{dispatch_} in
|
| 415 |
|
|
@file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and
|
| 416 |
|
|
@code{dispatch_serialirr}, that are compatible with all methods because they
|
| 417 |
|
|
run transactions completely in serial mode.
|
| 418 |
|
|
|
| 419 |
|
|
@subsection TM method life cycle
|
| 420 |
|
|
|
| 421 |
|
|
The state of TM methods does not change after construction, but they do alter
|
| 422 |
|
|
the state of transactions that use this method. However, because
|
| 423 |
|
|
per-transaction data gets used by several methods, @code{gtm_thread} is
|
| 424 |
|
|
responsible for setting an initial state that is useful for all methods.
|
| 425 |
|
|
After that, methods are responsible for resetting/clearing this state on each
|
| 426 |
|
|
rollback or commit (of outermost transactions), so that the transaction
|
| 427 |
|
|
executed next is not affected by the previous transaction.
|
| 428 |
|
|
|
| 429 |
|
|
There is also global state associated with each method group, which is
|
| 430 |
|
|
initialized and shut down (@code{method_group::init()} and @code{fini()})
|
| 431 |
|
|
when switching between method groups (see @file{retry.cc}).
|
| 432 |
|
|
|
| 433 |
|
|
@subsection Selecting the default method
|
| 434 |
|
|
|
| 435 |
|
|
The default method that libitm uses for freshly started transactions (but
|
| 436 |
|
|
not necessarily for restarted transactions) can be set via an environment
|
| 437 |
|
|
variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name
|
| 438 |
|
|
of one of the factory methods returning abi_dispatch subclasses but without
|
| 439 |
|
|
the "dispatch_" prefix (e.g., "serialirr" instead of
|
| 440 |
|
|
@code{GTM::dispatch_serialirr()}).
|
| 441 |
|
|
|
| 442 |
|
|
Note that this environment variable is only a hint for libitm and might not
|
| 443 |
|
|
be supported in the future.
|
| 444 |
|
|
|
| 445 |
|
|
|
| 446 |
|
|
@section Nesting: flat vs. closed
|
| 447 |
|
|
|
| 448 |
|
|
We support two different kinds of nesting of transactions. In the case of
|
| 449 |
|
|
@emph{flat nesting}, the nesting structure is flattened and all nested
|
| 450 |
|
|
transactions are subsumed by the enclosing transaction. In contrast,
|
| 451 |
|
|
with @emph{closed nesting}, nested transactions that have not yet committed
|
| 452 |
|
|
can be rolled back separately from the enclosing transactions; when they
|
| 453 |
|
|
commit, they are subsumed by the enclosing transaction, and their effects
|
| 454 |
|
|
will be finally committed when the outermost transaction commits.
|
| 455 |
|
|
@emph{Open nesting} (where nested transactions can commit independently of the
|
| 456 |
|
|
enclosing transactions) are not supported.
|
| 457 |
|
|
|
| 458 |
|
|
Flat nesting is the default nesting mode, but closed nesting is supported and
|
| 459 |
|
|
used when transactions contain user-controlled aborts
|
| 460 |
|
|
(@code{__transaction_cancel} statements). We assume that user-controlled
|
| 461 |
|
|
aborts are rare in typical code and used mostly in exceptional situations.
|
| 462 |
|
|
Thus, it makes more sense to use flat nesting by default to avoid the
|
| 463 |
|
|
performance overhead of the additional checkpoints required for closed
|
| 464 |
|
|
nesting. User-controlled aborts will correctly abort the innermost enclosing
|
| 465 |
|
|
transaction, whereas the whole (i.e., outermost) transaction will be restarted
|
| 466 |
|
|
otherwise (e.g., when a transaction encounters data conflicts during
|
| 467 |
|
|
optimistic execution).
|
| 468 |
|
|
|
| 469 |
|
|
|
| 470 |
|
|
@section Locking conventions
|
| 471 |
|
|
|
| 472 |
|
|
This section documents the locking scheme and rules for all uses of locking
|
| 473 |
|
|
in libitm. We have to support serial(-irrevocable) mode, which is implemented
|
| 474 |
|
|
using a global lock as explained next (called the @emph{serial lock}). To
|
| 475 |
|
|
simplify the overall design, we use the same lock as catch-all locking
|
| 476 |
|
|
mechanism for other infrequent tasks such as (de)registering clone tables or
|
| 477 |
|
|
threads. Besides the serial lock, there are @emph{per-method-group locks} that
|
| 478 |
|
|
are managed by specific method groups (i.e., groups of similar TM concurrency
|
| 479 |
|
|
control algorithms), and lock-like constructs for quiescence-based operations
|
| 480 |
|
|
such as ensuring privatization safety.
|
| 481 |
|
|
|
| 482 |
|
|
Thus, the actions that participate in the libitm-internal locking are either
|
| 483 |
|
|
@emph{active transactions} that do not run in serial mode, @emph{serial
|
| 484 |
|
|
transactions} (which (are about to) run in serial mode), and management tasks
|
| 485 |
|
|
that do not execute within a transaction but have acquired the serial mode
|
| 486 |
|
|
like a serial transaction would do (e.g., to be able to register threads with
|
| 487 |
|
|
libitm). Transactions become active as soon as they have successfully used the
|
| 488 |
|
|
serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock
|
| 489 |
|
|
implementation}). Likewise, transactions become serial transactions as soon as
|
| 490 |
|
|
they have acquired the exclusive rights provided by the serial lock (i.e.,
|
| 491 |
|
|
serial mode, which also means that there are no other concurrent active or
|
| 492 |
|
|
serial transactions). Note that active transactions can become serial
|
| 493 |
|
|
transactions when they enter serial mode during the runtime of the
|
| 494 |
|
|
transaction.
|
| 495 |
|
|
|
| 496 |
|
|
@subsection State-to-lock mapping
|
| 497 |
|
|
|
| 498 |
|
|
Application data is protected by the serial lock if there is a serial
|
| 499 |
|
|
transaction and no concurrently running active transaction (i.e., non-serial).
|
| 500 |
|
|
Otherwise, application data is protected by the currently selected method
|
| 501 |
|
|
group, which might use per-method-group locks or other mechanisms. Also note
|
| 502 |
|
|
that application data that is about to be privatized might not be allowed to be
|
| 503 |
|
|
accessed by nontransactional code until privatization safety has been ensured;
|
| 504 |
|
|
the details of this are handled by the current method group.
|
| 505 |
|
|
|
| 506 |
|
|
libitm-internal state is either protected by the serial lock or accessed
|
| 507 |
|
|
through custom concurrent code. The latter applies to the public/shared part
|
| 508 |
|
|
of a transaction object and most typical method-group-specific state.
|
| 509 |
|
|
|
| 510 |
|
|
The former category (protected by the serial lock) includes:
|
| 511 |
|
|
@itemize @bullet
|
| 512 |
|
|
@item The list of active threads that have used transactions.
|
| 513 |
|
|
@item The tables that map functions to their transactional clones.
|
| 514 |
|
|
@item The current selection of which method group to use.
|
| 515 |
|
|
@item Some method-group-specific data, or invariants of this data. For example,
|
| 516 |
|
|
resetting a method group to its initial state is handled by switching to the
|
| 517 |
|
|
same method group, so the serial lock protects such resetting as well.
|
| 518 |
|
|
@end itemize
|
| 519 |
|
|
In general, such state is immutable whenever there exists an active
|
| 520 |
|
|
(non-serial) transaction. If there is no active transaction, a serial
|
| 521 |
|
|
transaction (or a thread that is not currently executing a transaction but has
|
| 522 |
|
|
acquired the serial lock) is allowed to modify this state (but must of course
|
| 523 |
|
|
be careful to not surprise the current method group's implementation with such
|
| 524 |
|
|
modifications).
|
| 525 |
|
|
|
| 526 |
|
|
@subsection Lock acquisition order
|
| 527 |
|
|
|
| 528 |
|
|
To prevent deadlocks, locks acquisition must happen in a globally agreed-upon
|
| 529 |
|
|
order. Note that this applies to other forms of blocking too, but does not
|
| 530 |
|
|
necessarily apply to lock acquisitions that do not block (e.g., trylock()
|
| 531 |
|
|
calls that do not get retried forever). Note that serial transactions are
|
| 532 |
|
|
never return back to active transactions until the transaction has committed.
|
| 533 |
|
|
Likewise, active transactions stay active until they have committed.
|
| 534 |
|
|
Per-method-group locks are typically also not released before commit.
|
| 535 |
|
|
|
| 536 |
|
|
Lock acquisition / blocking rules:
|
| 537 |
|
|
@itemize @bullet
|
| 538 |
|
|
|
| 539 |
|
|
@item Transactions must become active or serial before they are allowed to
|
| 540 |
|
|
use method-group-specific locks or blocking (i.e., the serial lock must be
|
| 541 |
|
|
acquired before those other locks, either in serial or nonserial mode).
|
| 542 |
|
|
|
| 543 |
|
|
@item Any number of threads that do not currently run active transactions can
|
| 544 |
|
|
block while trying to get the serial lock in exclusive mode. Note that active
|
| 545 |
|
|
transactions must not block when trying to upgrade to serial mode unless there
|
| 546 |
|
|
is no other transaction that is trying that (the latter is ensured by the
|
| 547 |
|
|
serial lock implementation.
|
| 548 |
|
|
|
| 549 |
|
|
@item Method groups must prevent deadlocks on their locks. In particular, they
|
| 550 |
|
|
must also be prepared for another active transaction that has acquired
|
| 551 |
|
|
method-group-specific locks but is blocked during an attempt to upgrade to
|
| 552 |
|
|
being a serial transaction. See below for details.
|
| 553 |
|
|
|
| 554 |
|
|
@item Serial transactions can acquire method-group-specific locks because there
|
| 555 |
|
|
will be no other active nor serial transaction.
|
| 556 |
|
|
|
| 557 |
|
|
@end itemize
|
| 558 |
|
|
|
| 559 |
|
|
There is no single rule for per-method-group blocking because this depends on
|
| 560 |
|
|
when a TM method might acquire locks. If no active transaction can upgrade to
|
| 561 |
|
|
being a serial transaction after it has acquired per-method-group locks (e.g.,
|
| 562 |
|
|
when those locks are only acquired during an attempt to commit), then the TM
|
| 563 |
|
|
method does not need to consider a potential deadlock due to serial mode.
|
| 564 |
|
|
|
| 565 |
|
|
If there can be upgrades to serial mode after the acquisition of
|
| 566 |
|
|
per-method-group locks, then TM methods need to avoid those deadlocks:
|
| 567 |
|
|
@itemize @bullet
|
| 568 |
|
|
@item When upgrading to a serial transaction, after acquiring exclusive rights
|
| 569 |
|
|
to the serial lock but before waiting for concurrent active transactions to
|
| 570 |
|
|
finish (@pxref{serial-lock-impl,,Serial lock implementation} for details),
|
| 571 |
|
|
we have to wake up all active transactions waiting on the upgrader's
|
| 572 |
|
|
per-method-group locks.
|
| 573 |
|
|
@item Active transactions blocking on per-method-group locks need to check the
|
| 574 |
|
|
serial lock and abort if there is a pending serial transaction.
|
| 575 |
|
|
@item Lost wake-ups have to be prevented (e.g., by changing a bit in each
|
| 576 |
|
|
per-method-group lock before doing the wake-up, and only blocking on this lock
|
| 577 |
|
|
using a futex if this bit is not group).
|
| 578 |
|
|
@end itemize
|
| 579 |
|
|
|
| 580 |
|
|
@strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make
|
| 581 |
|
|
sense to introduce further complexity in the serial lock? For gl-*, we can
|
| 582 |
|
|
really only avoid an abort if we do -wb and -vbv.
|
| 583 |
|
|
|
| 584 |
|
|
|
| 585 |
|
|
@subsection Serial lock implementation
|
| 586 |
|
|
@anchor{serial-lock-impl}
|
| 587 |
|
|
|
| 588 |
|
|
The serial lock implementation is optimized towards assuming that serial
|
| 589 |
|
|
transactions are infrequent and not the common case. However, the performance
|
| 590 |
|
|
of entering serial mode can matter because when only few transactions are run
|
| 591 |
|
|
concurrently or if there are few threads, then it can be efficient to run
|
| 592 |
|
|
transactions serially.
|
| 593 |
|
|
|
| 594 |
|
|
The serial lock is similar to a multi-reader-single-writer lock in that there
|
| 595 |
|
|
can be several active transactions but only one serial transaction. However,
|
| 596 |
|
|
we do want to avoid contention (in the lock implementation) between active
|
| 597 |
|
|
transactions, so we split up the reader side of the lock into per-transaction
|
| 598 |
|
|
flags that are true iff the transaction is active. The exclusive writer side
|
| 599 |
|
|
remains a shared single flag, which is acquired using a CAS, for example.
|
| 600 |
|
|
On the fast-path, the serial lock then works similar to Dekker's algorithm but
|
| 601 |
|
|
with several reader flags that a serial transaction would have to check.
|
| 602 |
|
|
A serial transaction thus requires a list of all threads with potentially
|
| 603 |
|
|
active transactions; we can use the serial lock itself to protect this list
|
| 604 |
|
|
(i.e., only threads that have acquired the serial lock can modify this list).
|
| 605 |
|
|
|
| 606 |
|
|
We want starvation-freedom for the serial lock to allow for using it to ensure
|
| 607 |
|
|
progress for potentially starved transactions (@pxref{progress-guarantees,,
|
| 608 |
|
|
Progress Guarantees} for details). However, this is currently not enforced by
|
| 609 |
|
|
the implementation of the serial lock.
|
| 610 |
|
|
|
| 611 |
|
|
Here is pseudo-code for the read/write fast paths of acquiring the serial
|
| 612 |
|
|
lock (read-to-write upgrade is similar to write_lock:
|
| 613 |
|
|
@example
|
| 614 |
|
|
// read_lock:
|
| 615 |
|
|
tx->shared_state |= active;
|
| 616 |
|
|
__sync_synchronize(); // or STLD membar, or C++0x seq-cst fence
|
| 617 |
|
|
while (!serial_lock.exclusive)
|
| 618 |
|
|
if (spinning_for_too_long) goto slowpath;
|
| 619 |
|
|
|
| 620 |
|
|
// write_lock:
|
| 621 |
|
|
if (CAS(&serial_lock.exclusive, 0, this) != 0)
|
| 622 |
|
|
goto slowpath; // writer-writer contention
|
| 623 |
|
|
// need a membar here, but CAS already has full membar semantics
|
| 624 |
|
|
bool need_blocking = false;
|
| 625 |
|
|
for (t: all txns)
|
| 626 |
|
|
@{
|
| 627 |
|
|
for (;t->shared_state & active;)
|
| 628 |
|
|
if (spinning_for_too_long) @{ need_blocking = true; break; @}
|
| 629 |
|
|
@}
|
| 630 |
|
|
if (need_blocking) goto slowpath;
|
| 631 |
|
|
@end example
|
| 632 |
|
|
|
| 633 |
|
|
Releasing a lock in this spin-lock version then just consists of resetting
|
| 634 |
|
|
@code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}.
|
| 635 |
|
|
|
| 636 |
|
|
However, we can't rely on a pure spinlock because we need to get the OS
|
| 637 |
|
|
involved at some time (e.g., when there are more threads than CPUs to run on).
|
| 638 |
|
|
Therefore, the real implementation falls back to a blocking slow path, either
|
| 639 |
|
|
based on pthread mutexes or Linux futexes.
|
| 640 |
|
|
|
| 641 |
|
|
|
| 642 |
|
|
@subsection Reentrancy
|
| 643 |
|
|
|
| 644 |
|
|
libitm has to consider the following cases of reentrancy:
|
| 645 |
|
|
@itemize @bullet
|
| 646 |
|
|
|
| 647 |
|
|
@item Transaction calls unsafe code that starts a new transaction: The outer
|
| 648 |
|
|
transaction will become a serial transaction before executing unsafe code.
|
| 649 |
|
|
Therefore, nesting within serial transactions must work, even if the nested
|
| 650 |
|
|
transaction is called from within uninstrumented code.
|
| 651 |
|
|
|
| 652 |
|
|
@item Transaction calls either a transactional wrapper or safe code, which in
|
| 653 |
|
|
turn starts a new transaction: It is not yet defined in the specification
|
| 654 |
|
|
whether this is allowed. Thus, it is undefined whether libitm supports this.
|
| 655 |
|
|
|
| 656 |
|
|
@item Code that starts new transactions might be called from within any part
|
| 657 |
|
|
of libitm: This kind of reentrancy would likely be rather complex and can
|
| 658 |
|
|
probably be avoided. Therefore, it is not supported.
|
| 659 |
|
|
|
| 660 |
|
|
@end itemize
|
| 661 |
|
|
|
| 662 |
|
|
@subsection Privatization safety
|
| 663 |
|
|
|
| 664 |
|
|
Privatization safety is ensured by libitm using a quiescence-based approach.
|
| 665 |
|
|
Basically, a privatizing transaction waits until all concurrent active
|
| 666 |
|
|
transactions will either have finished (are not active anymore) or operate on
|
| 667 |
|
|
a sufficiently recent snapshot to not access the privatized data anymore. This
|
| 668 |
|
|
happens after the privatizing transaction has stopped being an active
|
| 669 |
|
|
transaction, so waiting for quiescence does not contribute to deadlocks.
|
| 670 |
|
|
|
| 671 |
|
|
In method groups that need to ensure publication safety explicitly, active
|
| 672 |
|
|
transactions maintain a flag or timestamp in the public/shared part of the
|
| 673 |
|
|
transaction descriptor. Before blocking, privatizers need to let the other
|
| 674 |
|
|
transactions know that they should wake up the privatizer.
|
| 675 |
|
|
|
| 676 |
|
|
@strong{TODO} Ho to implement the waiters? Should those flags be
|
| 677 |
|
|
per-transaction or at a central place? We want to avoid one wake/wait call
|
| 678 |
|
|
per active transactions, so we might want to use either a tree or combining
|
| 679 |
|
|
to reduce the syscall overhead, or rather spin for a long amount of time
|
| 680 |
|
|
instead of doing blocking. Also, it would be good if only the last transaction
|
| 681 |
|
|
that the privatizer waits for would do the wake-up.
|
| 682 |
|
|
|
| 683 |
|
|
@subsection Progress guarantees
|
| 684 |
|
|
@anchor{progress-guarantees}
|
| 685 |
|
|
|
| 686 |
|
|
Transactions that do not make progress when using the current TM method will
|
| 687 |
|
|
eventually try to execute in serial mode. Thus, the serial lock's progress
|
| 688 |
|
|
guarantees determine the progress guarantees of the whole TM. Obviously, we at
|
| 689 |
|
|
least need deadlock-freedom for the serial lock, but it would also be good to
|
| 690 |
|
|
provide starvation-freedom (informally, all threads will finish executing a
|
| 691 |
|
|
transaction eventually iff they get enough cycles).
|
| 692 |
|
|
|
| 693 |
|
|
However, the scheduling of transactions (e.g., thread scheduling by the OS)
|
| 694 |
|
|
also affects the handling of progress guarantees by the TM. First, the TM
|
| 695 |
|
|
can only guarantee deadlock-freedom if threads do not get stopped. Likewise,
|
| 696 |
|
|
low-priority threads can starve if they do not get scheduled when other
|
| 697 |
|
|
high-priority threads get those cycles instead.
|
| 698 |
|
|
|
| 699 |
|
|
If all threads get scheduled eventually, correct lock implementations will
|
| 700 |
|
|
provide deadlock-freedom, but might not provide starvation-freedom. We can
|
| 701 |
|
|
either enforce the latter in the TM's lock implementation, or assume that
|
| 702 |
|
|
the scheduling is sufficiently random to yield a probabilistic guarantee that
|
| 703 |
|
|
no thread will starve (because eventually, a transaction will encounter a
|
| 704 |
|
|
scheduling that will allow it to run). This can indeed work well in practice
|
| 705 |
|
|
but is not necessarily guaranteed to work (e.g., simple spin locks can be
|
| 706 |
|
|
pretty efficient).
|
| 707 |
|
|
|
| 708 |
|
|
Because enforcing stronger progress guarantees in the TM has a higher runtime
|
| 709 |
|
|
overhead, we focus on deadlock-freedom right now and assume that the threads
|
| 710 |
|
|
will get scheduled eventually by the OS (but don't consider threads with
|
| 711 |
|
|
different priorities). We should support starvation-freedom for serial
|
| 712 |
|
|
transactions in the future. Everything beyond that is highly related to proper
|
| 713 |
|
|
contention management across all of the TM (including with TM method to
|
| 714 |
|
|
choose), and is future work.
|
| 715 |
|
|
|
| 716 |
|
|
@strong{TODO} Handling thread priorities: We want to avoid priority inversion
|
| 717 |
|
|
but it's unclear how often that actually matters in practice. Workloads that
|
| 718 |
|
|
have threads with different priorities will likely also require lower latency
|
| 719 |
|
|
or higher throughput for high-priority threads. Therefore, it probably makes
|
| 720 |
|
|
not that much sense (except for eventual progress guarantees) to use
|
| 721 |
|
|
priority inheritance until the TM has priority-aware contention management.
|
| 722 |
|
|
|
| 723 |
|
|
|
| 724 |
|
|
@c ---------------------------------------------------------------------
|
| 725 |
|
|
@c GNU Free Documentation License
|
| 726 |
|
|
@c ---------------------------------------------------------------------
|
| 727 |
|
|
|
| 728 |
|
|
@include fdl.texi
|
| 729 |
|
|
|
| 730 |
|
|
@c ---------------------------------------------------------------------
|
| 731 |
|
|
@c Index
|
| 732 |
|
|
@c ---------------------------------------------------------------------
|
| 733 |
|
|
|
| 734 |
|
|
@node Index
|
| 735 |
|
|
@unnumbered Index
|
| 736 |
|
|
|
| 737 |
|
|
@printindex cp
|
| 738 |
|
|
|
| 739 |
|
|
@bye
|