ieee.3 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444
  1. .\" Copyright (c) 1985 Regents of the University of California.
  2. .\" All rights reserved.
  3. .\"
  4. .\" Redistribution and use in source and binary forms, with or without
  5. .\" modification, are permitted provided that the following conditions
  6. .\" are met:
  7. .\" 1. Redistributions of source code must retain the above copyright
  8. .\" notice, this list of conditions and the following disclaimer.
  9. .\" 2. Redistributions in binary form must reproduce the above copyright
  10. .\" notice, this list of conditions and the following disclaimer in the
  11. .\" documentation and/or other materials provided with the distribution.
  12. .\" 4. Neither the name of the University nor the names of its contributors
  13. .\" may be used to endorse or promote products derived from this software
  14. .\" without specific prior written permission.
  15. .\"
  16. .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  17. .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  18. .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  19. .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  20. .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  21. .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  22. .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  23. .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  24. .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  25. .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  26. .\" SUCH DAMAGE.
  27. .\"
  28. .\" from: @(#)ieee.3 6.4 (Berkeley) 5/6/91
  29. .\" $FreeBSD: src/lib/msun/man/ieee.3,v 1.25 2011/10/16 14:30:28 eadler Exp $
  30. .\"
  31. .Dd January 26, 2005
  32. .Dt IEEE 3
  33. .Os
  34. .Sh NAME
  35. .Nm ieee
  36. .Nd IEEE standard 754 for floating-point arithmetic
  37. .Sh DESCRIPTION
  38. The IEEE Standard 754 for Binary Floating-Point Arithmetic
  39. defines representations of floating-point numbers and abstract
  40. properties of arithmetic operations relating to precision,
  41. rounding, and exceptional cases, as described below.
  42. .Ss IEEE STANDARD 754 Floating-Point Arithmetic
  43. Radix: Binary.
  44. .Pp
  45. Overflow and underflow:
  46. .Bd -ragged -offset indent -compact
  47. Overflow goes by default to a signed \*(If.
  48. Underflow is
  49. .Em gradual .
  50. .Ed
  51. .Pp
  52. Zero is represented ambiguously as +0 or \-0.
  53. .Bd -ragged -offset indent -compact
  54. Its sign transforms correctly through multiplication or
  55. division, and is preserved by addition of zeros
  56. with like signs; but x\-x yields +0 for every
  57. finite x.
  58. The only operations that reveal zero's
  59. sign are division by zero and
  60. .Fn copysign x \(+-0 .
  61. In particular, comparison (x > y, x \(>= y, etc.)\&
  62. cannot be affected by the sign of zero; but if
  63. finite x = y then \*(If = 1/(x\-y) \(!= \-1/(y\-x) = \-\*(If.
  64. .Ed
  65. .Pp
  66. Infinity is signed.
  67. .Bd -ragged -offset indent -compact
  68. It persists when added to itself
  69. or to any finite number.
  70. Its sign transforms
  71. correctly through multiplication and division, and
  72. (finite)/\(+-\*(If\0=\0\(+-0
  73. (nonzero)/0 = \(+-\*(If.
  74. But
  75. \*(If\-\*(If, \*(If\(**0 and \*(If/\*(If
  76. are, like 0/0 and sqrt(\-3),
  77. invalid operations that produce \*(Na. ...
  78. .Ed
  79. .Pp
  80. Reserved operands (\*(Nas):
  81. .Bd -ragged -offset indent -compact
  82. An \*(Na is
  83. .Em ( N Ns ot Em a N Ns umber ) .
  84. Some \*(Nas, called Signaling \*(Nas, trap any floating-point operation
  85. performed upon them; they are used to mark missing
  86. or uninitialized values, or nonexistent elements
  87. of arrays.
  88. The rest are Quiet \*(Nas; they are
  89. the default results of Invalid Operations, and
  90. propagate through subsequent arithmetic operations.
  91. If x \(!= x then x is \*(Na; every other predicate
  92. (x > y, x = y, x < y, ...) is FALSE if \*(Na is involved.
  93. .Ed
  94. .Pp
  95. Rounding:
  96. .Bd -ragged -offset indent -compact
  97. Every algebraic operation (+, \-, \(**, /,
  98. \(sr)
  99. is rounded by default to within half an
  100. .Em ulp ,
  101. and when the rounding error is exactly half an
  102. .Em ulp
  103. then
  104. the rounded value's least significant bit is zero.
  105. (An
  106. .Em ulp
  107. is one
  108. .Em U Ns nit
  109. in the
  110. .Em L Ns ast
  111. .Em P Ns lace . )
  112. This kind of rounding is usually the best kind,
  113. sometimes provably so; for instance, for every
  114. x = 1.0, 2.0, 3.0, 4.0, ..., 2.0**52, we find
  115. (x/3.0)\(**3.0 == x and (x/10.0)\(**10.0 == x and ...
  116. despite that both the quotients and the products
  117. have been rounded.
  118. Only rounding like IEEE 754 can do that.
  119. But no single kind of rounding can be
  120. proved best for every circumstance, so IEEE 754
  121. provides rounding towards zero or towards
  122. +\*(If or towards \-\*(If
  123. at the programmer's option.
  124. .Ed
  125. .Pp
  126. Exceptions:
  127. .Bd -ragged -offset indent -compact
  128. IEEE 754 recognizes five kinds of floating-point exceptions,
  129. listed below in declining order of probable importance.
  130. .Bl -column -offset indent "Invalid Operation" "Gradual Underflow"
  131. .Em "Exception Default Result"
  132. Invalid Operation \*(Na, or FALSE
  133. Overflow \(+-\*(If
  134. Divide by Zero \(+-\*(If
  135. Underflow Gradual Underflow
  136. Inexact Rounded value
  137. .El
  138. .Pp
  139. NOTE: An Exception is not an Error unless handled
  140. badly.
  141. What makes a class of exceptions exceptional
  142. is that no single default response can be satisfactory
  143. in every instance.
  144. On the other hand, if a default
  145. response will serve most instances satisfactorily,
  146. the unsatisfactory instances cannot justify aborting
  147. computation every time the exception occurs.
  148. .Ed
  149. .Ss Data Formats
  150. Single-precision:
  151. .Bd -ragged -offset indent -compact
  152. Type name:
  153. .Vt float
  154. .Pp
  155. Wordsize: 32 bits.
  156. .Pp
  157. Precision: 24 significant bits,
  158. roughly like 7 significant decimals.
  159. .Bd -ragged -offset indent -compact
  160. If x and x' are consecutive positive single-precision
  161. numbers (they differ by 1
  162. .Em ulp ) ,
  163. then
  164. .Bd -ragged -compact
  165. 5.9e\-08 < 0.5**24 < (x'\-x)/x \(<= 0.5**23 < 1.2e\-07.
  166. .Ed
  167. .Ed
  168. .Pp
  169. .Bl -column "XXX" -compact
  170. Range: Overflow threshold = 2.0**128 = 3.4e38
  171. Underflow threshold = 0.5**126 = 1.2e\-38
  172. .El
  173. .Bd -ragged -offset indent -compact
  174. Underflowed results round to the nearest
  175. integer multiple of 0.5**149 = 1.4e\-45.
  176. .Ed
  177. .Ed
  178. .Pp
  179. Double-precision:
  180. .Bd -ragged -offset indent -compact
  181. Type name:
  182. .Vt double
  183. .Bd -ragged -offset indent -compact
  184. On some architectures,
  185. .Vt long double
  186. is the same as
  187. .Vt double .
  188. .Ed
  189. .Pp
  190. Wordsize: 64 bits.
  191. .Pp
  192. Precision: 53 significant bits,
  193. roughly like 16 significant decimals.
  194. .Bd -ragged -offset indent -compact
  195. If x and x' are consecutive positive double-precision
  196. numbers (they differ by 1
  197. .Em ulp ) ,
  198. then
  199. .Bd -ragged -compact
  200. 1.1e\-16 < 0.5**53 < (x'\-x)/x \(<= 0.5**52 < 2.3e\-16.
  201. .Ed
  202. .Ed
  203. .Pp
  204. .Bl -column "XXX" -compact
  205. Range: Overflow threshold = 2.0**1024 = 1.8e308
  206. Underflow threshold = 0.5**1022 = 2.2e\-308
  207. .El
  208. .Bd -ragged -offset indent -compact
  209. Underflowed results round to the nearest
  210. integer multiple of 0.5**1074 = 4.9e\-324.
  211. .Ed
  212. .Ed
  213. .Pp
  214. Extended-precision:
  215. .Bd -ragged -offset indent -compact
  216. Type name:
  217. .Vt long double
  218. (when supported by the hardware)
  219. .Pp
  220. Wordsize: 96 bits.
  221. .Pp
  222. Precision: 64 significant bits,
  223. roughly like 19 significant decimals.
  224. .Bd -ragged -offset indent -compact
  225. If x and x' are consecutive positive extended-precision
  226. numbers (they differ by 1
  227. .Em ulp ) ,
  228. then
  229. .Bd -ragged -compact
  230. 1.0e\-19 < 0.5**63 < (x'\-x)/x \(<= 0.5**62 < 2.2e\-19.
  231. .Ed
  232. .Ed
  233. .Pp
  234. .Bl -column "XXX" -compact
  235. Range: Overflow threshold = 2.0**16384 = 1.2e4932
  236. Underflow threshold = 0.5**16382 = 3.4e\-4932
  237. .El
  238. .Bd -ragged -offset indent -compact
  239. Underflowed results round to the nearest
  240. integer multiple of 0.5**16445 = 5.7e\-4953.
  241. .Ed
  242. .Ed
  243. .Pp
  244. Quad-extended-precision:
  245. .Bd -ragged -offset indent -compact
  246. Type name:
  247. .Vt long double
  248. (when supported by the hardware)
  249. .Pp
  250. Wordsize: 128 bits.
  251. .Pp
  252. Precision: 113 significant bits,
  253. roughly like 34 significant decimals.
  254. .Bd -ragged -offset indent -compact
  255. If x and x' are consecutive positive quad-extended-precision
  256. numbers (they differ by 1
  257. .Em ulp ) ,
  258. then
  259. .Bd -ragged -compact
  260. 9.6e\-35 < 0.5**113 < (x'\-x)/x \(<= 0.5**112 < 2.0e\-34.
  261. .Ed
  262. .Ed
  263. .Pp
  264. .Bl -column "XXX" -compact
  265. Range: Overflow threshold = 2.0**16384 = 1.2e4932
  266. Underflow threshold = 0.5**16382 = 3.4e\-4932
  267. .El
  268. .Bd -ragged -offset indent -compact
  269. Underflowed results round to the nearest
  270. integer multiple of 0.5**16494 = 6.5e\-4966.
  271. .Ed
  272. .Ed
  273. .Ss Additional Information Regarding Exceptions
  274. .Pp
  275. For each kind of floating-point exception, IEEE 754
  276. provides a Flag that is raised each time its exception
  277. is signaled, and stays raised until the program resets
  278. it.
  279. Programs may also test, save and restore a flag.
  280. Thus, IEEE 754 provides three ways by which programs
  281. may cope with exceptions for which the default result
  282. might be unsatisfactory:
  283. .Bl -enum
  284. .It
  285. Test for a condition that might cause an exception
  286. later, and branch to avoid the exception.
  287. .It
  288. Test a flag to see whether an exception has occurred
  289. since the program last reset its flag.
  290. .It
  291. Test a result to see whether it is a value that only
  292. an exception could have produced.
  293. .Pp
  294. CAUTION: The only reliable ways to discover
  295. whether Underflow has occurred are to test whether
  296. products or quotients lie closer to zero than the
  297. underflow threshold, or to test the Underflow
  298. flag.
  299. (Sums and differences cannot underflow in
  300. IEEE 754; if x \(!= y then x\-y is correct to
  301. full precision and certainly nonzero regardless of
  302. how tiny it may be.)
  303. Products and quotients that
  304. underflow gradually can lose accuracy gradually
  305. without vanishing, so comparing them with zero
  306. (as one might on a VAX) will not reveal the loss.
  307. Fortunately, if a gradually underflowed value is
  308. destined to be added to something bigger than the
  309. underflow threshold, as is almost always the case,
  310. digits lost to gradual underflow will not be missed
  311. because they would have been rounded off anyway.
  312. So gradual underflows are usually
  313. .Em provably
  314. ignorable.
  315. The same cannot be said of underflows flushed to 0.
  316. .El
  317. .Pp
  318. At the option of an implementor conforming to IEEE 754,
  319. other ways to cope with exceptions may be provided:
  320. .Bl -enum
  321. .It
  322. ABORT.
  323. This mechanism classifies an exception in
  324. advance as an incident to be handled by means
  325. traditionally associated with error-handling
  326. statements like "ON ERROR GO TO ...".
  327. Different
  328. languages offer different forms of this statement,
  329. but most share the following characteristics:
  330. .Bl -dash
  331. .It
  332. No means is provided to substitute a value for
  333. the offending operation's result and resume
  334. computation from what may be the middle of an
  335. expression.
  336. An exceptional result is abandoned.
  337. .It
  338. In a subprogram that lacks an error-handling
  339. statement, an exception causes the subprogram to
  340. abort within whatever program called it, and so
  341. on back up the chain of calling subprograms until
  342. an error-handling statement is encountered or the
  343. whole task is aborted and memory is dumped.
  344. .El
  345. .It
  346. STOP.
  347. This mechanism, requiring an interactive
  348. debugging environment, is more for the programmer
  349. than the program.
  350. It classifies an exception in
  351. advance as a symptom of a programmer's error; the
  352. exception suspends execution as near as it can to
  353. the offending operation so that the programmer can
  354. look around to see how it happened.
  355. Quite often
  356. the first several exceptions turn out to be quite
  357. unexceptionable, so the programmer ought ideally
  358. to be able to resume execution after each one as if
  359. execution had not been stopped.
  360. .It
  361. \&... Other ways lie beyond the scope of this document.
  362. .El
  363. .Pp
  364. Ideally, each
  365. elementary function should act as if it were indivisible, or
  366. atomic, in the sense that ...
  367. .Bl -enum
  368. .It
  369. No exception should be signaled that is not deserved by
  370. the data supplied to that function.
  371. .It
  372. Any exception signaled should be identified with that
  373. function rather than with one of its subroutines.
  374. .It
  375. The internal behavior of an atomic function should not
  376. be disrupted when a calling program changes from
  377. one to another of the five or so ways of handling
  378. exceptions listed above, although the definition
  379. of the function may be correlated intentionally
  380. with exception handling.
  381. .El
  382. .Pp
  383. The functions in
  384. .Nm libm
  385. are only approximately atomic.
  386. They signal no inappropriate exception except possibly ...
  387. .Bl -tag -width indent -offset indent -compact
  388. .It Xo
  389. Over/Underflow
  390. .Xc
  391. when a result, if properly computed, might have lain barely within range, and
  392. .It Xo
  393. Inexact in
  394. .Fn cabs ,
  395. .Fn cbrt ,
  396. .Fn hypot ,
  397. .Fn log10
  398. and
  399. .Fn pow
  400. .Xc
  401. when it happens to be exact, thanks to fortuitous cancellation of errors.
  402. .El
  403. Otherwise, ...
  404. .Bl -tag -width indent -offset indent -compact
  405. .It Xo
  406. Invalid Operation is signaled only when
  407. .Xc
  408. any result but \*(Na would probably be misleading.
  409. .It Xo
  410. Overflow is signaled only when
  411. .Xc
  412. the exact result would be finite but beyond the overflow threshold.
  413. .It Xo
  414. Divide-by-Zero is signaled only when
  415. .Xc
  416. a function takes exactly infinite values at finite operands.
  417. .It Xo
  418. Underflow is signaled only when
  419. .Xc
  420. the exact result would be nonzero but tinier than the underflow threshold.
  421. .It Xo
  422. Inexact is signaled only when
  423. .Xc
  424. greater range or precision would be needed to represent the exact result.
  425. .El
  426. .Sh SEE ALSO
  427. .Xr fenv 3 ,
  428. .Xr ieee_test 3 ,
  429. .Xr math 3
  430. .Pp
  431. An explanation of IEEE 754 and its proposed extension p854
  432. was published in the IEEE magazine MICRO in August 1984 under
  433. the title "A Proposed Radix- and Word-length-independent
  434. Standard for Floating-point Arithmetic" by
  435. .An "W. J. Cody"
  436. et al.
  437. The manuals for Pascal, C and BASIC on the Apple Macintosh
  438. document the features of IEEE 754 pretty well.
  439. Articles in the IEEE magazine COMPUTER vol.\& 14 no.\& 3 (Mar.\&
  440. 1981), and in the ACM SIGNUM Newsletter Special Issue of
  441. Oct.\& 1979, may be helpful although they pertain to
  442. superseded drafts of the standard.
  443. .Sh STANDARDS
  444. .St -ieee754