浏览代码

Merge #70

70: Clarify in the docs that `mul_add` is not always faster. r=cuviper a=frewsxcv

More info:

- https://github.com/rust-lang/rust/issues/49842
- https://github.com/rust-lang/rust/pull/50572

Co-authored-by: Corey Farwell <coreyf@rwell.org>
bors[bot] 7 年之前
父节点
当前提交
15dc0e7127
共有 3 个文件被更改,包括 13 次插入8 次删除
  1. 4 2
      src/float.rs
  2. 5 4
      src/ops/mul_add.rs
  3. 4 2
      src/real.rs

+ 4 - 2
src/float.rs

@@ -1237,8 +1237,10 @@ pub trait Float
     fn is_sign_negative(self) -> bool;
 
     /// Fused multiply-add. Computes `(self * a) + b` with only one rounding
-    /// error. This produces a more accurate result with better performance than
-    /// a separate multiplication operation followed by an add.
+    /// error, yielding a more accurate result than an unfused multiply-add.
+    ///
+    /// Using `mul_add` can be more performant than an unfused multiply-add if
+    /// the target architecture has a dedicated `fma` CPU instruction.
     ///
     /// ```
     /// use num_traits::Float;

+ 5 - 4
src/ops/mul_add.rs

@@ -1,7 +1,8 @@
-/// The fused multiply-add operation.
-/// Computes (self * a) + b with only one rounding error.
-/// This produces a more accurate result with better performance
-/// than a separate multiplication operation followed by an add.
+/// Fused multiply-add. Computes `(self * a) + b` with only one rounding
+/// error, yielding a more accurate result than an unfused multiply-add.
+///
+/// Using `mul_add` can be more performant than an unfused multiply-add if
+/// the target architecture has a dedicated `fma` CPU instruction.
 ///
 /// Note that `A` and `B` are `Self` by default, but this is not mandatory.
 ///

+ 4 - 2
src/real.rs

@@ -215,8 +215,10 @@ pub trait Real
     fn is_sign_negative(self) -> bool;
 
     /// Fused multiply-add. Computes `(self * a) + b` with only one rounding
-    /// error. This produces a more accurate result with better performance than
-    /// a separate multiplication operation followed by an add.
+    /// error, yielding a more accurate result than an unfused multiply-add.
+    ///
+    /// Using `mul_add` can be more performant than an unfused multiply-add if
+    /// the target architecture has a dedicated `fma` CPU instruction.
     ///
     /// ```
     /// use num_traits::real::Real;