瀏覽代碼

Performance pass for ralloc: global-local allocator model, TLS allocator, and more

- Ralloc now has a global-local allocator model where the local allocator requests memory from the global allocator. This allows for...

- ...thread-local storage allocators. The local allocator stores the allocator in a TLS static, bypassing the mutex.

- Changes to OOM handling: the OOM handleris no longer allocator-specific, but instead global

- Thread-local OOMs

- Tweak canonicalization.

- Add UniCell primive, which allow for single-threaded mutexes

- Tests and stuff.
ticki 8 年之前
父節點
當前提交
c8adc1ea2c
共有 16 個文件被更改,包括 755 次插入235 次删除
  1. 32 21
      README.md
  2. 2 0
      TODO.md
  3. 76 14
      shim/src/lib.rs
  4. 19 16
      src/allocator.rs
  5. 7 0
      src/block.rs
  6. 208 131
      src/bookkeeper.rs
  7. 65 0
      src/breaker.rs
  8. 55 0
      src/brk.rs
  9. 112 0
      src/cell.rs
  10. 19 2
      src/fail.rs
  11. 7 46
      src/lib.rs
  12. 1 0
      src/prelude.rs
  13. 41 0
      src/symbols.rs
  14. 18 3
      src/sys.rs
  15. 62 0
      src/tls.rs
  16. 31 2
      src/vec.rs

+ 32 - 21
README.md

@@ -57,6 +57,23 @@ fn main() {
 }
 ```
 
+### Thread-specific OOM handlers.
+
+You can override the global OOM handler for your current thread. Enable the `thread_oom` feature, and then do:
+
+```rust
+extern crate ralloc;
+
+fn my_handler() -> ! {
+    println!("Oh no. Blame the Mexicans.");
+}
+
+fn main() {
+    ralloc::set_thread_oom_handler(my_handler);
+    // Do some stuff...
+}
+```
+
 ### Debug check: double free
 
 Ooh, this one is a cool one. `ralloc` detects various memory bugs when compiled
@@ -87,21 +104,20 @@ fn main() {
 ```rust
 extern crate ralloc;
 
-use std::mem;
+use std::{mem, spawn};
 
 fn main() {
-    {
-        // We start by allocating some stuff.
-        let a = Box::new(500u32);
-        // We then leak `a`.
-        let b = mem::forget(a);
-    }
-    // The box is now leaked, and the destructor won't be called.
-
-    // To debug this we insert a memory leak check in the end of our programs.
-    // This will panic if a memory leak is found (and will be a NOOP without
-    // `debug_tools`).
-    ralloc::lock().debug_assert_no_leak();
+    thread::spawn(|| {
+        {
+            // We start by allocating some stuff.
+            let a = Box::new(500u32);
+            // We then leak `a`.
+            let b = mem::forget(a);
+        }
+        // The box is now leaked, and the destructor won't be called.
+
+        // When this thread exits, the program will panic.
+    });
 }
 ```
 
@@ -227,13 +243,8 @@ This is just one of many examples.
 ### Platform agnostic
 
 `ralloc` is platform independent. It depends on `ralloc_shim`, a minimal
-interface for platform dependent functions. The default implementation of
-`ralloc_shim` requires the following symbols:
-
-1. `sbrk`: For extending the data segment size.
-2. `sched_yield`: For the spinlock.
-3. `memcpy`, `memcmp`, `memset`: Core memory routines.
-4. `rust_begin_unwind`: For panicking.
+interface for platform dependent functions. An default implementation of
+`ralloc_shim` is provided (supporting Mac OS, Linux, and BSD).
 
 ### Local allocators
 
@@ -268,7 +279,7 @@ fn main() {
 
 ### Logging
 
-If you enable the `log` feature, you get detailed locking of the allocator, e.g.
+If you enable the `log` feature, you get detailed logging of the allocator, e.g.
 
 ```
 |   : BRK'ing a block of size, 80, and alignment 8.            (at bookkeeper.rs:458)

+ 2 - 0
TODO.md

@@ -1,5 +1,7 @@
 - [x] Thread local allocator.
 - [x] Lock reuse
+- [ ] Freeze/unfreeze -- memcompression
+- [ ] Proper error messages in unwraps.
 - [ ] Checkpoints
 - [ ] Fast `calloc`
 - [ ] Microcaches.

+ 76 - 14
shim/src/lib.rs

@@ -1,21 +1,83 @@
-//! Symbols and externs that ralloc depends on.
+//! Symbols and externs that `ralloc` depends on.
+//!
+//! This crate provides implementation/import of these in Linux, BSD, and Mac OS.
 
-#![crate_name="ralloc_shim"]
-#![crate_type="lib"]
-#![feature(lang_items)]
-#![warn(missing_docs)]
+#![feature(lang_items, linkage)]
 #![no_std]
+#![warn(missing_docs)]
+
+extern crate libc;
+
+pub use libc::sched_yield;
+
+extern {
+    /// Change the data segment. See `man sbrk`.
+    pub fn sbrk(libc::intptr_t) -> *const libc::c_void;
+}
+
+/// Thread destructors for Linux.
+#[cfg(target_os = "linux")]
+pub mod thread_destructor {
+    use libc;
+
+    extern {
+        #[linkage = "extern_weak"]
+        static __dso_handle: *mut u8;
+        #[linkage = "extern_weak"]
+        static __cxa_thread_atexit_impl: *const libc::c_void;
+    }
+
+    /// Does this platform support thread destructors?
+    ///
+    /// This will return true, if and only if `__cxa_thread_atexit_impl` is non-null.
+    #[inline]
+    pub fn is_supported() -> bool {
+        !__cxa_thread_atexit_impl.is_null()
+    }
 
-extern "C" {
-    /// Cooperatively gives up a timeslice to the OS scheduler.
-    pub fn sched_yield() -> isize;
+    /// Register a thread destructor.
+    ///
+    /// # Safety
+    ///
+    /// This is unsafe due to accepting (and dereferencing) raw pointers, as well as running an
+    /// arbitrary unsafe function.
+    ///
+    /// On older system without the `__cxa_thread_atexit_impl` symbol, this is unsafe to call, and will
+    /// likely segfault.
+    // TODO: Due to rust-lang/rust#18804, make sure this is not generic!
+    pub unsafe fn register(t: *mut u8, dtor: unsafe extern fn(*mut u8)) {
+        use core::mem;
+
+        /// A thread destructor.
+        type Dtor = unsafe extern fn(dtor: unsafe extern fn(*mut u8), arg: *mut u8, dso_handle: *mut u8) -> libc::c_int;
+
+        mem::transmute::<*const libc::c_void, Dtor>(__cxa_thread_atexit_impl)(dtor, t, &__dso_handle as *const _ as *mut _);
+    }
+}
+
+/// Thread destructors for Mac OS.
+#[cfg(target_os = "macos")]
+pub mod thread_destructor {
+    use libc;
 
-    /// Increment data segment of this process by some, _n_, return a pointer to the new data segment
-    /// start.
+    /// Does this platform support thread destructors?
     ///
-    /// This uses the system call BRK as backend.
+    /// This will always return true.
+    #[inline]
+    pub fn is_supported() -> bool { true }
+
+    /// Register a thread destructor.
     ///
-    /// This is unsafe for multiple reasons. Most importantly, it can create an inconsistent state,
-    /// because it is not atomic. Thus, it can be used to create Undefined Behavior.
-    pub fn sbrk(n: isize) -> *mut u8;
+    /// # Safety
+    ///
+    /// This is unsafe due to accepting (and dereferencing) raw pointers, as well as running an
+    /// arbitrary unsafe function.
+    #[cfg(target_os = "macos")]
+    pub unsafe fn register(t: *mut u8, dtor: unsafe extern fn(*mut u8)) {
+        extern {
+            fn _tlv_atexit(dtor: unsafe extern fn(*mut u8), arg: *mut u8);
+        }
+
+        _tlv_atexit(dtor, t);
+    }
 }

+ 19 - 16
src/allocator.rs

@@ -4,16 +4,30 @@
 
 use prelude::*;
 
-use sync;
+use {sync, breaker};
 use bookkeeper::Bookkeeper;
 
 /// The global default allocator.
-static ALLOCATOR: sync::Mutex<Allocator> = sync::Mutex::new(Allocator::new());
+static GLOBAL_ALLOCATOR: sync::Mutex<Allocator<breaker::Sbrk>> = sync::Mutex::new(Allocator::new());
+tls! {
+    /// The thread-local allocator.
+    static ALLOCATOR: Option<UniCell<Allocator<breaker::Global>>> = None;
+}
 
-/// Lock the allocator.
+/// Get the allocator.
 #[inline]
-pub fn lock<'a>() -> sync::MutexGuard<'a, Allocator> {
-    ALLOCATOR.lock()
+pub fn get() -> Result<Allocator<breaker::Global>, ()> {
+    if ALLOCATOR.is_none() {
+        // Create the new allocator.
+        let mut alloc = Allocator::new();
+        // Attach the allocator to the current thread.
+        alloc.attach();
+
+        // To get mutable access, we wrap it in an `UniCell`.
+        ALLOCATOR = Some(UniCell::new(alloc));
+
+        &ALLOCATOR
+    }
 }
 
 /// An allocator.
@@ -90,15 +104,4 @@ impl Allocator {
             Err(())
         }
     }
-
-    /// Assert that no leaks are done.
-    ///
-    /// This should be run in the end of your program, after destructors have been run. It will then
-    /// panic if some item is not freed.
-    ///
-    /// In release mode, this is a NOOP.
-    pub fn debug_assert_no_leak(&self) {
-        #[cfg(feature = "debug_tools")]
-        self.inner.assert_no_leak();
-    }
 }

+ 7 - 0
src/block.rs

@@ -267,6 +267,13 @@ impl fmt::Debug for Block {
     }
 }
 
+/// Make sure dropped blocks are empty.
+impl Drop for Block {
+    fn drop(&mut self) {
+        debug_assert!(self.is_empty(), "Dropping a non-empty block.");
+    }
+}
+
 #[cfg(test)]
 mod test {
     use prelude::*;

+ 208 - 131
src/bookkeeper.rs

@@ -2,41 +2,13 @@
 
 use prelude::*;
 
+use brk;
 use vec::Vec;
 
+use core::marker::PhantomData;
 use core::ops::Range;
 use core::{ptr, cmp, mem};
 
-/// Canonicalize a BRK request.
-///
-/// Syscalls can be expensive, which is why we would rather accquire more memory than necessary,
-/// than having many syscalls acquiring memory stubs. Memory stubs are small blocks of memory,
-/// which are essentially useless until merge with another block.
-///
-/// To avoid many syscalls and accumulating memory stubs, we BRK a little more memory than
-/// necessary. This function calculate the memory to be BRK'd based on the necessary memory.
-///
-/// The return value is always greater than or equals to the argument.
-#[inline]
-fn canonicalize_brk(min: usize) -> usize {
-    /// The BRK multiplier.
-    ///
-    /// The factor determining the linear dependence between the minimum segment, and the acquired
-    /// segment.
-    const BRK_MULTIPLIER: usize = 2;
-    /// The minimum size to be BRK'd.
-    const BRK_MIN: usize = 65536;
-    /// The maximal amount of _extra_ elements.
-    const BRK_MAX_EXTRA: usize = 4 * 65536;
-
-    let res = cmp::max(BRK_MIN, min.saturating_add(cmp::min(BRK_MULTIPLIER * min, BRK_MAX_EXTRA)));
-
-    // Make some handy assertions.
-    debug_assert!(res >= min, "Canonicalized BRK space is smaller than the one requested.");
-
-    res
-}
-
 /// The memory bookkeeper.
 ///
 /// This is the main component of ralloc. Its job is to keep track of the free blocks in a
@@ -48,7 +20,7 @@ fn canonicalize_brk(min: usize) -> usize {
 /// Only making use of only [`alloc`](#method.alloc), [`free`](#method.free),
 /// [`realloc`](#method.realloc) (and following their respective assumptions) guarantee that no
 /// buffer overrun, arithmetic overflow, panic, or otherwise unexpected crash will happen.
-pub struct Bookkeeper {
+pub struct Bookkeeper<B> {
     /// The internal block pool.
     ///
     /// # Guarantees
@@ -65,9 +37,14 @@ pub struct Bookkeeper {
     /// The number of bytes currently allocated.
     #[cfg(feature = "debug_tools")]
     allocated: usize,
+    /// The "breaker", i.e. the fresh allocator.
+    ///
+    /// This has as job as acquiring new memory through some external source (e.g. BRK or the
+    /// global allocator).
+    breaker: PhantomData<B>,
 }
 
-impl Bookkeeper {
+impl<B: Breaker> Bookkeeper<B> {
     /// Create a new, empty block pool.
     ///
     /// This will make no allocations or BRKs.
@@ -168,7 +145,7 @@ impl Bookkeeper {
             // There are many corner cases that make knowing where to insert it difficult
             // so we search instead.
             let bound = self.find_bound(&excessive);
-            self.free_ind(bound, excessive);
+            self.free_bound(bound, excessive);
 
             // Check consistency.
             self.check();
@@ -241,7 +218,7 @@ impl Bookkeeper {
         let bound = self.find_bound(&block);
 
         // Free the given block.
-        self.free_ind(bound, block);
+        self.free_bound(bound, block);
     }
 
     /// Reallocate memory.
@@ -285,7 +262,7 @@ impl Bookkeeper {
         // "Leave" the allocator.
         let block = self.enter(block);
         // Try to do an inplace reallocation.
-        match self.realloc_inplace_ind(ind, block, new_size) {
+        match self.realloc_inplace_bound(ind, block, new_size) {
             Ok(block) => self.leave(block),
             Err(block) => {
                 // Reallocation cannot be done inplace.
@@ -299,7 +276,7 @@ impl Bookkeeper {
                 // Free the old block.
                 // Allocation may have moved insertion so we search again.
                 let bound = self.find_bound(&block);
-                self.free_ind(bound, block);
+                self.free_bound(bound, block);
 
                 // Check consistency.
                 self.check();
@@ -322,7 +299,7 @@ impl Bookkeeper {
     ///
     /// This shouldn't be used when the index of insertion is known, since this performs an binary
     /// search to find the blocks index. When you know the index use
-    /// [`realloc_inplace_ind`](#method.realloc_inplace_ind.html).
+    /// [`realloc_inplace_bound`](#method.realloc_inplace_bound.html).
     #[inline]
     pub fn realloc_inplace(&mut self, block: Block, new_size: usize) -> Result<Block, Block> {
         // Logging.
@@ -332,7 +309,7 @@ impl Bookkeeper {
         let bound = self.find_bound(&block);
 
         // Go for it!
-        let res = self.realloc_inplace_ind(bound, block, new_size);
+        let res = self.realloc_inplace_bound(bound, block, new_size);
 
         // Check consistency.
         debug_assert!(res.as_ref().ok().map_or(true, |x| x.size() == new_size), "Requested space \
@@ -341,37 +318,10 @@ impl Bookkeeper {
         res
     }
 
-    /// Allocate _fresh_ space.
-    ///
-    /// "Fresh" means that the space is allocated through a BRK call to the kernel.
-    ///
-    /// The returned pointer is guaranteed to be aligned to `align`.
-    #[inline]
-    fn alloc_fresh(&mut self, size: usize, align: usize) -> Block {
-        // Logging.
-        log!(self.pool, "Fresh allocation of size {} with alignment {}.", size, align);
-
-        // To avoid shenanigans with unbounded recursion and other stuff, we pre-reserve the
-        // buffer.
-        self.reserve_more(2);
-
-        // BRK what you need.
-        let (alignment_block, res, excessive) = self.brk(size, align);
-
-        // Add it to the list. This will not change the order, since the pointer is higher than all
-        // the previous blocks.
-        self.double_push(alignment_block, excessive);
-
-        // Check consistency.
-        self.check();
-
-        res
-    }
-
-    /// Reallocate a block on a know index inplace.
+    /// Reallocate a block on a know index bound inplace.
     ///
     /// See [`realloc_inplace`](#method.realloc_inplace.html) for more information.
-    fn realloc_inplace_ind(&mut self, ind: Range<usize>, mut block: Block, new_size: usize) -> Result<Block, Block> {
+    fn realloc_inplace_bound(&mut self, ind: Range<usize>, mut block: Block, new_size: usize) -> Result<Block, Block> {
         // Logging.
         log!(self.pool;ind, "Try inplace reallocating {:?} to size {}.", block, new_size);
 
@@ -386,7 +336,7 @@ impl Bookkeeper {
             // Split the block in two segments, the main segment and the excessive segment.
             let (block, excessive) = block.split(new_size);
             // Free the excessive segment.
-            self.free_ind(ind, excessive);
+            self.free_bound(ind, excessive);
 
             // Make some assertions to avoid dumb bugs.
             debug_assert!(block.size() == new_size, "Block wasn't shrinked properly.");
@@ -441,7 +391,7 @@ impl Bookkeeper {
     ///
     /// See [`free`](#method.free) for more information.
     #[inline]
-    fn free_ind(&mut self, ind: Range<usize>, mut block: Block) {
+    fn free_bound(&mut self, ind: Range<usize>, mut block: Block) {
         // Logging.
         log!(self.pool;ind, "Freeing {:?}.", block);
 
@@ -483,29 +433,25 @@ impl Bookkeeper {
         self.check();
     }
 
-    /// Extend the data segment.
-    #[inline]
-    fn brk(&self, size: usize, align: usize) -> (Block, Block, Block) {
+    /// Allocate _fresh_ space.
+    ///
+    /// "Fresh" means that the space is allocated through the breaker.
+    ///
+    /// The returned pointer is guaranteed to be aligned to `align`.
+    fn alloc_fresh(&mut self, size: usize, align: usize) -> Block {
         // Logging.
-        log!(self.pool;self.pool.len(), "BRK'ing a block of size, {}, and alignment {}.", size, align);
-
-        // Calculate the canonical size (extra space is allocated to limit the number of system calls).
-        let brk_size = canonicalize_brk(size).checked_add(align).expect("Alignment addition overflowed.");
-
-        // Use SBRK to allocate extra data segment. The alignment is used as precursor for our
-        // allocated block. This ensures that it is properly memory aligned to the requested value.
-        let (alignment_block, rest) = Block::brk(brk_size).align(align).unwrap();
+        log!(self.pool, "Fresh allocation of size {} with alignment {}.", size, align);
 
-        // Split the block to leave the excessive space.
-        let (res, excessive) = rest.split(size);
+        // Break it to me!
+        let res = B::alloc_fresh(size, align);
 
-        // Make some assertions.
-        debug_assert!(res.aligned_to(align), "Alignment failed.");
-        debug_assert!(res.size() + alignment_block.size() + excessive.size() == brk_size, "BRK memory leak");
+        // Check consistency.
+        self.check();
 
-        (alignment_block, res, excessive)
+        res
     }
 
+
     /// Push two blocks to the block pool.
     ///
     /// This will append the blocks to the end of the block pool (and merge if possible). Make sure
@@ -555,48 +501,6 @@ impl Bookkeeper {
         self.check();
     }
 
-    /// Reserve space for the block pool.
-    ///
-    /// This will ensure the capacity is at least `needed` greater than the current length,
-    /// potentially reallocating the block pool.
-    #[inline]
-    fn reserve_more(&mut self, needed: usize) {
-        // Logging.
-        log!(self.pool;self.pool.len(), "Reserving {} past {}, currently has capacity {}.", needed,
-             self.pool.len(), self.pool.capacity());
-
-        let needed = self.pool.len() + needed;
-        if needed > self.pool.capacity() {
-            // TODO allow BRK-free non-inplace reservations.
-            // TODO Enable inplace reallocation in this position.
-
-            // Reallocate the block pool.
-
-            // Make a fresh allocation.
-            let size = needed.saturating_add(
-                cmp::min(self.pool.capacity(), 200 + self.pool.capacity() / 2)
-                // We add:
-                + 1 // block for the alignment block.
-                + 1 // block for the freed vector.
-                + 1 // block for the excessive space.
-            ) * mem::size_of::<Block>();
-            let (alignment_block, alloc, excessive) = self.brk(size, mem::align_of::<Block>());
-
-            // Refill the pool.
-            let old = self.pool.refill(alloc);
-
-            // Double push the alignment block and the excessive space linearly (note that it is in
-            // fact in the end of the pool, due to BRK _extending_ the segment).
-            self.double_push(alignment_block, excessive);
-
-            // Free the old vector.
-            self.free(old);
-
-            // Check consistency.
-            self.check();
-        }
-    }
-
     /// Perform a binary search to find the appropriate place where the block can be insert or is
     /// located.
     ///
@@ -811,6 +715,24 @@ impl Bookkeeper {
         }
     }
 
+    /// Reserve space for the block pool.
+    ///
+    /// This will ensure the capacity is at least `needed` greater than the current length,
+    /// potentially reallocating the block pool.
+    fn reserve_more(&mut self, extra: usize) {
+        // Logging.
+        log!(bk.pool;bk.pool.len(), "Reserving {} past {}, currently has capacity {}.", extra,
+             bk.pool.len(), bk.pool.capacity());
+
+        let needed = bk.pool.len() + extra;
+        if needed > bk.pool.capacity() {
+            B::realloc_pool(self, needed);
+
+            // Check consistency.
+            bk.check();
+        }
+    }
+
     /// Leave the allocator.
     ///
     /// A block should be "registered" through this function when it leaves the allocated (e.g., is
@@ -865,8 +787,8 @@ impl Bookkeeper {
                 let mut next = x;
                 for (n, i) in it {
                     // Check if sorted.
-                    assert!(next >= i, "The block pool is not sorted at index, {} ({:?} < {:?})", n, next,
-                            i);
+                    assert!(next >= i, "The block pool is not sorted at index, {} ({:?} < {:?}).",
+                            n, next, i);
                     // Make sure no blocks are adjacent.
                     assert!(!i.left_to(next) || i.is_empty(), "Adjacent blocks at index, {} ({:?} and \
                             {:?})", n, i, next);
@@ -887,13 +809,168 @@ impl Bookkeeper {
         }
     }
 
+    /// Attach this allocator to the current thread.
+    ///
+    /// This will make sure this allocator's data  is freed to the
+    pub unsafe fn attach(&mut self) {
+        fn dtor(ptr: *mut Bookkeeper) {
+            let alloc = *ptr;
+
+            // Lock the global allocator.
+            let global_alloc = allocator::GLOBAL_ALLOCATOR.lock();
+
+            // TODO, we know this is sorted, so we could abuse that fact to faster insertion in the
+            // global allocator.
+
+            // Free everything in the allocator.
+            while let Some(i) = alloc.pool.pop() {
+                global_alloc.free(i);
+            }
+
+            // Deallocate the vector itself.
+            global_alloc.free(Block::from(alloc.pool));
+
+            // Gotta' make sure no memleaks are here.
+            #[cfg(feature = "debug_tools")]
+            alloc.assert_no_leak();
+        }
+
+        sys::register_thread_destructor(self as *mut Bookkeeper, dtor).unwrap();
+    }
+
     /// Check for memory leaks.
     ///
     /// This will ake sure that all the allocated blocks have been freed.
     #[cfg(feature = "debug_tools")]
-    pub fn assert_no_leak(&self) {
+    fn assert_no_leak(&self) {
         assert!(self.allocated == self.pool.capacity() * mem::size_of::<Block>(), "Not all blocks \
                 freed. Total allocated space is {} ({} free blocks).", self.allocated,
                 self.pool.len());
     }
 }
+
+trait Breaker {
+    /// Allocate _fresh_ space.
+    ///
+    /// "Fresh" means that the space is allocated through the breaker.
+    ///
+    /// The returned pointer is guaranteed to be aligned to `align`.
+    fn alloc_fresh(bk: &mut Bookkeeper<Self>, size: usize, align: usize) -> Block;
+    /// Realloate the block pool to some specified capacity.
+    fn realloc_pool(bk: &mut Bookkeeper<Self>, cap: usize);
+}
+
+/// SBRK fresh allocator.
+///
+/// This will extend the data segment whenever new memory is needed. Since this includes leaving
+/// userspace, this shouldn't be used when other allocators are available (i.e. the bookkeeper is
+/// local).
+struct Sbrk;
+
+impl Breaker for Sbrk {
+    #[inline]
+    fn alloc_fresh(bk: &mut Bookkeeper<Sbrk>, size: usize, align: usize) -> Block {
+        // Obtain what you need.
+        let (alignment_block, res, excessive) = brk::get(size, align);
+
+        // Add it to the list. This will not change the order, since the pointer is higher than all
+        // the previous blocks.
+        bk.double_push(alignment_block, excessive);
+
+        res
+    }
+
+    #[inline]
+    fn realloc_pool(bk: &mut Bookkeeper<Sbrk>, extra: usize) {
+        // TODO allow BRK-free non-inplace reservations.
+        // TODO Enable inplace reallocation in this position.
+
+        // Reallocate the block pool.
+
+        // Make a fresh allocation.
+        let size = (needed +
+            cmp::min(bk.pool.capacity(), 200 + bk.pool.capacity() / 2)
+            // We add:
+            + 1 // block for the alignment block.
+            + 1 // block for the freed vector.
+            + 1 // block for the excessive space.
+        ) * mem::size_of::<Block>();
+        let (alignment_block, alloc, excessive) = brk::get(size, mem::align_of::<Block>());
+
+        // Refill the pool.
+        let old = bk.pool.refill(alloc);
+
+        // Double push the alignment block and the excessive space linearly (note that it is in
+        // fact in the end of the pool, due to BRK _extending_ the segment).
+        bk.double_push(alignment_block, excessive);
+
+        // Free the old vector.
+        bk.free(old);
+    }
+}
+
+/// Allocate fresh memory from the global allocator.
+struct GlobalAllocator;
+
+impl Breaker for GlobalAllocator {
+    #[inline]
+    fn alloc_fresh(bk: &mut Bookkeeper<GlobalAllocator>, size: usize, align: usize) -> Block {
+        /// Canonicalize the requested space.
+        ///
+        /// We request excessive space to the upstream allocator to avoid repeated requests and
+        /// lock contentions.
+        #[inline]
+        fn canonicalize_space(min: usize) -> usize {
+            // TODO tweak this.
+
+            // To avoid having mega-allocations allocate way to much space, we
+            // have a maximal extra space limit.
+            if min > 8192 { min } else {
+                // To avoid paying for short-living or little-allocating threads, we have no minimum.
+                // Instead we multiply.
+                min * 4
+                // This won't overflow due to the conditition of this branch.
+            }
+        }
+
+        // Get the block from the global allocator.
+        let (res, excessive) = allocator::GLOBAL_ALLOCATOR.lock()
+            .alloc(canonicalize_space(size), align)
+            .split(size);
+
+        // Free the excessive space to the current allocator. Note that you cannot simply push
+        // (which is the case for SBRK), due the block not necessarily being above all the other
+        // blocks in the pool. For this reason, we let `free` handle the search and so on.
+        bk.free(excessive);
+
+        res
+    }
+
+    #[inline]
+    fn realloc_pool(bk: &mut Bookkeeper<GlobalAllocator>, extra: usize) {
+        // TODO allow BRK-free non-inplace reservations.
+        // TODO Enable inplace reallocation in this position.
+
+        // Reallocate the block pool.
+
+        // Make a fresh allocation.
+        let size = (needed +
+            cmp::min(bk.pool.capacity(), 200 + bk.pool.capacity() / 2)
+            // We add:
+            + 1 // block for the alignment block.
+            + 1 // block for the freed vector.
+            + 1 // block for the excessive space.
+        ) * mem::size_of::<Block>();
+        let (alignment_block, alloc, excessive) = brk::get(size, mem::align_of::<Block>());
+
+        // Refill the pool.
+        let old = bk.pool.refill(alloc);
+
+        // Double push the alignment block and the excessive space linearly (note that it is in
+        // fact in the end of the pool, due to BRK _extending_ the segment).
+        bk.double_push(alignment_block, excessive);
+
+        // Free the old vector.
+        bk.free(old);
+    }
+}

+ 65 - 0
src/breaker.rs

@@ -0,0 +1,65 @@
+use prelude::*;
+
+trait Breaker {
+    /// Allocate _fresh_ space.
+    ///
+    /// "Fresh" means that the space is allocated through the breaker.
+    ///
+    /// The returned pointer is guaranteed to be aligned to `align`.
+    fn alloc_fresh(pool: &mut Vec<Block>, size: usize, align: usize) -> Block;
+}
+
+/// Canonicalize a BRK request.
+///
+/// Syscalls can be expensive, which is why we would rather accquire more memory than necessary,
+/// than having many syscalls acquiring memory stubs. Memory stubs are small blocks of memory,
+/// which are essentially useless until merge with another block.
+///
+/// To avoid many syscalls and accumulating memory stubs, we BRK a little more memory than
+/// necessary. This function calculate the memory to be BRK'd based on the necessary memory.
+///
+/// The return value is always greater than or equals to the argument.
+#[inline]
+fn canonicalize_brk(min: usize) -> usize {
+    /// The BRK multiplier.
+    ///
+    /// The factor determining the linear dependence between the minimum segment, and the acquired
+    /// segment.
+    const BRK_MULTIPLIER: usize = 2;
+    /// The minimum size to be BRK'd.
+    const BRK_MIN: usize = 1024;
+    /// The maximal amount of _extra_ elements.
+    const BRK_MAX_EXTRA: usize = 4 * 65536;
+
+    let res = cmp::max(BRK_MIN, min.saturating_add(cmp::min(BRK_MULTIPLIER * min, BRK_MAX_EXTRA)));
+
+    // Make some handy assertions.
+    debug_assert!(res >= min, "Canonicalized BRK space is smaller than the one requested.");
+
+    res
+}
+
+struct Sbrk;
+
+impl Breaker for Sbrk {
+    fn ocbtain(size: usize, align: usize) -> (Block, Block, Block) {
+        // Logging.
+        log!(self.pool;self.pool.len(), "Obtaining a block of size, {}, and alignment {}.", size, align);
+
+        // Calculate the canonical size (extra space is allocated to limit the number of system calls).
+        let brk_size = canonicalize_brk(size) + align;
+
+        // Use SBRK to allocate extra data segment. The alignment is used as precursor for our
+        // allocated block. This ensures that it is properly memory aligned to the requested value.
+        let (alignment_block, rest) = Block::brk(brk_size).align(align).unwrap();
+
+        // Split the block to leave the excessive space.
+        let (res, excessive) = rest.split(size);
+
+        // Make some assertions.
+        debug_assert!(res.aligned_to(align), "Alignment failed.");
+        debug_assert!(res.size() + alignment_block.size() + excessive.size() == brk_size, "BRK memory leak.");
+
+        (alignment_block, res, excessive)
+    }
+}

+ 55 - 0
src/brk.rs

@@ -0,0 +1,55 @@
+use prelude::*;
+
+/// Canonicalize a BRK request.
+///
+/// Syscalls can be expensive, which is why we would rather accquire more memory than necessary,
+/// than having many syscalls acquiring memory stubs. Memory stubs are small blocks of memory,
+/// which are essentially useless until merge with another block.
+///
+/// To avoid many syscalls and accumulating memory stubs, we BRK a little more memory than
+/// necessary. This function calculate the memory to be BRK'd based on the necessary memory.
+///
+/// The return value is always greater than or equals to the argument.
+#[inline]
+fn canonicalize_space(min: usize) -> usize {
+    // TODO tweak this.
+    /// The BRK multiplier.
+    ///
+    /// The factor determining the linear dependence between the minimum segment, and the acquired
+    /// segment.
+    const BRK_MULTIPLIER: usize = 2;
+    /// The minimum size to be BRK'd.
+    const BRK_MIN: usize = 1024;
+    /// The maximal amount of _extra_ elements.
+    const BRK_MAX_EXTRA: usize = 4 * 65536;
+
+    let res = cmp::max(BRK_MIN, min + cmp::min(BRK_MULTIPLIER * min, BRK_MAX_EXTRA));
+
+    // Make some handy assertions.
+    debug_assert!(res >= min, "Canonicalized BRK space is smaller than the one requested.");
+
+    res
+}
+
+/// BRK new space.
+///
+/// The first block represents the aligner segment (that is the precursor aligning the middle
+/// block to `align`), the second one is the result and is of exactly size `size`. The last
+/// block is the excessive space.
+fn get(size: usize, align: usize) -> (Block, Block, Block) {
+    // Calculate the canonical size (extra space is allocated to limit the number of system calls).
+    let brk_size = canonicalize_brk(size) + align;
+
+    // Use SBRK to allocate extra data segment. The alignment is used as precursor for our
+    // allocated block. This ensures that it is properly memory aligned to the requested value.
+    let (alignment_block, rest) = Block::brk(brk_size).align(align).unwrap();
+
+    // Split the block to leave the excessive space.
+    let (res, excessive) = rest.split(size);
+
+    // Make some assertions.
+    debug_assert!(res.aligned_to(align), "Alignment failed.");
+    debug_assert!(res.size() + alignment_block.size() + excessive.size() == brk_size, "BRK memory leak.");
+
+    (alignment_block, res, excessive)
+}

+ 112 - 0
src/cell.rs

@@ -0,0 +1,112 @@
+use core::cell::{UnsafeCell, Cell};
+use core::ops;
+
+/// An "uni-cell".
+///
+/// This is a mutually exclusive container, essentially acting as a single-threaded mutex.
+pub struct UniCell<T> {
+    /// The inner data.
+    inner: UnsafeCell<T>,
+    /// Is this data currently used?
+    used: Cell<bool>,
+}
+
+impl<T> UniCell<T> {
+    /// Create a new uni-cell with some inner data.
+    #[inline]
+    pub const fn new(data: T) -> UniCell<T> {
+        UniCell {
+            inner: UnsafeCell::new(data),
+            used: Cell::new(false),
+        }
+    }
+
+    /// Get an reference to the inner data.
+    ///
+    /// This will return `Err(())` if the data is currently in use.
+    #[inline]
+    pub fn get(&self) -> Result<Ref<T>, ()> {
+        if self.used.get() {
+            None
+        } else {
+            // Mark it as used.
+            self.used.set(true);
+
+            Some(Ref {
+                cell: self,
+            })
+        }
+    }
+
+    /// Get the inner and mark the cell used forever.
+    pub fn into_inner(&self) -> Option<T> {
+        if self.used.get() {
+            None
+        } else {
+            // Mark it as used forever.
+            self.used.set(true);
+
+            Some(ptr::read(self.inner.get()))
+        }
+    }
+}
+
+/// An reference to the inner value of an uni-cell.
+pub struct Ref<T> {
+    cell: UniCell<T>,
+}
+
+impl<T> ops::Deref for Ref<T> {
+    type Target = T;
+
+    #[inline]
+    fn deref(&self) -> &T {
+        &*self.cell.inner.get()
+    }
+}
+
+impl<T> ops::DerefMut for Ref<T> {
+    #[inline]
+    fn deref_mut(&mut self) -> &mut T {
+        &mut *self.cell.inner.get()
+    }
+}
+
+impl<T> Drop for Ref<T> {
+    #[inline]
+    fn drop(&mut self) {
+        self.cell.used.set(false);
+    }
+}
+
+#[cfg(test)]
+mod test {
+    use super::*;
+
+    #[test]
+    fn test_inner() {
+        assert_eq!(UniCell::new(101).get(), Ok(101));
+        assert_eq!(UniCell::new("heh").get(), Ok("heh"));
+    }
+
+    #[test]
+    fn test_double_get() {
+        let cell = UniCell::new(500);
+
+        assert_eq!(*cell.get().unwrap(), 500);
+
+        {
+            let tmp = cell.get();
+            assert!(cell.get().is_err());
+            {
+                let tmp = cell.get();
+                assert!(cell.get().is_err());
+            }
+            *tmp.unwrap() = 201;
+        }
+
+        assert_eq!(*cell.get().unwrap(), 201);
+        *cell.get().unwrap() = 100;
+        assert_eq!(*cell.get().unwrap(), 100);
+    }
+}

+ 19 - 2
src/fail.rs

@@ -3,7 +3,12 @@
 use core::sync::atomic::{self, AtomicPtr};
 use core::{mem, intrinsics};
 
+/// The global OOM handler.
 static OOM_HANDLER: AtomicPtr<()> = AtomicPtr::new(default_oom_handler as *mut ());
+tls! {
+    /// The thread-local OOM handler.
+    static THREAD_OOM_HANDLER: Option<fn() -> !> = None;
+}
 
 /// The default OOM handler.
 ///
@@ -28,8 +33,14 @@ fn default_oom_handler() -> ! {
 /// The rule of thumb is that this should be called, if and only if unwinding (which allocates)
 /// will hit the same error.
 pub fn oom() -> ! {
-    unsafe {
-        (mem::transmute::<_, fn() -> !>(OOM_HANDLER.load(atomic::Ordering::SeqCst)))()
+    if let Some(handler) = THREAD_OOM_HANDLER.get().unwrap() {
+        // There is a local allocator available.
+        handler();
+    } else {
+        unsafe {
+            // Transmute the atomic pointer to a function pointer and call it.
+            (mem::transmute::<_, fn() -> !>(OOM_HANDLER.load(atomic::Ordering::SeqCst)))()
+        }
     }
 }
 
@@ -40,3 +51,9 @@ pub fn oom() -> ! {
 pub fn set_oom_handler(handler: fn() -> !) {
     OOM_HANDLER.store(handler as *mut (), atomic::Ordering::SeqCst);
 }
+
+/// Override the OOM handler for the current thread.
+#[inline]
+pub fn set_thread_oom_handler(handler: fn() -> !) {
+    *THREAD_OOM_HANDLER.get().unwrap() = handler;
+}

+ 7 - 46
src/lib.rs

@@ -21,10 +21,16 @@
 mod write;
 #[macro_use]
 mod log;
+#[macro_use]
+mod tls;
+#[cfg(feature = "allocator")]
+mod symbols;
 
 mod allocator;
 mod block;
 mod bookkeeper;
+mod brk;
+mod cell;
 mod fail;
 mod leak;
 mod prelude;
@@ -34,50 +40,5 @@ mod sys;
 mod vec;
 
 pub use allocator::{lock, Allocator};
-pub use fail::set_oom_handler;
+pub use fail::{set_oom_handler, set_thread_oom_handler};
 pub use sys::sbrk;
-
-/// Rust allocation symbol.
-#[no_mangle]
-#[inline]
-#[cfg(feature = "allocator")]
-pub extern fn __rust_allocate(size: usize, align: usize) -> *mut u8 {
-    lock().alloc(size, align)
-}
-
-/// Rust deallocation symbol.
-#[no_mangle]
-#[inline]
-#[cfg(feature = "allocator")]
-pub unsafe extern fn __rust_deallocate(ptr: *mut u8, size: usize, _align: usize) {
-    lock().free(ptr, size);
-}
-
-/// Rust reallocation symbol.
-#[no_mangle]
-#[inline]
-#[cfg(feature = "allocator")]
-pub unsafe extern fn __rust_reallocate(ptr: *mut u8, old_size: usize, size: usize, align: usize) -> *mut u8 {
-    lock().realloc(ptr, old_size, size, align)
-}
-
-/// Rust reallocation inplace symbol.
-#[no_mangle]
-#[inline]
-#[cfg(feature = "allocator")]
-pub unsafe extern fn __rust_reallocate_inplace(ptr: *mut u8, old_size: usize, size: usize, _align: usize) -> usize {
-    if lock().realloc_inplace(ptr, old_size, size).is_ok() {
-        size
-    } else {
-        old_size
-    }
-}
-
-/// Get the usable size of the some number of bytes of allocated memory.
-#[no_mangle]
-#[inline]
-#[cfg(feature = "allocator")]
-pub extern fn __rust_usable_size(size: usize, _align: usize) -> usize {
-    // Yay! It matches exactly.
-    size
-}

+ 1 - 0
src/prelude.rs

@@ -1,5 +1,6 @@
 //! Frequently used imports.
 
 pub use block::Block;
+pub use cell::UniCell;
 pub use leak::Leak;
 pub use ptr::Pointer;

+ 41 - 0
src/symbols.rs

@@ -0,0 +1,41 @@
+//! Rust allocation symbols.
+
+/// Rust allocation symbol.
+#[no_mangle]
+#[inline]
+pub extern fn __rust_allocate(size: usize, align: usize) -> *mut u8 {
+    lock().alloc(size, align)
+}
+
+/// Rust deallocation symbol.
+#[no_mangle]
+#[inline]
+pub unsafe extern fn __rust_deallocate(ptr: *mut u8, size: usize, _align: usize) {
+    lock().free(ptr, size);
+}
+
+/// Rust reallocation symbol.
+#[no_mangle]
+#[inline]
+pub unsafe extern fn __rust_reallocate(ptr: *mut u8, old_size: usize, size: usize, align: usize) -> *mut u8 {
+    lock().realloc(ptr, old_size, size, align)
+}
+
+/// Rust reallocation inplace symbol.
+#[no_mangle]
+#[inline]
+pub unsafe extern fn __rust_reallocate_inplace(ptr: *mut u8, old_size: usize, size: usize, _align: usize) -> usize {
+    if lock().realloc_inplace(ptr, old_size, size).is_ok() {
+        size
+    } else {
+        old_size
+    }
+}
+
+/// Get the usable size of the some number of bytes of allocated memory.
+#[no_mangle]
+#[inline]
+pub extern fn __rust_usable_size(size: usize, _align: usize) -> usize {
+    // Yay! It matches exactly.
+    size
+}

+ 18 - 3
src/sys.rs

@@ -1,6 +1,6 @@
 //! System primitives.
 
-extern crate ralloc_shim;
+extern crate ralloc_shim as shim;
 
 #[cfg(not(feature = "unsafe_no_brk_lock"))]
 use sync;
@@ -25,7 +25,7 @@ pub unsafe fn sbrk(n: isize) -> Result<*mut u8, ()> {
     #[cfg(not(feature = "unsafe_no_brk_lock"))]
     let _guard = BRK_MUTEX.lock();
 
-    let brk = ralloc_shim::sbrk(n);
+    let brk = shim::sbrk(n);
     if brk as usize == !0 {
         Err(())
     } else {
@@ -35,7 +35,22 @@ pub unsafe fn sbrk(n: isize) -> Result<*mut u8, ()> {
 
 /// Cooperatively gives up a timeslice to the OS scheduler.
 pub fn yield_now() {
-    assert_eq!(unsafe { ralloc_shim::sched_yield() }, 0);
+    assert_eq!(unsafe { shim::sched_yield() }, 0);
+}
+
+/// Register a thread destructor.
+///
+/// This will add a thread destructor to _the current thread_, which will be executed when the
+/// thread exits.
+// TODO I haven't figured out a safe general solution yet. Libstd relies on devirtualization,
+// which, when missed, can make it quite expensive.
+pub fn register_thread_destructor<T>(primitive: *mut T, dtor: fn(*mut T)) -> Result<(), ()> {
+    if shim::thread_destructor::is_supported() {
+        shim::thread_destructor::register(primitive, dtor);
+        Ok(())
+    } else {
+        Err(())
+    }
 }
 
 #[cfg(test)]

+ 62 - 0
src/tls.rs

@@ -0,0 +1,62 @@
+use core::{ops, marker};
+
+/// Add `Sync` to an arbitrary type.
+///
+/// This primitive is used to get around the `Sync` requirement in `static`s (even thread local
+/// ones! see rust-lang/rust#35035). Due to breaking invariants, creating a value of such type is
+/// unsafe, and care must be taken upon usage.
+///
+/// In general, this should only be used when you know it won't be shared across threads (e.g. the
+/// value is stored in a thread local variable).
+pub struct Syncify<T>(T);
+
+impl<T> Syncify<T> {
+    /// Create a new `Syncify` wrapper.
+    ///
+    /// # Safety
+    ///
+    /// This is invariant-breaking and thus unsafe.
+    const unsafe fn new(inner: T) -> Syncify<T> {
+        Syncify(T)
+    }
+}
+
+impl<T> ops::Deref for Syncify<T> {
+    type Target = T;
+
+    fn deref(&self) -> Syncify<T> {
+        &self.0
+    }
+}
+
+impl<T> ops::DerefMut for Syncify<T> {
+    fn deref_mut(&mut self) -> Syncify<T> {
+        &mut self.0
+        // If you read this, you are reading a note from a desperate programmer, who are really
+        // waiting for a upstream fix, cause holy shit. Why the heck would you have a `Sync`
+        // bound on thread-local variables. These are entirely single-threaded, and there is no
+        // reason for assuming anything else. Now that we're at it, have the world been destroyed
+        // yet?
+    }
+}
+
+unsafe impl<T> marker::Sync for Syncify<T> {}
+
+/// Declare a thread-local static variable.
+///
+/// TLS works by copying the initial data on every new thread creation. This allows access to a
+/// variable, which is only available for the current thread, meaning that there is no need for
+/// syncronization.
+///
+/// For this reason, in contrast to other `static`s in Rust, this need not thread-safety, which is
+/// what this macro "fixes".
+macro_rules! tls {
+    (static $name:ident: $type:ty = $val:expr) => { tls!(#[] static $name: $type = $val) };
+    (#[$($attr:meta),*], static $name:ident: $type:ty = $val:expr) => {{
+        use tls::Syncify;
+
+        $(#[$attr])*
+        #[thread_local]
+        static $name: $type = unsafe { Syncify::new($val) };
+    }}
+}

+ 31 - 2
src/vec.rs

@@ -108,6 +108,25 @@ impl<T: Leak> Vec<T> {
         }
     }
 
+    /// Pop an element from the vector.
+    ///
+    /// If the vector is empty, `None` is returned.
+    #[inline]
+    pub fn pop(&mut self) -> Option<T> {
+        if self.len == 0 {
+            None
+        } else {
+            unsafe {
+                // Decrement the length.
+                self.len -= 1;
+
+                // We use `ptr::read` since the element is unaccessible due to the decrease in the
+                // length.
+                Some(ptr::read(self.get_unchecked(self.len)))
+            }
+        }
+    }
+
     /// Truncate this vector.
     ///
     /// This is O(1).
@@ -145,9 +164,9 @@ impl<T: Leak> From<Vec<T>> for Block {
 }
 
 impl<T: Leak> ops::Deref for Vec<T> {
-    #[inline]
     type Target = [T];
 
+    #[inline]
     fn deref(&self) -> &[T] {
         unsafe {
             slice::from_raw_parts(*self.ptr as *const _, self.len)
@@ -200,10 +219,20 @@ mod test {
         for _ in 0..14 {
             vec.push(b'_').unwrap();
         }
+        assert_eq!(vec.pop().unwrap(), b'_');
+        vec.push(b'@').unwrap();
+
 
         vec.push(b'!').unwrap_err();
 
-        assert_eq!(&*vec, b".aaaaaaaaaaaaaaabc______________");
+        assert_eq!(&*vec, b".aaaaaaaaaaaaaaabc_____________@");
         assert_eq!(vec.capacity(), 32);
+
+        for _ in 32 { vec.pop().unwrap(); }
+
+        vec.pop().unwrap_err();
+        vec.pop().unwrap_err();
+        vec.pop().unwrap_err();
+        vec.pop().unwrap_err();
     }
 }