morfizm (morfizm) wrote,

Storage explorations

Things I've learned recently (via experiments and by reading forums).

1. ZFS is an amazing thing in terms of data reliability (end-to-end checksumming, automatic error recovery, scrubbing). It's totally worthwhile for a home user to use it once data reaches a couple of TBs which makes bit rot and drive degradation issues quite possible.

2. ZFS needs a lot of RAM. The recommended minimum is 6GB + 1GB*each TB of storage in array. E.g. 4 drives of 2T+2T+3T+3T in RaidZ means an array of 8T with 2T redundancy (extra 1T of 3T disks is not used because minimum disk size is 2T). Usable space is 6T, but RAM requirement is 6GB + 1GB*8 = 14GB.

3. ZFS exaggerates problems of Non-ECC RAM, therefore you really need ECC RAM in order to use it. ECC RAM requires CPU and motherboard support, which is absent in most desktop PC builds and present in server builds. ECC RAM cost just 40% over non-ECC.

4. Microservers such as HP ProLiant N40L (CPU is AMD Turion II Neo N40L 1.5GHz Dual-Core) perform poorly with ZFS. I've been testing it with sufficient RAM and an array where each of my disks can max out 1GBit for sequential reads and writes (at least at the beginning of the drive, where I was testing). RaidZ config of 3, 4 or 5 disks yielded in 75% network utilization in writing and reading one big file with no apparent bottleneck (CPUs at 60%). Could be network card, motherboard or CPU, each adding a little delay here and there. When encryption network utilization was 25% (barely reaching 200-300 mbps) and CPU usage 100%. Not good. I guess it's not just RAM, but a serious server box with some good CPU is required. Going to test it with Intel Xeon.

5. ZFS can use two types of caches: L2ARC for reading and ZIL for writing. Do you really need these caches in home file server environment? My current guess is no, but I'll do some more testing. Details I know so far:

5.1. Both ARC (underlying in-memory layer) and read-ahead are already enabled given you have sufficient RAM - a few GB over the minimum from above. Using an SSD for L2ARC may be worthwhile for server environment with some heavy apps like databases running straight against network storage, or many users are working with their files simultaneously, but I doubt it will give real benefit for home use scenario.

5.2. In order to use L2ARC effectively, you need RAM for metadata structures, at about 1:45 rate. E.g.: 2T will require 45GB of RAM and 120GB disk will require 2.7GB of RAM.

5.3. ZIL can provide speed up in writing small files because of sync: instead of waiting for all disks in array to confirm successful write, it's sufficient to get a quick ack from a fast NVRAM device, such as SSD. However, it's not easy to get the right SSD (see section on SSDs), and it may be better to use a battery-backed memory card, which sounds like hassle to find and configure unless you really need it.

6. SSDs. Consumer grade SSDs have many problems.

6.1. They introduce many new failure points in case of power loss, sometimes corrupting unrelated data or bricking the entire disk. All this can happen *after* SSD acked a write (but it didn't flush it yet), and some errors are possible even when no writes were in progress, but SSD was doing some internal maintenance work. I saw reports that an SSD issue caused a corruption over an entire array while using as a cache for hardware RAID.

6.2. Some SSDs (very few models actually) have capacitors allowing a few milliseconds of run-time to flush buffers in case of power failure. One of the most known model is Intel 320 Series, but on a flip side it was known for other issues with its controller.

6.3. Some SSDs are advertised as having ECC RAM on board, which implies there are potential issues with all other SSDs that don't. I hope ZFS is resilient to the type of RAM disk controller is using, but I didn't check.

6.4. SSDs may use single layer flash (SLC) or multiple layers (MLC). SLCs are faster and wear out 10x slower, but are more expensive. Consumer grade SSDs are MLCs.

6.5. SSD data cells have limited life - they wear out after some number of cycles. Continuous use as a cache will exaggerate this problem. Some people configure 120GB SSDs to only use 2GB of them and rely on SSD-s internal wear leveling technology to continuously re-map those 2GB to the entire 120GB area to prolong its life (60x improvement in this case). Note: this can't be configured from GUI in FreeNAS, it wants to use an entire disk.

6.6. Modern SSDs often use compression to report higher data transfer rates in benchmarks and attract customers. Compression allows faster transfer rates as well as lower wear. It's not just tricking the benchmarks, it's actually has some real benefits for consumers - e.g. faster boot time, because OS executable are often nicely compressible (2x or so). However, this won't give benefit for typical "file server" type of data in home scenario - such as pictures and videos, and this won't have any benefit if you use encryption. ATTO uses highly compressible test data for benchmark, therefore you can't trust ATTO even for sequential read/write throughput. Get a better benchmarking software and/or test it with real multimedia files and/or compression.

It's extremely hard to shop for SSDs and look at their real benchmarks for non-compressible data, as it's often not published, and many consumers are using things like ATTO in their reviews. (I've been using it too before I learned about the compression issue).

7. FreeNAS - interesting bits of info.

7.1. FreeNAS has CD image for installer and USB flash image for straight booting. USB flash image uses a little over 2GB even if you have a 32GB flash. Resizing it is quite a hassle. Gparted doesn't work, you need to do some manual careful dd, partitioning and boot-labeling magic. In the end, I guess it's not worth it, because you'll have to re-do it every time you upgrade your FreeNAS image.

7.2. FreeNAS is really designed to boot from a flash drive. When installed onto a hard drive, (a) the drive becomes unusable as a data drive for an array, (b) you may hit boot loading issues, and no one really cares about fixing them as everybody use flash. It means you'll hit a great hassle if you're using an older server that has issues with booting from flash, or doesn't have a USB 2.0 port. Sometimes updating BIOS help and sometimes it doesn't.

7.3. Normally FreeNAS only stores tiny bits of configuration on flash. Standard way to allocate data storage for plugins is to create "Jails" which are created on your data drives. This can be done via GUI.

7.4. FreeNAS has GUI and shell/command line access, but as a general advice - don't use features that aren't present in GUI, as you're going to face increasing levels of hassle maintaining and porting that configuration, and potential conflicts with FreeNAS assumptions.

7.5. FreeNAS doesn't let you partition your drives and doesn't let you use a portion of hard disk. Perhaps this is done for simplicity, but perhaps also to prevent inexperienced users from shooting themselves in a foot and doing silly things like sharing one drive for both ZIL and L2ARC. (There are many reasons why it's silly). Perhaps, for the same reasons, FreeNAS doesn't support configuring RAM drives.

8. Robocopy: NTFS vs ZFS. Robocopy would repeatedly copy the same files if their "last changed" datetime is different from "last modified" ("last changed" includes metadata change such as permissions change). When reading a file back, it's "last changed" timestamp is set to be equal to "last modified" and therefore different from the original. I didn't fully track this down - whether it's ZFS limitation or an incompatibility in robocopy, but it's quite annoying. Going to write a script that will just reset "last changed" timestamp in the original.
Tags: 1, devices, in english, software
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.