20min Handson ZFS

ZFS is often called the last word in file systems.
It is a new approach to deal with large pools of disks originally invented by Sun.
It was later then ported to FreeBSD, MacOS (only 10.5) and Linux.

This text should show some of the basic feature of ZFS and demonstrate them handson by example.

Prerequisites

-> FreeBSD
-> Solaris
-> MacOS (only Userland)

In our example we use

SunOS openindiana 5.11 oi_151a5 i86pc i386 i86pc Solaris.

as an environment.

But most commands also work on the other systems.

Since we do all the work within a VM, our commands have the pattern:

Input VM:

command

Output VM:

result

Pool Creation

The first information we need is the number of disk, present in our environment.
There are several ways to get a basic disk listing. Under (Open-)Solaris this can be done with:

Input VM:

format < /dev/null

Output VM:

AVAILABLE DISK SELECTIONS:
0. c4t0d0 
/pci@0,0/pci8086,2829@d/disk@0,0
1. c5t0d0 
/pci@0,0/pci1000,8000@16/sd@0,0
2. c5t1d0 
/pci@0,0/pci1000,8000@16/sd@1,0
3. c5t2d0 
/pci@0,0/pci1000,8000@16/sd@2,0
4. c5t3d0 
/pci@0,0/pci1000,8000@16/sd@3,0
5. c5t4d0 
/pci@0,0/pci1000,8000@16/sd@4,0
6. c5t5d0 
/pci@0,0/pci1000,8000@16/sd@5,0
7. c5t6d0 
/pci@0,0/pci1000,8000@16/sd@6,0
8. c5t7d0 
/pci@0,0/pci1000,8000@16/sd@7,0

about Pools

With ZFS it is possible to create different kinds of pools on a specific number of disk.
You can also create several pools within one system.

The following Pools are possible and most commonly used:

Type Info Performance Capacity Redundancy Command
JBOD Just a bunch of disks. In theory it is possible to create on pool for each disk in the system, although this is not quite commonly used. of each disk of each disk zpool create disk1 pool1
zpool create disk1 pool2
Stripe This is equivalent to RAID0, the data is distributed over all disks in the pool. If one disks fails, all the data is lost. But you can also stripe several Pools (e.g. two raidz pools) to have better redundancy. very high N Disks no zpool create disk1 disk2 pool1
Mirror This is equivalent to RAID1, the data is written to both disks in the Pool. Restoring a pool (resilvering) is less efficient, since the data needs to be copied from the remaining disk. normal N-1 Disks +1 zpool create mirror disk1 disk2 pool1
Raidz This is equivalent to RAID5. One disk contains the parity data. Restoring a pool (resilvering) is less efficient, since the data needs to be copied from the remaining disks. high N-1 Disks +1 zpool create raidz disk1 disk2 disk3 pool1
Raidz2 This is equivalent to RAID6. Two disks containing the parity data. Restoring a pool (resilvering) is less efficient, since the data needs to be copied from the remaining disk with parity data. high N-2 Disks +2 zpool create raidz2 disk1 disk2 disk3 disk4 pool1
Raidz3 There is no real equivalent existing for that one. You have basically three disks with parity data. high N-3 Disks +3 zpool create raidz3 disk1 disk2 disk3 disk4 disk5 pool1

You can also add hot-spares for a better fallback behaviour, SSDs for caching reads (cache) and writes (logs).
I also created a benchmark with various combinations.

create a basic Pool (raidz)

Input VM:

zpool create tank raidz c5t0d0 c5t1d0 c5t2d0
...
zpool status

Output VM:

  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0

errors: No known data errors

(Raid5)

You already can access the newly created pool:

Input VM:

ls -al /tank

Output VM:

 
...
total 4
drwxr-xr-x  2 root root  2 2012-10-23 22:02 .
drwxr-xr-x 25 root root 28 2012-10-23 22:02 ..

create a basic Pool (raidz) with one spare drive

Input VM:

zpool create tank raidz1 c5t0d0 c5t1d0 c5t2d0 spare c5t3d0
...
zpool status

Output VM:

  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
        spares
          c5t3d0    AVAIL   

errors: No known data errors

List the availibe Layout

Input VM:

zpool list

Output VM:

NAME     SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
tank    1,46G   185K  1,46G         -     0%  1.00x  ONLINE  -

*The 1,5G does not reflect the real availible space. If you copy a 1G File to the Pool it will use 1,5G (1G + 512M Parity).

create a stripped pool

Input VM:

zpool create tank raidz1 c5t0d0 c5t1d0 c5t2d0 raidz1 c5t4d0 c5t5d0 c5t6d0

Output VM:

  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c5t0d0  ONLINE       0     0     0
            c5t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            c5t4d0  ONLINE       0     0     0
            c5t5d0  ONLINE       0     0     0
            c5t6d0  ONLINE       0     0     0

errors: No known data errors

(Raid50 = Raid5 + Raid5)

deal with disk failures

Input VM:

zpool create tank raidz1 c5t0d0 c5t1d0 c5t2d0 spare c5t3d0

Failure Handling

Input Host:

echo /dev/random >> 1.vdi

Wait for it…
or Input VM:

  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 66K in 0h0m with 0 errors on Tue Oct 23 22:14:19 2012
config:

        NAME          STATE     READ WRITE CKSUM
        tank          DEGRADED     0     0     0
          raidz1-0    DEGRADED     0     0     0
            spare-0   DEGRADED     0     0     0
              c5t0d0  DEGRADED     0     0    64  too many errors
              c5t3d0  ONLINE       0     0     0
            c5t1d0    ONLINE       0     0     0
            c5t2d0    ONLINE       0     0     0
        spares
          c5t3d0      INUSE     currently in use

errors: No known data errors

Input VM:

zpool clear tank
...
zpool detach tank c5t0d0
zpool replace tank c5t0d0 c5t7d0

Create File systems

Input VM:

zfs create tank/home
zfs create tank/home/user1
...
chown -R user:staff /tank/home/user1
...
zfs get all tank/home/user1
...
zfs set sharesmb=on tank/home/user1
...
zfs set quota=500M tank/home/user1

Copy File from MacOS into SMB Share.

Snapshot

Input VM:

zfs snapshot tank/home/user1@basic
...
zfs list
...
zfs list -t snapshot

Output VM:

NAME                              USED  AVAIL  REFER  MOUNTPOINT
rpool1/ROOT/openindiana@install  84,0M      -  1,55G  -
tank/home/user1@basic                0      -  42,6K  -

Input VM:

zfs snapshot -r tank/home@backup
...
zfs list -t snapshot

Output VM:

NAME                              USED  AVAIL  REFER  MOUNTPOINT
rpool1/ROOT/openindiana@install  84,0M      -  1,55G  -
tank/home@backup                     0      -  41,3K  -
tank/home/user1@basic                0      -  42,6K  -
tank/home/user1@backup               0      -  42,6K  -

Input VM:

zfs clone tank/home/user1@basic tank/home/user2

Output VM:

tank/home/user2          1,33K   894M  70,3M  /tank/home/user2

Restoring Snapshots

Delete ZIP File in SMB-Share.

Input VM:

ls -al tank/home/user1
...
zfs rollback tank/home/user1@backup

Output VM:

ls -al tank/home/user1

Resizing a Pool

Input VM:

zpool list
...
zpool replace tank c5t0d0 c5t4d0
zpool replace tank c5t1d0 c5t5d0
zpool replace tank c5t2d0 c5t6d0
...
zpool scrub tank
...
zpool list

Output VM:

NAME     SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
tank    1,46G   381K  1,46G     1,50G     0%  1.00x  ONLINE  -

Input VM:

zpool set autoexpand=on tank

Using ZFS for Backups

Bash-Script

rsync -avrz --progress --delete /Users/user root@nas.local::user-backup/
backupdate=$(date "+%Y-%m-%d")
ssh root@nas.local zfs snapshot tank/backup@$backupdate

Leave a Reply