update: 2019-05: XFS? the way to go?

there are at least 3 distros i know of (Fedora/RedHat/CentOS, SuSE/OpenSuse and Trisquel (yes Richard Stallman himself runs xfs!)) using xfs as their default filesystem.

„Red Hat Enterprise Linux 7 began defaulting to an XFS file-system rather than EXT4 and it looks like they are continuing to invest in XFS and make continued use of it into the future, which isn’t too surprise considering they have been employing some XFS developers.“ (src: phoronix.com)

Red Hat is developing it’s own XFS based filesystem management cool codename „Stratis„.

Stratis is a Linux local storage management tool that aims to enable easy use of advanced storage features such as thin provisioning, snapshots, and pool-based management and monitoring.

After two years of development, Stratis 1.0 has stabilized its on-disk metadata format and command-line interface, and is ready for more widespread testing and evaluation by potential users. Stratis is implemented as a daemon –

stratisd

– as well as a command-line configuration tool called stratis, and works with Linux kernel versions 4.14 and up.

sounds interesting 🙂

i am testing Trisquel/XFS on two laptops and XFS on a CentOS virtual machine (actually this website you just read here is delivered to you via xfs)

„XFS is awesome for home folders. XFS has great multi-thread performance and more consistent performance as it fills up.“ (src)

problem: XFS has the y2038 problem too!

While XFS has proven to be a pretty good, fast filesystem (so far i love it, it is really responsive) and development effort is going into it – i still would recommend the much slower ext3 (unless you want your NAS to be a webserver (not recommended! DEDICATE IT TO BACKU PURPOSES ONLY!)) with regular automated filesystem checks (reboot every weekend sa/su at 00:00) until 2030, why?

  1. because it is under development
    • i hope the y2038 problem for xfs is solved by 2030, then you will have to migrate ext3 to xfs. you will win a y2038 safe filesystem but lose the undeleteability – so have good backup (of backup?) strategy in place. sorry.
  2. because it has no extundelete

update: 2019 on y2038

The Y2038 Problem – cars and embedded systems

you can use ext3 until 2030, so around 2030 you have 8 years to migrate to ext4 or xfs which (hopefully) have those y2038 problems solved by 2030.

( NFSv3, ext3, and XFS filesystems all have problems resulting from their use of 32-bit timestamps)

ext3 advantage over ext4: extundelete does not undelete well under ext4 – under ext3 it does 😀

I hope there is proper ext4 undelete available in 2030 X-D

SO I WOULD ALSO RECOMMEND YOU TO DO RAID1 (2x HDs or more (safer), SAME SIZE) OR RAID10 (4x HDs, 6x HDs, 8x HDs)

(which is RAID1 on „steroids“ (double, tripple the speed) and are seen as one big drive by the OS.)

Also HIGHLY recommended:

  • setup Mail/SMS NOTIFICATION for drive-failure so you can replace the faulty drive on time (before another drive fails)
  • 1x SPARE drive, it „kicks in“ as soon as one drive fails and gives you more time to react if a drive fails.

I think no matter in business or in private scenarios, RAID1 (3x HDs mirrored disks holding the same data) + EXT3 (linux mdadm software can do that and works reliable) should be used in combination with a simple, reliable filesystem like ext3 and regular automatic filesystem checks.

So if one disk fails – you still have two disks holding the same data – so you are not in a hurry to replace the broken disk – it can take up to 1-3 days to sync a raid1 of 2TB disks.

In addition to this i would recommend backing your stuff up automatically via internet. But this is a different difficult topic which should be possible with rsync which is capable of transfering deltas even of larger files. (instead of re-transmitting the whole file, it transmits only the bits that have changed).

https://dwaves.org/2016/04/18/linux-automatic-filesystem-check-on-reboot-every-sunday/

MY RECOMMENDATION: WHO CARES IF THE NAS IS DOING A AUTOMATIC REBOOT AT SUNDAY 3 o’CLOCK IN THE MORNING AND CHECKING 2-3TB OF EXT3 FILESYSTEM? NO ONE!

SO WHO CARES ABOUT THE FILESYSTEM CHECK DELAY ON BOOT, IF IN RETURN, YOU WILL HAVE A RELIABLE FILESYSTEM.

tune2fs -C 2 -c 1 /dev/sda1; # check filesystem on every boot
tune2fs -c 10 -i 30 /dev/sda1; # check sda1 every 10 boots/mounts or after 30 days

RELIABILITY SHOULD BE THE TOP1 PRIORITY OF ANY FILESYSTEM.

MAYBE SPEED CAN BE SECOND, UNLESS YOU DO NOT CARE ABOUT DATA-LOSS. (temporary storage… but believe me… even a temporary storage contains important files that users feel very angry about if lost)

So what you SHOULD do is let your filesystem (no matter what system) be checked automatically and monthly/weekly.

some say:

„Home users can relax though. Home RAID is a bad idea: you are much better off with frequent disk-to-disk backups and an online backup like CrashPlan or Backblaze.“

i say… yes: Windows 7 in-build RAID1 functionality… failed me once (Win 7 Ultimate) but runs well on another machine.

The nasty thing: If it goes out of sync… it will not even notify you about that thing.

If the Win7-Software-Raid1 fails… it might be tricky to get it back working.

http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/

ZFS

Zettabyte File System – this is what oracle-datacenters use, was donated by Oracle to the linux project.

It is a 128Bit filesystem developed by Jeff Bonwick

It is more than just a file system – it includes functionality of a logical volume manage (lvm2), software-RAID capabilities and copy-on-write.

ZFS accroding to this site is even +10% faster than hardware-RAID.

Oracle is using it on their Solaris OS which again runs on Sun’s SPARC systems.

So from database to hardware to filesystems – Sun and Oracle do it all.

I have no hands-on-experience with ZFS, but it seems to be a new concept, that is aiming at datacenter’s and datacenters only 😀

You will need at least 2GB of RAM to run it properly… otherwise it might CRASH!? A SSD is recommended as it seems to do „hybrid-drive“ stuff (combining RAM, SSD and SATA to one big drive… using RAM and SSD for caching and SATA for storage…  )

You should have „real“ professional server hardware (EEC RAM) if you use ZFS:

„#1 Yes you should be using ECC ram for using ZFS.. It’s in all the literature.. You do risk your whole pool if you have corruption without ECC..

#2 yes.. Motherboard.. CPU and RAM should be ECC..“

https://en.wikipedia.org/wiki/ECC_memory

https://forums.freenas.org/index.php?threads/need-help-to-pick-which-file-system-to-use.16489/

QNAP NAS are „prosumer“ devices that have no EEC-RAM, so ZFS is not an option here. (it uses ext4 as default)

XFS

XFS has the y2038 problem as well.

XFS is a high performance journaling filesystem which originated on the SGI IRIX platform.

It is completely multi-threaded, can support large files and large filesystems, extended attributes,
variable block sizes, is extent based, and makes extensive use of Btrees (directories, extents, free space) to aid both performance and scalability.

Refer to the documentation at http://oss.sgi.com/projects/xfs/

comment by a reader: „UPS, battery-backed data caching, none of it matters. If you run XFS, it will burn you sooner or later. I have had 4 XFS systems in the last 2 years, every…single…one of them has failed.“

XFS was contributed to the Linux kernel by SGI and is one of the best filesystems for working with large volumes and large files.

XFS uses more RAM than other filesystems, but if you need to work with large files its performance there is well worth the penalty in memory usage.

XFS is not particularly ill-suited for desktop or laptop use, but really shines on a server that handles medium to large size files all day long. Like ext3, XFS is a fully journaled filesystem.

I have heard that using XFS is a good way to f*** up your system if the power goes out while it is doing stuff.

interesting that with xfs maximum amount of writable bytes vary – while the amount of storable bytes of the other filesystems stays constant – could this be due to xfs large write cache – which also means – only use it with a USV that properly shuts down your server in case of an power outage!!!.

http://arstechnica.com/civis/viewtopic.php?t=1169535

Centos7 is using it for root as well as /home data partition.

[root@CentOS7 user]# hostnamectl
 Static hostname: CentOS7
 Icon name: computer-vm
 Chassis: vm
 Machine ID: ad6f3410bf2346ec97a6fdc05dc4a607
 Boot ID: 2b2ed2a7485d4a108a215758025b4089
 Virtualization: microsoft
 Operating System: CentOS Linux 7 (Core)
 CPE OS Name: cpe:/o:centos:centos:7
 Kernel: Linux 4.12.0cuztom
 Architecture: x86-64

[root@CentOS7 user]# lsblk -fs
NAME    FSTYPE      LABEL UUID                                   MOUNTPOINT
cl-swap swap              750b159f-f6b6-43be-8da7-db083f1c3b9f   [SWAP]
└─sda2  LVM2_member       iNu4Bl-mBja-408f-hWVL-XDA4-18AY-RUN0ca
  └─sda
sr0
cl-home xfs               1e7fca88-bcad-4703-8a91-e3c2ab10c0e0   /home
└─sda2  LVM2_member       iNu4Bl-mBja-408f-hWVL-XDA4-18AY-RUN0ca
  └─sda
fd0
cl-root xfs               1789d6f8-54d5-4c16-bbd6-8f70f8697dab   /
└─sda2  LVM2_member       iNu4Bl-mBja-408f-hWVL-XDA4-18AY-RUN0ca
  └─sda
sda1    xfs               ae3de842-733e-44d0-8f62-edee1291d87f   /boot
└─sda

ext3 <- still (2017) recommended NAS, laptop, desktop

while ext3 is STILL (in 2018) a very reliable filesystem, this „feature“ makes me worry: Near-time Extinction due to Date-Stamp Limitation – This „Geek’s Millenium“ is expected to cause widespread disruption if not dealt with in a timely fashion. Ext3 stores dates as Unix time using four bytes in the file header. 32 bits does not give enough scope to continue processing files beyond January 18, 2038.[43]

Here’s an animation showing how the Year 2038 bug would reset the date – so i guess you should migrate to ext4 or BTRFS after 2030.

Source: Wikipedia

ext3 is the younger cousin of ext2.

It was designed to replace ext2 in most situations and shares much the same code-base, but adds journaling support.

In fact, ext3 and ext2 are so much alike that it is possible to convert one to the other on the fly without lose of data.

ext3 enjoys a lot of popularity for these reasons.

There are many tools available for recovering data from this filesystem in the event of catastrophic hardware failure as well.

ext3 is a good general purpose filesystem with journaling support, but fails to perform as well as other journaling filesystems in specific cases.

One pitfall to ext3 is that the filesystem must still go through this exhaustive check every so often.

This is done when the filesystem is mounted, usually when the computer is booted, and causes an annoying delay.

MY RECOMMENDATION: WHO CARES IF THE NAS IS DOING A AUTOMATIC REBOOT AT SUNDAY 3 o’CLOCK IN THE MORNING AND CHECKING 2-3TB OF EXT3 FILESYSTEM? NO ONE!

SO WHO CARES ABOUT THE FILESYSTEM CHECK DELAY ON BOOT, IF IN RETURN, YOU WILL HAVE A RELIABLE FILESYSTEM.

RELIABILITY SHOULD BE THE TOP1 PRIORITY OF ANY FILESYSTEM.

MAYBE SPEED CAN BE SECOND, UNLESS YOU DO NOT CARE ABOUT DATA-LOSS. (temporary storage… but believe me… even a temporary storage contains important files that users feel very angry about if lost)

So what you SHOULD do is let your filesystem (no matter what system) be checked automatically and monthly/weekly.

or better: every time on boot by

touch /forcefsck

i never had any major issues with ext3. so stick with it 🙂

size limit to files and filesystem?

„Ext3 has some limits on max disk size, due to the chosen block size. With a block size of 4KiB the maximum file size is 2TiB and the max disk size 16TiB.“ (src)

blockdev –getbsz /dev/md0; # find blocksize of filesystem

4096

coolness: You can grow partitions „on the fly“ with ext3 and gparted

i would never the less recommend:

  1. unmount
  2. backup
  3. gparted: grow
  4. fsck -y -v -f /dev/sda1
  5. reboot.

limits:

  • maximum amount of files:
    • The maximum number of inodes (and hence the maximum number of files and directories) is set when the file system is created.
      • If V is the volume size in bytes, then the default number of inodes is given by V/213 (or the number of blocks, whichever is less), and the minimum by V/223.
      • The default was deemed sufficient for most applications.
    • max number of subdirectories in one directory is fixed to 32000.

ext4

ext4 is the latest in the ext series of filesystems.

It was designed to build upon ext3 with new ideas on what filesystems should do.

While Slackware supports ext4, you should remember that this filesystem is still very new (particularly in file system terms) and is under heavy development.

If you require stability over performance, you may wish to use a different filesystem such as ext3.

With that said, ext4 does boast some major improvements over ext3 in the performance arena, but many people don’t yet trust it for stable use.

reiserfs

reiserfs is one of the oldest journaling filesystems for the Linux kernel and has been supported by Slackware for many years.

It is a very fast filesystem particularly well suited for storing, retrieving, and writing lots of small files.

Unfortunately there are few tools for recovering data should you experience a drive failure, and reiserfs partitions experience corruption more often than ext3.

JFS

IBM’s journaled file system technology, currently used in IBM enterprise servers, is designed for high-throughput server environments, key to running intranet and other high-performance e-business file servers.

JFS was contributed to the Linux kernel by IBM and is well known for its responsiveness even under extreme conditions.

It can span colossal volumes making it particularly well-suited for Network Attached Storage (NAS) devices.

JFS’s long history and thorough testing make it one of the most reliable journaling filesystems available for Linux.

btrfs

Btrfs is a new copy on write filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration.

SUSE12 is using it on /root partition where it creates daily read-only-snapshots of your operating system.

For data partition they use xfs:

suse12:/home/user # hostnamectl
   Static hostname: suse12.domain
Transient hostname: suse12
         Icon name: computer-vm
           Chassis: vm
        Machine ID: fe6bf561a4d4f20df6176faf58fdd5b5
           Boot ID: 861d2df8a4f5405e82f6ec0c8e8d23b5
    Virtualization: microsoft
  Operating System: SUSE Linux Enterprise Server 12 SP2
       CPE OS Name: cpe:/o:suse:sles:12:sp2
            Kernel: Linux 4.4.21-69-default
      Architecture: x86-64

suse12:/home/user # lsblk -fs
NAME  FSTYPE LABEL UUID                                 MOUNTPOINT
fd0
sda1  swap         3da1ea6f-d0eb-436b-9937-1b3f5667914d [SWAP]
└─sda
sda2  btrfs        550d46ea-ad14-464c-b019-aa4ec5a10613 /var/log
└─sda
sda3  xfs          986b9f95-b3a1-441e-92a4-98b7a500166b /home
└─sda
sr0

f2fs

F2FS is a new filesystem for Linux aimed at NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards. It is based on Log-structured File System (LFS).

source of inspiration: http://slackbook.org/beta/#id320403

should i raid1 my swap as well?

usually: NO!

do it like this:

1x SSD drive holds replacable software (OS + swap)

1x RAID holds NOT-replacable data (that is also backed up once a week and swaped for another USB-drive that you then carry home, so you ALWAYS should have one complete data set backup OUTSIDE of office/business)

advantage of this combination of SSD + RAID = SPEED! 🙂

http://serverfault.com/questions/195839/where-should-my-swap-partition-s-live-when-using-software-raid1-performance-lv

Speed:

changes with use-case and filesystem-version… keep an eye on:

simple dd benchmark inside a virtual machine – https://dwaves.org/2017/07/07/gnu-linux-what-filesystem-can-store-more-bytes-storage-efficiency-harddisk-space-utilization-dd-benchmark-in-virtual-machine/

https://openbenchmarking.org/s/File-System <- a lot of data but not well structured… confusing.

http://www.phoronix.com/scan.php?page=article&item=linux_311_filesystems&num=3

http://www.linux-magazine.com/Issues/2014/165/Choose-a-Filesystem/(language)/eng-US

pyhton based many small files benchmarks: https://dwaves.org/2014/12/10/linux-harddisk-benchmark/

Links:

What’s the Best File System for My Linux Install?

https://dwaves.org/2017/07/06/gnu-linux-what-filesystems-does-my-kernel-support/

https://dwaves.org/2017/05/15/why-xfs/

Videos:

(in German) https://media.ccc.de/v/froscon2016-1821-a_short_history_of_linux_filesystems

admin