rsync and different UTF normalization in APFS vs HFS+

Ryan Schmidt ryandesign at macports.org
Fri Jul 3 11:00:30 UTC 2020


On Jul 3, 2020, at 04:53, Ces VLC wrote:

> Some of my disks are HFS+ and others are APFS. I've been using rsync for years, in order to sync some folders across all my disks. There were no problems until APFS was introduced into the game.
> 
> Now, in filenames that have UTF international characters, I often hit the problem of rsync deleting a file and then rewriting it again, just because the UTF normalization is not the same in both disks. Other users have been reporting this (see here for example: https://superuser.com/questions/1513326/rsync-from-mac-os-to-synology-with-btrfs-having-issues-with-file-and-directories ).
> 
> For over a year I've been tolerating this because I considered it non-critical, but I feel I should fix it. However, I didn't find any posted solution that could address this in a convenient and proper way.
> 
> People suggest to use the --iconv flag, but... does this mean that you need to use different iconv settings depending on whether your transfer is APFS->HFS+ or HFS+->APFS? If affirmative, it would be a bit clumsy, IMHO (first detect the disk FS, then choose proper flags). 
> 
> Isn't there some way for dealing with this more conveniently, in a way that you don't need to check the disk FS before invoking rsync?

The issue I'm familiar with is that there can be several valid ways to represent certain strings of UTF-8 characters. (Characters comprised of several symbols can be composed or decomposed.) The designers of HFS+ picked one of those representations as the "correct" one and normalize such strings to that form when writing filenames to disk. HFS+ was unusual in that regard. Most Linux filesystems did not normalize and instead accepted whatever bytes the program gave it. This could result in the problem that a file created on Linux and moved to an HFS+ Mac might then have a different sequence of bytes for its filename, though they are the same characters. (Linux would also have the problem that two or more different filenames could be created that would each have different representations of the same characters.) The problem should not happen when moving a file from an HFS+ Mac to a Linux machine, since the Linux filesystem will accept the order of bytes that HFS+ used.

APFS changes things again, so maybe you will now see some similar types of problems when using HFS+ and APFS together, but I couldn't tell you under what conditions or in what way it would manifest or what to do about it. APFS certainly seems more complicated, since the behavior can vary based on which OS version you used to create the APFS volume and whether the volume is case-sensitive or case-insensitive:

https://mjtsai.com/blog/2017/06/27/apfs-native-normalization/

Here's some info direct from Apple, though it is a "retired" document so maybe a newer version is available:

https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/APFS_Guide/FAQ/FAQ.html




More information about the macports-users mailing list