Using Git Special Characters

Using Git Special Characters

This article explains the pitfalls of using special characters in Git, specifically when cloning repositories between Linux and Windows Operating Systems. It assumes you are comfortable with both the Linux and Windows operating systems and using Git.

We will discuss how the differences between the character encodings on Linux and Windows machines are important when you are working with Git between these platforms.

Why do we need to know this?

If a git repository is cloned between two file systems whose encodings do not match, the same file will be shown on each system but with a different file name. As a result of this difference in file names builds could fail.

Git was originally designed for Linux and then ported to Windows. This has created an assumption that Git will work seamlessly between Windows and Linux. While this is the case most of the time, different character encodings on both systems do cause issues.

Git deals with file names and folder names as a stream of binary characters and does not attempt to compensate for the different file encoding of operating systems. Where these characters fall outside the standard Latin character set (specifically the ASCII character set) managing repositories can become more difficult.

The following example demonstrates a manifestation of this problem, in this example we used an Ubuntu Linux machine with Git 1.7.0.4 and a Windows machine using msysgit 1.7.4.

  1. Create a Samba shared folder on Linux (e.g. my_shared_repos)
  2. Create a git repository on the Linux machine under the shared folder (e.g. git_special_characters)
  3. Add a file to the git repository (On the Linux machine) and call the file spécial_filé.txt.
  4. Log into the Windows machine and mount the Linux shared folder as a shared drive (e.g. Z:\)
  5. Using msysgit on a windows command line, clone the repository (that you created in step 2 and 3) to a suitable folder on your desktop.
  6. Change directory into the cloned repository and view its contents (i.e. using dir). You will see that the file name has changed, it is now called spécial_filé.txt
    1. It does not seem to matter where you clone the Git repository to, only what encoding your OS uses.
    2. If you used msysgit on Windows to clone the git repository from the mounted Linux file system back to the same mounted file system (but in a different folder), then the file name would still have changed.
  7. On the windows machine, run the command git log -z. It will show you the binary data of the file name rather than an OS interpretation of such data.
    1. You will see that the file you created on Linux (spécial_filé.txt) is shown as sp<C3><A9>cial_fil<C3><A9>.txt.
    2. This is due to the fact that when you originally created the file, it was stored on the Linux file system as a UTF-8 encoded file name. UTF-8 represents the é character with the hexadecimal sequence C3 A9. On Windows these hexadecimal sequences relate to two separate characters (Ã and © respectively).

If you were to modify and update this file (spécial_filé.txt) using msysgit on windows (in your local Windows clone), commit it and push it back to the master (on the Linux share), the correct file would be updated.

In fact even if a copy of the file is taken in windows, renamed to be spécial_filé_1.txt, committed and pushed back to the master repository, it will still show correctly as spécial_filé_1.txt for anyone using Linux that clones the master repository.

This shows how Git is dealing with the file name as a binary sequence of characters, and not performing any encoding mapping between operating systems.

Clearvision provide many levels of Git Support designed to match the needs of your business, our Pay As You Go Git Support provides a cheap and effective entry level Git Support option while the Enterprise Git Support option gives a more comprehensive Git support option for your business. Clearvision can assist with Git Help, Git FAQ, Git Howto and any Git Questions you may have. Please visit our Git support page for more information: http://www.clearvision-cm.com/sccm-support-options.html.