File Systems: Files, Directories (Folders) and Paths

The output of some command line commands related to file system usage in Linux systems. The image illustrates directories and files listead by the commands ls, mount and tree.

Image credits: Image created by the author using the program Spectacle.

Modernity Paradoxes

In general and although still less than idea, modern systems are easier to use than those from some decades ago. Accessibility and usability issues are still challenging and relevant -- especially for people with disabilities. However, the systems are usable by more people than in the past. In fact, the current generation is often called digital natives, as a reflex of the availability of digital technologies since their birth.

Internet (Web) systems have contributed for the ease of use, by promoting standard interfaces (that is, at least for a given system) among different devices and platforms. Web browsers keep improving, extending their features. Web systems become more complex and sophisticated, at times providing suitable alternatives to their traditional desktop counterparts. The term cloud has since become a synonym of computation and modernity -- even in an exaggerated way.

Nevertheless, simplicity has its price. The counterpart is that Web migration decreased the need for performing basic, daily informatics operations. This is positive, in some ways. However, not entirely positive.

Perhaps paradoxically, despite people using more technologies nowadays, they also seem to know less about them. Traditional concepts were forgotten, due to disuse. What is a file? What is a folder or a directory?

Back to the Basics: Below the Cloud, the Disk

There is no cloud; everything is stored somewhere, in some machine. Whether the machine is yours is the question. This raises questions about security and privacy, although they are not the focus on this entry in particular.

At this moment, the concern are file systems. The basics. One can think of files as boxes that store digital data to use with a computer (or any other digital device). Files include:

Text documents;
Images;
Slideshows;
Video;
Music;
Computer software (programs, application, or, simply, apps);
Computer program shortcuts (or to any other files or data).

The text in this page is stored in a file. The page, as a whole, is a set of files: it combines file texts, with images, with source code files for a Web browser (for instance, Mozilla Firefox, Google Chrome, Microsoft Edge e Microsoft Internet Explorer, Apple Safari, or textual browsers, such as Lynx and Links).

Files store bits. The emphasis is intentional. Technically, files do not store text, images, nor sounds. They only store bits. The word bit is a contraction of binary digit, for a bit can assume one of two values. By convention, these values are zero (0) and one (1). Thus, it could be said that computers store only sequences of values zero and one. Everything else is the result of binary coding.

It is not very practical to work with bits. For convenience, bytes are more commonly used than bits. A bit corresponds to eight (8) bits. For even more convenience, it is usual to use multiples of bytes, such as kilobytes, megabytes, gigabytes, terabytes.

In computers, disks store data in persistent (secondary) memory. Common options include hard disk drives (HDDs -- or simply hard disks, HDs) and solid slate drives (SSDs). Disk storage is counted by multiples of bytes. Modern commercial disks can store gigabytes to terabytes of data.

To organize disks, one can create folders, also known as directories. Directory was the usual term used in the first file systems, when the command line dictated the navigation around files and folders. The term is still used for command line programs.

File Systems

File systems (or filesystems) are, often, organized with a file structure called a tree. Every tree starts from a root. In file systems, folders (directories) define subtrees. Files define leaves. Put simply, subtrees are trees; thus, directories can store files or other directories. Files, on the other hand, are terminal elements.

In practice, files can store other files using archiving tools (that often can perform compression as well). tar, zip, rar, gz and bz are traditional examples of extensions for formats generated by archival or compression. For file systems, a file created by tools compatible with the previous format represent a single file. In fact, without a compatible tool, it is not possible to access the original content stored within.

Absolute Paths of Directories and Folders

Path is an important concept of using files when programming. A path is a sequence of directories that lead from an origin (a start) to a destination (and end). The origin is always a directory. The destination can be a subdirectory (or the origin itself) or a file stored in a subdirectory.

Every file as a path called absolute path, which serves as an address for the file in a file system. The absolute path is unique; each file and directory in a file system has its own distinct absolute path based on the names defined along the way to the destination.

An absolute path starts from the root of the file system and ends at the chosen destination (either a file or directory). On Windows operating systems, the root maps a disk unit using a pattern in the form "letter:\". For instance, the usual C:\ is normally used by the Windows install. User accounts on Windows have an absolute path in the form C:\Users\User Name -- for instance, C:\Users\Franco, for an account named Franco. For Windows installs with a single account, the name is the one chosen at install or during the first use. It is possible to use a generic value to refer to the current active (logged in) user, one can use the values stored in environment variables (for instance, USERPROFILE, accessed as %USERPROFILE%).

An example the absolute path for a file placed in the desktop (which is a folder itself) can be something as C:\Users\Franco\Desktop\File name.txt. If the very same file was stored in a folder placed in the desktop called My Folder, the path would be C:\Users\Franco\Desktop\My Folder\File name.txt instead.

On Windows, each disk unit has its own letter, which represent its root. Values normally start by C:\, followed by D:\, E:\, and so on. In the past, A:\ and B:\ mapped floppy disk units. Moreover, it is possible to map an arbitrary file to custom letters, with a command line command called subst (manual entry; manuals are essential for software development).

In Unix based systems, such as Linux, the root of the file system is called /. Unlike Windows, every disk units start from the root, in directories chosen in a mounting process (which uses the command mount). An equivalent example for the home directory could be something like /home/user-name/ (for instance, /home/franco/). For convenience, a tilde (~) allows referencing the current user's home directory. For the previous entry, both ~/ and /home/franco/ would represent the same absolute path, provided that the current user account was called franco.

Continuing the example, the corresponding absolute paths to the files proposed for Windows on a Linux machine would be:

For File name.txt on desktop: /home/franco/Desktop/.txt, which is the same as ~/Desktop/File name.txt;
For Nome do arquivo.txt on a My Folder subdirectory of the desktop: /home/franco/Desktop/My Folder/File name.txt, or simply ~/Desktop/My Folder/File name.txt.

Overall, the main differences between the systems refer to whether they distinguish lower case from upper case, the name of the root, and the slashes' direction.

The first difference (which requires attention and care) regards differences of case. Windows does not distinguish cases. A file named abc.txt can also be called ABC.txt, abc.TXT, AbC.tXt or any other variation. Thus, one can say that Windows is not case-sensitive to file paths.

Conversely, Linux systems are case-sensitive for paths, and, therefore, they do distinguish lower and upper case letters. abc.txt, ABC.txt, abc.TXT, AbC.tXt are considered four different files, each one accessed by writing the corresponding path exactly as defined.

Next, the choice of slashes is not always a problem nowadays, although it is still useful to know the differences. Backslashes on Windows are an inheritance of DOS' days, because, back then, forward slashes provided (and do still provide) parameters for the command line interpreter of DOS (nowadays Windows; the command line interpreter cmd for batch processing). Modern Windows version usually accept forward slashes instead of backslashes, although there are exceptions. As a rule of thumb, in modern programming libraries, one can almost always use forward slashes. However, it is wise to check and test before assuming. Even better, abstract the implementation using a subroutine (such as a function or procedure).

The third difference is that names in Windows often have spaces, accents and other special characters. Names in Unix-like systems often avoid spaces, accents and other special characters for ease and convenience of command line usage. Instead, they use hyphens of underlines (underscores) as a substitution for spaces. For instance, instead of Name Surname, one can write Name-Surname, Name_Surname, NameSurname. Even more commonly (and also for convenience), words only use lowercase letters (name-surname).

Although the choices seem inconsequential, they show to be wise for software development. There are many programming libraries that assume the use of the English language. In other words, special characters can be problematic. Spaces can also be problematic.

When programming, it is prudent to avoid unnecessary risks and maximize opportunities to avoid problems. For directories and internal files, a conscious choice of avoiding spaces, accents, cedillas and other special characteres (that is, any characters that is not part of the American Standard Code for Information Interchange -- ASCII -- encoding) can help to avoid problems and wastes of time. It is a good programming practice to create conventions for naming files and folders for paths.

Relative Paths of Directories and Folders

Besides absolute paths, files and folders can have relative paths. While absolute paths always start from the root of the file system, relative paths start from an arbitrarily chosen origin.

It is easier to understand relative paths with examples. Generically, one can think of file systems as:

Root
- Users Directory
  - Directory of A User
  - Directory of Another User
- Programs Directory
- Operating System Directory
- Other Directories of the Root

For Windows, one could instance the previous schema as:

C:\
- Users (or Documents and Settings for older Windows' versions)
  - Ana
  - Franco
- Program Files
- Windows
- ...

For Linux, the schema would become:

/
- home
  - ana
  - franco
- bin
- sys

Now, one can consider an arbitrary user; for instance, franco. franco could have the following personal directory:

Franco / franco
- Desktop
  - a.txt
  - b.md
  - c.pdf
  - Images
    - 1.png
    - 2.jpg
    - 3.gif
  - Code
    - C
      - x.c
    - C++
      - y.cpp
    - Python
      - z.py

To map absolute file paths, it suffices to follow the sequence defined from the root until the desired file or directory. For some instances:

File b.md:
- Windows:
  - C:\Users\Franco\Desktop\b.md
  - %USERPROFILE%\Desktop\b.md
- Linux:
  - /home/franco/Desktop/b.md
  - ~/Desktop/b.md
Directory Images.md:
- Windows:
  - C:\Users\Franco\Desktop\Images\
  - %USERPROFILE%\Desktop\Images\
- Linux:
  - /home/franco/Desktop/Images/
  - ~/Desktop/Images/
File y.cpp:
- Windows:
  - C:\Users\Franco\Desktop\Code\C++\y.cpp
  - %USERPROFILE%\Desktop\Code\C++\y.cpp
- Linux:
  - /home/franco/Desktop/Code/C++/y.cpp
  - ~/Desktop/Code/C++/y.cpp

For relative paths, the values depend on the chosen origin. For instance, if one chose the origin C:\Users\Franco\ (/home/franco/), the examples would become:

File b.md:
- Windows:
  - .\Desktop\b.md
- Linux:
  - .\Desktop/b.md
Directory Images.md:
- Windows:
  - .\Desktop\Images\
- Linux:
  - ./Desktop/Images/
File y.cpp:
- Windows:
  - .\Desktop\Code\C++\y.cpp
- Linux:
  - ./Desktop/Code/C++/y.cpp

In relative paths, a dot (.) represents the current directory. At the start of an address, it substitutes the origin. Therefore, the example, . corresponds to C:\Users\Franco\ on Windows and to /home/franco on Linux.

It is possible to omit the starting dot. When omitted, the operating systems assume that the address is relative, starting from the current address. There is an exception, though: for files that are programs and that one wishes to run, it is necessary to append a dot slash (./, as in ./programe-name) to avoid conflicts with values defined in environment variables. Exception aside, the following examples are equivalent to the previous ones:

File b.md:
- Windows:
  - Desktop\b.md
- Linux:
  - Desktop/b.md
Directory Images.md:
- Windows:
  - Desktop\Images\
- Linux:
  - Desktop/Images/
File y.cpp:
- Windows:
  - Desktop\Code\C++\y.cpp
- Linux:
  - Desktop/Code/C++/y.cpp

The address that the operating system assumes is called work directory (or work dir) or current work directory (current work dir). On Linux, one can use the PWD environment variable to retrieve the value of the work directory (to use it, $PWD/resto/caminho/file.ext). On Windows, one can get the value using the cd command (current directory; cd also allows changing directories on Windows; on Unix-like systems, cd is called change directory and only allows for directory changes). To use the value on Windows, one should write %cd%\rest of\the path to\file.ext.

Relative addresses assume different value whenever the origin (which, now, can be called working directory) changes. For instance, if one assumes the origin as the directory C:\Users\Franco\Desktop\Code\ (Windows) or /home/franco/Desktop/Code (Unix), the results become:

File b.md:
- Windows:
  - ./..\b.md
  - ..\b.md
- Linux:
  - ./../b.md
  - ../b.md
Directory Images.md:
- Windows:
  - ./../b.md
  - ..\Images\
- Linux:
  - ./../Images/
  - ../Images/
File y.cpp:
- Windows:
  - ./C++\y.cpp
  - C++\y.cpp
- Linux:
  - ./C++/y.cpp
  - C++/y.cpp

The previous examples provide the second special value for relative directories, which is represented by two dots in sequence (..). The points represent the parent directory of the result up to that point.

In the example, as . matches C:\Users\Franco\Desktop\Code\ (/home/franco/Desktop/Code) at the start of the path, .. means C:\Users\Franco\Desktop\ on Windows e /home/franco/Desktop/ on Linux.

Finally, the values of . and .. always provide the value up to the considered point on the path. In other words,

././././ corresponds to ./;
.././././../ corresponds to ../../;
./abc/./def/ghi/./ corresponds to abc/def/ghi/;
abc/def/ghi/.. corresponds to abc/def/.
abc/def/ghi/.././ corresponds to abc/def/.
abc/def/ghi/.././.. corresponds to abc/.

It is common to feel that absolute and relative paths are confusing and complex when one learns them. With some practice, their use become simpler. Therefore, it can be convenient to practice using paths with other values defined at the start of this section. Perhaps it can be even better to try with files and folders existing on your own computer.

When Should I Use an Absolute Path? When Should I Choose a Relative One?

It is a good programming practice to use relative paths whenever possible.

In particular, the use of relative paths is useful for programming and automation, because they allow working with files and folders in a generic way. An absolute path works only if the entire path is identical, from root to leaf (file or folder). Thus, if there is a user or machine name anywhere on the path, it also has to be the very same.

On the other hand, it is enough that a local structure of files and folders match from the origin for a functional relative path

When absolute paths are inevitable, one should abstract them using environment variables provided by the operating system -- or define ones own, documenting them.

Additional Information and Trivia

File Sizes: Bytes and Bibytes

Although, technically, all bytes quantities must be a power of two (for bits have only two possible values), marketing campaigns and advertising often use powers of ten (10) to describe memory storage quantities. To avoid imprecision when it is necessary, there exists the term bibyte, which is always defined as power of two.

Name	Acronym	Value	Power
bit	b	1	2⁰
byte	B	8	2³
kilobyte	kB	1000	1000¹
megabyte	MB	1000000	1000²
gigabyte	GB	1000000000	1000³
terabyte	TB	1000000000	1000⁴
petabyte	PB	1000000000000	1000⁵
exabyte	EB	1000000000000000	1000⁶
zettabyte	ZB	1000000000000000000	1000⁷
yottabyte	YB	1000000000000000000000	1000⁸

Name	Acronym	Value	Power of b
bit	b	1	2⁰
byte	B	8	2³
kibibyte	KiB	1024	2¹⁰
mebibyte	MiB	1048576	2²⁰
gibibyte	GiB	1073741824	2³⁰
tebibyte	TiB	1099511627776	2⁴⁰
pebibyte	PiB	1125899906842624	2⁵⁰
exabibyte	EiB	1152921504606846976	2⁶⁰
zebibyte	ZiB	1180591620717411303424	2⁷⁰
yobibyte	YiB	1208925819614629174706176	2⁸⁰

Encoding, Decoding, Transcoding and Extensions

Nothing exists for a computer but zeros and ones. Characters, digits, images, sounds, videos, computer code -- and even the operating system -- are coded sequences of bits.

For instance, using the ASCII or the Unicode Transformation Format 8 (UTF-8) encoding, the character a has the binary representation 1100001₂, which requires a minimum of eight bits of memory (one byte) for storage. The value is read as one, one, zero, zero, zero, zero, one and corresponds to the decimal value (97₁₀).

A possible way to find the value is by writing the following line of code in an interpreter for the Python programming language:

print("a = ", ord("a"))      # 97
print("a = ", bin(ord("a"))) # 0b1100001

To run the code, call python (after installing it) in a command line interpreter, type one the lines, press enter then wait a few moments for the result. The interpreter will evaluate the first expression and output its result. The same applies for the second line. It is not necessary to write what is after the hash symbol; it is a comment, illustrating the expected output.

In a desktop browser, it is possible to find the value without installing any additional other program. To do this, one can open the developer options provided by the browser (pressing F12 on F12 no Mozilla Firefox, Google Chrome and Microsoft Edge), switching to the tab named Console and writing the following line of JavaScript code (technically, charCodeAt() provides a UTF-16 value):

console.log("a".charCodeAt(0))             // 97
console.log("a".charCodeAt(0).toString(2)) // 1100001

The process of converting a value to a code is called encoding. The process of converting the code back to the original value is decoding. Finally, encoding followed by decoding is called transcoding.

All letters in this page's text are encoded by their UTF-8 values. The browser interprets the value and draws a corresponding character on the screen (or a voice synthesizer reads the character aloud). This process is repeated to transform encoded binary character sequences into sequences of images to draw.

Images, sounds, documents, computer programs and all other files have their own encoding. For the computer, there is nothing besides zeros and ones. Everything is code. The computer only performs what the program commands it to do.

Therefore, it is up to the question: how can a computer know which program it should use to open and read a file?

There are three main ways:

A person chooses the program that the computer should use;
File extensions. Extensions are sequences of characters prefixed at the end of a file's name, usually after a dot (pattern: name-of-the-file.ext). For instance, .txt, .doc, .docx, .pdf, .jpg, .png, .gif, .avi, .mpeg, .mp3 and .ogg are popular extensions for documents, images, videos and audio. Extensions can have more than three characters (for instance, .franco or .FrAnCo-GaRcIa) or less (.f). It is also possible to create files without any extension at all (for instance, just name or name). A file without extension is still the of the type defined by the format of its data. Extensions are, therefore, optional.
Regardless, an extension serves as a heuristic (a tip) of what programs a computer can choose and use to open a file. If there exists a mapping between file extension and a program, the operating system can run that program to open the file with the provided extension.
Metadata on stored in the header of a file. Many file formats reserve the first stored values to describe the contents of the file. This way, an operating system can read them to discover the format and choose a suitable program (if there exists a mapping) to open it.

A fourth way is to try to guess the format by analyzing the file's contents; this is not always possible. When the computer cannot open a file, it asks the user how she/he wants to open it.

If this subsection seems hazy at this time, there is nothing to worry about. This is a trivia about how computers work. At this moment, it is more important to know how to use files and folders. The next entry presents the basics.

← Previous:
Let There Be Light

Next:→
File Systems: File Managers and Basic File Manipulation Operations