File Systems: Files, Directories (Folders) and Paths
Image credits: Image created by the author using the program Spectacle.
Modernity Paradoxes
In general and although still less than idea, modern systems are easier to use than those from some decades ago. Accessibility and usability issues are still challenging and relevant -- especially for people with disabilities. However, the systems are usable by more people than in the past. In fact, the current generation is often called digital natives, as a reflex of the availability of digital technologies since their birth.
Internet (Web) systems have contributed for the ease of use, by promoting standard interfaces (that is, at least for a given system) among different devices and platforms. Web browsers keep improving, extending their features. Web systems become more complex and sophisticated, at times providing suitable alternatives to their traditional desktop counterparts. The term cloud has since become a synonym of computation and modernity -- even in an exaggerated way.
Nevertheless, simplicity has its price. The counterpart is that Web migration decreased the need for performing basic, daily informatics operations. This is positive, in some ways. However, not entirely positive.
Perhaps paradoxically, despite people using more technologies nowadays, they also seem to know less about them. Traditional concepts were forgotten, due to disuse. What is a file? What is a folder or a directory?
Back to the Basics: Below the Cloud, the Disk
There is no cloud; everything is stored somewhere, in some machine. Whether the machine is yours is the question. This raises questions about security and privacy, although they are not the focus on this entry in particular.
At this moment, the concern are file systems. The basics. One can think of files as boxes that store digital data to use with a computer (or any other digital device). Files include:
- Text documents;
- Images;
- Slideshows;
- Video;
- Music;
- Computer software (programs, application, or, simply, apps);
- Computer program shortcuts (or to any other files or data).
The text in this page is stored in a file. The page, as a whole, is a set of files: it combines file texts, with images, with source code files for a Web browser (for instance, Mozilla Firefox, Google Chrome, Microsoft Edge e Microsoft Internet Explorer, Apple Safari, or textual browsers, such as Lynx and Links).
Files store bits. The emphasis is intentional. Technically, files do not store text, images, nor sounds. They only store bits. The word bit is a contraction of binary digit, for a bit can assume one of two values. By convention, these values are zero (0) and one (1). Thus, it could be said that computers store only sequences of values zero and one. Everything else is the result of binary coding.
It is not very practical to work with bits. For convenience, bytes are more commonly used than bits. A bit corresponds to eight (8) bits. For even more convenience, it is usual to use multiples of bytes, such as kilobytes, megabytes, gigabytes, terabytes.
In computers, disks store data in persistent (secondary) memory. Common options include hard disk drives (HDDs -- or simply hard disks, HDs) and solid slate drives (SSDs). Disk storage is counted by multiples of bytes. Modern commercial disks can store gigabytes to terabytes of data.
To organize disks, one can create folders, also known as directories. Directory was the usual term used in the first file systems, when the command line dictated the navigation around files and folders. The term is still used for command line programs.
File Systems
File systems (or filesystems) are, often, organized with a file structure called a tree. Every tree starts from a root. In file systems, folders (directories) define subtrees. Files define leaves. Put simply, subtrees are trees; thus, directories can store files or other directories. Files, on the other hand, are terminal elements.
In practice, files can store other files using archiving tools (that often can
perform compression as well). tar
, zip
, rar
, gz
and bz
are traditional
examples of extensions for formats generated by archival or compression. For
file systems, a file created by tools compatible with the previous format
represent a single file. In fact, without a compatible tool, it is not possible
to access the original content stored within.
Absolute Paths of Directories and Folders
Path is an important concept of using files when programming. A path is a sequence of directories that lead from an origin (a start) to a destination (and end). The origin is always a directory. The destination can be a subdirectory (or the origin itself) or a file stored in a subdirectory.
Every file as a path called absolute path, which serves as an address for the file in a file system. The absolute path is unique; each file and directory in a file system has its own distinct absolute path based on the names defined along the way to the destination.
An absolute path starts from the root of the file system and ends at the chosen
destination (either a file or directory). On Windows operating systems, the root
maps a disk unit using a pattern in the form "letter:\". For instance, the usual
C:\
is normally used by the Windows install. User accounts on Windows have an
absolute path in the form C:\Users\User Name
-- for instance,
C:\Users\Franco
, for an account named Franco
. For Windows installs with a
single account, the name is the one chosen at install or during the first use.
It is possible to use a generic value to refer to the current active (logged in)
user, one can use the values stored in environment variables (for instance,
USERPROFILE
, accessed as %USERPROFILE%
).
An example the absolute path for a file placed in the desktop (which is a folder
itself) can be something as C:\Users\Franco\Desktop\File name.txt
. If the very
same file was stored in a folder placed in the desktop called My Folder
, the
path would be C:\Users\Franco\Desktop\My Folder\File name.txt
instead.
On Windows, each disk unit has its own letter, which represent its root. Values
normally start by C:\
, followed by D:\
, E:\
, and so on. In the past, A:\
and B:\
mapped floppy disk units. Moreover, it is possible to map an arbitrary
file to custom letters, with a command line command called subst
(manual entry;
manuals are essential for software development).
In Unix based systems, such as Linux, the root of the file system is called /
.
Unlike Windows, every disk units start from the root, in directories chosen in a
mounting process (which uses the command mount
). An equivalent example for the
home directory could be something like /home/user-name/
(for instance,
/home/franco/
). For convenience, a tilde (~
) allows referencing the
current user's home directory. For the previous entry, both ~/
and
/home/franco/
would represent the same absolute path, provided that the
current user account was called franco
.
Continuing the example, the corresponding absolute paths to the files proposed for Windows on a Linux machine would be:
- For
File name.txt
on desktop:/home/franco/Desktop/.txt
, which is the same as~/Desktop/File name.txt
; - For
Nome do arquivo.txt
on aMy Folder
subdirectory of the desktop:/home/franco/Desktop/My Folder/File name.txt
, or simply~/Desktop/My Folder/File name.txt
.
Overall, the main differences between the systems refer to whether they distinguish lower case from upper case, the name of the root, and the slashes' direction.
The first difference (which requires attention and care) regards differences of
case. Windows does not distinguish cases. A file named abc.txt
can also be
called ABC.txt
, abc.TXT
, AbC.tXt
or any other variation. Thus, one can say
that Windows is not case-sensitive to file paths.
Conversely, Linux systems are case-sensitive for paths, and, therefore, they do
distinguish lower and upper case letters. abc.txt
, ABC.txt
, abc.TXT
,
AbC.tXt
are considered four different files, each one accessed by writing the
corresponding path exactly as defined.
Next, the choice of slashes is not always a problem nowadays, although it is
still useful to know the differences. Backslashes on Windows are an inheritance of DOS'
days, because, back then, forward slashes provided (and do still
provide) parameters for the command line interpreter of DOS (nowadays Windows;
the command line interpreter cmd
for batch
processing). Modern Windows
version usually accept forward slashes instead of backslashes, although there
are exceptions. As a rule of thumb, in modern programming libraries, one can
almost always use forward slashes. However, it is wise to check and test before
assuming. Even better, abstract the implementation using a subroutine (such as a
function or procedure).
The third difference is that names in Windows often have spaces, accents and
other special characters. Names in Unix-like systems often avoid spaces, accents
and other special characters for ease and convenience of command line usage.
Instead, they use hyphens of underlines (underscores) as a substitution for
spaces. For instance, instead of Name Surname
, one can write Name-Surname
,
Name_Surname
, NameSurname
. Even more commonly (and also for convenience),
words only use lowercase letters (name-surname
).
Although the choices seem inconsequential, they show to be wise for software development. There are many programming libraries that assume the use of the English language. In other words, special characters can be problematic. Spaces can also be problematic.
When programming, it is prudent to avoid unnecessary risks and maximize opportunities to avoid problems. For directories and internal files, a conscious choice of avoiding spaces, accents, cedillas and other special characteres (that is, any characters that is not part of the American Standard Code for Information Interchange -- ASCII -- encoding) can help to avoid problems and wastes of time. It is a good programming practice to create conventions for naming files and folders for paths.
Relative Paths of Directories and Folders
Besides absolute paths, files and folders can have relative paths. While absolute paths always start from the root of the file system, relative paths start from an arbitrarily chosen origin.
It is easier to understand relative paths with examples. Generically, one can think of file systems as:
- Root
- Users Directory
- Directory of A User
- Directory of Another User
- Programs Directory
- Operating System Directory
- Other Directories of the Root
- Users Directory
For Windows, one could instance the previous schema as:
C:\
Users
(orDocuments and Settings
for older Windows' versions)Ana
Franco
Program Files
Windows
...
For Linux, the schema would become:
/
home
ana
franco
bin
sys
Now, one can consider an arbitrary user; for instance, franco
. franco
could
have the following personal directory:
Franco
/franco
Desktop
a.txt
b.md
c.pdf
Images
1.png
2.jpg
3.gif
Code
C
x.c
C++
y.cpp
Python
z.py
To map absolute file paths, it suffices to follow the sequence defined from the root until the desired file or directory. For some instances:
- File
b.md
:- Windows:
C:\Users\Franco\Desktop\b.md
%USERPROFILE%\Desktop\b.md
- Linux:
/home/franco/Desktop/b.md
~/Desktop/b.md
- Windows:
- Directory
Images.md
:- Windows:
C:\Users\Franco\Desktop\Images\
%USERPROFILE%\Desktop\Images\
- Linux:
/home/franco/Desktop/Images/
~/Desktop/Images/
- Windows:
- File
y.cpp
:- Windows:
C:\Users\Franco\Desktop\Code\C++\y.cpp
%USERPROFILE%\Desktop\Code\C++\y.cpp
- Linux:
/home/franco/Desktop/Code/C++/y.cpp
~/Desktop/Code/C++/y.cpp
- Windows:
For relative paths, the values depend on the chosen origin. For instance, if one
chose the origin C:\Users\Franco\
(/home/franco/
), the examples would
become:
- File
b.md
:- Windows:
.\Desktop\b.md
- Linux:
.\Desktop/b.md
- Windows:
- Directory
Images.md
:- Windows:
.\Desktop\Images\
- Linux:
./Desktop/Images/
- Windows:
- File
y.cpp
:- Windows:
.\Desktop\Code\C++\y.cpp
- Linux:
./Desktop/Code/C++/y.cpp
- Windows:
In relative paths, a dot (.
) represents the current directory. At the start of
an address, it substitutes the origin. Therefore, the example, .
corresponds to C:\Users\Franco\
on Windows and to /home/franco
on Linux.
It is possible to omit the starting dot. When omitted, the operating systems
assume that the address is relative, starting from the current address. There is
an exception, though: for files that are programs and that one wishes to run, it
is necessary to append a dot slash (./
, as in ./programe-name
) to avoid
conflicts with values defined in environment variables. Exception aside, the
following examples are equivalent to the previous ones:
- File
b.md
:- Windows:
Desktop\b.md
- Linux:
Desktop/b.md
- Windows:
- Directory
Images.md
:- Windows:
Desktop\Images\
- Linux:
Desktop/Images/
- Windows:
- File
y.cpp
:- Windows:
Desktop\Code\C++\y.cpp
- Linux:
Desktop/Code/C++/y.cpp
- Windows:
The address that the operating system assumes is called work directory (or work
dir) or current work directory (current work dir). On Linux, one can use the
PWD
environment variable to retrieve the value of the work directory (to use
it, $PWD/resto/caminho/file.ext
). On Windows, one can get the value using the
cd
command (current directory; cd
also allows changing directories on Windows;
on Unix-like systems, cd
is called change directory and only allows for directory changes).
To use the value on Windows, one should write %cd%\rest of\the path to\file.ext
.
Relative addresses assume different value whenever the origin (which, now, can
be called working directory) changes. For instance, if one assumes the origin as
the directory C:\Users\Franco\Desktop\Code\
(Windows) or
/home/franco/Desktop/Code
(Unix), the results become:
- File
b.md
:- Windows:
./..\b.md
..\b.md
- Linux:
./../b.md
../b.md
- Windows:
- Directory
Images.md
:- Windows:
./../b.md
..\Images\
- Linux:
./../Images/
../Images/
- Windows:
- File
y.cpp
:- Windows:
./C++\y.cpp
C++\y.cpp
- Linux:
./C++/y.cpp
C++/y.cpp
- Windows:
The previous examples provide the second special value for relative directories,
which is represented by two dots in sequence (..
). The points represent the
parent directory of the result up to that point.
In the example, as .
matches C:\Users\Franco\Desktop\Code\
(/home/franco/Desktop/Code
) at the start of the path, ..
means
C:\Users\Franco\Desktop\
on Windows e /home/franco/Desktop/
on Linux.
Finally, the values of .
and ..
always provide the value up to the
considered point on the path. In other words,
././././
corresponds to./
;.././././../
corresponds to../../
;./abc/./def/ghi/./
corresponds toabc/def/ghi/
;abc/def/ghi/..
corresponds toabc/def/
.abc/def/ghi/.././
corresponds toabc/def/
.abc/def/ghi/.././..
corresponds toabc/
.
It is common to feel that absolute and relative paths are confusing and complex when one learns them. With some practice, their use become simpler. Therefore, it can be convenient to practice using paths with other values defined at the start of this section. Perhaps it can be even better to try with files and folders existing on your own computer.
When Should I Use an Absolute Path? When Should I Choose a Relative One?
It is a good programming practice to use relative paths whenever possible.
In particular, the use of relative paths is useful for programming and automation, because they allow working with files and folders in a generic way. An absolute path works only if the entire path is identical, from root to leaf (file or folder). Thus, if there is a user or machine name anywhere on the path, it also has to be the very same.
On the other hand, it is enough that a local structure of files and folders match from the origin for a functional relative path
When absolute paths are inevitable, one should abstract them using environment variables provided by the operating system -- or define ones own, documenting them.
Additional Information and Trivia
File Sizes: Bytes and Bibytes
Although, technically, all bytes quantities must be a power of two (for bits have only two possible values), marketing campaigns and advertising often use powers of ten (10) to describe memory storage quantities. To avoid imprecision when it is necessary, there exists the term bibyte, which is always defined as power of two.
Name | Acronym | Value | Power |
---|---|---|---|
bit | b | 1 | 20 |
byte | B | 8 | 23 |
kilobyte | kB | 1000 | 10001 |
megabyte | MB | 1000000 | 10002 |
gigabyte | GB | 1000000000 | 10003 |
terabyte | TB | 1000000000 | 10004 |
petabyte | PB | 1000000000000 | 10005 |
exabyte | EB | 1000000000000000 | 10006 |
zettabyte | ZB | 1000000000000000000 | 10007 |
yottabyte | YB | 1000000000000000000000 | 10008 |
Name | Acronym | Value | Power of b |
---|---|---|---|
bit | b | 1 | 20 |
byte | B | 8 | 23 |
kibibyte | KiB | 1024 | 210 |
mebibyte | MiB | 1048576 | 220 |
gibibyte | GiB | 1073741824 | 230 |
tebibyte | TiB | 1099511627776 | 240 |
pebibyte | PiB | 1125899906842624 | 250 |
exabibyte | EiB | 1152921504606846976 | 260 |
zebibyte | ZiB | 1180591620717411303424 | 270 |
yobibyte | YiB | 1208925819614629174706176 | 280 |
Encoding, Decoding, Transcoding and Extensions
Nothing exists for a computer but zeros and ones. Characters, digits, images, sounds, videos, computer code -- and even the operating system -- are coded sequences of bits.
For instance, using the ASCII or the Unicode Transformation Format 8 (UTF-8) encoding, the
character a
has the binary representation 1100001
2, which
requires a minimum of eight bits of memory (one byte) for storage.
The value is read as one, one, zero, zero, zero, zero, one and corresponds to the
decimal value (97
10).
A possible way to find the value is by writing the following line of code in an interpreter for the Python programming language:
print("a = ", ord("a")) # 97
print("a = ", bin(ord("a"))) # 0b1100001
To run the code, call python
(after installing it) in a command line
interpreter, type one the lines, press enter
then wait a few moments for the
result. The interpreter will evaluate the first expression and output its
result. The same applies for the second line. It is not necessary to write what
is after the hash symbol; it is a comment, illustrating the expected output.
In a desktop browser, it is possible to find the value without installing any
additional other program. To do this, one can open the developer options
provided by the browser (pressing F12
on F12
no Mozilla Firefox, Google
Chrome and Microsoft Edge), switching to the tab named Console and writing the
following line of JavaScript code (technically, charCodeAt()
provides a
UTF-16 value):
console.log("a".charCodeAt(0)) // 97
console.log("a".charCodeAt(0).toString(2)) // 1100001
The process of converting a value to a code is called encoding. The process of converting the code back to the original value is decoding. Finally, encoding followed by decoding is called transcoding.
All letters in this page's text are encoded by their UTF-8 values. The browser interprets the value and draws a corresponding character on the screen (or a voice synthesizer reads the character aloud). This process is repeated to transform encoded binary character sequences into sequences of images to draw.
Images, sounds, documents, computer programs and all other files have their own encoding. For the computer, there is nothing besides zeros and ones. Everything is code. The computer only performs what the program commands it to do.
Therefore, it is up to the question: how can a computer know which program it should use to open and read a file?
There are three main ways:
A person chooses the program that the computer should use;
File extensions. Extensions are sequences of characters prefixed at the end of a file's name, usually after a dot (pattern:
name-of-the-file.ext
). For instance,.txt
,.doc
,.docx
,.pdf
,.jpg
,.png
,.gif
,.avi
,.mpeg
,.mp3
and.ogg
are popular extensions for documents, images, videos and audio. Extensions can have more than three characters (for instance,.franco
or.FrAnCo-GaRcIa
) or less (.f
). It is also possible to create files without any extension at all (for instance,just name
orname
). A file without extension is still the of the type defined by the format of its data. Extensions are, therefore, optional.Regardless, an extension serves as a heuristic (a tip) of what programs a computer can choose and use to open a file. If there exists a mapping between file extension and a program, the operating system can run that program to open the file with the provided extension.
Metadata on stored in the header of a file. Many file formats reserve the first stored values to describe the contents of the file. This way, an operating system can read them to discover the format and choose a suitable program (if there exists a mapping) to open it.
A fourth way is to try to guess the format by analyzing the file's contents; this is not always possible. When the computer cannot open a file, it asks the user how she/he wants to open it.
If this subsection seems hazy at this time, there is nothing to worry about. This is a trivia about how computers work. At this moment, it is more important to know how to use files and folders. The next entry presents the basics.