Learn Programming: Data Types
Image credits: Image created by the author using the program Spectacle.
Requirements
In the introduction to development environments, I have mentioned Python, Lua and JavaScript as good choices of programming languages for beginners. Later, I have commented about GDScript as an option for people who want to program digital games or simulations. For the introductory programming activities, you will need, at least, a development environment configured for one of the previous languages.
If you wish to try programming without configuring an environment, you can use of the online editors that I have created:
However, they do not provide all features offered by interpreters for the languages. Thus, sooner or later, you will need to set up a development environment. If you need to configure one, you can refer to the following resources.
Thus, if you have an Integrated Development Environment (IDE), or a combination of text editor and an interpreter, you are ready to start. The following example assumes that you know how to run code in your chosen language, as presented in the configuration pages.
If you want to use another language, the introduction provides links for configure development environments for the C, C++, Java, LISP, Prolog, and SQL (with SQLite) languages. In many languages, it suffices to follow the models from the experimentation section to modify syntax, commands and functions from the code blocks. C and C++ are exceptions, for they require pointers access the memory.
Encoded and Decoded Zeros and Ones
A section of File Systems: Files, Directories (Folders) and Paths mentioned data codification. In particular, it was commented that everything that is stored in a computer is composed by sequences of zeros and ones. Although the focus of the topic was use and storage of files and directories, the codification also applies for code.
In other words, for you, this page is composed by words and images. For the computer, the page has code and data as encoded sequences of binary numbers.
One of the lowest level ways of programming is working directly with binary sequences. The process is complicated, slow and error-prone.
Programming languages abstract data codification as data types. Data types and operators allow writing and handle value or text in a simpler, more practical and convenient way.
To illustrate the commodity provided by data types, we can consider an example. One of the simplest text codification for computers is called American Standard Code for Information Interchange (ASCII). ASCII maps numbers to English characters and some control codes. To check associations, it is possible to refer to a relation called ASCII table.
The following three representations encode the same word:
- The binary sequence
01000110 01110010 01100001 01101110 01100011 01101111
2 encoded in ASCII. Each group of eight binary digits corresponds to a byte; - The decimal sequence
70 114 97 110 99 111
10 also encoded in ASCII. - The sequence of characters
F
,r
,a
,n
,c
eo
to form a word.
Each digit grouping corresponds to one character.
Which of the previous three forms are the simplest and easiest way to write and read the word Franco
?
For more examples, you can use the following button. It provides decimal and binary values for strings encoded in Unicode Transformation Format 16-bit (UTF-16), which are used by JavaScript.
Similar to ASCII tables, there exists Unicode tables. A difference is that Unicode tables are significantly larger, because they can encode millions of characters instead of, at most, 128 or 256, as it happens with ASCII and Extended ASCII, respectively.
Thus, it is practical and more convenient to use higher level representation. To ease programming activities, there exists data types.
Data Types
The lowest level to use data with computers is working with bits and bytes. Computer work with binary data, that is, sequences of bits. Programmers work with data types, which encode binary sequences in a more convenient way.
There are some classes of data types. There are three that are commonly used for programming: primitive types, composite types, and types to reference other types.
Primitive (or Basic, Built-In or Predefined) Types
Primitive types, also called basic, built-in, homogeneous, or predefined types represent the basic programming data unit.
Programming languages commonly offer four primitive types:
- Integer number;
- Real number (or floating point);
- Logic value (or boolean);
- Character (or literal).
Some languages also define a special value and/or type for lack of typing (or an empty type).
Common names to describe a lack of typing include void
, NULL
, null
and nil
.
These values also are often used to indicate errors (or lack of errors), or empty or invalid values.
Integer Numbers
Integer numbers and real numbers encode numbers, as in Mathematics.
Integer numbers represent the set of integer numbers.
Examples of integer numbers include 0
, -1
, 1
, 777
, -1234
.
console.log(1)
console.log(-123)
print(1)
print(-123)
print(1)
print(-123)
extends Node
func _init():
print(1)
print(-123)
The type integer number is typically called int
or integer
.
However, there exists programming languages with a single type for number, which applies both for integer, and for real numbers. This is the case, for instance, of JavaScript and older versions of Lua (up to 5.2). In the case of Lua, version 5.3 introduced to the language a type for integer numbers.
Furthermore, it is important to know that there are maximum and minimum limits for numbers in programming languages. This happens because, although numbers are infinite, the memory of a computer is finite. The quantity of different numbers that can be represented digitally depends on the quantity of memory used to encode numbers. To learn more about how computers can store big numbers, you can search for arbitrary-precision arithmetic (also known as big number or bignum arithmetic).
Real Numbers
Numbers in floating-point approximate the set of the real numbers. The choice of the term "approximation" is intentional, because binary codification does not allow to represent all numbers with exact precision. Absolute precision is only possible with integer numbers.
In programming, the separation between the integer and the decimal part of numbers usually used a dot (.
) instead of a comma (.
), because programming languages commonly assume English as the (human) language.
Thus, examples of real numbers are 0.0
, -1.0
, 777.777
, -1234.56
.
Many languages also allow writing numbers in scientific notation or in the form of a multiplication by a power of 10.
For instance, 1.23456789e4
is equal to , that is, 12345.6789
.
console.log(1.0)
console.log(-123.4567890)
console.log(1.23456789e4)
print(1.0)
print(-123.4567890)
print(1.23456789e4)
print(1.0)
print(-123.4567890)
print(1.23456789e4)
extends Node
func _init():
print(1.0)
print(-123.4567890)
print(1.23456789e4)
The type real number is usually called real
, float
or double
.
float
or double
vary according to their precision of float-point representation.
Beginners should usually choose double
, when the choice is possible.
The increased precision (double precision) can help to avoid surprises and common mistakes, due to imprecision of representation and rounding.
Although the errors will still exist in double
, the double of precision (hence the name) is more convenient in common problems used by beginners to learn computational arithmetic.
Real numbers are useful for problems on which a margin of error can be acceptable. When precision is fundamental, you must use integer numbers and adopt an alternative strategy to represent the decimal part. To learn more about this, you can search for arbitrary-precision floating-point arithmetic.
Logic Values
Logic values correspond to the truth values True
and False
.
Thus, all logic values are True
or False
; there are no other options.
The name of the value can vary depending on programming language. In JavaScript, Lua and GDScript, the terms are written in English with lowercase characters. In Python, the term is written in English, with an upper case capital letter.
- JavaScript:
true
andfalse
; - Python:
True
andFalse
; - Lua:
true
andfalse
; - GDScript:
true
andfalse
.
console.log(true)
console.log(false)
print(True)
print(False)
print(true)
print(false)
extends Node
func _init():
print(true)
print(false)
The type logic value is commonly called bool
or boolean
.
Characters
Characters comprehend all symbols that can be used to write text, such as letters, symbols, digits, punctuation, spacing, and some special values for control and formatting.
Special values usually start with a backslash.
For instance, \n
for a line break.
Thus, a character can be, for instance, A
, a
, á
, .
, 1
,
(space), \n
(line break).
Something can be confusing in the previous examples if the character 1
.
Instead of a number (1
), it is a value encoded as a character.
To avoid confusions, programming languages adopt a special syntax for character values, as single or double quotes.
Thus, the previous examples would be written as 'A'
, 'a'
, 'á'
, '.'
, '1'
' '
and '\n'
.
This way, it is easier to distinguish the integer number 1
for the character '1'
.
console.log("a", " ", "b")
print("a", " ", "b")
print("a", " ", "b")
extends Node
func _init():
print("a", " ", "b")
The type characters if often called char
or character
.
Composite Types
Primitive types, as their name suggest, are basic or primitive. Although they are sufficient for many cases, they can also serve as a basis to derive other data types.
There are cases on which it can be interesting to create abstractions or representation that allow to think and work with higher level solutions. To do this, many programming languages define (or allow the definition) of composite types.
Although I, in particular, normally use the term "composite type" as synonym for records (or structs), the term also applies to grouping of values of primate types. The most commons are strings, arrays (vectors), sets, and unions. I prefer to think in grouping as sequences or data structures (data collections), though the term composite type is also applicable to the previous cases.
Strings
Strings allow storing and process text. In the previous topic (Console (Terminal) Output), strings were used for output to write text.
When available, the type string is usually called string
.
There is a special string called the empty string, which is often represented as ""
(double quotes without content) or ''
(single quotes without content).
Using strings can be as simple as using the primitive character type. For beginners, it is a good idea to choose a language on which string uses is simple. However, this is not true for every programming language.
Some programming languages allow using the basic character type for character sequences, called strings. This is often the case for high level languages, such as JavaScript, Python, Lua and GDScript. For beginners, this is the best scenario, for it is the simplest to use.
console.log("Olá, mundo!")
console.log("Nome: Franco\nSobrenome: Garcia")
print("Olá, mundo!")
print("Nome: Franco\nSobrenome: Garcia")
print("Olá, mundo!")
print("Nome: Franco\nSobrenome: Garcia")
extends Node
func _init():
print("Olá, mundo!")
print("Nome: Franco\nSobrenome: Garcia")
Other languages differ a single character from a string.
This can happen, for instance, in low or medium level languages, such as C and C++.
In C and C++, 'A'
and "A"
are different.
In the first case, with single quotes, the value is an uppercase A letter.
In the second case, with double quotes, the value is a string with two characters: an uppercase A, that is, A
, and an implicit special character for string termination, represented as '\0'
(normally with the integer 0 value).
This distinction is commonly a problem for programming beginners.
As C and C++ do not belong to the languages considered for examples, the discussion is not in the scope of this page.
Besides, at this moment, it is enough to understand the types for integer numbers, real numbers, logic values and strings. The next composite types will be studied in their own topics; at this moment, they serve only as curiosity.
Arrays or Vectors
Computers are excellent machines to repeat instructions. In many real world problems, it is rare that a single element or sample is processed; often enough, there are many samples. It is common that problems consider tens, hundreds, millions of similar samples.
Data structures allow storing values and access them using a key (which is an arbitrary value used for easy access). Arrays or vectors are among the simplest examples of data structures, on which values are sequentially stored and accessed by their position (index) in the vector.
console.log(["Red", "Green", "Blue", "Blue"])
print(["Red", "Green", "Blue", "Blue"])
-- Strictly speaking, it is a table.
print({"Red", "Green", "Blue", "Blue"})
extends Node
func _init():
print(["Red", "Green", "Blue", "Blue"])
Some programming languages provide predefined types for arrays. Other provide lists. Some provide both. Others do not provide any.
The distinction is not important at this time; vectors will be studied at a more appropriate time.
Sets
Sets are data structs on which values cannot be repeated, nor have a specific order. In languages that do not provide a type for sets, you can use an array without repeated entries, provided that you take care not to add duplicates.
console.log(new Set(["Red", "Green", "Blue"]))
print({"Red", "Green", "Blue"})
print({"Red", "Green", "Blue"})
extends Node
func _init():
print(["Red", "Green", "Blue"])
Unions
As a curiosity, some programming languages (such as C and C++) allow the definition of unions. A union group values of different types into a same memory region. For instance, a union can admit values for the integer, real and logic types. However, it is only possible to use one of the types at a time. For the previous example, if you use a union with a real value, you cannot use it, at the same time, with an integer or logic value.
There also exists programming languages that provide a type called variant or tagged union, that stores a value and a type.
For instance, Godot Engine provides the custom type Variant
(a class) in C++ for compatibility with GDScript.
The Variant
type provides a generic data type that can be used in C++.
Records or Objects
A record (or struct) is a data type that allows grouping other types into a single custom type. For instance, if you defined a record that groups an integer number, a real number and a logic value, it is possible to store any combination of values for each previous type in a same record.
In Object-Oriented Programming (OOP), a class is a register that can combine data and code, in the form of methods (that are subroutines, like functions or procedures). A object is a possible instance of a class.
The creation of records depends on variables, which is the next topic to be studied. To avoid anticipating concepts, records, class and objects will be addressed at a later time.
Types to Reference Other Types (References and Pointers)
Beside values, there are types that can abstract addresses of the memory of a computer. Computer addresses are a metaphor based on real addresses. A real address allows to find a specific person or organization belonging to a larger region. A computer address is, by analogy, a specific memory position on where it is possible to find a value.
References and pointers are characteristic implementations of types to reference other types. Their importance vary depending on the programming language.
In lower level languages, such as C and C++, they are essential. In higher level languages, such as Python, Lua and JavaScript, the language abstract and simplify the use of references to manipulate the memory.
Once again, to avoid anticipating concepts, reference types will be discussed at a later time.
Typing
Typing refers to stylistic aspects and type verification in programming languages. In practice, some languages require explicit type definition for any values used in a program. Other languages do not; as a result, you may find type incompatibilities only when you run your program.
In general, the preferred term is type safety, that qualifies languages that provide resources to guarantee that types can be used safely, regardless of requiring an explicit necessity to define a type. There are programming languages that cannot provide such safety, because they can provide resources such as coercion. Coercion is an attempt made by a compiler or interpreter to convert a value from a type to another automatically, without a programmer's intervention. It does not always work correctly; JavaScript is a language on which it can be dangerous to depend on coercion without knowing the possible errors.
Other terminology for typing is confusing.
There are languages, such as C, C++ and Java, that requires the programmer to explicit define the type for each value used in a program. This requirement is, at times, called strongly-typed languages. It can also be called static typing.
There are languages that try to infer types based on the values used. In general, these languages are called weakly-typed Depending on the case, they can also have dynamic typing.
Some languages, like Python, LISP, Rust and F#, combine aspects for various forms of typing, having a hybrid typing. For instance, the language can have dynamic typing, though the compiler or interpreter can determine strong typing at run-time (for interpreted languages) or compilation-time (for compiled languages). Python is one of such cases, also providing something called duck typing, that tries to infer characteristics of a type based on the way it was used. In languages such as LISP, Rust and F#, it is often said that the languages perform type inference to ensure the correct type identification from its use.
Finally, there are languages without types. A classical example is assembly. The interpretation and definition of what a value represents in assembly is up to the developer of the system.
Identifying Types
Languages that do not enforce type definition often provide resources to obtain the inferred type. The command or function varies according the language. For primitive types:
- JavaScript:
typeof()
(documentation); - Python:
type()
(documentation); - Lua:
type()
(documentation); - GDScript:
typeof()
(documentation).
console.log(typeof(-1))
console.log(typeof(-1.0))
console.log(typeof(true))
console.log(typeof("string"))
print(type(-1))
print(type(-1.0))
print(type(True))
print(type("string"))
print(type(-1))
print(type(-1.0))
print(type(true))
print(type("string"))
extends Node
func _init():
print(typeof(-1))
print(typeof(-1.0))
print(typeof(true))
print(typeof("string"))
Conversion of Primitive Types
There are situations on which it can be necessary to convert a value from a type to another. For instance:
- Convert an integer number to a real number (or vice-versa);
- Convert an integer number to a string (or vice-versa);
- Convert a real number to a string (or vice-versa);
- Convert an integer number to a logic value (or vice-versa);
- Convert a real number to a logic value (or vice-versa);
- Convert a string to a logic value (or vice-versa).
From the previous case, it only is possible to convert text (strings) to numbers when the strings, in fact, represent a number. Otherwise, the conversion will fail.
At this moment, it is not possible to handle errors without introducing new concepts. In real programs, it is necessary to handle them to avoid problems (such as a program crash).
In languages without coercion, it is required to convert from a type to another when you wish to combine values of different types. In languages with coercion, this is option. Personally, I would recommend doing the conversion explicitly whenever possible, except if the language provides good type safety.
Conventions to convert types can vary according to programming language.
- JavaScript:
- Conversion to string:
String()
(documentation); - Conversion to integer or real number:
Number()
(documentation); It is also possible to use:parseInt()
(documentation);parseFloat()
(documentation).
- Conversion to logic value:
Boolean()
(documentation).
- Conversion to string:
- Python:
- Conversion to string:
str()
(documentation); - Conversion to integer number:
int()
(documentation); - Conversion to real number:
float()
(documentation); - Conversion to logic value:
bool()
(documentation).
- Conversion to string:
- Lua:
- Conversion to string:
tostring()
(documentation); - Conversion to integer or real number:
tonumber()
(documentation); - Conversion to logic value: perform a comparison.
The example compares an empty string as
false
, for results that similar to the other chosen languages. If you prefer, you can change to comparison to other value.
- Conversion to string:
- GDScript:
- Conversion to string:
str()
(documentation); - Conversion to integer number:
int()
(documentation); - Conversion to real number:
float()
(documentation); - Conversion to logic value:
bool()
(documentation).
- Conversion to string:
console.log(String(123))
console.log(String(123.456))
console.log(String(true))
console.log(Number("123"))
console.log(parseInt("123"))
console.log(Number("123.456"))
console.log(parseFloat("123.456"))
console.log(Number(true))
console.log(Boolean(1))
console.log(Boolean(1.0))
console.log(Boolean(0))
console.log(Boolean(0.0))
console.log(Boolean("true"))
console.log(Boolean("false")) // Warning: returns true!
console.log(Boolean("")) // Only the empty string returns false.
print(str(123))
print(str(123.456))
print(str(True))
print(int("123"))
print(int(123.456))
print(int(True))
print(float("123"))
print(float("123.456"))
print(float(123))
print(float(True))
print(bool(1))
print(bool(1.0))
print(bool(0))
print(bool(0.0))
print(bool("True"))
print(bool("False")) # Warning: returns True!
print(bool("")) # Only the empty string returns false.
print(tostring(123))
print(tostring(123.456))
print(tostring(true))
print(tonumber("123"))
print(tonumber(123.456))
print(tonumber(true)) -- Warning: returns nil!
print(true and 1 or 0) -- Alternative to return 1 for the logic value true.
print(false and 1 or 0) -- e 0 para false.
print(tonumber("123"))
print(tonumber("123.456"))
print(tonumber(123))
print(tonumber(true)) -- Warning: returns nil!
print(true and 1.0 or 0.0)
print(false and 1.0 or 0.0)
print(1 ~= 0)
print(1.0 ~= 0.0)
print(0 ~= 0)
print(0.0 ~= 0.0)
print("true" ~= "")
print("false" ~= "") -- Warning: returns true!
print("" ~= "") -- Only the empty string returns false.
extends Node
func _init():
print(str(123))
print(str(123.456))
print(str(true))
print(int("123"))
print(int(123.456))
print(int(true))
print(float("123"))
print(float("123.456"))
print(float(123))
print(float(true))
print(bool(1))
print(bool(1.0))
print(bool(0))
print(bool(0.0))
print(bool("true"))
print(bool("false")) # Warning: returns true!
print(bool("")) # Only the empty string returns false.
In programming languages without an integer type, how can you transform a real number into an integer one?
A simple way is to truncate the value or round it down.
This is possible in any programming language, using a function such as floor()
.
- JavaScript:
Math.floor()
(documentation); - Python:
math.floor()
(documentation); - Lua:
math.floor()
(documentation); - GDScript:
floor()
(documentation).
console.log(Math.floor(Number("123")))
console.log(Math.floor(Number("123.456")))
console.log(Math.floor(Number(true)))
import math
print(math.floor(float("123")))
print(math.floor(float(123.456)))
print(math.floor(float(True)))
print(math.floor(tonumber("123")))
print(math.floor(tonumber(123.456)))
print(math.floor(true and 1.0 or 0.0))
extends Node
func _init():
print(floor(float("123")))
print(floor(float(123.456)))
print(floor(float(true)))
In Python and GDScript, the truncation only has illustrative purposes.
It is simpler and clear to use int()
to perform the conversion, than converting a number to real, then eliminating the decimal part.
New Items for Your Inventory
Tools:
- ASCII table;
- Unicode table;
Skills:
- Choice of data types for representations and abstractions.
Concepts:
- Primitive data types;
- Composite data types;
- Integer numbers;
- Real numbers;
- Characters;
- Strings;
- Logic values.
Programming resources:
- Data types;
- Conversions between data types.
Practice
What primitive data types would you choose to represent the following data:
- Name;
- Name and surname;
- Day of birth;
- Month of birth;
- Year of birth;
- Date of birth;
- Day of the week;
- Nickname for a pet;
- Height in meters;
- Height in centimeters;
- Weight in kilograms;
- Time;
- Address (street, number, neighborhood, city, province, postal code);
- Telephone number;
- A yes or no answer;
- Volume of a bottle;
- The answer to confirm or cancel an action;
- The contents of a book;
- The state of a lamp (turned on or off);
- A shopping list;
- The temperature of an oven;
- The lyrics of a song;
- The score of a level of game;
- The answer to a question with multiple choices;
- The degree of kinship between two persons;
- The required quantity of flour to make bread dough;
- The maximum speed of a racing car;
- The size of font to display text;
- An order state: paid or not;
- The state of an order delivery: confirmed payment, separate order, order in transport, order delivered.
If it is needed, you can divide a dataset in smaller data fields and choose different types for each field. For instance, the date of birth can admit three values: day, month and year. If necessary, each value can be of a different type. You could also include a time, if you wanted.
Some items can be represented in more than a primitive type. If you find such situation, what are the alternatives that you could use?
Strictly speaking, you could choose the string type for all answers, though it is not always the most convenient. Why?
Next Steps
The choice of data types to represent real world objects and processes' state happens in every computer program. A program without data starts and finishes, without intermediate operations. Any non-empty intermediate operation require data.
Numbers, characters and the True
and False
values are the usual primitive data types in programming languages.
They are present from the simplest to the most complex programs.
For instance, from Hello, world!
to your operating system.
Are you able to observe your computer and identify some data types used by your favorite programs?
To transform data in information, you will need to manipulate them. To do this, you will need operations and a way to store results into the memory, to avoid losing them and having to recalculate them every time they are needed.
The next topic introduces variables, which enables you to store results while a programming is running.
- Introduction;
- Entry point and program structure;
- Output (for console or terminal);
- Data types;
- Variables and constants;
- Input (for console or terminal);
- Arithmetic and basic Mathematics;
- Relational operations and comparisons;
- Logic operations and Boolean Algebra;
- Conditional (or selection) structures;
- Subroutines: functions and procedures;
- Repetition structures (or loops);
- Arrays, collections and data structures;
- Records (structs);
- Files and serialization (marshalling);
- Libraries;
- Command line input;
- Bitwise operations;
- Tests and debugging.