A Theoretical Examination of the Abstraction (6/8/2018)

It is without doubt that all software developers have always used abstractions, be it formatting data using a struct, writing a function, or perhaps reasoning about native processor instructions with a one-to-one textual representation. What is an abstraction, how do abstractions affect code, and are they useful?

To understand what an abstraction is in the context of software development, it first must be asserted that a programmer interacts with a program with source code; that is, a programmer does their job by writing sequences of characters that can be translated to some form of an instruction or set of instructions to a machine. An abstraction, in the context of software development, is to change the sequences of characters that a programmer can type such that the interface that a programmer is reasoning about changes fundamentally. This is usually done in the pursuit of more efficient software development through code reuse; that is, in fact, why concepts like "structs" or "functions" exist; they have been created in order to allow a programmer to specify more with less code.

A C programmer might deal with many abstractions, including (but not limited to) variables, functions, and structs.

A "variable" is a method by which a programmer can consider some block of physical memory as representing some discrete entity that persists for some period of time. Consider the following:

 Address | State of Physical Memory
------------------------------------
       0 | 00000000
       1 | 00000000
       2 | 00000000
       3 | 00000000
       4 | 00000000
       5 | 00000000
       6 | 00000000
       7 | 00000000
       8 | 00000000
       9 | 00000000
      10 | 00000000
      11 | 00000000
      12 | 00000000
      13 | 00000000
      14 | 00000000
      15 | 00000000

In C, when sizeof(int) == 4:

int x;

// Suppose that x refers to the memory from 
// address 0 to address 3 (occupying 4 addresses, 
// when each address references a byte). Now, 
// those 4 bytes can be referenced just using 'x', 
// and their value can be modified, with, say:

x = 5;

A "function" is a method by which a programmer can reuse a specific set of instructions. Consider the following:

int x, y, z, result;

// Some complicated math operation (don't
// pay too much attention to it, it's totally arbitrary)
x = 5;
y = 1;
z = 3;
result = x*y - x/z + y*47 + z*23 + x%z + ^z * ~x;

// This operation might need to happen more than once,
// but maybe with different values:

x = 2;
y = 17;
z = 43;
result = x*y - x/z + y*47 + z*23 + x%z + ^z * ~x;

x = 4;
y = 12;
z = 1;
result = x*y - x/z + y*47 + z*23 + x%z + ^z * ~x;

x = 1;
y = 2;
z = 3;
result = x*y - x/z + y*47 + z*23 + x%z + ^z * ~x;

Now, suppose that the operation must change in order to meet the needs of the program. Now, the programmer working on this bit of code can not simply modify the operation that is occurring; they must modify 4 places in the code. In very large, complicated programs, this isn't maintainable, so a function is introduced:

int do_a_complicated_math_operation(int x, int y, int z) {
    return x*y - x/z + y*47 + z*23 + x%z + ^z * ~x;
}

// Somewhere else...

int result;

result = do_a_complicated_math_operation(5, 1, 3);
result = do_a_complicated_math_operation(2, 17, 43);
result = do_a_complicated_math_operation(4, 12, 1);
result = do_a_complicated_math_operation(1, 2, 3);

Now, if the math operation must be modified, the function can change, but the interface that one must interact with in order to retrieve the result of such an operation stays the same. The code must only change in one location.

A "struct" is a method by which a programmer can reuse a specific data format. Consider the following:

// Data for a person
const char *name;
int age;
int salary;
int number_of_children;
bool will_buy_the_melodist;
bool will_subscribe_to_ryan_on_youtube;

name = "Ryan Fleury";
age = 20;
salary = SOME_LOW_CONSTANT;
number_of_children = 0;
will_buy_the_melodist = false;
will_subscribe_to_ryan_on_youtube = false;
print_out_person_data(name, age, salary, number_of_children, will_buy_the_melodist);

name = "Handmade Network Community Member";
age = 32;
salary = 75000;
number_of_children = 2;
will_buy_the_melodist = true;
will_subscribe_to_ryan_on_youtube = true;
print_out_person_data(name, age, salary, number_of_children, will_buy_the_melodist);

// Imagine having to do this several times, and imagine modifying
// print_out_person_data or the data required for a "person" in
// the program.
//
// The same data is required in all places, so a struct can be 
// introduced. The print_out_person_data can be modified to
// reason about the struct.

typedef struct Person {
    const char *name;
    int age;
    int salary;
    int number_of_children; 
    bool will_buy_the_melodist;
    will_subscribe_to_ryan_on_youtube;
} Person;

Person person1 = {
    "Some Name",
    23,
    1000000,
    0,
    true,
    true
};

print_out_person_data(&person1);

// Now, imagine having to modify the data for 
// a person. It won't be nearly as laborious.

It follows from the above examples that, when an abstraction is introduced to code, the code can change in one of two ways.

The first possibility is that capability of some operation or memory format is lost. When the Person struct was introduced, in all code that reasons about a 'Person', the format in which data for a person is stored for the program must necessarily be consistent. When the function do_a_complicated_math_operation was introduced, the exact math operation could not be significantly restructured from the site at which the function was called; thus, capability has been lost. In the most basic case, one can imagine an extraordinarily complicated operation being wrapped in a function that takes no arguments; at sites that call said function, there is little capability over what takes place inside of the function.

The other possibility is that, if steps are taken to use abstractions but maintain capability, the complexity of an operation or specification of a memory format is not 'hidden', as many will claim, but it will rather be moved elsewhere. Imagine the opposite extreme of the aforementioned complicated function with no arguments; it would be a function with infinite arguments, to control every possible operation that could possibly be performed within the function.

The idea that abstractions have one of the two above affects on code rings true when very high level and modern languages are examined; the amount of experience and expertise required to fully understand, say, JavaScript or Python, is still extraordinarily high. The abstracted nature of the languages has done very little to reduce the complexity with which programmers are dealing; the complexity, instead, has moved elsewhere (in order to maintain some level of capability).

Are abstractions useful, then? The answer should follow from the earlier examples: Yes. It is very much useful at times to reduce capability and complexity in favor of code reuse to promote productivity and reduce mental overhead; however, it must be noted by programmers that complexity cannot be reduced without losing capability. Even in the science-fiction utopia in which a computer perfectly interprets a sentence in a spoken language and performs the best possible operation to react, the capability is supposedly extremely high, so the complexity must have moved somewhere. The complexity has, in fact, embedded itself in the complexity of the spoken language and human communication; it's vital to understand that it has not disappeared.

This is all to make the point that it is, in fact, impossible to maintain capability while also reducing complexity, and therefore the ivory tower of increasing abstraction in pursuit of simplifying problems is, in fact, useless. It is perhaps, then, most useful to concern oneself only with the complexities of the most transferable skills, and those that are not arbitrary, so that the constraints under which one is working are strictly grounded in reality and utility, and not the arbitrary decisions of another individual. Abstractions are undeniably useful, but the decision to introduce an abstraction should be a reasoned choice, instead of an assumed action.

In a language like C, it is true that the abstractions that one is working with are fundamentally defined not by reality but by humans; however, they have been structured around building a more useful tool to command a machine's hardware. The goal of the language and its abstractions is not to disregard hardware and its complexities entirely, but rather to make the hardware more convenient for a programmer to command (as there are many repeated operations that a programmer does when programming at a lower level). In other, more high level languages, the goal has shifted to abstract away the hardware entirely and instead force the programmer to reason about nebulous ideas born in the minds of others instead of in physical reality. To program effectively in those languages, then, one must concern themselves with the complexities of the mental models by which those languages were formed. It should follow that a deep understanding of these mental models is not transferable, and is more arbitrary.

In addition to the above implications of abstractions, there are also many performance implications, though this post was focusing on the theoretical aspect of abstractions in particular.

I hope that this was helpful in promoting programmer reasoning about abstractions, why they should be used, and what sort of implications they can have on code.

Thanks for reading!