Acting as a Software Engineer

For a successful application of any industry standard and technology (like, OpenAPI Specification) there are generic assumptions pertaining to software engineering fundamentals. This blog tries to illuminate what we expect as a sort of minimal software engineering knowledge and culture as a precondition for successful application of new standards and technologies. We will illuminate the following pertinent software engineering practices in a pragmatic manner:

  • Holistic approach to software development and accepting the interconnectedness of various software engineering knowledge areas.

  • The importance of test-driven development.

  • The importance of documentation, especially active variants.

  • The necessity for deep knowledge and experience with a target programming language and its standard library.

  • The necessity for adequate knowledge and experience with a target distributed system (like, Kafka) to effectively develop solutions around it.

  • To admire and appreciate that quality is everyone's responsibility.

From a personal experience, I will share an idea on how to look at the technical lead role in an organization. Any Agile method (including scaled variants) promotes structuring work around small cross-functional squads. Usually each squad has a person in the role of a technical lead. In exceptional cases, when a squad covers multiple disparate domains, there might be multiple persons in this role, each covering a specific area of expertise. This role is central for disseminating knowledge among squad members and for maintaining inter squads communication channels. This last item is often a weak spot in many companies, as squads are frequently not aware of what others are doing, what sort of problems they are facing, and what are the additional needs pertaining to reusable assets developed by a squad. To connect all the dots the article will continue by depicting a concocted story of a squad trying to develop a dependable software solution.

I would like to emphasize the importance of instigating those personal traits in people that lead to self-motivated individuals eager to learn, be creative problem solvers, and share knowledge and ideas with others.

Securing a Consumer

To make our elaboration devoid of abstractions, suppose there is a task to securely invoke a shell command by a Kafka consumer to check available space on a local disk (for example, a consumer might store records in a local database). Furthermore, assume that this consumer is marked as a business critical component. Last, but not least, a consumer is fully obeying the best practices for secure deployment of Kafka published by Confluent. Some of the principles that a particular developer may hear are enrolled below (we will use Python 3 in this section, but the story applies to all languages):

  • Avoid security by obscurity.

  • Treat all users as potentially malicious (even protect against internal mischief).

  • Be careful with the subprocess module in Python when using the shell=True option.

  • Design for security from the very beginning, as integrated security is the best approach.

  • Don’t run your code with an elevated privilege level, unless necessary.

  • Reduce attack vectors by reducing complexity of your code.

The previous suggestions are indisputably valid, although not enough for developers to properly understand them. This is why there is a need for guidance around standards and a holistic viewpoint on software development. Figure 1 illustrates the interconnections between various standards and guidelines.

Figure 1. The area on the right is outside of scope of the Kafka security standard, although the consumer is a Kafka client. This is why other more general standards are equally important for a fully secure solution. The case study continues by focusing on the orange components.

Initial Version

The consumer expects to read a configuration file at startup for the name of a directory where a local database should write files. This configuration file isn’t particularly protected and many guys have access to it. The first version of the disk_usage_insecure.py module is shown below:

import subprocess


def disk_usage(folder_name):

cmd = 'du -sh ' + folder_name

print(f"Invoking the command: {cmd}")

disk_space = subprocess.run(cmd, stdout=subprocess.PIPE, shell=True)

return disk_space.stdout.decode("UTF-8")

Here is the sample run:

>>> print(disk_usage("."))

Invoking the command: du -sh .

16K .

What can go wrong here? Well, instead of a folder name, someone may provide the following string “.;rm -rf ~” as a folder name, and there will be more free space on the disk than anybody would like. Just for fun, here is a benign demonstration what may happen:

>>> print(disk_usage(folder_name=".;echo Test"))

Invoking the command: du -sh .;echo Test

20K .

Test

The Assumed Job of a Technical Lead

A technical lead should be, among other things, focused on interpreting best practices and preparing a proper environment for its squad. More specifically, she/he may develop supporting software, so that passive words would be turned into action. Instead of writing down dozens of rules, principles, and suggestions there should be reusable software artifacts addressing specific concerns of some standard. Let us see how this may look like in practice by pretending to play the role of a technical lead.

We start with clear goal statements:

  1. Improve security of a code base by offering easy mechanisms for developers to leverage.

  2. Give ready templates how to structure code, write proper tests, and document artifacts.

Our job boils down to implementing a decent decorator that may adorn secure functions and sanitize input. We will develop unit tests and proper documentation. So, let us start with our decorator (in Python 3 the Decorator pattern is essentially an idiom of the language) called sanitize, as shown below:

from functools import wraps

import shlex

def sanitize(arg_name: str, guarded: bool = True):

"""

Decorator which sanitizes the input string provided as an argument to

a decorated function.

Args:

arg_name: The name of the input argument to sanitize.

This argument must be passed as a keyword argument during

the original function call.

guarded: If `False`, then a mismatch between assumed and actual

argument names will be ignored. Otherwise, a target function will

not be called and `None` will be returned as a result.

Returns:

A shell-escaped version of the input string.

"""

def decorator(function):

@wraps(function)

def wrapper(*args, **kwargs):

if arg_name in kwargs:

kwargs[arg_name] = shlex.quote(kwargs[arg_name])

elif guarded:

return None

return function(*args, **kwargs)

return wrapper

return decorator

Notice the proper documentation that allows any user to find out information by typing help(sanitize). Next is the set of automated unit tests for our decorator (the name of the tests are self-explanatory):

from sanitize import sanitize

import unittest

class TestSanitize(unittest.TestCase):

@sanitize('user_input')

def _secure_action_global(cmd, user_input):

return user_input

def test_proper_wrapping(self):

self.assertEqual(TestSanitize._secure_action_global.__name__, '_secure_action_global')

def test_input_is_sanitized(self):

self.assertEqual(TestSanitize._secure_action_global('ls -l ', user_input='my_dir; cd ~'),

"'my_dir; cd ~'")

def test_input_is_not_found_in_non_guarded_mode_and_nothing_is_sanitized(self):

@sanitize('client_input', guarded=False)

def _secure_action_local(cmd, user_input):

return user_input

self.assertEqual(_secure_action_local('ls -l ', user_input='my_dir; cd ~'),

'my_dir; cd ~')

def test_input_is_not_found_in_guarded_mode_and_None_is_returned(self):

@sanitize('client_input')

def _secure_action_local(cmd, user_input):

return user_input

self.assertIsNone(_secure_action_local('ls -l ', user_input='my_dir; cd ~'))

if __name__ == '__main__':

unittest.main()

Now, everything is ready for a developer to enhance its initial version. Besides getting this decorator, our squad members also got guidance on how to craft code documentation and write good automated unit tests.

Secured Version

Thankfully to the handy decorator, here is the embellished version of the disk_usage function; only two extra lines are added to an existing code, one import statement and usage of our decorator:

from sanitize import sanitize

import subprocess

@sanitize("folder_name")

def disk_usage(folder_name):

cmd = 'du -sh ' + folder_name

print(f"Invoking the command: {cmd}")

disk_space = subprocess.run(cmd, stdout=subprocess.PIPE, shell=True)

return disk_space.stdout.decode("UTF-8")

It cannot be easier, right? Let see this decorator in action by trying to pass more than a folder name:

>>> print(disk_usage(folder_name=".;echo Test"))

Invoking the command: du -sh '.;echo Test'

du: .;echo Test: No such file or directory

As expected, the folder name is now quoted and interpreted as a pure string. Observe that proactively preventing problems is the best way to boost overall quality and combat incidents in production.

Conclusions

Being aware of software engineering fundamentals and applying a holistic approach to software development is a prerequisite for lowering production issues. Through an artificial story you have witnessed the importance of a technical lead role as well as how proactive protection mechanisms against quality problems are far more effective than tons of passive documentation.