Comparing Bash and Python for Linux scripting

Sh (from English shell) is a mandatory command interpreter for UNIX-compatible systems according to the POSIX standard. However, its capabilities are limited, so more feature-rich command interpreters such as Bash or Ksh are often used instead. Ksh is typically used in BSD-family operating systems, while Bash is used in Linux-family operating systems. Command interpreters simplify solving small tasks related to working with processes and files. This article will focus on Linux operating systems, so the discussion will revolve around Bash.

Python, on the other hand, is a full-fledged interpreted programming language, often used for writing scripts or solving small application tasks. It is hard to imagine a modern UNIX-like system without both sh and Python, unless it is a device with a minimalist OS like a router. For example, in Ubuntu Oracular, the python3 package cannot be removed because the grub-common package depends on it, which in turn depends on grub2-common and, consequently, grub-pc, the actual operating system bootloader. Thus, Python 3 can confidently be used as a replacement for Bash when necessary.

When solving various tasks at the OS or file system level, the question may arise: which language, Bash or Python, is more advantageous to use in a particular case? The answer depends on the task at hand. Bash is advantageous when you need to quickly solve a simple task related to process management, file search, or modification. However, as the logic becomes more complex, Bash code can become cumbersome and difficult to read (although readability primarily depends on the programmer). Of course, you can break the code into scripts and functions, create sh-libraries, and connect them via the source command, but covering them with modular tests becomes challenging.

Preface

Who is this article for?

This article is for those who are interested in system administration, are familiar with one of the two languages, and want to understand the other. Or for those who want to learn about some features of Bash and Python that they might not have known before. Basic command-line skills and familiarity with programming fundamentals are required to understand the material.

For a complete picture, including code readability, the article will compare debugging capabilities, syntax, and various use cases. Similar examples in both languages will be provided. In Python code, you may occasionally see commas at the end of enumerations—this is not an error. Such a style is considered good practice because it avoids marking the last element as modified when adding new elements to the enumeration.

The article will consider Bash version 3.0 or higher and Python version 3.7 or higher.

Debugging Scripts

Both languages are interpreted, meaning that during script execution, the interpreter knows a lot about the current execution state.

Debugging in Bash

Debugging via xtrace

Bash supports the xtrace option (-x), which can be set either in the command line when starting the interpreter or within the script itself:

#!/bin/bash

# Specify where to write logs, open the file for writing:
exec 3>/path/to/log/file
BASH_XTRACEFD=3  # which file descriptor to output debug information to

set -x # enable debugging
# ... code to debug ...
set +x # disable debugging

Such logs can also be written to the systemd journal if implementing a simple service:

#!/bin/bash

# Specify where to write logs:
exec 3> >(systemd-cat --priority=debug)
BASH_XTRACEFD=3  # which stream to output debug information to

set -x # enable debugging
# ... code to debug ...
set +x # disable debugging

Debugging in Bash will show which commands are being executed and with which arguments. If you need to get the current values of variables or the code of executed functions, you can do this with the set command without arguments. However, since the output of the command can be quite large, set is more suitable for manual debugging than for event logging.

Debugging via trap

Another debugging method is setting handlers for command execution using the trap command on the special “trap” DEBUG. The commands being executed can be obtained through the built-in variable BASH_COMMAND. However, you cannot get the return code from this handler because it is executed before the command itself is called.

trap 'echo "+ $BASH_COMMAND"' DEBUG

But it will be more useful to intercept errors and output the command and line number where the error occurred. To inherit this interception by functions, you also need to set the functrace option:

set -o functrace
trap 'echo "+ line $LINENO: $BASH_COMMAND -> $?"' ERR

# Test:
ls "$PWD"
ls unknown_file

Debugging in Python

Debugging via pdb

Python has rich debugging and logging tools. For debugging, Python has the pdb module. You can run a script with debugging enabled from the console, in which case the debug mode will be activated in exceptional situations:

python3 -m pdb my_script.py

Directly in the code, you can set breakpoints using the built-in breakpoint() function.

#!/usr/bin/python3

import os

breakpoint()
# Now you can try, for example, the source os command:
# (Pdb) source os

The language is object-oriented, and everything in it is an object. You can see the methods available for an object using the dir() command. For example, dir(1) will show the methods available for the object 1. Example of calling one of these methods: (1).bit_length(). In many cases, this helps to understand arising questions without the need to read the documentation. In debug mode, you can also use the dir() command to get information about objects and print() to get variable values.

Logging via the logging module

Python provides the logging module, which allows you to log debug information with specified logging levels and log sources. In general, logging looks something like this:

import logging

logging.basicConfig(
    filename="myscript.log",
    level = logging.DEBUG, # output DEBUG, INFO, WARNING, ERROR, and CRITICAL levels
)

logger = logging.getLogger('MyApp')

logger.debug('Some debug information')
logger.error('Some error')

Comparison of Bash and Python Semantics

Variables and Data Types

Primitive Data Types

In Bash, all variables are strings, but string variables can also be used as numbers. For arithmetic calculations, the syntax $(( expression )) is used.

str_var="some_value"  # string, array of characters

int_var=1234  # string "1234", but can be used in calculations
int_var=$(( 1 + (int_var - 44) / 111 - 77 ))  # string: "-66"

In Python:

str_var="some_value"  # class str
int_var = 1234  # class int
int_var = 1 + (int_var - 44) // 111 - 77  # -66, class int

Floating-point numbers are not supported in Bash. And this is logical, because if you need to use floating-point numbers in command-line scripts, you are clearly doing something at the wrong level or in the wrong programming language. However, floating-point numbers are supported in Ksh.

String Formatting

Both Bash and Python support variable substitution in formatted strings. In Bash, formatted strings are strings enclosed in quotes, while in Python, they are strings with the f prefix.

Both languages also support C-like style output of formatted strings. In Bash, this way you can even format floating-point numbers, although the language itself does not support them (the decimal separator is determined by the locale).

var1='Some string'
var2=0,5
echo "Variable 1: $var1, variable 2: $var2"
# Variable 1: Some string, variable 2: 0,5

# Without the current locale
LANG=C 
printf 'String: %s, number: %d, floating-point number: %f.n' 
        'str' '1234' '0.1'

# With the current locale
printf 'String: %s, number: %d, floating-point number: %f.n' 
        'str' '1234' '0,1'
# String: str, number: 1234, floating-point number: 0,100000.

In Python:

var1 = 'Some string'
var2 = 0.5
print(f"Variable 1: var1, variable 2: var2")
# Variable 1: Some string, variable 2: 0.5

# Without the current locale:
print('String: %s, number: %d, floating-point number: %f.'
        % ('str', 1234, 0.1))
# String: str, number: 1234, floating-point number: 0.100000.

# With the current locale:
import locale
locale.setlocale('')  # apply the current locale
print(locale.format_string('String: %s, number: %d, floating-point number: %f.',
        ('str', 1234, 0.1)))
# String: str, number: 1234, floating-point number: 0,100000.

You can notice a difference regarding the locale—in Python, the print() function ignores the locale. If you need to output values considering the locale, you must use the locale.format_string() function.

Arrays

In Bash, arrays are essentially text separated by spaces (by default). The syntax is very specific; for example, to copy an array (via @), you must enclose all its elements in quotes; otherwise, any spaces in the elements themselves will cause the element to be split into parts. But in general, working with arrays is similar in simple cases:

arr=( 'First item' 'Second item' 'Third item' )
echo "$arr[0]" "$arr[1]" "$arr[2]"
arr_copy="$arr[@]"  # copying the array, quotes are mandatory
arr[0]=1
arr[1]=2
arr[2]=3
echo "$arr[@]"
echo "$arr_copy[0]" "$arr_copy[1]" "$arr_copy[2]"

In Python:

arr = [ 'First', 'Second', 'Third' ]
print(arr[0], arr[1], arr[2])
arr_copy = arr.copy()  # but you can also do it like in Bash: [ *arr ]
arr[0] = 1
arr[1] = 2
arr[2] = 3
print(*arr)
print(arr_copy[0], arr_copy[1], arr_copy[2])

The * operator in Python performs unpacking of lists, dictionaries, iterators, etc. That is, the elements of the array are listed as if they were separated by commas as arguments.

Associative Arrays

Bash also supports associative arrays (unlike Sh), but the capabilities for working with them are limited. In Python, associative arrays are called dictionaries, and the language provides very rich capabilities for working with them.

declare -A assoc_array=(
  [name1]='Value 1'
  [name2]='Value 2'
  [name3]='Value 3'
)

# Assigning a value by key:
assoc_array['name4']='Value 4'  # assigning a value

# Element-wise access:
echo "$assoc_array['name1']" 
        "$assoc_array['name2']" 
        "$assoc_array['name3']" 
        "$assoc_array['name4']"

echo "$!assoc_array[@]" # output all keys
echo "$assoc_array[@]" # output all values

# Iterate over all elements
for key in "$!assoc_array[@]"; do
    echo "Key: $key"
    echo "Value: $assoc_array[$key]"
done

In Python:

assoc_array = 
  'name1': 'Value 1',
  'name2': 'Value 2',
  'name3': 'Value 3',


# Assigning a value by key:
assoc_array['name4'] = 'Value 4'

# Element-wise access
print(
    assoc_array['name1'],
    assoc_array['name2'],
    assoc_array['name3'],
    assoc_array['name4']
)

print(*assoc_array)  # output all keys
print(*assoc_array.values())  # output all values

for key, value in assoc_array.items():
    print(f"Key: key")
    print(f"Value: value")

Module Importing

In Bash, there are no modules as such. But you can execute a script in the current interpreter using the source command. Essentially, this is analogous to importing modules, since all functions of the included script become available in the current interpreter’s namespace. In Python, there is full support for modules with the ability to import them. Moreover, the Python standard library contains a large number of modules for a wide variety of use cases. Essentially, what is implemented in Bash by third-party command-line utilities may be available in Python as standard library modules (and if not, you can install additional libraries).

In Bash:

# Include the file mylib.sh with some functions:
source mylib.sh

# Let's see the list of available functions (all of them):
declare -F

In Python:

# Import the module mylib.py or mylib.pyc:
import mylib

# Let's see the list of available objects in the mylib module:
print(dir(mylib))

Conditionals and Loops

Conditional Operator

In Bash, conditions work on two principles: either a command is provided as a condition, and its return code is checked, or built-in Bash double square or double round brackets are used. In the case of a return code, 0 is true (everything is fine), while in the case of double round brackets, it’s the opposite—the result of an arithmetic expression is checked, where 0 is false.

In Python, the standard approach for programming languages is used: False, 0, '', [], set(), —all of these are equated to False. Non-empty, non-zero values are equated to True.

In Bash:

if [[ "$PWD" == "$HOME" ]]; then
    echo 'Current directory: ~'
elif [[ "$PWD" == "$HOME"* ]]; then
    echo "Current directory: ~$PWD#$HOME"
else
    echo "Current directory: $PWD"
fi

if (( UID < 1000 )); then
    echo "You are logged in as a system user. Please log in as yourself."
fi

In Python:

import os

curr_dir = os.environ['PWD']
home_dir = os.environ['HOME']

if curr_dir == home_dir:
    print('Current directory: ~')
elif curr_dir.startswith(home_dir):
    print('Current directory: ~' + curr_dir[len(home_dir):])
else:
    print(f"Current directory: curr_dir")

if os.environ['UID'] < 1000:
    print('You are logged in as a system user. Please log in as yourself.')

Loops

Both languages support for and while loops.

Loop with Element Iteration

In both languages, the for loop supports element iteration via the in operator. In Bash, elements of an array or elements of a string separated by separators recorded in the IFS variable (default: space, tab, and newline) are iterated. In Python, the in operator allows iterating over any iterable objects, such as lists, sets, tuples, and dictionaries, and is safer to work with.

In Bash:

# Recoding text files from CP1251 to UTF-8
for filename in *.txt; do
    tmp_file=`mktemp`
    iconv -f CP1251 -t UTF-8 "$filename" -o "$tmp_file"
    mv "$tmp_file" "$filename"
done

In Python:

import glob
from pathlib import Path

# Recoding text files from CP1251 to UTF-8
for filename in glob.glob('*.txt'):
    file = Path(filename)
    text = file.read_text(encoding='cp1251')
    file.write_text(text, encoding='utf8')
Loop with Counter

A loop with a counter in Bash looks unusual; the form for arithmetic calculations is used (((initialization; conditions; actions after iteration))).

In Bash:

# Get a list of all locally registered hosts:
mapfile -t lines < <(grep -P -v '(^s*$|^s*#)' /etc/hosts)

# Output the list with numbering:
for ((i = 0; i < "$#lines[@]"; i += 1)); do
    echo "$((i + 1)). $lines[$i]"
done

In Python:

from pathlib import Path
import re

def is_host_line(s):
    return not re.match(r'(^s*$|^s*#)', s)

lines = list(filter(is_host_line, Path('/etc/hosts').read_text().splitlines()))

for i in range(0, len(lines)):
    print(f"i + 1. lines[i]")

Functions

As in regular languages, Bash supports functions. In essence, functions in Bash are similar to separate scripts—they can also accept arguments like regular scripts, and they return a return code. But, unlike Python, they cannot return a result other than a return code. However, you can return text through the output stream.

In Bash:

some_function()

    echo "Script: $0."
    echo "Function: $FUNCNAME."
    echo "Function arguments:"
    for arg in "$@"; do
        echo "$arg"
    done

    return 0


some_function One Two Three Four Five
echo $? # Return code

In Python:

import inspect

def some_function_is_ok(*args):
    try:  # If suddenly run from the interpreter
        script_name = __file__
    except:
        script_name=""
    print('Script: ' + script_name)
    print('Function: ' + inspect.getframeinfo(inspect.currentframe()).function)
    print('Function arguments:')
    print(*args, sep='n')
    return True

result = some_function_is_ok('One', 'Two', 'Three', 'Four', 'Five')
print(result) # True

Input, Output, and Error Streams

The input stream is used to receive information by a process, while the output stream outputs information. Why streams and not regular variables? Because in streams, information can be processed as it arrives. Since information from the output stream can undergo further processing, error messages can break this information. Therefore, errors are output to a separate error stream. However, when running a command in interactive mode, these streams are mixed. Since these are streams, they can be redirected, for example, to a file. Or vice versa, read a file into the input stream. In Bash, the input stream has the number 0, the output stream—1, and the error stream—2. If the stream number is not specified in the redirection operator to a file, the output stream is redirected.

Writing to a File

Writing to a file in Bash is done using the > operator, which redirects the command’s output to the specified file. In Python, you can write text files using the pathlib module or standard means—by opening a file via the open() function. The latter option is more complex but is well-known to programmers.

In Bash:

# Clear a text file by redirecting an empty string to it:
echo -n > some_text_file.txt

# Write a line to a file, overwriting it:
echo 'Line 1' > some_other_text_file.txt

# Append a line to a file:
echo 'Line 2' >> some_other_text_file.txt

In Python:

from pathlib import Path

# Overwrite the file with an empty string (make it empty):
Path('some_text_file.txt').write_text('')

# Overwrite the file with a line:
Path('some_other_text_file.txt').write_text('Line 1')

# Open the file for appending (a):
with open('some_other_text_file.txt', 'a') as fd:
    print('Line 2', file=fd)

Writing Multi-line Text to a File

For multi-line text in Bash, there is a special heredoc format (an arbitrary label after <<<, the repetition of which on a new line will mean the end of the text), which allows redirecting arbitrary text to the command’s input stream, and from the command, it can be redirected to a file (and here you can’t do without the external cat command). Redirecting file contents to a process is much simpler.

In Bash:

# Redirect multi-line text to a file for appending:
cat <<<EOF >> some_other_text_file.txt
Line 3
Line 4
Line 5
EOF

# Redirect file contents to the cat command:
cat < some_other_text_file.txt

In Python:

# Open the file for appending (w+):
with open('some_other_text_file.txt', 'w+') as fd:
    print("""Line 3
Line 4
Line 5""", file=fd)

# Open the file for reading (r):
with open('some_other_text_file.txt', 'r') as fd:
    # Output the file contents line by line:
    for line in fd:
        print(line)
    # Or fd.read(), but then the entire file will be read into memory.

Reading from a File

In Bash, reading from a file is done via the < sign. In Python, you can read in the standard way via open(), or simply via Path(...).read_text():

In Bash:

cat < some_other_text_file.txt

In Python:

import pathlib

print(Path('some_other_text_file.txt').read_text())

Stream Redirection

Streams can be redirected not only to a file or process but also to another stream.

In Bash:

error()

    # Redirect the output stream and error stream to the error stream (2).
    >&2 echo "$@"


error 'An error occurred.'

In Python:

print('An error occurred.', file=sys.stderr)

In simple cases, redirection to a file or from a file in Bash looks much clearer and simpler than writing to a file or reading from it in Python. However, in complex cases, Bash code will be less understandable and more difficult to analyze.

Executing External Commands

Running external commands in Python is more cumbersome than in Bash. Although, of course, there are simple functions subprocess.getoutput() and subprocess.getstatusoutput(), but they lose the advantage of Python in terms of passing each individual argument as a list element.

Getting Command Output

If you simply need to get text from a command and you are sure that it will always work, you can do it as follows:

In Bash:

cmd_path="`which ls`"  # backticks execute the command and return its output
echo "$cmd_path"  # output the command path

In Python:

import subprocess
cmd_path = subprocess.getoutput("which ls").rstrip('n')
print(cmd_path)  # output the path to the ls command

But getting command output via backticks in Bash will be incorrect if you need to get an array of lines. In Python, subprocess.getoutput() accepts a command line, not an array of arguments, which carries some risks when substituting values. And both options do not ignore the return code of the executed command.

Running a utility in Python to get some list into a variable will take much more code than in Bash, although the code in Python will be much clearer and simpler:

In Bash:

mapfile -t root_files < <(ls /)  # put the list of files from / into root_files
echo "$root_files[@]"  # Output the list of files

In Python:

import subprocess
result = subprocess.run(
        ['ls', "https://techplanet.today/"],  # we are sure that such a command exists
        capture_output = True,  # get the command output
        text = True,  # interpret input and output as text
)
root_files = result.stdout.splitlines()  # get lines from the output
print(*root_files, sep='n')  # output one file per line

Getting and Processing Return Codes

With full error handling, it becomes even more complicated, adding checks that complicate the code:

In Bash:

root_files="`ls /some/path`"  # Run the command in backticks
if [[ $? != 0 ]]; then
    exit $?
fi
echo "$root_files[@]"  # Output the list of files

In Python:

import subprocess
import sys

result = subprocess.run(
        ['ls', '/some/path'],
        capture_stdout = True,  # get the command output
        text = True,  # interpret input and output as text
        shell = True, # to get the return code, not an exception, if the command does not exist
)
if result.returncode != 0:
    sys.exit(result.returncode)
root_files = result.stdout.split('n')  # get lines from the output
del root_files[-1]  # the last line will be empty due to n at the end, delete it
print(*root_files, sep='n')  # output one file per line

Executing a Command with Only Getting the Return Code

Executing a command with only getting the return code is slightly simpler:

In Bash:

any_command any_arg1 any_arg2
exit_code=$? # get the return code of the previous command
if [[ $exit_code != 0 ]]; then
    exit 1
fi

In Python:

import subprocess
import sys

result = subprocess.run(
    [
        'any_command',
        'any_arg1',
        'any_arg2',
    ],
    shell = True,  # to get the error code of a non-existent process, not an exception
)
if result.returncode != 0:
    sys.exit(1)

Exceptions Instead of Handling Return Codes

But everything becomes even simpler if the script exit mode on any error is enabled. In Python, this approach is used by default; errors do not need to be checked manually; a function can throw an exception and crash the process.

In Bash:

set -o errexit  # crash on command errors
set -o pipefail  # the entire pipeline fails if there is an error inside the pipeline

critical_command any_arg1 any_arg2

In Python:

import subprocess

subprocess.run(
    [
        'critical_command',
        'any_arg1',
        'any_arg2',
    ],
    check = True, # throw an exception on a non-zero return code
)

In some cases, exceptions can be caught and handled. In Python, this is done via the try operator. In Bash, such catches are done via the usual if operator.

In Bash:

set -o errexit  # crash on command errors
set -o pipefail  # the entire pipeline fails if there is an error inside the pipeline

if any_command any_arg1 any_arg2; then
    do_something_else any_arg1 any_arg2
fi

In Python:

import subprocess

try:
    subprocess.run(
        [
            'critical_command',
            'any_arg1',
            'any_arg2',
        ],
        check = True,  # throw an exception on a non-zero return code
    )
except:
    subprocess.run(
        [
            'do_something_else',
            'any_arg1',
            'any_arg2',
        ],
        check = True,  # throw an exception on a non-zero return code
    )

In high-level languages, error handling via exceptions is preferred. The code becomes simpler and clearer, meaning there is less chance of making a mistake, and code review becomes cheaper. Although sometimes such checks look more cumbersome than a simple return code check. Whether to use this style of error handling largely depends on whether such exception checks will be frequent or will be in exceptional cases.

Building Pipelines

In Bash, pipelines are common practice, and the language itself has syntax for creating pipelines. Since Python is not a command interpreter, it is done a bit more cumbersome via the subprocess module.

In Bash:

ls | grep -v '.txt$' | grep 'build'

In Python:

import subprocess

p1 = subprocess.Popen(
    ['ls'],
    stdout = subprocess.PIPE,  # to pass output to the next command
    text = True,
)

p2 = subprocess.Popen(
    [
        'grep',
        '-v',
        '\.txt$'
    ],
    stdin = p1.stdout,  # create a pipeline
    stdout = subprocess.PIPE,  # to pass output to the next command
    text = True,
)

p3 = subprocess.Popen(
    [
        'grep',
        'build',
    ],
    stdin = p2.stdout,  # create a pipeline
    stdout = subprocess.PIPE,  # already for reading from the current process
    text = True,
)

for line in p3.stdout:  # read line by line as data arrives
    print(line, end='')  # each line already ends with n

Pipelines with Parallel Data Processing

In Bash, pipelines can be created both between commands and between commands and interpreter blocks. For example, you can redirect a pipeline to a line-by-line reading loop. In Python, processing data from a parallel process is also done by simple line-by-line reading from the process’s output stream.

In Bash:

# Get a list of files containing some text:
find . -name '*.txt' 
    | while read line; do  # sequentially get file paths
        if [[ "$line" == *'text'* ]]; then  # substring in string
            echo "$line"
        fi
    done

In Python:

import subprocess

p = subprocess.Popen(
    [
        'find',
        '.',
        '-name',
        '*.txt'
    ],
    stdout=subprocess.PIPE,
    text=True,
)

while True:
    line = p.stdout.readline().rstrip('n')  # there is always n at the end
    if not line:
        break
    if 'text' in line:  # substring in string
        print(line)

Parallel Process Execution with Waiting for Completion

In Bash, running a process in the background is supported at the language syntax level (the & operator), and you can run both individual commands in the background and parts of the interpreter (for example, functions or loops). But at this level of complexity, the code will often be simpler and clearer if it is written in Python, especially since the standard library provides capabilities that at the command interpreter level are implemented by third-party utilities that need to be considered as dependencies.

In Bash:

unalias -a  # in case someone copies directly into the terminal

get_size_by_url()

    url="$1"
    # Get the file size from the Content-Length field of the response headers to a HEAD request
    curl --head --silent --location "$url" 
        

download_range()

    url="$1"
    start=$2
    end=$3
    output_file="$4"
    ((curr_size = end - start + 1))
    curl 
            --silent 
            --show-error 
            --range "$start-$end" 
            "$url" 
            --output - 
        

download_url()

    url="$1"
    output_file="$2"
    
    ((file_size = $(get_size "$url")))
    # Allocate disk space for the file in advance:
    fallocate -l "$file_size" "$output_file"

    range_size=10485760  # 10 MiB
     # Divide into parts of up to 100 MiB:
    ((ranges_count = (file_size + range_size - 1) / range_size))
    declare -a pids  ## We will save all process identifiers
    for ((i = 0; i < ranges_count; i += 1)); do
        ((start = i * range_size))
        ((end = (i + 1) * range_size - 1))
        if ((end >= file_size)); then
            ((end = file_size - 1))
        fi
        # Start downloading in the background:
        download_range "$url" $start $end "$output_file" &
        pids[$i]=$!  # remember the PID of the background process
    done
    
    wait "$pids[@]"  # wait for the processes to complete

In Python:

import requests
from multiprocessing import Process
import os


def get_size_by_url(url):
    response = requests.head(url)
    return int(response.headers['Content-Length'])

def download_range(url, start, end, output_file):
    req = requests.get(
        url,
        headers =  'Range': 'bytes=" + str(start) + "-' + str(end) ,
        stream = True,
    )
    req.raise_for_status()

    with open(output_file, 'r+b') as fd:
        fd.seek(start)
        for block in req.iter_content(4096):
            fd.write(block)

def download_url(url, output_file):
    file_size = get_size_by_url(url)
    range_size = 10485760  # 10 MiB
    ranges_count = (file_size + range_size - 1) // range_size

    with open(output_file, 'wb') as fd:
        # Allocate space for the file in advance:
        os.posix_fallocate(fd.fileno(), 0, file_size)

    processes = []
    for i in range(ranges_count):
        start = i * range_size
        end = start + range_size - 1
        if end >= file_size:
            end = file_size - 1

        # Prepare the process and run it in the background:
        process = Process(
            target = download_range,  # this function will work in the background
            args = (url, start, end, output_file),
        )
        process.start()
        processes.append(process)

    for process in processes:
        process.join()  # wait for each process to complete

Process Substitution

A separate topic worth mentioning is process substitution in Bash via the <(...) construct, since not everyone knows about it, but it makes life much easier. Sometimes you need to pass streams of information from other processes to commands, but the commands themselves can only accept file paths as input. You could redirect the output of processes to temporary files, but such code would be cumbersome. Therefore, Bash has support for process substitution. Essentially, a virtual file is created in the /dev/fd/ space, through which information is transmitted by passing the name of this file to the necessary command as a regular argument.

In Bash:

# Find common processes on two hosts:
comm 
        <(ssh user1@host1 'ps -x --format cmd' | sort) 
        <(ssh user2@host2 'ps -x --format cmd' | sort)

In Python:

from subprocess import check_output

def get_common_lines(lines1, lines2):
    i, j = 0, 0
    common = []
    while i < len(lines1) and j < len(lines2):
        while lines2[j] < lines1[i]:
            j += 1
            if j >= len(lines2):
              return common
        while lines2[j] > lines1[i]:
            i += 1
            if i >= len(lines1):
              return common
        common.append(lines1[i])
        i += 1
        j += 1
    return common

lines1 = check_output(
    ['ssh', 'user1@host1', 'ps -x --format cmd'],
    text = True,
).splitlines()
lines1.sort()

lines2 = check_output(
    ['ssh', 'user2@host2', 'ps -x --format cmd'],
    text = True,
).splitlines()
lines2.sort()

print(*get_common_lines(lines1, lines2), sep='n')

Environment Variables

Working with Environment Variables

Environment variables allow passing information from parent processes to child processes. Bash has built-in support for environment variables at the language level, but there is no associative array of all environment variables. Information about them can only be obtained via the external env command.

In Bash:

# Assigning a value to an environment variable:
export SOME_ENV_VAR='Some value'

echo "$SOME_ENV_VAR"  # getting the value

env  # output the list of environment variables using an external command

In Python:

import os

# Assigning a value to an environment variable:
os.environ['SOME_ENV_VAR'] = 'Some value'

print(os.environ['SOME_ENV_VAR'])  # getting the value

print(os.environ)  # output the array of environment variables

Setting Values for Individual Processes

Environment variables are passed from the parent process to child processes. Sometimes you may need to change only one environment variable. Since Python is positioned as an application programming language, it will be somewhat more complicated to do this in Python, while in Bash, support for such variable setting is built-in:

In Bash:

# Set Russian localization for launched applications
export LANG='ru_RU.UTF-8'

LANG='C' ls --help  # but run this command with English localization

echo "LANG=$LANG"  # make sure the environment variables are not affected

In Python:

import os
import subprocess

# Assigning a value to an environment variable:
os.environ['LANG'] = 'ru_RU.UTF-8'

new_env = os.environ.copy()
new_env['LANG'] = 'C'# Assigning a value to an environment variable:
export SOME_ENV_VAR='Some value'

echo "$SOME_ENV_VAR" # getting the value
subprocess.run(
    ['ls', '--help'],
    env = new_env,
)

print('LANG=' + os.environ['LANG'])  # make sure the environment variables are not affected

Executing Arbitrary Code

Executing arbitrary code is not required in everyday situations, but both languages have this capability. In Bash, this may be useful, for example, to return variables modified by the process or to return named execution results. In Python, there are two operators: eval() and exec(). The Bash eval analog in this case is the exec() operator, since it allows executing a list of commands, not just evaluating expressions. Using eval() and exec() is very bad practice in Python, and these operators can always be replaced with something more suitable, unless you need to write your own command interpreter based on Python.

In Bash:

get_user_info()

    echo "user=`whoami`"
    echo "curr_dir=`pwd`"

eval $(get_user_info)  # execute the command output
echo "$user"
echo "$curr_dir"

In Python:

import getpass
import os

def user_info_code():
    return f"""
user="getpass.getuser()"  # very bad practice
curr_dir="os.getcwd()"  # please don't do this
"""

exec(user_info_code())
print(user)
print(curr_dir)
# But returning named values in general
# is better through classes, namedtuple, or dictionaries
from collections import namedtuple
import getpass
import os

UserInfo=namedtuple('UserInfo', ['user', 'curr_dir'])
def get_user_info():
    return UserInfo(getpass.getuser(), os.getcwd())

info = get_user_info()
print(info.user)
print(info.curr_dir)

Working with the File System and Processes

Getting and Changing the Current Directory

Changing the current directory in the command line is usually required when doing something manually. But getting the current directory may be needed in scripts, for example, if the script or the program being launched does something with files in the current directory. For the same reason, you may need to change the current directory if you need to run another program that does something in it.

In Bash:

current_dir=`pwd`  # get the current directory
echo "$current_dir"

cd /some/path  # change to a directory

In Python:

import os

current_dir = os.getcwd()  # get the current directory
print(current_dir)

os.chdir('/some/path')  # change to a directory

Working with Signals

In Bash, the kill command is built-in, which is why man kill will display help for a completely different command with different arguments. By the way, sudo kill will already call the kill utility. But Python code is still slightly clearer.

In Bash:

usr1_handler()

    echo "Received USR1 signal"


# Set a handler for the SIGUSR1 signal:
trap 'usr1_handler' USR1

# Send a signal to the current interpreter:
kill -USR1 $$  # $$ — PID of the parent interpreter

Compilation Capability

Bash by definition does not support compiling its scripts, which is perhaps why everything in it strives for minimalism in names. Python, although interpreted, can be compiled into platform-independent bytecode executed by the Python Virtual Machine (PVM). Executing such code can improve script performance. Usually, bytecode files have the .pyc extension.

Choosing a Language Depending on the Task

As a summary of the article, the main postulates can be formed about which language is better to use in which cases.

Bash is more advantageous to use in cases:

  • solving simple tasks that can be solved faster with good knowledge of the language;
  • simple command-line scripts where work is done with processes, files, directories, or even hard drives and the file system;
  • if wrappers are created over other commands (starting a command interpreter can be faster than starting the Python interpreter);
  • if Python is not available in the system for some reason.

Python is more suitable for cases:

  • solving tasks related to text processing, mathematical calculations, or implementing non-trivial algorithms;
  • if Bash code would be difficult to read and understand;
  • if you need to cover the code with unit tests (the unittest module);
  • if you need to parse a large set of command-line parameters with a hierarchy of options between commands;
  • if you need to display graphical dialog boxes;
  • if script performance is critical (starting in Python may be slower, but executing code can be faster);
  • for creating constantly running services (systemd services).

In case you have found a mistake in the text, please send a message to the author by selecting the mistake and pressing Ctrl-Enter.

https://techplanet.today/storage/posts/2025/01/09/vLGhTfUF5Me5OGcBOxQ8olJ5EorUMzmoAlIydKfc.webp

2025-01-09 12:39:00

Exit mobile version