Running automated checks on your code on commits

Posted on Tuesday January 25, 2022

Now that I have been working on this site and its different components¹, I was thinking about the quality of the code that runs this site. As I have ”exposed” earlier, this site is, more or less, just static HTML files that are enhanced with Javascript and that those files are served from a Git repository. Those Javascript components just read certain files and generate the HTML code from them. Therefore, I was just wondering if the Javascript code I have written is any good². How would I check that? I could write tests, but there is a light-weight alternative that can point out potential issues and thus help writing better code.

Static code analysis

I am obviously talking about static code analysis, also known as linting. Like I mentioned in the safe(r) Bash coding post, a linter can help you to write safer code. The linter(s) can spot some problematic code that might cause some unintentional side effects. Obviously, a linter cannot spot all issues (more on this later), but it is a good starting point. Also, running the linter often enough can “teach” you to write better code, even though it might feel wrong at first. But after a while, you start to spot the potential issues yourself and write the code in a “linter-approved” way.

Like I said above, running a linter is in no way a perfect shortcut to bug-free code. It cannot spot potential code flow issues because it just checks how the programming language features are applied. Also, as there might be numerous linters available for the language in question, some issues might not even be spotted by one alternative while another linter considers them as errors. For example, for Javascript there are 3 fairly popular linters: jslint, jshint, and eslint³. The first, jslint, is fairly superficial and will not spot all the potential issues. The second, jshint has a broader rule set than jslint, which means that Javascript code that passes jslint might not pass jshint. eslint is the most advanced linter of this group and it will fail code that passes the other two alternatives⁴.

Oftentimes a linter just applies checks on the code, but some of them can also check code styling. Of the linters listed above, eslint has this feature as a built-in, if requested by the eslint configuration. If code styling should be enforced, it will also fail the linter check. However, usually that feature comes together with an option to fix the issues. Moreover, it is often possible to configure how that linter should be run and what checks it would execute for each project separately.

Running linters in a Git pre-commit hook

The obvious question is, when these linters should be run? One could run them manually every now and then, but that would not enforce the check when writing new code. One option, which I would recommend, is to run the linters in a Git pre-commit hook. A pre-commit hook is a script that is run before the actual commit object is created to the repository. If the pre-commit hook exits with a non-zero value, the commit is aborted and it will not be created. Therefore, it a good candidate for running the linter checks so that only good code gets committed to the repository.

In practice, the Git repository should have an executable file .git/hooks/pre-commit in the Git repository that runs the desired linters and that exits with exit code 0 when they pass or with exit code 1 when they fail. For example, the Javascript hooks (see above) could be run with the following pre-commit hook:

#!/bin/bash

function runJavascriptLinter() {
	local linter="$1"
	shift
	local files="$*"
	local result
	"${linter}" "${files}"
	result="$?"
	if [ "${result}" -gt "0" ]; then
		exit 1
	fi
}

function main() {
	local files
	files=$(git ls-files | grep -E "\.js$")
	# Run jslint
	runJavascriptLinter "jslint" "${files}"
	# Run jshint
	runJavascriptLinter "jshint" "${files}"
	# Run eslint
	runJavascriptLinter "eslint" "${files}"
}

main

Should any of the checks fail, the script would exit with a non-zero code and the pre-commit check would fail.

That example runs the Javascript checks on all files currently in the repository (git ls-files) ending with “.js” (grep -E "\.js$")⁵. If you would like to check only those files that are being committed that have been either added or modified, change the command for the files variable of the main function to this:

files=$(git diff --staged HEAD --name-status | awk '/^(M|A)/ {print $2}' | grep -E "\.js$")

This command will check which files have been staged for the commit (git diff --staged HEAD --name-status) and filters those that are added or modified but not deleted (awk '/^(M|A)/ { print $2 }') before it picks only the Javascript files.

Running the linter checks on all Git repositories and for all programming languages you use

The example given above runs three different linters on Javascript code. How about a scenario where you are coding in multiple programming languages? How could one run linters on all of those? Does one need to write a separate pre-commit hook for every repository? Well, there is an easy solution: handling the linter runs as library functionality⁶ and using Git’s repository template functionality.

In Git, if you add lines

[init]
	templatedir = <template_directory>

to your ~/.gitconfig file, all new repository clones and inits will copy data from the directory defined by templatedir to the repository’s .git directory. Alternatively, you can define the template directory in the git clone command or in the git init command using a parameter --template=<template_directory>. If that directory contains a subdirectory hooks that contains the executable pre-commit script, it will be copied to that repository and it will be run when you execute a commit.

Using this approach together with library functions will allow you to run the linter checks on every commit on every repository. In practice, this would mean that the pre-commit hook defined in the template directory is like this:

#!/bin/bash

HOOK_FUNCTIONS_PATH="${HOME}/hook-functions"
# shellcheck source=<path_to_hook_functions>/git-files-for-linting.bash
source "${HOOK_FUNCTIONS_PATH}/git-files-for-linting.bash" # Defines function getGitFilesForLinting
# shellcheck source=<path_to_hook_functions>/linters.bash
source "${HOOK_FUNCTIONS_PATH}/linters.bash" # Defines function runLinters

function main() {
	local files
	files=$(getGitFilesForLinting)
	# Run linters
	runLinters "${files}"
}

main

In this example, the getGitFilesForLinting function will pick the files that the linters are to be run on (all files in the repository or the ones that got modified or added) while the runLinters function would run the desired linters on those files:

#!/bin/bash

function runJavascriptLinter() {
	local linter="$1"
	shift
	local files="$*"
	local result
	"${linter}" "${files}"
	result="$?"
	if [ "${result}" -gt "0" ]; then
		exit 1
	fi
}

function runJavascriptLinters() {
	local files
	files=$(echo "$*" | grep -E "\.js$")
	if [ -z "${files}" ]; then
		# No files to lint, return
		return 0
	fi
	# Run jslint
	runJavascriptLinter "jslint" "${files}"
	# Run jshint
	runJavascriptLinter "jshint" "${files}"
	# Run eslint
	runJavascriptLinter "eslint" "${files}"
}

# Define other linter functions here

function runLinters() {
	local files="$*"
	# Run Javascript linters
	runJavascriptLinters "${files}"
	# Add other linters here
}

One can obviously move those Javascript linting functions to a different library file, so that they can be modified independent of the linters.bash file. When these are stored separate from the ~/.git/templates directory and they are referred to as library files in the pre-commit script, one can modify and extend the checks without the need of copying the updated pre-commit hook to all repositories after the update. For example, one could add HTML and/or CSS linting to the flow after Javascript linter run, the script could install the needed linters if they are not installed yet, or one could add option to skip certain linters if defined in the Git configuration.

Test runs in the pre-commit stage? Are there any potential pitfalls?

You might have been asking, why one should not run tests on the pre-commit hook. That is a totally valid question, but the reason is faily simple: running a linter (or few of them) is relatively fast process whereas running a test suite can take a lot of time. Especially when you are dealing with tests on programming languages or platforms that are not quick to execute. For example, while a simple unit test written in Python might take a fraction of a second, running an Android UI test (on an emulator) might take several minutes to complete. Also, tests are commonly run after a push to the remote repository by a continuous integration (CI) system so that it does not consume the resources of the developer machine.

Therefore, it is recommended that the pre-commit check should be as lightweight as possible. Running a linter is relatively quick and when the checks are applied to only those files that got modified or added, the check will not make the commit process significantly slower than without it. One can, obviously, add unit test run to the pre-commit hook, but in that case I would recommend that it should be disabled by default and enabled only on per repository basis. In other words, there could be a Git configuration flag, for example, user.pre-commit-hook.runtests, that is false in your global Git configuration (that is, in your ~/.gitconfig) and that could be overridden in the repository-specific configuration (that is, in the .git/config file)⁷.

The pre-commit hook comes with caveats, though. As these pre-commit hooks are not tied to the repository itself, they are not shared among the different developers of that repository. They are personal, and only you manage and maintain them⁸. If the project wants to enforce certain linters, they should be an integral part of the repository and they should be then run as part of the CI process.

Also, one can skip running the pre-commit script by using the --no-verify option of the git commit command. But, as the pre-commit script is personal, it is your responsibility to know when you can use it. Furthermore, one can obviously overwrite the copied hook template in the repository’s configuration. Again, that is your responsibility to know what you are doing if you are willing to do that.

How about other version control systems?

The description above refers to Git everywhere. While Git is the de facto standard of source control management (SCM) systems, some repositories are managed by some other system. How does one implement a similar solution on those, like Mercurial or Subversion?

Mercurial is a SCM fairly similar to Git. It also has hooks, and one can run a pre-commit hook before the actual commit takes place in a similar fashion to Git. However, with Mercurial, there is no automatic hook template copying functionality. Instead, the hooks to be executed need be defined in the repository config separately (in the repository’s .hg/hgrc) instead of being enabled in a certain directory or one can define them globally (in ${HOME}/.hgrc). Like with Git, these rules are not part of the code stored in the repository, so the configuration is your personal⁹. With Mercurial, command hg files will return all the tracked files in the repository and command hg status | awk '/^(M|A)/ {print $2}' will return the files that are being modified or added in the upcoming commit.

Also Subversion has hooks, but whereas Git and Mercurial has “client-side” hooks, with Subversion they all are run on the server. That means that all Subversion repository users will have the same script executed on all their commits. There are 3 different commit-related hooks available on Subversion: start-commit, pre-commit, and post-commit. The first is executed when the commit transaction is created, the second is executed just before the commit transaction is promoted to the new revision, and the last is executed after the new revision is created. In the start-commit hook, there is no information what files have been changed by the commit, thus making it not suitable for this purpose. The pre-commit hook, on the other hand, can fetch information about the changed files (using svnlook changed --transaction <id> | awk '/^(U|A)/ {print $2}' where <id> is the ID of the transaction created at this stage), but as the script is run on the server before the new revision is promoted it cannot fetch the updated file contents yet. That can be done in the post-commit hook, but, as said, it will be executed after the new revision has been created¹⁰. Therefore, the commit will not be aborted as it has already been completed.

I managed to break one crucial component which I then obviously had to fix. You bet I felt like an idiot when I spotted that. ↩︎
Answer: It is not great, passable maybe… ↩︎
I am deliberately excluding prettier, which does more than just Javascript linting. ↩︎
Unless you are very proficient Javascript developer who has used eslint before. ↩︎
This could be achieved with git ls-files **/*.js as well, but there are reasons why I did it like this. Please continue reading. ↩︎
See my Bash programming post on the reason why one should use library functions. ↩︎
To set such a configuration parameter, one can either add a section [user "pre-commit-hook"] with a parameter runtests = <value> to the config file or use a command git config [--global] --add user.pre-commit-hook.runtests <value>. ↩︎
Unless there is a pre-commit hook defined on the Git installation. This might be a case where the development happens on a shared server resource. And yes, this might still be happening. ↩︎
Unless, like with Git, the Mercurial installation has the hook configuration defined. ↩︎
In other words, it acts like a CI check. ↩︎

Tags: Code quality, Static code analysis, Automatic commit checks