Do we invest too much in risk assessments and too little in security?

tl;dr: Don’t assess risks before you have basic security controls in place.

I recently came across a LinkedIn post from Karl Stefan Afradi linking to a letter to the editor in the Norwegian version of Computer World, criticizing our tendency to use risk assessments for all types of security decisions. The CW article can be found here: Risikostyring har blitt Keiserens nye klær.

The article raises a few interesting and very valid points:

  • Modern regulatory frameworks are often risk based, expecting risk assessments to be used to design security concepts
  • Most organizations don’t have the maturity and competence available to do this in a good way
  • Some security needs are universal, and organizations should get the basic controls right before spending too much time on risk management

I agree that basic security controls should be implemented first. Risk management definitely has its place, but not at the expense of good basic security posture. The UK NCSC cyber essentials is a good place to start to get the bare bones basic controls in place, as I listed here Sick of Security Theater? Focus on These 5 Basics Before AnythingΒ Else. When all that is in place, it is useful to add more basic security capabilities. Modern regulatory frameworks such as NIS2, or the Norwegian variant, “the Digital Security Act” do include a focus on risk assessment, but also some other key capabilities such as having a systematic approach to security management and implementing a management system approved by top management, and building incident response capabilities: Beyond the firewall – what modern cybersecurity requirements expect (LinkedIn Article).

So, what is a pragmatic approach that will work well for most organizations? I think a 3-step process can help build a strong security posture fit to the digital dependency level and maturity of the organization.

Basic security controls

Start with getting the key controls in place. This will significantly reduce the active attack surface, it will reduce the blast radius of an actual breach, and allow for easier detection and response. This should be applied before anything else.

  • Network security: divide the network into zones, and enforce control of data flows between them. This makes lateral movement harder, and can help shield important systems from exposure to attacks.
  • Patching and hardening: by keeping software up to date, and removing features we do not need we reduce the attack surface.
  • Endpoint security includes the use of anti-virus or EDR software, execution control and script blocking on endpoints. This makes it much harder for attackers to gain a foothold without being noticed, and to execute actions on compromised endpoints such as privilege escalation, data exfiltration or lateral movement techniques.
  • Access control is critical. Only people with a business need for access to data and IT systems should have access. Administrative privileges should be strictly controlled. Least privilege is a critical defense.
  • Asset management is the basis for protecting your digital estate: know what you have and what you have running on each endpoint. This way you know what to check if a critical vulnerability is found, and can also respond faster if a security incident is detected.

Managed capabilities

With the basics in place it is time to get serious about processes, competence and continuous improvement. Clarify who is responsible for what, describe processes for the most important workflows for security, and provide sufficient training. This should include incident response.

By describing and following up security work in a systematic way you start to build maturity and can actually achieve continuous improvement. Think of it in terms of the plan-do-check-act cycle. Make these processes part of corporate governance, and build it out as maturity grows.

Some key procedures you may want to consider include:

  • Information security policy (overall goals, ownership)
  • Risk assessment procedure (methodology, when it should be done, how it should be documented)
  • Asset management
  • Access control
  • Backup management
  • End user security policy
  • Incident response plan
  • Handling of security deviations
  • Security standard and requirements for suppliers

Risk-based enhancements

After step 2 you have a solid security practice in place in the organization, including a way to perform security risk assessments. Performing good security risk assessments requires a good understanding of the threat landscape, the internal systems and security posture, and how technology and information systems support business processes.

The first step to reduce the risk to the organization’s core processes from security incidents is to know what those core processes are. Mapping out key processes and how technology is supporting them is therefore an important step. A practical approach to describe this on a high level is to use SIPOC – a table format for describing a business process in terms of Suppliers – Inputs – Process – Outputs – Customers. Here’s a good explanation form software vendor Asana.

When this is done, key technical and data dependencies are included in the “INPUTS” column. Key suppliers should also include here cloud and software vendors. This way we map out key technical components required to operate a core process. From here we can start to assess the risk from security incidents to this process.

  • (Threats): Who are the expected threat actors and what are their expected modes of operation in terms of operational goals, tradecraft, etc. Frameworks such as MITRE ATT&CK can help create a threat actor map.
  • (Assets and Vulnerabilities): Describe the data flows and assets supporting the process. Use this to assess potential vulnerabilities related to the use and management of the system, as well as the purely technical risks. This can include CVE’s, but typically social engineering risks, logic flaws, supply-chain compromise and other less technical vulnerabilities are more important.

We need to evaluate the risk to the business process from the threats, vulnerabilities and assets-at-risk. One way to do this is to define “expected scenarios” and asses both the likelihood (low, medium high) and consequences to the business process of that scenario. Based on this we can define new security controls to further reduce the risk beyond the contribution from basic security controls.

Note that the risk treatment we design based on the risk assessment can include more than just technical controls. It can be alternative processes to reduce the impact of a breach, it can be reduced financial burden through insurance policies, it can be well-prepared incident response procedures, good communication with suppliers and customers, and so on. They key benefit of the risk assessment is in improving business resilience, not selecting which technical controls to use.

Do we invest too much in risk assessments then?

Many organizations don’t do risk assessments. That is a problem, but what makes it worse, is that immature organizations also fail the previous steps here. They don’t implement basic security controls. They also don’t have clear roles and responsibilities, or procedures for managing security. For those organizations, investing in risk management should not be the top priority, it should be getting the basics right.

For more mature organizations, the basics may be in place, but the understanding of how security posture weaknesses translate to business risk may be weak or non-existent. Those businesses would benefit from investing more in good quality risk assessment. It is also a good vaccination against the Shiny Object Syndrome – Security Edition (we need a new firewall and XDR and DLP and this and that and next-gen dark AI blockchain driven anomaly based network immune system)

The Showdown: SAST vs. Github Copilot – who can find the most vulnerabilities?

Vibe coding is popular, but how good does “vibe security” compare to throwing traditional SAST tools at your code? “Vibe security review” seems to be a valuable addition to the aresenal here, and performs better than both Sonarqube and Bandit!

Here’s an intentionally poorly programmed Python file (generated by Le Chat with instructions to create a vulnerable and poorly coded text adventure game):

import random
import os

class Player:
    def __init__(self, name):
        self.name = name
        self.hp = 100
        self.inventory = []

    def add_item(self, item):
        self.inventory.append(item)

def main():
    player_name = input("Enter your name: ")
    password = "s3Lsnqaj"
    os.system("echo " + player_name)
    player = Player(player_name)
    print(f"Welcome, {player_name}, to the Adventure Game!")

    rooms = {
        1: {"description": "You are in a dark room. There is a door to the north.", "exits": {"north": 2}},
        2: {"description": "You are in a room with a treasure chest. There are doors to the south and east.", "exits": {"south": 1, "east": 3}},
        3: {"description": "You are in a room with a sleeping dragon! There is a door to the west.", "exits": {"west": 2}},
    }

    current_room = 1

    while True:
        room = rooms[current_room]
        print(room["description"])

        if current_room == 3:
            action = input("Do you want to 'fight' the dragon or 'flee'? ").strip().lower()
            if action == "fight":
                if random.randint(0, 1):
                    print("You defeated the dragon and found the treasure! You win!")
                else:
                    print("The dragon defeated you. Game over!")
                break
            elif action == "flee":
                current_room = 2
                continue

        command = input("Enter a command (go [direction], get [item]): ").strip().lower()

        if command.startswith("go "):
            direction = command.split("go ")[1]
            if direction in room["exits"]:
                current_room = room["exits"][direction]
            else:
                print("You can't go that way.")
        elif command.startswith("get "):
            item = command.split("get ")[1]
            player.add_item(item)
            print(f"You picked up {item}.")
        else:
            print("Invalid command.")

if __name__ == "__main__":
    main()

The code is made unnecessarily complex, and allows user input to be injected directly into the command line. It also has a hardcoded “password” that is not really used for anything. Let’s check the security of this with 3 methods:

  1. Sonarqube Cloud (free version)
  2. Bandit (Python SAST tool)
  3. Github Copilot with GPT-o4

Sonarqube is run with the default ruleset “Sonar Way” with no extra configuration effort. The scan detects “2 security hotspots” and some maintainability issues. A security hotspot is a potential weakness that the SAST tool can’t conclude on, and will require manual review.

The first hotspot is the “password”:

Sonarqube also gives guidance on how to assess if this is dangerous, and how to fix it, which is quite helpful. The other hotspot is “weak cryptography” when the game is using random.randint to choose whether fight or flee is the winning strategy when encountering a dragon – not a (software) security issue. Dragons can be dangerous, though!

Now, let’s try Bandit.

Badit did its job – and provides links to more information, including CWE references. Blue is “not good” and red is “bad”.

OK, now for Github Copilot.

Prompt – and then followed by the formatted report!

You are a professional software quality expert and security test engineer. Please review this code and provide an overview of security weaknesses with references to CWE's, and suggested fixes. Analyse the current file. Create the output as a security vulnerability report in markdown format. 

Security Vulnerability Report for spaghetti.py

This report provides an analysis of the security weaknesses in the provided spaghetti.py file. Each vulnerability is referenced with its corresponding Common Weakness Enumeration (CWE) and includes suggested fixes.


Vulnerabilities

1. Hardcoded Password

  • Description: The password is hardcoded in the file (password = "s3Lsnqaj"), making it vulnerable to exposure if the file is shared or accessed by unauthorized users.
  • CWE Reference: CWE-798: Use of Hard-coded Credentials
  • Impact: Hardcoded passwords can lead to unauthorized access to sensitive resources.
  • Suggested Fix:
    • Store the password securely using environment variables or a secrets management tool.
    • Example:
      import os
      password = os.getenv("GAME_PASSWORD")
      if not password:
          raise ValueError("Password not set in environment variables")
      

2. Command Injection

  • Description: The os.system call uses unsanitized user input (os.system("echo " + player_name)), which can lead to command injection if a malicious user provides crafted input.
  • CWE Reference: CWE-78: Improper Neutralization of Special Elements used in an OS Command (β€˜OS Command Injection’)
  • Impact: A malicious user could execute arbitrary system commands, potentially compromising the host system.
  • Suggested Fix:
    • Use safer alternatives like subprocess.run with proper input sanitization.
    • Example:
      import subprocess
      subprocess.run(["echo", player_name], check=True)
      

3. Unvalidated User Input

  • Description: User input (player_name, action, command) is not validated or sanitized, which can lead to unexpected behavior or vulnerabilities.
  • CWE Reference: CWE-20: Improper Input Validation
  • Impact: Unvalidated input can lead to crashes, security vulnerabilities, or incorrect program behavior.
  • Suggested Fix:
    • Validate and sanitize all user inputs.
    • Example:
      player_name = input("Enter your name: ").strip()
      if not player_name.isalnum() or len(player_name) > 50:
          print("Invalid name. Please enter a valid name.")
          return
      

4. Predictable Randomness

  • Description: The use of random.randint(0, 1) for critical game logic (e.g., dragon fight outcome) is predictable and not cryptographically secure.
  • CWE Reference: CWE-338: Use of Cryptographically Weak Pseudo-Random Number Generator (PRNG)
  • Impact: Predictable randomness can lead to unfair or exploitable outcomes in the game.
  • Suggested Fix:
    • Use a cryptographically secure random number generator if randomness is critical.
    • Example:
      import secrets
      if secrets.choice([True, False]):
          print("You defeated the dragon and found the treasure! You win!")
      else:
          print("The dragon defeated you. Game over!")
      

5. No Input Length Restriction

  • Description: The player_name input does not restrict the length, which could lead to buffer overflow or other issues in certain scenarios.
  • CWE Reference: CWE-1283: Improper Validation of Specified Quantity in Input
  • Impact: Excessively long input could cause performance issues or crashes.
  • Suggested Fix:
    • Limit the length of user input and validate it.
    • Example:
      player_name = input("Enter your name: ").strip()
      if len(player_name) > 50:
          print("Name too long. Please enter a shorter name.")
          return
      

6. No Error Handling

  • Description: The code lacks error handling for invalid commands, missing keys in dictionaries, or unexpected input.
  • CWE Reference: CWE-703: Improper Check or Handling of Exceptional Conditions
  • Impact: The program may crash or behave unpredictably when encountering errors.
  • Suggested Fix:
    • Add proper exception handling to ensure the program handles errors gracefully.
    • Example:
      try:
          direction = command.split("go ")[1]
          if direction in room["exits"]:
              current_room = room["exits"][direction]
          else:
              print("You can't go that way.")
      except KeyError:
          print("Invalid room configuration.")
      except Exception as e:
          print(f"An unexpected error occurred: {e}")
      

Summary

The code contains several vulnerabilities that could compromise its security and reliability. Addressing these issues will improve the robustness and safety of the application.

Recommendations

  1. Replace hardcoded sensitive information with secure storage methods.
  2. Avoid using os.system with unsanitized input; use safer alternatives like subprocess.run.
  3. Validate all user inputs and restrict their length to prevent unexpected behavior.
  4. Use proper error handling to ensure the program does not crash due to invalid input or unexpected conditions.
  5. Consider using cryptographically secure randomness for critical game logic if necessary.

By implementing these fixes, the code will be more secure and maintainable.

Summary

Using Copilot to do a security review and generate a report created the most thorough assessment, which is not surprising. Bandit did quite well and detected the actually dangerous injection vulnerability. Sonarqube provides a nice UI but did not detect the one actually serious vulnerability here.

Avoiding risk by doing nothing: the European regulator’s unintended consequences

Mario Draghi’s recent report on European competitiveness summarized what has long been a favorite topic of meme creators on the Internet; we are killing our companies with regulations. In the foreword to the report, Draghi writes: “we claim to favour innovation, but we continue to add regulatory burdens onto European companies, which are especially costly for SMEs and self-defeating for those in the digital sectors”. In other words, the road to poverty is paved with good risk-averse intentions.

Perhaps one of the most challenging effects of heavy regulation is how it has changed the mindset of people. Some people end up seeing new regulations as the key driver of innovation.

Innovation Norway, a government agency that funds and supports innovation and startups in Norway, was running an ad earlier this year with the copy: “Regulation creates requirements, which creates demand, which creates opportunity for growth and increased competitiveness in fishery. Innovation Norway can help you stay ahead. Learn how. ”

Governments imposing regulations as a growth driver? That’s definitely absurd.

Self-governing markets or imposed compliance driven governance?

The Internet is full of memes about bottle caps, most of them portraying Europe as a desert for ideas, whereas the U.S. is a growth and innovation paradise.

While exaggerated these memes may show a difference in how we view the world. If we trust the market to punish those who don’t act in the interest of society as a whole, we leave most risk trade-offs to be made by individuals and companies, but if we think that people won’t act in good ways without regulatory pressure, we make regulations for everything.

Regulatory pressure can drive practices in a positive direction, but they can also have serious side effects. One of those, if we take the regulatory focus too far, is that people become more concerned with compliance and auditing than with solving actual problems. In Europe, it seems that we have done so. This makes us largely unable to solve big problems requiring radical innovation, such as the changing demographics where there will be fewer tax payers and more elderly people, climate change, and handling competition from regions with higher growth and willingness to take risks in investments.

A security perspective on the whole thing

The “regulation is great” attitude is also very much present in cybersecurity. In Europe, security talks aren’t really about vulnerability research, use of AI in offense or defense, using cloud technologies to build resilient self-defending systems, or how to make sure consumers appreciate our products are safe to use. We want all those things, but our conferences are about… regulations!

  • NIS-2: Are you ready for NIS-2? Beware of government fines. Act now! Probably the most common type of advertising in cybersecurity in Europe the last few years. The message is: “Buy our compliance solution to avoid fines from the regulator” – not how to actually build great security solutions.
  • AI Act: this act has generated almost as many memes as the bottle caps. The intention is to avoid AI risks and abuses, but at the same time it does make it less likely that Europe is the preferred location for AI research and startups
  • Cyber resilience act: hailed by many as the holy grail of security – with strict requirements for software of all sorts.

It doesn’t mean that there are no new technologies being developed here, or that people don’t do great things – but it shifts the focus of business away from the innovators and over to the regulators – also in security.

Solutions? Those are hard to find!

We are going to struggle to change our very risk-averse ways. But eventually, we will be forced to do so, if we aren’t going to significantly reduce the quality of life in Europe.

  1. I think we need to remove or reduce regulations and put more trust in individuals, companies and markets. That is going to be very difficult for us.
  2. Most likely we will also need to reduce taxes and transactions costs to encourage investment and growth.
  3. Reducing taxes will be hard to do, we have big welfare systems to fund. But if we don’t act, we will also not be able to fund the safety nets we like to have. We need to learn to prioritize more – and that’s perhaps the hardest challenge of all.

The solutions we need will require a shift in politics. If it happens, it will take time.

Can we do something about this in the private sector, to improve growth and innovation capacity? Perhaps the most immediate solution is to use AI to minimize the regulatory burden as much as possible – in other words, focus on improving our compliance work so much that we can find some time to also work on the real problems – solving slow productivity growth, improving healthcare and finding solutions to the climate crisis?

Does the AI act make it illegal to use AI for European companies?

The AI Act does not make it illegal to use AI, but it does regulate a lot of the use cases. As EU acts typically go, it makes it mandatory to do a lot of assessment, documentation and governance – at least for so-called “high-risk” use cases. The EU has published an official short summary here: https://artificialintelligenceact.eu/high-level-summary/.

The main points of the AI Act

  • The AI Act classifies AI systems based on risk. There are 4 levels: unacceptable (illegal use cases), high-risk (OK, but with a lot of paperwork and controls), limited risk (chatbots, be transparent), and minimal risk (unregulated, for example spam filters).
  • The AI Act has rules for companies using AI, but more rules for companies making AI systems. Your personal hobby use and development is not regulated.
  • General purpose AI systems (basically, systems capable of solving many tasks such as AI agents able to execute commands via API’s) has requirements for documentation, instructions for use, respect copyright and publish a summary of the content used for training. Open source: only copyright and summary of training data needed, unless the system is “high-risk”. GPAI systems also need threat modeling, testing, incident reporting and reasonable security controls.

Banned AI systems

The unacceptable ones: these protections are there to protect you against evil, basically systems made for mass surveillance, social credit systems, predictive crime profiling of individuals, manipulation of people’s decisions, etc.

High-risk AI systems

Systems that are safety critical are considered high-risk, including a long list of systems under other EU legislation such as important components in machinery, aircraft, cars and medical systems (Annex I in the EU Act has a long list of systems). There is also an Annex III, listing particular high-risk systems, including using AI for employee management, immigration decisions and safety critical components in critical infrastructure. OK – it is quite important that we can trust all of this, perhaps a bit of governance and oversight is not so bad? At the same time, the important cases are perhaps also the areas where we would expect to see a lot of benefit from using technology to make things better, more efficient, cheaper, etc. So, what are makers and users of high-risk AI systems required to do? Let’s begin with the makers. They need to:

  • Create a risk management system
  • Perform data governance, to make sure training and validation data sets are appropriate and of good quality
  • Create technical documentation to demonstrate compliance (this can be interpreted in many ways)
  • Design the system for “record keeping” to identify national level risks(?) and substantial modifcations throughout the system’s lifecycle
  • Create instructions for use to downstream deployers
  • Design the system so that users can implement human oversight
  • Ensure acceptable levels of cybersecurity, robustness and accuracy
  • Establish a quality management system

Most of these requirements should be part of any serious software or product development.

Limited risk

For limited risk systems, the main requirement is to be transparent to the user that the system is using artificial intelligence. The transparency requirement is regulated in Article 50 of the AI Act. Content generated by AI systems must be marked as such, including deep-fakes. There is an exception for satirical or artistic content (to avoid making the art less enjoyable, but you still have to be honest about AI being part of the content), and also for “assistive editing functions”, like asking an LLM to help you edit a piece of text you wrote.

Risk management requirements for “high-risk” systems

The first requirement for developers of “high-risk” AI systems is to have a risk management system. The system must ensure that risk management activities follow the lifecycle of the AI system. The key requirements for this system:

  • Identify and analyze potential risks to health, safety or fundamental rights
  • Estimate and evaluate the risks
  • Adopt measures to manage the risks to acceptable levels, following the ALARP principle
  • The systems shall be tested to identify the best risk management methods
  • The developer must consider whether the AI system can have negative effects for people under the age of 18 years, or other vulnerable groups

In other words, the developer needs to perform risk assessments and follow up on these. Most companies are used to performing risk assessments, but in this case the term “fundamental rights” is perhaps less common, except for in privacy assessments under the GDPR. The fundamental rights requirements are detailed out in Article 27. The EU has a Charter of fundamental rights covering dignity, freedoms, equality, solidarity, citizen’s rights and justice. The AI Office will publish tools to simplify the fundamental rights assessment for AI system developers.

AI based glucose level regulation in diabetes patients (a ficticious example)

Consider the use of an AI system used to optimize blood glucose level regulation in diabetes type I patients. The system works in a closed loop, and automatically adjusts continuous insulin injection using an insulin pump. The system measures blood glucose levels, but also senses activity level, environmental factors such as humidity, temperature, altitude. The system also uses image recognition using a small camera to detect what the patient is eating as early as possible, including interpreting menu items in a restaurant before the food is ordered. Using this system, the developer claims to completely remove the hassle of carbohydrate calculations and manual insulin adjustments, to reduce the amount of time the patient has a too high or low glucose level, and avoid the typical delayed insulin-glucose response in the body through feedforward mechanisms based on predictive powers of the AI.

Can I based systems make it unnecessary for patients to look at the phone to keep treatments under control?

For a system like this, how could one approach the risk management requirements? Let’s first consider the risk categories and establish acceptance criteria.

Health and safety (for the patient):

  • Critical: Death or severe patient injuries: unacceptable
  • High severity: Serious symptoms related to errors in glucose level adjustment (such as hyperglycemia with very high glucose levels): should occur very rarely
  • Medium: Temporary hypoglycemia (low blood sugar levels) or hyperglycemia (increased blood suger levels): acceptable if the frequency is lower than in manually regulated patients (e.g. once per month)
  • Low: annoyances, requiring patient to perform manual adjustments. Should occur less than weekly.

If we compile this into a risk matrix representation, we get:

CriticalUnacceptableUnacceptableUnacceptable
HighUnacceptableUnacceptableALARP
MediumALARPAcceptableAcceptable
LowAcceptableAcceptableAcceptable
WeeklyYearlyDecades
Example risk acceptance matrix for health and safety effects due to adverse AI events

Fundamental rights (for the patient and people in the vicinity of the patient). A fundamental rights assessment should be performed at the beginning of the development, and to be updated with major feature or capability changes. Key questions:

  • Will use of the system reveal to others your health data?
  • Will the sensors in the system process data about others that they have not consented to, or where there is no legal basis for collecting the data?

We are not performing the fundamental rights assessment here, but if there are risks to fundamental rights, mitigations need to be put in place.

Let’s consider some risk factors related to patient safety. We can use the MIT AI Risk repository as as starting point for selecting relevant checklist items in order to trigger identification of relevant risks. The taxonomy of AI risks has 7 main domains:

  1. Discrimination and toxicity
  2. Privacy and security
  3. Misinformation
  4. Malicious actors and misuse
  5. Human-computer interaction
  6. Socioeconomic and environmental harms
  7. AI system safety, failures and limitations

In our ficticious glucose regulation system, we consider primarily domain 7 (AI system safety, failures and limitations) and domain 2 (Privacy and security)

  • AI system safety failures and limitations (7)
    • AI possessing dangerous capabilities (7.2)
      • Self-proliferation: the AI system changes its operational confines, evades safeguards due to its own internal decisions
    • Lack of capability or robustness (7.3)
      • Lack of capability or skill: the quality of the decisions is not good enough
      • Out-of-distribution inputs: input data is outside the validity for the trained AI model
      • Oversights and undetected bugs: lack of safeguards to catch bugs or prevent unintended use
      • Unusual changes or perturbations in input data (low noise robustness)
    • Lack of transparency and interpretability (7.4)
      • Furstrate achievement of auditing: lack of compliance to relevant standards, cannot be assessed.
  • Privacy and security (2)
    • Compromise privacy by obtaining, leaking or correctly inferring personal data (2.1)
      • PII memorization: Models inadvertently memorizing or producing personal data present in training data
      • Prompt injection: Compromise of privacy by prompt based attacks on the AI model
    • AI system vulnerability exploitation (2.2)
      • Physical or network based attack: can lead to manipulation of model weights and system prompts
      • Toolchain and dependency vulnerabilities (vulnerabilities in software)

To assess the AI system for these risks, the typical process would follow typical risk management practices:

  • Describe the system and its context
  • Break down the system into parts or use cases
  • Assess each part or use case, as well as interactions between parts to identify hazards
  • Document the finding with cause, consequence, existing safeguards
  • Perform evaluation of probability and severity, compare with acceptance criteria
  • Identify mitigations

Let’s consider a particular risk for our glucose regulator:

RISK CATEGORY: (7.3) Lack of capability or skill.

  • Possible risk: the system makes the wrong decision about insulin injection rate due to lack of capabilities.
  • Possible causes: insufficient training data, insufficient testing.

Consequence: over time it can lead to frequent hypo- or hyperglycemia, causing long-term patient complications and injury.

Probability: would require testing or an assessment of the training and testing regime to determine the probability.

Suggested decision: provide extra safeguards based on blood glucose level measurements, and let the patient take over to adjust manually if the glucose regulation is detected as outside of expected performance bounds. Use this while performing testing to to assess the reliability of the model’s inference in order to allow fully automatic regulation.

Key take-aways

  1. The AI act puts requirements on developers and users of AI systems.
  2. For high-risk systems, a robust risk management system must be put in place
  3. AI risks is an active field of research. A good resource for AI risks is the MIT AI Risk taxonomy.

Further reading

AI Risk repository

AI Act Explorer

Engaging the Whole Workforce in Cybersecurity: A Guide for Security Managers

Cybersecurity requires everyone to contribute but that is hard to achieve. In this post we look at how security managers can think like marketers to engage the management team, create strategic alignment that makes sense to others, create alliances and mutual support with other business functions. To achieve great security results we need to value and build strong internal relationships.

A common problem

Do you run a cybersecurity program but it feels like you are the only one who cares about it? Than you are unfortunately in a tough position, and your probability of success will be very low. To succeed with securing an organization’s critical processes, everyone must contribute. In order for that to happen, everybody must care.

Why don’t people care about cybersecurity? People are generally busy, and there are a million good causes seeking attention. For someone tasked with cybersecurity as their primary area of concern will naturally see this as one of the most important topics, but in gaining traction among the rest of the staff you are competing with climate change, profitability, growth, talent development, innovation projects, any many more things. To get people on your side, you will need to make it important for them; as a cybersecurity manager you will need to engage in internal marketing! In this blog post I will try to explore reasons for not engaging in cybersecurity work for different employee categories, and suggest steps that can be taken to change attitudes.

If cybersecurity is a topic considered only something IT and tech people need to care about, almost like a guardian on the hill, you won’t be able to engage the whole workforce. (Picture: the castle in Vaduz seen from the town – an interesting place to visit)

Management is not interested in security

Whether you are a CISO not being invited to the C-Suite meetings where decisions are made, or an IT security responsible in the IT department, being left out of decisions and with lots of responsibility but few resources is unfortunately a common situation. In companies where this is the case, one or more of the following attitudes are common in the management team:

  1. Cyber incidents won’t happen if we use well-known IT brands
  2. Cybersecurity does not contribute to the company’s mission, therefore we also don’t need to spend time on it
  3. Cybersecurity is invisible, therefore there is nothing I can do about it
  4. It won’t disrupt us, we have talented people in the organization who can handle any situation
  5. Cybersecurity is only a compliance issue, if we do the minimum necessary to pass the audit we will be OK

When this is the case, you have a tough marketing job to do. Jumping to talking about solutions and investment needs will probably not do much good here.

Homework: align security objectives with the company’s strategy

Before you can convince anyone else, you will need to know how security supports the strategy. Where is the company heading? What are the overall goals? How does digital fit into this? If you can’t answer this, it will be hard to talk to other management functions about priorities and what matters.

To get ahead with this work, a business impact assessment (BIA) is a very good tool. In a business impact assessment you will identify how disruptive events will impact your most important business processes, and also what to do about it. For example, if your company is betting on high growth through partnerships with retailers, investigate the impact of digital events to those partnerships. For how to do a digitally BIA, see this post: What is the true cost of a cyberΒ attack?

Find allies and ambassadors in the management team

Not everybody cares equally about each topic. Some members of the management team you are trying to influence will be more receptive to your message. Getting one or two well-respected leaders on your side to help amplify your messaging can help immensely getting the message across. To recruit supporters, prioritize being helpful, spending time with them, and helping them get ahead with their own work. Here are some things you can do:

  1. When they communicate about something they care about, comment on it and make your support visible to them. Mention how cybersecurity is either helped by their initiative or how cybersecurity can help their initiative
  2. Ask them for advice on things you are working on, in the context they are working in.
  3. Provide them with easy to use talking points that they can bring up to support cybersecurity in rooms where you are not present. Avoid jargon, make it interesting and easy to talk about.
  4. Invite them for a coffee break, a walk, or a lunch. Build that relationship.

Engage in visual storytelling

Set up an internal marketing campaign. This can be monthly newsletters, short internal videos, or in-person meetings. Keep the storytelling short, jargon-free and to the point. Use structure and visuals to support your stories – and try to get a single point across each time instead of bombarding people with too much information to handle. Make sure the story fits the audience in terms of appeal, language, and ability to use the information for something.

Contrast for example the way bleepingcomputer.com (a tech website) describes the Crowdstrike faulty update last week that crashed millions of computers and disrupted many businesses globally, with how the same events are portrayed by general news media (for example CNN):

Bleepingcomputer: technical details, jargon, workarounds for IT people.

CNN: no jargon, explaining what Crowdstrike is, focus on impact, comments about risks for IT consolidation.

Be more like CNN than Bleepingcomputer when talking to non-experts, and put it into your organization’s context. For example, the Crowdstrike event, which people are likely to have read about in general news (more like CNN than Bleepingcomputer), could be used to increase attention to software supply-chain security.

Make benefits from security investments clear

Nobody is really interested in looking at security dashboards, but having a few metrics to show how security efforts are actually supporting the business and paying off is a good idea.

  • Connect security posture to business impact and risk. Showcase how investments improve posture and reduce risk. Make it simple.
  • Use metrics that capture the dynamics of people, processes and technology. Make it clear that success depends on the organization, not only buying technology from well-known brands.
  • Distribute the results at the right time, and with relevant context.
  • Suggest a regular reporting cycle to top management. Align reporting with regulatory compliance and corporate governance processes so it doesn’t show up as “a new cybersecurity report”, but as an integrated part of management reporting.

It is going to take time. Be patient, and prioritize getting people on board and building relationships before you add too many facts. Be consistent and to the point in messaging, and make yourself available for follow-ups. Make progress by making call-to-actions easy to agree to.

Other functional managers competing for attention are sabotaging cyber initiatives to further their own cause

You are living in internal competition with many other good causes, such as business growth, innovation, diversity initiatives, and efficiency boosting IT projects. People who own those processes may see cybersecurity as something causing friction for their own initiatives, as well as something that competes for attention from the management team. If internal functional managers are fighting each other, it is certainly not good for the company.

Photo by Helena Lopes on Pexels.com: An informal chat over coffee may do more good for your security performance than yet another log source in your SIEM (software for detecting attacks by analysing logs from IT systems).

To avoid destructive conflict, help other functional managers succeed. Look for ways improvements in security can strengthen the goals of other functions. For example, a growth initiative depending a lot on digital technologies, will also be more vulnerable to disruption from cyber attacks. Engaging with the manager of the growth initiative on making it more robust, less vulnerable is likely to bring you new friends and allies, as well as actually contributing to improved security for the organization. This can also be a powerful story to tell, together, to the management team.

A primary concern for process owners is often friction caused by security controls. If your security controls are making it harder for others to succeed, they won’t support security. There are some important steps to avoiding this situation:

  1. Understand the impact of security controls on the business process
  2. Build understanding for why we need barriers against unwanted events, such as hacking
  3. Prioritize balance between performance and security when a trade-off is necessary. Try to find good, low-friction controls.
  4. Make sure the “why security is important here” is understood by everyone who works with the process

This is definitely not something you can win without good relationships with people. You need to the process owner on your side. Building good internal relationships is a critical activity to achieve good security. Hence, important tools for security improvement include:

  • Coffee breaks
  • Situational awareness
  • Productivity vs. security trade-offs

You will probably benefit from approaching process owners in a similar way to senior managers, but perhaps with a more hands-on approach focusing on the particular process, initiative or function.

Dealing with the internal adversary

If you have other functional managers trying to compete with your for resources, and downplaying the importance of security, you need to take action. The opposition may be open, or it may be more covert. Typically sabotage will consist of a combination of some direct opposition, some microaggressions, and some your area when you are not around. If you suspect that you are meeting such opposition, make sure you understand the situation correctly before you take action against it.

The first step is thus to have a respectful but honest conversation with the person who sees you as their opponent. Try to find out what their actual goals are, if you have understood things correctly instead of escalating it to a more difficult situation. If you can find some common ground and agree to collaborate moving forward you may be able to defuse the situation already here.

Photo by Gratisography on Pexels.com: Internal fighting over resources is naturaly but can evolve into unhealthy conflict. Stop it before it does: your organization is working towards common goals.

If you cannot resolve the situation yourselves, try to agree to bring in someone else to help you sort things out. This can be your managers, or a trusted third party to mediate. Make sure you can agree to a path forward and focus on that.

If you see micro-agressions, general bad behavior meant to make you less influental, or outright bullying, you should take rapid action. If such behaviors are allowed to manifest, they can not only jeopordize your health and wellbeing, but can do so for others too, and will certainly not contribute to good results. Constructive conflict is good, bullying is not. This article from HBR explains the topic well, including strategies to stop the bad behavior: https://hbr.org/2022/11/how-bullying-manifests-at-work-and-how-to-stop-it. Dealing with bullying will require hard conversations and involving management early. The organization should work to put structures in place that don’t support such behaviors, as well as routines for handling transgressions when they move from acceptable conflict to unhealthy conflict.

Before jumping to the final thoughts, consider subscribing to the blog to avoid missing the next post!

Getting the organization on board with security

It is clear that relationships matter, also for security. It is also important to make the benefits of security investments visible, and ensure that a common situational awareness can be maintained, in order for everyone to pull in the same direction. When done right, there is not conflict between the goals of different functional areas, and the goals of security; you are contributing to the same strategic vision for your organization.

To succeed you need backing from top management. This may not come naturally, or for free. Think like a marketer and build demand for security in your organization. Be a security sales person and build relationships with key decision makers. Make sure you have allies in rooms where you are not present. This is easier said than done, and requires continued effort.

Underpinning all of this is situational awareness. Your job is really to create situational awareness to allow integrating security into corporate governance, business process design and daily operations. And to allow that to happen, you need to win over hearts and minds of your colleagues. Before people understand “why” security matters they won’t care about “how” security is achieved. To paraphrase Simon Sinek: start with the why.

Simon Sinek’s Ted talk from 2009: Start with why

What is the true cost of a cyber attack?

All businesses depend on digital tools, making them all vulnerable to cyber attacks to a smaller or larger degree. We regularly read about cyber attacks in media, and figures for the cost of the average data breach are reported in various publications – and they are ranging from small and insignificant to billions. If you operate a 5-person UX design consultancy, an average cost based on Fortune 500 company incidents is obviously not very useful.

The true cost is the combination of impact across multiple categories. First, the immediate costs include lost current business and direct handling costs. Long-term costs include lost future business, liability costs, need for extra marketing to counteract market loss of trust, as well as follow-on technology and process improvement costs due to identified security gaps. The actual cost depends on the readiness to handle the event, including backup procedures and training to use them.

Let’s consider the UX consultancy Clickbait & Co, and help them think about the potential cost of cyber attacks with the aid of a BIA. The founder and CEO Maisie “Click” Maven had been listening to a podcast about cyber disruption while doing her 5am morning run along the river, and called the CTO into her office early in the morning. The CTO, Dr. Bartholomew Glitchwright, was a technical wizard who knew more about human-machine interaction that was good for himself. Maisie told him: “I am worried about cyber attacks. How much could it cost us, and what would be the worst-case things we need to plan for?”. Dr. Glitch, who usually had an answer to everything, said “I don’t know. But let’s find out, let’s do a BIA”.

Maisie’s run along the river, while meditative, is also causing new worries through the impact of podcasts

Dr. Glitch’s BIA approach

Dr. Glitch is always in favor of systematic approaches, and doing BIA’s was no exception. He liked to follow a 7-step process:

  1. Identify the value creating business processes
  2. Describe the business processes in flowcharts
  3. For each flowchart, annotate with digital dependencies in terms of applications, data flows, users and suppliers
  4. Create a “super-flow” connecting the business processes together to map out dependencies between them, from sales lead to customer cash flow. This can be done at the end too, but is important to assess cross-process impact of cyber events.
  5. Consider digital events with business process impact:
    • Confidentiality breaches: data leaks, data theft
    • Integrity breaches: data manipulation
    • Availability breaches: unavailability of digital tools, users, data
  6. Assess the impact, starting with direct impact. For each digital event, assess business process impact in terms of downtime (day, week, month). Mark the most likely duration of disruption.
  7. Evaluate the total cyber disruption cost (TCDC) including
    • Immediate costs: lost current business, recovery costs
    • Longer-term costs: lost future business, marketing spend increase, tech investment needs, legal fees

Dr. Glitch and Maisie got coffees and got to work. They decided to focus on the main processes of the UX firm:

  • Sales
  • Digital design
  • Invoicing and accounting

Sales

They created simple flow charts for the 3 processes, starting with sales. The sales in the firm was mostly done by the two of them. They had two main sources of business: requests for proposals from customers coming in to their e-mail inbox, and direct sales outreach by phone and e-mail. The outline of the process looks as follows:

Now they started annotating the flowchart with the digital dependencies.

They had identified several digital dependencies here:

  • Hubspot CRM: used for all CRM activity, lead capture, tracking deals, etc
  • Office 365: create sales decks, e-mail, vidoe meetings
  • uxscan.py: internally developed tool to identify poor practice on web pages, used for identifying prospects to contact, and also in their own QA work
  • Digisign: a digital signature service used to sign contracts before work is started

As for users, they identified their own personal user accounts. Dr. Glitch had set up SSO for Hubspot and Digisign, so there was only one personal account to care about. The Python script was run on their own laptops, no user required. There are a few integrations, between Office365 and Hubspot, and between WordPress and the Hubspot lead capture form (not using an API here, just a simple iframe embed).

Dr. Glitch had made a shortlist of events to consider in the cyber BIA:

  • Ransomware targeting Sharepoint/Office365
  • Hacking of user accounts, followed by data theft (Hubspot, O365)
  • Mainpulation of uxscan.py code
  • DDoS of Digisign, Hubspot, Office365
  • Hacking of user accounts, followed by issuing fake contracts or bids (signed with Digisign)
  • Data breach of personal data (Hubspot)

Then they together made an assessment of the impact of each event on the shortlist in a table. The average deal value for Clicbait & Co is NOK 400 000.

Digital assetWorst-case impactImmediate costLong-term costTotal cost
O365Data leak and encryption (ransomware)1 week downtime: 1 lost deal. Assume as base case.

2 week downtime: 2 lost deals

Recovery consultants: 200 hours x 2000 NOK/hr = 400 000.
Marketing campaign to reduce brand damage: NOK 150k

Lost business: 5 deals = 2 MNOK

Legal fees: none, assuming no GDPR liability

Cyber improvements: 100 000.
Immediate (800k) + Long-term (150k + 2M + 100k) = 3 050 000

3MNOK
HubspotTheft of customer list, deal sizes, by competitor.

Duration may be short or ongoing, but the disruptive effect can be long-term.
No immediate business impact. Future lost business: 30% of bids in the first year, 20 deals x 400k = 8 MNOK.

Possible GDPR fine: 500 kNOK.
8.5 MNOK
Digisign DDoSCannot sign digitally, resort to manual processNo immediate impact, reduced efficiency.No long-term impact, reduced efficiency.0 MNOK
WordPress websiteUnavailability – no leads collectedLost business, assume 1 lost customer for a week of downtime.

Direct cost: up to 50k to reestablish website if a destructive attack.
Loss of trust, leading to 1 lost future deal.850 kNOK
Business impact form cyber events disrupting the sales process

From this quick high-level assessment they decide that a few mitigating activities are in order for the sales process; they need to improve the security of the O365 environment. This will likely include buying a more expensive O365 license with more security features, and setting up a solid backup solution, so it will carry some cost.

For the Hubspot case the impact is high, but they are unsure of the security is good or not. They decide to do a risk assessment of the Hubspot case, to see if anything will need to change. Maisie also decides to do a weekly export of ongoing deals to make sure an event making Hubspot unavailable can’t stop them from bidding on jobs in the short term.

For the Digisign case, they agree that this is a “nice-to-have” in terms of availability. They discussed the case of an attacker creating fake offers from Clickbait & Co and sign it with Digisign, but agree that this is far-fetched and not worth worrying about.

The BIA is a very useful tool to decide where you need to dig more into risk assessments and continuity planning – that is the primary value, not the cost of the worst-case impact itself.

Dr. Glitch.

Some thoughts on BIA’s for information processing events

Looking at the business impact of cyber attacks on the sales process we see that we expect some events to cause long-term damage to the business, without upsetting the internal workings of the process (information theft, data leaks). This is different from what we would find in BIA’s focusing on other aspects than information processing, but it does not make handling the event less important.

For events that lead to immediate disruption of the process, we can use the traditional metrics such as recovery time objective (RTO) and recover point objective (RPO). The latter is the target for when the system should be back up an functioning again, and the latter is about how much data loss you accept: basically it dictates the maximum time of data lost that is acceptable in an event requiring recovery.

Summarizing the findings from Maisie’s and Dr. Glitch’s business impact assessment, we can create the following table:

ProcessEventsImpactRecovery targetsImmidiate action
SalesRansomware attackDowntime and data leak. Cost 3 MNOKRTO: 2 days
RPO: 4 hours
GAP assessment of security practices for O365 and backup
SalesData theft from Hubspot by competitorLong-time business loss, possible GDPR fine. Cost 8 MNOK.No process disruption
Mitigation requiring marketing and communication efforts, future improvements, possibly certification/audits.
Risk assessment
Dimensioning BIA events presented per process.

Finally, lets’ summarize the process. The purpose is for each process to find the dimensioning disruptive events, and decide what the next step should be. The next step could be one of the following:

  1. Do nothing (if the expected impact is low)
  2. Do improvements (if it is obviously a problem and clear improvements are known)
  3. Perform a risk assessment (if the uncertainty about the events is too high to move to improvements directly)

This means, look at each process alone, identify impact of disruptive events, plan next steps. After this is done for all processes, review the impact of each process on each other, to see if disrupting one process will have impact on another. if this is the case, it should be given higher priority in continuity planing and risk management.

Remember to subscribe to get the next post in your inbox – and get a free supply-chain assessment spreadsheet too!

Zero-Day OT Nightmare? How Zero-Trust Can Stop APT attacks

It was a crisp summer Monday, and Alex, the maintenance engineer at Pulp Friction Paper Company, arrived with his coffee, ready to tackle the day. He reveled in the production regularity achieved thanks to his recently implemented smart maintenance program. This program used machine learning to anticipate condition degradation in machinery, a significant improvement over the facility’s previous reliance on traditional periodic maintenance or the ineffective risk-based approaches.

Alex, a veteran at Pulp Friction, had witnessed the past struggles. Previously, paper products were frequently rejected due to inconsistencies in humidity control, uneven drying, or even mechanical ruptures. He was a firm believer in leveraging modern technology, specifically AI, to optimize factory operations. While not a cybersecurity expert, his awareness wasn’t limited to just using technology. He’d read about the concerning OT (Operational Technology) attacks in Denmark last year, highlighting the inherent risks of interconnected systems.

As a seasoned maintenance professional, Alex understood the importance of anticipating breakdowns for effective mitigation. He empathized with the security team’s constant vigilance against zero-day attacks – those unpredictable, catastrophic failures that could turn a smooth operation into a major incident overnight.

Doctors analysing micro blood samples on advanced chromatographic paper form
Pulp Friction Paper Mills

Dark clouds over Pulp Friction

A digital phantom stalked the web. “The Harvester,” a notorious APT (Advanced Persistent Threat) group known for targeting high-value assets, had Pulp Friction Paper Company in their crosshairs. Their prize? Not paper, but a revolutionary innovation: medical diagnostic paper. Pulp Friction had recently begun producing these specialized sheets, embedded with advanced materials, for use in chromatographic tests. This cutting-edge technology promised rapid diagnosis of a multitude of diseases from mere microliter blood samples, a potential game-changer in the medical field. Unbeknownst to Alex, a gaping zero-day vulnerability resided within the facility’s industrial control system (ICS) software. If exploited, The Harvester could wreak havoc, disrupting production of these life-saving diagnostic tools and potentially delaying critical medical care for countless individuals. The stakes had just been raised. Could Alex, with his limited cybersecurity awareness, and the current defenses, thwart this invisible threat and ensure the smooth flow of this vital medical technology?

A wave of unease washed over Alex as he stared at the malfunctioning control panel. The usually predictable hum of the paper production line had been replaced by a cacophony of alarms and erratic readings. Panic gnawed at him as vital indicators for the chromatographic test paper production process lurched erratically. This wasn’t a typical equipment malfunction – it felt deliberate, almost malicious.

Just then, a memory flickered in Alex’s mind. Sarah, the friendly and highly skilled network security specialist he occasionally consulted with, had been pushing for a new security system called “zero-trust.” While Alex appreciated Sarah’s expertise, he hadn’t quite understood the nuances of the system or its potential benefits. He’d brushed it off as an extra layer of complexity for an already demanding job.

Now, regret gnawed at him alongside the growing sense of dread. Grabbing his phone, Alex dialed Sarah’s number, his voice laced with a tremor as he blurted out, “Sarah, something’s terribly wrong with the ICS system! The readings are all messed up, and I don’t know what’s happening!” The urgency in his voice was impossible to miss, and Sarah, sensing the dire situation, promised to be there as soon as possible. With a heavy heart, Alex hung up, the echo of his own ignorance a stark reminder of the consequences he might have inadvertently unleashed by ignoring the recommendations on network security improvements.

The Harvester: a technical intermezzo

Diagram made with mermaid.live

The Harvester is capable of zero-day research and exploit development. In this attack they are targeting companies using advanced technologies to supply to healthcare providers – and many of those companies use innovative maintenance systems.

They first find a web exposed server used by the AI driven maintenance system. The system is Internet exposed due to frequent need for access by multiple vendors. By exploiting the vulnerability there, they gain root access to the underlying Linux operating system. The Harvester, like many other threat actors, then install a web shell for convenient persistent access, and continue to move on using conventional techniques. Reaching the engineering workstation, the attacker is able to reprogram PLC’s, and disable safety features. Having achieved this, the system is no longer a highly reliable production system for diagnostic test paper: it is a bleeping mess spilling water, breaking paper lines and causing a difficult-to-fix mess.

They continue to pose as Pulp Friction employees, leaking CCTV footage of the mess on the factory floor, showing panicking employees running around, and also post on social media claiming Pulp Friction never cared about reliability or security, and that money was the only goal, without any regard for patient safety: this company should never be allowed to supply anything to hospitals or care providers!

What it took to get back to business

Sarah arrived at Pulp Friction, a whirlwind of focused energy. Immediately, she connected with Alex and reviewed the abnormal system behavior. Her sharp eyes landed on the internet access logs for the smart maintenance system – a system Alex had mentioned implementing. Bingo! This web-exposed system, likely the initial point of entry, was wide open to the internet. Without hesitation, Sarah instructed the IT team to isolate and disable the internet access for the maintenance system – a crucial first step in stemming the bleeding.

“The only thing necessary for the triumph of evil is for good men to do nothing.”

Edmund Burke

Cybersecurity meaning: “Don’t be a sitting duck the day the zero-day is discovered.”

Next, she initiated the full incident response protocol, securing compromised systems, isolating affected network segments, and reaching out to both the Pulp Friction IT team and external forensics experts. The following 48 hours were a blur – a symphony of collaboration. Sarah led the incident response, directing forensics on evidence collection and containment, while the IT team worked feverishly to restore services and patch vulnerabilities.

Exhausted but resolute, Sarah and Alex presented their findings to the CEO. The CEO, witnessing the team’s dedication and the potential consequences, readily approved Sarah’s plan for comprehensive security improvements, including implementing zero-trust and segmentation on the OT network, finally putting Pulp Friction on the path to a more robust defense. They couldn’t erase the attack, but they could ensure it wouldn’t happen again.

With the immediate crisis averted, Sarah knew a stronger defense was needed. She turned to Alex, his eyes reflecting a newfound appreciation for cybersecurity. “Remember zero-trust, Alex? The system I’ve been recommending?” Alex nodded, his earlier skepticism replaced by a desire to understand.

“Think of it like guarding a high-security building,” Sarah began. “No one gets in automatically, not even the janitor. Everyone, from the CEO to the maintenance crew, has to show proper ID and get verified every time they enter.”

Alex’s eyes lit up. “So, even if someone snuck in through a hidden door (like the zero-day), they wouldn’t have access to everything?”

“Exactly!” Sarah confirmed. “Zero-trust constantly checks everyone’s access, isolating any compromised systems. Imagine the attacker getting stuck in the janitor’s closet, unable to reach the control room.”

Alex leaned back, a relieved smile spreading across his face. “So, with zero-trust, even if they got in through that maintenance system, they wouldn’t be able to mess with the paper production?”

“Precisely,” Sarah said. “Zero-trust would limit their access to the compromised system itself, preventing them from reaching critical control systems or causing widespread disruption.”

With the analogy clicking, Alex was on board. Together, Sarah and Alex presented the zero-trust solution to the CEO, emphasizing not only the recent attack but also the potential future savings and improved operational efficiency. Impressed by their teamwork and Sarah’s clear explanation, the CEO readily approved the implementation of zero-trust and segmentation within the OT network.

Pulp Friction, once vulnerable, was now on the path to a fortress-like defense. The zero-day vulnerability might have been a wake-up call, but with Sarah’s expertise and Alex’s newfound understanding, they had turned a potential disaster into a catalyst for a much stronger security posture. As production hummed back to life, creating the life-saving diagnostic paper, a sense of quiet satisfaction settled in. They couldn’t erase the attack, but they had ensured it wouldn’t happen again.

How Alex and Sarah collaborated to achieve zero-trust benefits in the OT network

Zero-trust in the IT world relies a lot on identity and endpoint security posture. Both those concepts can be hard to implement in an OT system. This does not mean that zero-trust concepts have no place in industrial control systems, it just means that we have to play within the constraints of the system.

  • Network segregation is critical. Upgrading from old firewalls to modern firewalls with strong security features is a big win.
  • Use smaller security zones than what has been traditionally accepted
  • For Windows systems in the factory, on Layer 3 and the DMZ (3.5) in the Purdue model, we are primarily dealing with IT systems. Apply strong identity controls, and make patchable systems. The excuse is often that systems cannot be patched because we allow no downtime, but virtualization and modern resilient architectures allow us to do workload management and zero-downtime patching. But we need to plan for it!
  • For any systems with weak security features, compensate with improved observability
  • Finally, don’t expose things to the Internet. Secure your edge devices, use DMZ’s, VPN’s, and privileged access management (PAM) systems with temporary credentials.
  • Don’t run things as root/administrator. You almost never need to.

In a system designed like this, the maintenance server would not be Internet exposed. The Harvester would have to go through a lot of hoops to land on it with the exploit. Assuming the threat actor manages to do that through social engineering and multiple hops of lateral movement, it would still be very difficult to move on from there:

  • The application isn’t running as root anymore – only non-privileged access
  • The server is likely placed in its own information management zone, receiving data through a proxy or some push/pull data historian system. Lateral movement will be blocked on the firewall, or at least require a hard-to-configure bypass.
  • The engineering workstations are isolated and not network reachable without a change request for firewall rules. Getting to the place the settings and logic can be changed gets difficult.
  • The PLC’s are configured to not be remotely programmable without a physical change (like a physical key controlling the update mode).

Using Sarah’s plan, the next time The Harvester comes along, the bad guy is turned away at the door, or gets locked into the janitor’s closet. The diagnostic paper is getting shipped.

Key take-aways:

  1. Exposing critical systems directly on the Internet is not a good idea, unless it is meant to be a web service engineered for that type of hostile environment
  2. Zero-trust in OT systems is possible, and is a good strategy to defend against zero-days.
  3. Defenders must be right all the time, hackers only need to get lucky once is a lie – if you implement good security architecture. Lucky once = locked into the janitor’s closet.

The security sweet spot: avoid destroying your profitability with excessive security controls

Excessive security controls when the organization isn’t ready causes friction and destroys value. Learn to identify your organization’s security sweet spot and avoid making the security team the most unpopular group in your company.

Many cybersecurity professionals are good at security but bad at supporting their organizations. When security takes priority over the mission of the organization, your security team may be just as bad for business as the adversary. Security paranoia will often lead to symptoms such as:

  • Security controls introducing so much friction that people can’t get much done. The best employees give up and become disengaged or leave.
  • Mentioning IT to people makes them angry. The IT department in general, and security team in particular, is hated by everyone.
  • IT security policies are full of threats of disciplinary actions, including reporting employees to the police and firing them.

Security when done wrong, can be quite toxic. When security aligns with the culture and mission of the organization, it creates value. When it is abrasive and misaligned, it destroys value. Paranoia is destructive.

An illustrative graph showing that the more secruity you add, the better it is, until it isn’t.

The minimum on the green line on the graph is perhaps the sweet spot for how much security to apply. The difficulty is in finding the sweet spot. It is also not a fixed point on the scale, it is a sliding scale. As the maturity of the organization develops, the sweet spot will move towards the right on the graph. Higher maturity in the organization will allow you to tighten security without destroying value through friction, inefficiencies and misalignment.

As the organization’s workflows and competence matures, it can benefit from tightening security

If you want to kick-start security improvements at work, consider e-mailing this article to your manager with your own take on what your organization’s security sweet spot is

Finding your sweet spot and translating it into security controls

Finding the sweet spot can be challenging. You want to challenge the organization, and help it grow its security maturity, without causing value destruction and disengagement. To achieve this, it is helpful to think about 3 dimensions in your security strategy:

  1. Business process risk
  2. Lean process flow with minimal waste
  3. Capacity for change

If you want to be profitable, keep an engaged workforce, and maintain a high level of security it requires good understanding of cyber risk, that you have established digital work processes that are not getting in the way of the organization’s goals, and a motivated workforce that welcomes change. If you are starting in a completely different place than that, tightening security can easily destory more value than it protects.

Understanding your business process cyber risk is necessary so that you can prioritize what needs to be protected. There are many methods available to asses risks or threats to a system. The result is a list of risks, with a description of possible causes and consequences, an evaluation of likelihood and severity, and suggested security controls or mitigations to reduce the risk. No matter what process you use to create the risk overview, you will need to

  • Describe the system you are studying and what about it is important to protect.
  • Identify events that can occur and disturb the system
  • Evaluate the risk contribution from the elements
  • Find risk treatments

If the risk to your business value from cyber attacks is very high, it would indicate a need for tighter security. If the risk is not too worrying, less security tightness may be appropriate.

The next step is about your workflows. Do you have work processes with low friction? Securing a cumbersome process is really difficult. Before you apply more security controls, focus on simplifying and optimizing the processes such that they become lean, reliable and joyful to work with. Get rid of the waste! If you are far from being lean and streamlined, be careful about how much security you apply.

The final point is the capacity for change. If the workforce is not too strained, has a clear understanding of the strategic goals, and feel they get rewarded for contributing to the organization’s mission, the capacity for change will typically be high. You can introduce more security measures without destroying value or causing a lot of frustration. If this is not in place, it will be a precursor for going deep on security measures.

To summarize – make sure you have efficient value creation processes and enough capacity for change before you apply a lot of security improvements. If your organization sees a high risk from cyber attacks, but has low process efficiency and limited capacity for change, it would be a good approach to apply basic security controls, and focus on improving the efficiency and capacity for change before doing further tightening. That does mean operating with higher risk than desired for some time, but there is no way to rush change in an organization that is not ready for it.

Security growth through continuous improvement is the way.

Like what you read? Remember to subscribe – and share the article with colleagues and friends!

The balancing act

Consider an engineering company that provides engineering services for the energy sector. They are worried about cyber attacks delaying their projects, which could cause big financial problems. The company is stretched, with most engineers routinely working 60-hour weeks. The workflows are highly dependent on the knowledge of individuals, and not much is documented or standardized. The key IT systems they have are Windows PC’s and Office 365, as well as CAD software.

The CEO has engaged a security consulting company to review the cybersecurity posture of the firm. The consulting report shows that the company is not very robust to cyber attacks, that security awareness is low. The cyber risk level is high.

The CEO, herself an experienced mechanical engineer, introduces a security improvement program that will require heavy standardization, introduction of new administrative software and processes, and will limit the personal freedom in choice of working methods for the engineers. He meets massive opposition, and one of the most senior and well-respected engineering managers says “this is a distraction, we have never seen a cyber attack before. We already work a lot of overtime, and cannot afford to spend time on other things than our core business – which is engineering.”. The other lead engineers support this view.

The CEO calls the consultants up again, and explains that she will have difficulties with introducing a lot of changes, especially in the middle of a big project for one of the key customers. She asks what the most important security measures would be. She gets a list of some key measures that should be implemented, such as least privilege access, multifactor authentication and patching. The CEO then makes a plan to roll out MFA first, and then to focus on working with the engineers to improve the work flows to reduce “waste”. With a step-by-step approach, they have seen some security wins, and after 12 months, the organization is at a much healthier state.

  • Engineers no longer log on to their PC’s as administrators for daily work
  • MFA is used everywhere, and 90% of logons are now SSO thorugh Entra ID
  • They have documented, standardized and optimized some of the work processes that they do often. This has freed up a lot of time, 60-hour weeks are no longer the norm.
  • The CEO has renewed focus on strategic growth for the company, and everyone knows what the mission is, and what they are trying to achieve. Staff motivation is much higher than before.

Thanks to good organizational understanding of the CEO, and helpful input from the security consultants, the actual security posture is vastly improved, even with few actual security controls implemented. The sweet spot has taken a giant leap to the right on the attacker-paranoia graph, and the firm is set to start its maturity growth journey for improved cybersecurity.

The key take-aways

  • Don’t apply more security tightness than the organization can take. That will be destructive.
  • Assess the security needs and capacity by evaluating risk, business process efficiency and capacity for change
  • Prioritize based on risk and capacity, improve continuously instead of trying to take unsustainable leaps in security maturity

Secure Multihomed Devices in Control Networks: Managing Risks and Enhancing Resilience

In control networks, where ensuring constant communication and reliable operation is critical, devices are frequently configured to be multihomed. This means they possess connections to multiple separate networks. This approach is favored over traditional routing methods where traffic is passed between networks. The advantage lies in the redundancy and potential performance boost multihoming offers. If one connection malfunctions, the device can seamlessly switch to another, maintaining vital communication within the control network. Additionally, multihoming allows for the possibility of utilizing different networks for specific traffic types, potentially optimizing overall control network performance.

While multihoming offers redundancy and performance benefits in control networks, it introduces security risks if the connected networks are meant to be entirely separate. Here’s why:

  1. Bridging Separate Networks: A multihomed device acts like a bridge between the networks it’s connected to. If these networks should be isolated for security reasons (e.g., a control signal network and a configuration network), the multihomed device can unintentionally create a pathway for unauthorized access. A malicious actor on one network could potentially exploit vulnerabilities on the device to gain access to the otherwise isolated network.
  2. Policy Bypass: Firewalls and other security measures are typically implemented at network borders to control traffic flow. With a multihomed device, traffic can potentially bypass these security controls altogether. This is because the device itself can become a point of entry, allowing unauthorized traffic or data to flow between the networks, even if the network firewalls have proper rules in place.
  3. Increased Attack Surface: Each additional connection point represents a potential vulnerability. With a multihomed device, attackers have more opportunities to exploit weaknesses in the device’s security or configuration to infiltrate one or both networks.

Bypassing firewalls: an example

Consider a system with two networks, where traffic is routed through a firewall. Network B is considered critical for real-time operations and has primarily control system protocols such as Modbus. This network is not encrypted. Network A is primarily used for configuring systems and reprogramming controllers. Most of the traffic is encrypted. Remote access is accepted into Network A, but not Network B.

On the firewall, all traffic between A and B is blocked during normal operation. When a controller in network B needs to be updated, a temporary firewall rule to allow the traffic is added.

Computer 2 i multi-homed and can be used to bypass the firewall

Along comes the adversary, and managed to use remote access to compromise Computer 1, and take over a local administrator account. Then the attacker moves laterally to Computer 2 using the Network A interface, managing to secure an SSH shell to Computer 2. From this shell, the attacker now has access to the control network over the second network interface, and executes a network scan from Computer 2 to identify the devices in Network B. Moving from there, the attacker is able to manipulated devices and network traffic to cause physical disruption, and the plant shuts down.

What are the options?

Your options to reduce the risk from multihomed devices may be limited, but keeping it like the example above is definitely risky.

  • The ideal solution: Remove any multi-homed setups, and route all traffic through the firewall. This way you have full control of what traffic is allowed. This may not be possible if the latency added is too much but this is a rare constraint.
  • The micro-segmented solution: Keep the network interfaces but add stateless firewalls on each network card to limit the traffic. Then the multi-homed device becomes its own network segment. Using this to implement a default deny policy will greatly improve the security of the solution.
  • Device hardening: This should be done for all the solutions, but can also be a solution in its own right. Keep the multi-homed behavior in place, but harden the device so that taking over it becomes really difficult. Disable all unused services, run all applications with minimal privileges, and used the host-based firewall to limit the traffic allowed (both ingress and egress).

AI-Powered Threat Hunting on Linux Servers: Honeypot Experiment and Privilege Escalation

Microsoft has received a lot of attention for its Copilot for Security. Some of it good, some of it bad. Irrespective of that, using AI to help with finding evil in the stack of logs is tempting.

Let’s try to do that. We set up a Linux server exposed to the Internet with SSH open, with password authentication. We create a user called admin with the password donaldtrump2024. We also make the /etc/passwd file writable for all users, but turn on file access logging for this file. Then we wait.

Creating a very insecure Linux server

We set up a spot instance on Google’s compute engine service. This instance is running Debian and is exposed to the Internet. We did the following things:

  1. Created a new standard user with username admin
  2. Set a password for the user to donaldtrump2024
  3. Allowed ssh login with password
  4. Set chmod 777 on /etc/passwd
  5. Installed auditd
  6. Set auditd to monitor for changes to /etc/passwd

Now it should be easy to both get access to the box using a brute-force attack, and then to elevate privileges.

The VM was set up a spot instance since I am greedy, and relatively quickly shut down. Might have been a mistake, will restart it and see if we can keep it running for a while.

Command line hunting

Since this is a honeypot, we are just waiting for someone to be able to guess the right password. Perhaps donaldtrump2024 is not yet in enough lists that this will go very fast, but we can do our own attack if the real criminals don’t succeed.

To find successful logins we can use the wtmb file with the utility command “last”. After being up for 30 minutes on the Internet, there has been only one attempt at logging in with the user ‘admin’. It is interesting to see which users the attackers are trying:

sudo lastb -w |cut -d ' '  -f1 | grep -wv btmp | grep -v '^$' | sort | uniq -c | sort -rn | head -n 5

This gives us the following top 5 list of attempted usernames:

  • root (20)
  • ubuntu (6)
  • xu (4)
  • vps (4)
  • steam (4)

There are in all 53 different user names attempted and 101 failed login attempts. Still no success for our admin user.

To sum it up – to look at successful login attempts, you can use the command “last”. Any user can use this one. To look at failed login attempts, you can use the command “lastb”, but you need sudo rights for that.

We expect the bad guys to look for privilege escalation opprotunities if they breach our server. The writable passwd should be pretty easy to find. Since we have set up auditd to track any file accesses we should be able to quickly find this simply by using the utility ausearch with parameters

ausearch -k <keyword>

Simulating the attack

Since we don’t want to wait for a real attacker to finally find our pot of honey, we’ll need to do the dirty deeds ourselves. We will try to log in repeatedly with the wrong password, then get the right password using SSH. When we get in, we will locate the /etc/passwd, check its permissions, and then edit it to become root. Then we see how we can discover that this happened. First, the attack:

screenshot of simualated brute force attack

Then, we check if we have sudo rights with sudo -l. We don’t have that. Then we check permissions on /etc/passwd… and.. bingo!

ls -la /etc/passwd
-rwxrwxrwx 1 root root 1450 Jun  6 17:02 /etc/passwd

The /etc/passwd file is writeable for all! We change the last line in the file to

admin:x:0:0:root:/root:/bin/sh

After logging out, we can not get back in again! Likely because the root user is blocked from logging in with ssh, which is a very good configuration. (Fixing this by logging in with my own account, and setting the /etc/passwd back to its original state, then doing the attack again). OK, we are back to having edited the /etc/passwd πŸ™‚

We now escalate with using su to log in as ourselves after editing the /etc/passwd file, to avoid being blocked by sshd_config.

root i am screeshot

OK, we are now root on the server. Mission accomplished.

Hunting the bad guys with AI

Let’s try to ask Gemini (the free version) to try to find the evil in the logs. First we get the lastb and last logs, and ask Gemini to identify successful brute-force attacks:

Here's a log of failed logons on a linux server:
admin    ssh:notty    194.169.175.36   Thu Jun  6 18:14 - 18:14  (00:00)
ant      ssh:notty    193.32.162.38    Thu Jun  6 18:14 - 18:14  (00:00)
ant      ssh:notty    193.32.162.38    Thu Jun  6 18:14 - 18:14  (00:00)
visitor  ssh:notty    209.38.20.190    Thu Jun  6 18:13 - 18:13  (00:00)
visitor  ssh:notty    209.38.20.190    Thu Jun  6 18:13 - 18:13  (00:00)
admin    pts/1                         Thu Jun  6 18:12 - 18:12  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 18:10 - 18:10  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 18:10 - 18:10  (00:00)
ansibleu ssh:notty    193.32.162.38    Thu Jun  6 18:07 - 18:07  (00:00)
ansibleu ssh:notty    193.32.162.38    Thu Jun  6 18:07 - 18:07  (00:00)
ubnt     ssh:notty    209.38.20.190    Thu Jun  6 18:06 - 18:06  (00:00)
ubnt     ssh:notty    209.38.20.190    Thu Jun  6 18:06 - 18:06  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 18:05 - 18:05  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 18:05 - 18:05  (00:00)
user     ssh:notty    85.209.11.27     Thu Jun  6 18:04 - 18:04  (00:00)
user     ssh:notty    85.209.11.27     Thu Jun  6 18:04 - 18:04  (00:00)
ibmeng   ssh:notty    193.32.162.38    Thu Jun  6 18:01 - 18:01  (00:00)
ibmeng   ssh:notty    193.32.162.38    Thu Jun  6 18:01 - 18:01  (00:00)
root     ssh:notty    209.38.20.190    Thu Jun  6 17:59 - 17:59  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 17:59 - 17:59  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 17:59 - 17:59  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 17:59 - 17:59  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 17:59 - 17:59  (00:00)
admin    ssh:notty    88.216.90.202    Thu Jun  6 17:59 - 17:59  (00:00)
admin    ssh:notty    193.32.162.38    Thu Jun  6 17:54 - 17:54  (00:00)
masud02  ssh:notty    209.38.20.190    Thu Jun  6 17:53 - 17:53  (00:00)
masud02  ssh:notty    209.38.20.190    Thu Jun  6 17:52 - 17:52  (00:00)
admin    ssh:notty    193.32.162.38    Thu Jun  6 17:48 - 17:48  (00:00)
auser    ssh:notty    209.38.20.190    Thu Jun  6 17:46 - 17:46  (00:00)
auser    ssh:notty    209.38.20.190    Thu Jun  6 17:46 - 17:46  (00:00)
radio    ssh:notty    193.32.162.38    Thu Jun  6 17:41 - 17:41  (00:00)
radio    ssh:notty    193.32.162.38    Thu Jun  6 17:41 - 17:41  (00:00)
root     ssh:notty    209.38.20.190    Thu Jun  6 17:39 - 17:39  (00:00)
root     ssh:notty    193.32.162.38    Thu Jun  6 17:35 - 17:35  (00:00)

Here's a list of successful ones: 
cyberhak pts/2        35.235.241.16    Thu Jun  6 18:16   still logged in
admin    pts/1        88.216.90.202    Thu Jun  6 18:10    gone - no logout
cyberhak pts/0        35.235.244.33    Thu Jun  6 18:07   still logged in
admin    pts/3        88.216.90.202    Thu Jun  6 17:59 - 18:05  (00:05)
cyberhak pts/0        35.235.244.32    Thu Jun  6 17:52 - 18:05  (00:12)
cyberhak pts/1        35.235.244.32    Thu Jun  6 17:39 - 18:01  (00:21)
cyberhak pts/0        35.235.244.34    Thu Jun  6 17:32 - 17:48  (00:15)
reboot   system boot  6.1.0-21-cloud-a Thu Jun  6 17:31   still running
cyberhak pts/5        35.235.244.33    Thu Jun  6 17:23 - 17:26  (00:03)
cyberhak pts/3        35.235.244.32    Thu Jun  6 17:19 - 17:26  (00:07)
cyberhak pts/2        35.235.241.16    Thu Jun  6 17:14 - 17:26  (00:11)
cyberhak pts/1        35.235.244.34    Thu Jun  6 17:07 - 17:24  (00:16)
cyberhak pts/0        35.235.244.32    Thu Jun  6 17:01 - 17:16  (00:14)
reboot   system boot  6.1.0-21-cloud-a Thu Jun  6 16:53 - 17:26  (00:33)

Can you find successful brute-force logins? For which user was this? Did the IP addresses vary during the attack? How long did the attack last before the attacker succeeded?

Gemini successfully identifies the attack, although confounds different attackers:

Yes, there are successful brute-force logins in this log. Here’s a breakdown:

  • User: The successful logins were for the user “admin“.
  • IP Addresses: The attacker used multiple IP addresses during the attack. These include 35.235.241.16, 35.235.244.32, and 35.235.244.33. This suggests a coordinated attack or the use of a botnet.
  • Attack Duration: It’s difficult to pinpoint the exact start time of the attack, but there are failed attempts for “admin” as early as 17:35 on Thu Jun 6. The first successful login for “admin” appears at 17:52 on Thu Jun 6. This means the attack likely lasted for at least 17 minutes.

Here are the lines in the log that show successful logins:

  • cyberhak pts/2 35.235.241.16 Thu Jun 6 18:16 still logged in
  • cyberhak pts/0 35.235.244.33 Thu Jun 6 18:07 still logged in
  • admin pts/3 88.216.90.202 Thu Jun 6 17:59 - 18:05 (00:05) (This user “admin” successfully logged in at some point between 17:59 and 18:05)

Even though the logs don’t show the exact method used for the brute-force attack, the repeated failed attempts with various usernames suggest an attempt to crack passwords.

Looking for privilege escalation

OK, we have a likely breach from the user “admin”. Did the user try to escalate privileges?

The first thing an attacker would do to achieve privilege escalation is to find vulnerabilities to exploit. Let’s run an enumeration script to see if we find anything exploitable first, got get a hunting hypothesis in place: https://github.com/peass-ng/PEASS-ng/tree/master/linPEAS.

Piping linpeas script to bash to find privesc vulns

And we find that /etc/passwd is writeable – when linpeas marks something with yellow background, it claims that it has a 95% chance of being exploitable to escalate privileges. It is an easy thing to check, so the hacker probably found it too!

Easiest way to check now is of course just to read the file! Doing that, we see that indeed, there is an entry giving the user “admin” root privileges:

cyberhakon@instance-20240606-164919:~$ tail /etc/passwd
messagebus:x:102:104::/nonexistent:/usr/sbin/nologin
systemd-network:x:998:998:systemd Network Management:/:/usr/sbin/nologin
systemd-timesync:x:997:997:systemd Time Synchronization:/:/usr/sbin/nologin
systemd-resolve:x:996:996:systemd Resolver:/:/usr/sbin/nologin
tcpdump:x:103:109::/nonexistent:/usr/sbin/nologin
sshd:x:104:65534::/run/sshd:/usr/sbin/nologin
polkitd:x:995:995:polkit:/nonexistent:/usr/sbin/nologin
..
admin:x:0:0:root:/root:/bin/sh

Let’s now provide the audit log to Gemini, and ask it to analyse it.

The logs don’t conclusively show that the user “admin” edited the “/etc/passwd” file. Here’s what we can see:

  • User “cyberhakon” with root privileges (UID=0) opens the “/etc/passwd” file with vim (audit type: SYSCALL, msg: audit(1717870666.722:125)).
  • There’s a following syscall fchmod (audit type: SYSCALL, msg: audit(1717870666.730:126)) which suggests a permission change attempt, but it doesn’t confirm a modification to the file itself.

However, there are some interesting points:

  • User “cyberhakon” logs in with root privileges twice using su and sudo before this event.
  • User “admin” then logs in successfully via SSH.

It’s a possibility that “cyberhakon” might have intended to edit the “/etc/passwd” file to grant “admin” escalated privileges, but due to missing logs or them being outside the timeframe provided, we cannot be certain.

Here’s what you can do to investigate further:

  • Check the access logs of the FTP server or SSH server to see if there were any uploads or modifications to the “/etc/passwd” file around the timeframe.
  • Analyze the changes made to the “/etc/passwd” file. You can use tools like passwd -S <username> to view password information or check timestamps of the file itself.
  • Look for any additional logs related to user privilege changes. This might involve looking at authorization logs or security software logs.

Remember, these are just suggestions, and the best course of action might depend on your specific system and security policies.

OK, so from the file, Gemini does not conclude anything directly, but clearly understands what to look for, and gives hints for further investigation steps. Let’s check when passwd was last modified first:

cyberhakon@instance-20240606-164919:~$ stat /etc/passwd
  File: /etc/passwd
  Size: 1443            Blocks: 8          IO Block: 4096   regular file
Device: 8,1     Inode: 396540      Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-06-08 18:33:44.169031086 +0000
Modify: 2024-06-08 18:33:30.355890892 +0000
Change: 2024-06-08 18:33:30.355890892 +0000
 Birth: 2024-06-06 17:02:03.169890922 +0000

From the stat command we see that the file was last modified at 18:33:30. Let’s see if admin was logged in then. Using the “last” command, we get that “admin” logged in at 18:21, and is still logged on when this is checked (at 18:59).

Since we have also configured audit logging, we can search for the key we set for write attempts to /etc/passwd. We then find that at 18:33 was modified with vim with a user with uid=1005, and starting in the working directory /home/admin. In other words, it is highly likely that the user “admin” escalated privileges by editing /etc/passwd at 18:33.

time->Sat Jun  8 18:33:30 2024
type=PROCTITLE msg=audit(1717871610.355:157): proctitle=76696D002F6574632F706173737764
type=PATH msg=audit(1717871610.355:157): item=1 name="/etc/passwd" inode=396540 dev=08:01 mode=0100777 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=PATH msg=audit(1717871610.355:157): item=0 name="/etc/" inode=393343 dev=08:01 mode=040755 ouid=0 ogid=0 rdev=00:00 nametype=PARENT cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
type=CWD msg=audit(1717871610.355:157): cwd="/home/admin"
type=SYSCALL msg=audit(1717871610.355:157): arch=c000003e syscall=257 success=yes exit=3 a0=ffffff9c a1=56004f7d4150 a2=41 a3=1ff items=2 ppid=932 pid=14377 auid=1005 uid=1005 gid=1005 euid=1005 suid=1005 fsuid=1005 egid=1005 sgid=1005 fsgid=1005 tty=pts1 ses=4 comm="vim" exe="/usr/bin/vim.basic" subj=unconfined key="user-modify-passwd"

We can then conclude that:

  • Chatbots are helpful for threat hunting and provide reasonable suggestions
  • They may not find everything on their own
  • It is possible to build agents that can automate some of the forensic groundwork in threat hunting using AI – that may be a topic for a future post πŸ™‚