Things I learned from starting – and shutting down – a company

In 2016 I worked as a business development manager at Lloyd’s Register‘s consulting unit in Norway. We were building up a new service within industrial cybersecurity, and had a few good people on the team. We had great plans but then difficult times in the oil and gas sector started to cause problems for us. The order books were close to empty and the company started offering severance packages. We lost two key resources for our cybersecurity project, and internal funding for “future growth” was hard to obtain in this economic climate. That was the birth of the company Cybehave.

Starting a company with little sense of direction

We started Cybehave first a development project where we wanted to automate cybersecurity risk assessment, to make such services available to smaller companies. We got seed funding from Innovation Norway – called “markedsavklaringsstøtte” (Norwegian for “market validation grant”), about NOK 85.000. We also got a free workplace for a while at a startup incubator while establishing a “minimum viable product”. A key problem was that we didn’t really know what was a viable product, or who the customers were. We were searching for pilot customers, looking at small and medium sized businesses. All our real-world sales experience, however, was from LR. We were used to working with global energy companies, government agencies and international manufacturing organizations. The contacts we had, and the typical way to initiate conversations in that space, was irrelevant in the SMB space. So we were to a large degree guessing what those SMB’s would need in terms of security, having problems agreeing between ourselves what exactly our value proposition was. While doing this, our laid-off PhD level expert in risk management was building a minimum viable product by coding up a web application in Django/Python.

Without a clear understanding of your market, it hard to know what focus on.

We did focus groups, where we invited companies from many sectors. We got little useful feedback. Vi visited a lot of companies, trying to convince them that they needed cybersecurity risk management and awareness training. They were not particularly interested, and our message was perhaps not very clear either.

Before you invest a lot of time (and money) in your product, know who the customer is, and what problem you are solving for them. If you don’t know, spend time searching for a problem to solve instead of a customer who has the problem you have imagined must be important to others.

Without money, life is hard

Still without customers, we wanted to sell our great approach to human centric cybersecurity. We were thinking that “we don’t have customers because we don’t have money for marketing”. Because of this, we wanted a bring an investor on board. One of the co-founders focused a lot on this, but finding an investor who is interested without customers and cash flow, and without a very clear value proposition was difficult, for some reason. Here’s what we learned:

  • Local angel investors want a lot for equity without contributing much money. They have limited networks and understanding of B2B markets.
  • Pitching in start-up events requires to have a really good story. B2C stories tend to win over B2B stories, at least if your story isn’t particularly exiting
  • Financial estimates have very little value in the early phase. They are mostly baseless guesstimates, sprinkled with wishful thinking.
  • Professional investors give a lot of very useful feedback. Talking to investment funds even if you are not in a place where you would be a good investment. You learn how they think, what they are looking for: clarity of benefit provided, growth potential, intellectual property rights, and capabilities of the management team/founders.

End of story, we did not get any external investment. The story was a bit too vague to compete with B2C for small-scale investors – or for the offers we did get we were too greedy to say “yes” to give away too much ownership, and too early to be interesting to equity funds.

We went to the government again – Innovation Norway. They granted us a “commercialisation grant” of NOK 450.000. We received the first pay-out early 2019, 50% of the grant. That process was not without effort, but with a better story to tell, a better plan, and a working prototype to demonstrate part of we wanted to do was enough to get that money. And it was a nice grant because we did not have to give away equity – although the amount of money was not anywhere close to what we wanted to get sufficient growth. Because of this, and our not so successful attempts at getting investors on board, we switched the strategy of getting funding the old-fashioned way; through positive cash flow.

Because the company was not making money, and we did not have any serious funding in place, nobody was working on the project full-time. We all had day jobs, and demanding day jobs at that. Building up a security team at a global IT company, leading a department at a regional hospital. This further hampered product development.

You need a realistic funding plan from the beginning. Think through what you want from an external investor (money – or network, operational experience, support in addition) and how much of the equity (and control) you are willing to part with.

We did not want to make money by selling consulting hours. We wanted to build a scalable alternative. However, to provide cash flow to the company, we decided to start doing some consulting. However, doing that on top of a day job that had to be followed up, did not leave much time for building those scalable services!

Create a realistic plan for input resources, whether time or money. Full time work on the side of bringing in money for development through consulting is not a sustainable model.

Administration requires work too

It is easy to focus on the customer, the big ideas, developing software (more about that later). if you don’t keep up with administrative needs, there will be problems.

Accounting is important. There are many software companies selling “do-it-yourself” accounting solutions. Unless you enjoy accounting and actually know what you are doing, avoid the DIY solutions. IT is hard to know which account to use for a certain expense, and what services bought outside the country that should be reported for VAT or not. You could spend time learning all this, but unless that is your core business or you enjoy the details of accounting, get help. Top three accounting tips must be:

  • Engage an accountant.
  • Set up integrations between your bank accounts and your accounting system.
  • Use the accounting data to keep track of your company’s finances. Set up dashboards or reports that make sense to you. As a bare minimum you should get monthly statements on cash flow, liquidity and expenses in key categories (e.g. cloud computing, travel, salary).

In addition to accounting you will need to report regularly to the government. In Norway you will have to create a VAT tax report every other month. Failing to report on time will cause trouble – or fines from the tax authorities. This job is definitely best left to an accountant again! The same goes for the annual accounts and shareholder registry if your company is a limited company with shares.

Get an accountant, and set up bank integration solutions and automation as early as you can. This will free up a lot of time and worry so you can focus on building your company.

A successful product: PrivacyBox

In 2018 I worked at Sportradar as my full-time day job. There I met the data protection officer, newly hired, who was trying to get this multinational company in shape for the GDPR. Together we created an internal tool for a personal data inventory solution. We also saw that there were a lot of challenges related to management of requests from data subjects. The most common solution was to publish an e-mail address on the privacy policy page where people could submit requests for access to data, deletion or other rights they want to exercise under the GDPR or other policies. We agreed to take my colleague from Sportradar on as a shareholder in Cybehave and to develop a good solution for handling privacy rights. The counterpart at Sportradar was the head of legal, to avoid conflicts of interest. Sportradar would be a pilot customer, with free access the first months (before the product was actually very usable) as long as we got feedback on the software. Then they would get a discount for some time before the price goes up to the market price.

This gave us a very different situation from the security awareness and risk solution: someone with actual use for the product who could tell us what they needed. It was mainly I who developed the first version of this software, as a prototype. We got a lot of great features in, and the customer was happy with the product. It was in use by Sportradar globally across all their brands from 2019 to 31 December 2021. They had to switch vendor because Cybehave is being dissolved but they were happy with the solution.

  • Have a pilot customer before you write any type of code
  • The pilot customer should have a clear need to satisfy and opinions on how the system should work
  • The pilot customer should have sufficient volume of work to be done in the software that you get real-world experience and feedback

In addition to the help we got from the clear feedback from the pilot customer, we quickly learned a few other things:

  • Create great end-user documentation that tells users how to accomplish tasks.
  • For “one-time users” such as data subjects making requests, make filling in the form as quick and easy as possible
  • Solutions that filter spam are important when publishing forms online on pages with high-volume traffic. An e-mail with a confirmation link is a simple and effective solution for this.
  • Application logging is extremely important for troubleshooting and customer support requests
  • Be prepared to answer customer support requests quickly. Keeping the customer happy means making sure they can get their work done, even when the software solution has a bug or is missing a feature

Work closely with a pilot customer to create a product that actually solves a problem. Remember that documentation, logging and support are essential parts of the service offering!

Don’t develop software alone

Cybehave was a company without full-time employees. In fact, most of the time it did not have any employees at all. In the beginning, the first prototype of a SaaS software was created by the colleague that was let go from LR. She was a brilliant risk analyst, and great at scientific computing. That does not make you a software engineer. The majority, however, was written by me. The other two co-founders were non-technical and did not write code. Not sure I am brilliant at anything, but I am also not a software engineer by education. I did, however, learn a great deal from the Cybehave project, as well as from working at Sportradar. Key take-aways for the next time:

  1. Don’t write software alone. It is too much work and too easy to make serious mistakes leading to vulnerabilities and nasty bugs.
  2. Spend more time thinking about architecture and design patterns than actually writing code.
  3. Iterate. When your new feature works, it is not done. Work on it until it becomes good – think about and measure performance, reliability, user experience. And most of all: get outside feedback on how well it works – it will all be easy to you, since you created it!
  4. Test. Because of the lack of formal software engineering background and a focus on “creating the things” as a one-man show, not much testing was done when writing Cybehave’s software. Testing is extremely important for both performance and security.
  5. Don’t create features the customer does not need. All software will need to be maintained, the less code you have, the less interest there will be to pay on the technical debt.

For PrivacyBox we sometimes needed to improve features or add new ones. At one point, we decided to hire a freelancer to do some improvements. That freelancer was a professional software engineer who was not necessarily cheap per hour, but created high-quality code, improved architecture and provided very helpful feedback on technical details. If your team does not have the competence needed and you cannot afford to hire someone, contract with good freelancers for specific tasks and make sure to work closely with them.

Automation and Git hygiene provide a lot of value

You should not make software development a solo project, and testing is important. If you are a non-technical founder but your company makes software, make sure to talk to your technical team about how to ensure good quality of the software you produce. Even with a small team, or with freelancers on board for specific features, you will gain a lot by setting up automated tests and build pipelines. This will reduce the number of bugs and provide help to build better software.

  • Set up at least three branches in Git: development, test, production
  • Push often to development to make sure you do not lose work
  • Use feature branches that will merge to development
  • Merge to test branch should automatically run important tests. Those should include static analysis and software component analysis as a minimum. You should also have unit tests and integration tests running in a software test suite. If tests fail, you should not be able to merge the branch into production.
  • When you merge to production, your pipeline should automatically push the changes to the production servers. Most likely you will be running your software on public cloud infrastructure. Public cloud providers will typically have good documentation available for how to set up CI/CD pipelines.

Application security bare minimum practices

Nothing will erode your customer’s trust as fast as a compromised software solution. Security is business critical, not only to you but also to your customer. Because of this, you should make sure that the software you create follows some key practices.

  1. Ensure identity and authorization management is properly implemented. Use single sign-on solutions for B2B interactions when possible. If you implement your own authentication and authorization system, make sure passwords are strong enough, hashed and salted, and that multifactor authentication is available and possibly required.
  2. Log all security events and create alerts for unexpected events. Important events include authentication, password change, privilege escalation (if multiple authorization levels exist), user creation, unauthorized access/transaction attempts, all privileged access/transactions. In addition, there may be context specific events that are important to track, such as data deletion, data sharing, etc.
  3. Ensure input validation is applied for all user generated input. This also applies to responses from third-party API’s.
  4. Make sure there are no secrets in your code. Secrets should be injected at run-time and be possible to rotate.

Follow good software engineering practices form the start. If you don’t you will get a lot of technical debt, which means there will be so much maintenance to do that you will never catch up.

Lessons from shutting it down

So, Cybehave came to an end. Closing down a software company, means shutting down a lot of services. It would have been much easier to do this, if we had an inventory of all online services and software solutions we were using. When starting to shut down our operations, we had to create this inventory. Here are some categories of services we were using:

  • Transactional mail providers
  • Cloud services (we were running on Google cloud IaaS and PaaS solutions)
  • Office/collaboration software
  • CRM and marketing solutions
  • Github organization with private and public repositories
  • Accounting software
  • Mobile apps from banks, etc.

Just keeping track of the online accounts and services used in a spreadsheet would be a great help. I noticed that we had accounts with many SaaS providers that we were not using; we had simply tried them out and left the accounts active when abandoning them. With a cloud software inventory and a practice to shut down unused accounts we would not only make it easier to shut down the company, we would also have reduced our attack surface.

Shutting down a company also means reporting this to the authorities. We got good help from our accountant in doing this, which takes way the uncertainty about what is required.

Telling our customers has also been important, of course. This should be done in good time, so the customers can transfer data and systems to new solutions if your products are being discontinued. We see that this requires a bit of support time and extra engineering effort to create good data transfer solutions. Factoring in the time to do this is important so that no bridges are burned and contractual obligations are met.

If you are shutting down your company, set aside enough time for technical shutdowns, mandatory reporting, and most importantly, taking care of your business relationships.

Postludium

Cybehave has been a great journey, and the company was actually profitable most years. Most of the cash flow came from consulting, where we have had the privilege of helping software companies, healthcare authorities, municipalities, construction companies. As much as I enjoy creating software, working directly with customers creating value and improving security is the real motivator. Today I am back in a great company where I can do this every day – with real positive impact.

Talking to fund managers and potential customers that never resulted in investments or sales have also been interesting. Start-up life is full of contrasts, at one point we were sitting in a meeting with top management of a multinational engineering company one day, and meeting a potential customer the next day where all 6 employees shared one office that was filled with cardboard boxes and laptops on the floor. It is rewarding but also tiring. But without sufficient financial muscles, the impact you want to make will remain a dream.

Although leaving a project you have put thousands of hours into will inevitably make you feel a bit melancholic, the future is bright, exiting and fast paced!

Happy new year – cybersecurity is still the main focus for 2022. Working to keep the lights on, hospitals running and supply chains safe from hackers.

My new project is working for DNV Cybersecurity, where we are building the world’s best industrial cybersecurity provider within DNV’s Accelerator. DNV’s purpose is to safeguard life, property and the environment – which is very close to heart for me. DNV Cybersecurity has recently joined forces Applied Risk, and this is only the beginning. I am therefore looking forward to making impact together with great colleagues at DNV, where fast growth will allow us to bring the best security solutions for the real world to more customers around the world, defending hospitals, the power grid, shipping, food supply chains and the energy markets from hackers also in 2022.

Happy new year to all of you!

Vendor Security Management: how to decide if tech is safe (enough) to use

tl;dr: Miessler is right. We need to focus on our own risk exposure, not vendor security questionnaires

If you want to make a cybersecurity expert shiver, utter the words “supply chain vulnerabilities”. Everything we do today, depends on a complex mixture of systems, companies, technologies and individuals. Any part of that chain of interconnected parts can be the dreaded weakest link. If hackers can find that weak link, the whole house of cards comes crumbling down. Managing cyber supply chain risk is challenging, to say the least. 

Most companies that have implemented a vendor cybersecurity risk process, will make decisions based on a questionnaire sent to the vendor during selection. In addition, audit reports for recognized standards such as ISO 27001, or SOC2, may be shared by the company and used to assess the risk. Is this process effective at stopping cyberattacks through third parties? That is at least up for debate.

Daniel Miessler recently wrote a blog post titled It’s time for vendor security 2.0, where he argues that the current approach is not effective, and that we need to change the way we manage vendor risks. Considering how many cybersecurity questionnaires Equifax, British Airways and Codecov must have filled in before being breached, it is not hard to agree with @danielmiessler about this. What he argues in his blog is: 

  1. Cybersecurity reputation service (rating companies, etc) are mostly operating like the mob, and security questions are mostly security theater. None of this will save you from cyber armageddon.
  2. Stay away from companies that seem extremely immature in terms of security
  3. Assume the vendor is breached
  4. Focus more on risk assessment under the assumption that the vendor is breached than questionable questionnaires. Build threat models and mitigation plans, make those risks visible. 

Will Miessler’s security 2.0 improve things?

Let’s pick at the 4 numbered points above one by one. 

Are rating companies mobsters? 

There are many cybersecurity rating companies out there. They take measure of themselves to be the Moody’s or S&P’s of cybersecurity. The way they operate is they pull in “open source information about cybersecurity posture” of companies. They also say that they enrich this information with other data that only they have access to (that is, they buy data from marketing information brokers and perform data exchange with insurance companies). Then they correlate this information in more or less sound statistical ways (combined with a good dose of something called expert judgment – or guessing, as we can also call it) with known data breaches and create a security score. Then they claim that using companies with a bad score is dangerous, and with a good score is much better. 

This is definitely not an exact science, but it does seem reasonable to assume that companies that show a lot of poor practice such as a lack of patching, botnet infected computers pinging out to sinkholes and so on, have worse security management than similar companies that do not have these indicators. Personally, I think a service like this can help sort the terrible ones from the reasonably OK ones. 

Then, are they acting as mobsters? Are they telling you “we know about all these vulnerabilities, if you don’t pay us we will tell your customers?”. Not exactly. They are telling everyone willing to pay for access to their data these things, but they are not telling you about it, unless pay them. It is not exactly in line with accepted standards of “responsible disclosure”. At the same time, their findings are often quite basic and anyone bothering to look could find the same things (such as support for old ciphers on TLS or web servers leaking use of an old PHP version). Bottom line, I think their business model is acceptable and that the service can provide efficiency gains for a risk assessment process. I agree with Miessler that trusting this to be a linear scale of cyber goodness is naive at best, but I do think companies with a very poor security rating would be more risky to use than those with good ratings. 

mobster planning his next security rating extortion of SaaS cybersecurity vendors
Some security vendors have a business model that resemble extortion rackets of a 1930’s mobster. But even mobsters can be useful at times.

Verdict – usefulness: rating services can provide a welcome substitute or addition for slower ways of assessing security posture. An added benefit is the ability to see how things develop over time. Small changes are likely to be of little significance, but a steady improvement of security rating over time is a good sign. These services can be quite costly, so it is worth thinking about how much money you want to throw at it. 

Verdict – are they mobsters? They are not mobsters but they are also not your best friends. 

Are security questionnaires just security theater? 

According to Miessler, you should slim down your security questionnaires to two questions: 

  1. “when was the last time you were breached (what happened, why, and how did you adjust)”?, 
  2. and “do you have security leadership and a security program?”.

The purpose of these questions is to judge if they have a reasonable approach to security. It is easy for people to lie on detailed but generic security forms, and they provide little value. To discover if a company is a metaphorical “axe murderer” the two questions above are enough, argues Miessler. He may have a point. Take for example a typical security questionnaire favorite: “does your company use firewalls to safeguard computers from online attacks?” Everyone will answer “yes”. Does that change our knowledge about their likelihood of being hacked? Not one bit. 

Of course, lying on a short questionnaire with Miessler’s 2 questions is not more difficult than lying on a long and detailed questionnaire. Most companies would not admit anything on a questionnaire like this, that is not already publicly known. It is like flying to the US a few years ago where they made you fill out an immigration questionnaire with questions like “are you a terrorist?” and “have you been a guard at a Nazi concentration camp during WWII”. It is thus a good question if we can even just scrap the whole questionnaire. If the vendor you are considering is a software firm, at least if it is a “Software as a Service” or another type of cloud service provider, they are likely to have some generic information about security on their web page. Looking up that will usually be just as informative as any answer to the question above. 

Verdict: Security questionnaires are mostly useless – here I agree with Miessler. I think you can even drop the minimalist axe murderer detection variant, as people who lie on long forms probably lie on short forms too. Perhaps a good middle ground is to first check the website of the vendor for a reasonable security program description, and if you don’t see anything, then you can ask the two questions above as a substitute. 

Stay away from extremely bad practice

Staying away from companies with extremely bad practice is a good idea. Sometimes this is hard to do because business needs a certain service, and all potential providers are horrible at security. But if you have a choice between someone with obviously terrible security habits and someone with a less worrying security posture, this is clearly good advice. Good ways to check for red flags include: 

  • Create a user account and check password policies, reset, etc. Many companies allow you to create free trial accounts, which is good for evaluating security practices as well. 
  • Check if the applications are using outdated practices, poor configuration etc. 
  • Run sslscan to check if they are vulnerable to very old crypto vulnerabilities. This is a good indicator that patching isn’t exactly a priority.

Verdict: obviously a good idea.

Assume the vendor is breached and create a risk assessment

This turns to focus on your own assets and risk exposure. Assuming the vendor is breached is obviously a realistic start. Focusing on how that affects the business and what you can do about it, makes the vendor risk assessment about business risk, instead of technical details that feel irrelevant. 

Miessler recommends: 

  • Understand how the external service integrates into the business
  • Figure out what can go wrong
  • Decide what you can do to mitigate that risk

This is actionable and practical. The first part here is very important, and to a large degree determines how much effort it is worth putting into the vendor assessment. If the vendor will be used for a very limited purpose that does not involve critical data or systems, a breach would probably not have any severe consequences. That seems acceptable without doing much about it. 

On the other hand, what if the vendor is a customer relationship management provider (CRM), that will integrate with your company’s e-commerce solution, payment portal, online banking and accounting systems? A breach of that system could obviously have severe consequences for the company in terms of cost, reputation and legal liabilities. In such a case, modeling what could happen, how one can reduce the risk and assessing whether the residual risk is acceptable would be the next steps.

Shared responsibility – not only in the cloud

Cloud providers talk a lot about the shared responsibility model (AWS version). The responsibility for security of software and data in the cloud is shared between the cloud provider and the cloud customer. They have documentation on what they will take care of, as well as what you as a customer need to secure yourself. For the work that is your responsibility, the cloud provider will typically give you lots of advice on good practices. This is a reasonable model for managing security across organizational interfaces – and one we should adopt with other business relationships too. 

The most mature software vendors will already work like this, they have descriptions of their own security practices that you can read. They also have advice on how you should set up integrations to stay secure. The less mature ones will lack both the transparency and the guidance. 

This does not necessarily mean you should stay away from them (unless they are very bad or using them would increase the risk in unacceptable ways). It means you should work with them to find good risk mitigations across organizational interfaces. Some of the work has to be done by them, some by you. Bringing the shared responsibility for security into contracts across your entire value chain will help grow security maturity in the market as a whole, and benefit everyone. 

Questionnaires are mostly useless – but transparency and shared responsibility is not. 

In Miessler’s vendor security 2.0 post there is a question about what vendor security 3.0 will look like. I think that is when we have transparency and shared responsibility established across our entire value chain. Reaching this cybersecurity Nirvana of resilience will be a long journey – but every journey starts with a first step. That first step is to turn the focus on how you integrate with vendors and how you manage the risk of this integration – and that is a step we can take today. 

How conversations help us grow

We don’t develop alone. As a colleague, and as a leader, there are many ways you can contribute to the growth of others. I would like to share some thoughts on how to create an environment where professionals can thrive, together.

Think now for a moment that you have a one-to-one conversation with one of your team members. You ask the person; “can you describe a situation where you feel you performed really well at work?”. Perhaps there is no answer, so you will need to follow up with a few nudges. For example, you say that you perform best when you have a clear goal, and you know why you have this goal. Then you may ask – do you feel the same? They are probably going to agree that this sounds quite good. This could be a conversation starter about what the ideal state of work is – when do we get to be the best versions of ourselves at work?

Conversations are important to people
Humans interact through language. Good conversations at work are essential for fostering growth.

Here’s a list of some plausible factors that people could come up with:

  • We have a clear vision of what we are trying to achieve, together
  • There is room for my opinions to be heard and valued
  • I can use my competence and personal strengths to drive results that are valued by others
  • The work itself is interesting and challenges me to learn
  • We have the necessary time and resources to build fundamental knowledge and skills
  • I get clear feedback and support from my manager
  • We all make an effort to contribute to the success of others
  • Our team enjoys good work-life balance
  • We have realistic career development opportunities (vertical and horizontal)
  • Ambition is welcome

Your list may look different, but variations around purpose, autonomy and community are typically ingredients of most people’s ideal working environment. Caring about what that means for each individual, is the essence of professional empathy. If your job as a leader is to facilitate results through others, how can you do that?  

Humans are good at spotting flaws. Engineers and analysts are perhaps the most skilled of all at this. This is why it is so easy for us to start with a problem when we want to achieve improvement. I think it is better to start by focusing on personal strengths. If you perform work every day where you feel you are not developing, or that your competence is not needed for the type of work being done, it is no wonder if you feel disengaged after a while. The best way to find out if someone’s strengths are matching the work they do, is to ask them. Have a conversation about strengths, and how to best use those strengths in the work we do, as a starting point. That is a much more positive tone and helps build a sense of having value in the work community, as opposed to the more typical approach of focusing on a GAP assessment of a skills matrix.

Professional development is key to the motivation of any professional. Without it, engagement dies. If the organization has no training budget and going to conferences is riddled with bureaucracy and layers upon layers of approval requests, this is likely to hurt employee retention more than factors such as low compensation or a high workload. Training is valuable to each individual, but of course it brings benefits to the organization too. We all know this. Don’t accept a situation where people cannot get training. It is not fair to the employee, and it is not sustainable for the company.

Learning is not only done in trainings. We should aim to learn every day, as individuals, and as organizations. A lot of people have never thought about all the opportunities to learn that exist as part of the work they do every day. As a manager you can improve the effect of learning from doing the work by making it more explicit. For example, during investigation of a particular security incident, analysts learn about new TTP’s, as well as how to detect and stop them. Or, when creating a new policy, discussing with stakeholders and collecting feedback is a great opportunity to learn about the perspectives of different stakeholders. Common to both cases is that this learning is very often wasted. It remains in short-term memory only and can often only be retrieved again by relearning it the next time a need for this knowledge exists. This is why we need to be explicit about expectations to learn on the job.

Everyone should have some time every week to reflect on what has been learned, and what it means for them in the future, as well as for the team and organization as a whole. If we set aside a fixed number of hours for “skills development”, encouraging employees to spend some of that time reflecting on what they have learned on the job over the last week, is an example of good management. Don’t mandate how people reflect or document what they have learned but sharing ideas on how to do it is a good idea. Some like to write a work journal. Some prefer blogging, some would rather create proof of concept code. Most people have never thought about doing this, or what they prefer, so encourage experimentation.

Some things that people learn on the job are mostly improving individual competencies. But some things are worth sharing, and it is good to challenge existing practices when they are suboptimal. This is how we move forward. Those practices can be policies and guidelines, they can be habits, or they can be ways of using technology. Encourage sharing where sharing is due. Encourage challenging the status quo and improving the way things are done. Continuous improvement is not a result of a management standard or policy, it is the result of culture. We need to make it happen. As a leader you should visibly share knowledge, visibly challenge practices, and encourage others to do so too. When people see that you are doing it, and not only talking about it, the message becomes much more powerful. A good place to start inviting such contributions is to take a page from lean management and ask: “what is something we spend time on today that we could stop doing without any harm to the organization or our department?”

Of course, our hypothetical bullet point list of a great working environment that will help us perform at our best, is not only about learning and training. Another important aspect here is relationships at work. This is what we can think of as “work community”. A leader is a catalyst for work community; not necessarily the driver of it but the leader helps the organization choose healthy pathways to build community. From our bullet points, the desire to have room to be have opinions heard and valued, packs a lot in one sentence. What has to be in place for us to have such a situation? We definitely need a certain level of psychological safety, so that people don’t feel threatened of ridicule or being ignored when they raise their voice. We can achieve a sense of psychological safety when we can trust that our surroundings have our best interest in mind. The people we surround us with want us to succeed. At the same time, we must accept disagreement and honesty. We should not expect any idea to be accepted at face value, we should expect, even demand, that every idea is challenged. But it should be challenged constructively, respectfully, and without any implication of us thinking less of the person bringing the idea to the table. Bringing a bad idea to the table is infinitely better than not bringing any ideas to the table. A culture of silence is the place where creativity goes to die. So, what can you do to foster this ideal state where people love to contribute and really feel that their contributions mean something to the department, and to the organization?

One thing you can do to instill trust, is to be vulnerable. Put yourself at risk by sharing your ideas with your team and ask them for feedback. Not the type of feedback often given to managers, such as “OK” or “looks good to me”. Ask for concrete feedback on “what do you like about this suggestion?”, “what do you dislike about it?”, “why do you think so?”, “how can we improve it?”. Let people see that you don’t have all the answers. If the case you are trying to improve is difficult, let people know you think it is difficult. Taking away the notion that you have to know everything is helpful for reducing imposter syndrome.

Empathy is key to trust. We cannot expect to have the same kind or relationship with everyone on the team, or to reduce relationship management to a bullet point list, but we can seek to have valuable and trusting relationships with everyone on the team. To build healthy relationships that foster trust, investing time in working together and in having conversations about both work and life itself, is time well spent. Listen actively in conversations, and care about the ambitions and wants of the other person, as well as the organization. Active listening is a skill worth practicing every day.

Another thing you can do is to think about how you balance relationships versus results.

What have you done lately to support the personal ambitions and career plans of your team members? For example, if one of your the team members has a personal dream of publishing a novel, how would you think about that in terms of your manager-employee relationship? Is it irrelevant to work, should you discourage such ambitious personal plans due to fear of their thoughts being spent on non-work-related projects, or should you support it and help them balance those ambitions with responsibilities and ambitions at work? I know what I think is the best choice, but your view may be different. It is worth thinking about.

And that brings me to the end of this post, thinking. Leadership is difficult. People are complex, and there are so many things that influence how we behave and think. This is why leaders also need support structures. You will have doubts, and you will have seemingly intractable judgments to make. Having a mentor is helpful, someone who can empathize with you as a leader, someone who knows to ask good questions and help you reason. Supporting each other in the leadership team is essential; share your management practices, your doubts, and how that difficult conversation went (while respecting the privacy of your team members, as appropriate). If you want to develop as a leader, I highly recommend finding a good mentor. Good mentors elevate your thinking.

A letter to the manager

This is a letter to all managers out there. If you are being paid to manage other people, this one is for you.

Leadership is like baking. It has a lot of ingredients and care means more than measurements.

I bet there is friction in your team. There is friction in all teams, and some of it is healthy. But when it turns into a chronic condition, relentless, abrasive, never taking a break – then you have a problem. And it may very well be that you and your organization is at fault for creating this unhealthy and unproductive environment. For many workers, work no longer feel inspiring and rewarding. Instead, colleagues feel tired, and many feel disengaged at work. This is a big problem. Disengagement is the arch enemy of excellence. And we would all like to be considered centers of excellence, wouldn’t we?

Perhaps there is a narrow focus on performance management through reporting and key performance indicators. This approach resonates well with most engineers and accountants; what is measured gets managed. There is no doubt that we need to measure performance. How else would we know if we are moving in the right direction? And perhaps that is the core of the disengagement problem. Because who knows what future state are we trying to move towards? If there is a lack of a shared and compelling vision, it is hard for people to know what matters, and what is just noise.

Performance management is a double-edged sword. It has downsides that managers need to be aware of and watch closely to avoid the negative effects of management to overtake the good effects. A very high focus on key performance indicators tend to bring out some side effects such as a lack of involvement, tunnel vision and can also exacerbate short-termism. All of this together tends to create disengagement, which again would drive the real key performance indicators in the wrong direction. Successful managers know how to balance focus on results and relationships. Managing based on measurements alone will tip the balance of focus heavily towards results over relationships, but without healthy relationships we cannot reliably drive results over time.

Let us first consider how measurements can help us drive result in a complex system such as a big organization, and then return to how we tie achievement to key management practices.

About measurements

Measurements are critical. But how do we know if what we measure, and the results we infer from our KPI’s, indicate progress? Managing an organization is an optimization problem. To know whether we succeed or not, we need to know what we are aiming for. In mathematical optimization this is called the objective function – a mathematical function that we seek to minimize, typically under a set of constraints. In management, we typically rely on a vision statement to guide our actions. The KPI’s we live and manage by, should have a clear connection to that vision. Without this connection, it is hard to tell whether a change in the KPI is good or bad, or if such a change is important, or merely a weak improvement of the whole system. To make these connections, we need to apply systems thinking. Systems thinking means an approach where we look at the internal and external interactions of a system and try to understand how our actions push this system from one state to another. Is that new state taking us closer to our desired state, as described in our vision?

Let us go back to our mathematical optimization problem as an analogy of what we are trying to do. Let’s say we have a mathematical model describing “the system”. This model describes the interactions internally in the system, as well as how the system responds to external events that we have no control over, and actions we take on purpose to drive our systems towards that optimal state, where an objective function is minimized. This is a very difficult problem; how can we make the best decisions about inputs we can control (let’s call them u), to optimize the state of a system when there is considerable uncertainty (let’s call such signals that we cannot control d).

In most cases we are also not able to observe every state of the system. There are features of our complex system we cannot see. In some cases, we may infer what they are, but very often we have limited observability of the internal state. This is also true of organizations and management; there will always be internal factors we have no way of observing.

When we make decisions about what to do next, we need to rely on things we can see. These are measurement variables, y. This information can be used to drive our system towards our ideal state, but all information is not equally important. Sometimes two different measurements can also give us in essence the same information. Mathematically speaking we say that the measurements are highly correlated. This means that for solving our mathematical optimization problem, it is not arbitrary which measurement variables we use to drive our decisions. We should carefully select measurements that give us the best ability to approach our optimal state or minimizing our objective function. This is the same for management of an organization; we should pick the KPI’s that will help us the most in moving in the direction of our vision.

The actions we take can be viewed as inputs to our system, whether they are variables in a mathematical optimization problem, or actions and tasks to focus on in an organization. Say we have decided some key performance indicators we would like to drive to some target values. We need to choose our actions for doing this. We will typically have many candidates for actions to take, but not all of them are equally effective. We have two decision problems to solve; which knob should I turn, and what value should I set it to? We also have another issue to keep in mind. While turning a certain knob may drive a property of our system in the desired direction as measured by one specific KPI, what if it makes the situation worse as measured by another KPI? Our optimization problem is much more difficult to solve if there is significant interaction between the internal states we change through our inputs. We should thus aim to decouple the input-output structure of our system. We would like to use inputs (actions) that do not cause conflicting outcomes as measured by different outputs (i.e., our KPI’s). This is not always possible, but we should be aware of the possibility of conflicting interactions and strive for more decoupling in the measurements we use.

So, if we now can agree that it is important to carefully select KPI’s, do we have any heuristics or rules that can help us do that? Luckily, we do. This has been extensively studied both from a mathematical point of view, and from a management theory point of view. It is a good thing that the general conclusions from different research areas do align well with each other.

  • Select KPI’s that are tightly coupled to the objective function so that a change in the KPI would indicate a change in the closeness to our ideal state
  • Select KPI’s that have optimums that are close to invariant under noise and disturbances. This means that if we have small errors in the measurement of our KPI, or external conditions change slightly, we are still operating close to the ideal point of operation.
  • Select KPI’s that are not strongly correlated with each other as they would not together provide more information about the internal state of the system than one alone would
  • Do not select more KPI’s than you have inputs to manipulate. This is because we cannot independently change more outputs, than we have inputs available.

If we pull this knowledge into the context of managing an organization, we can make some immediate observations. First, it will be very hard to select good KPI’s unless we know where we are heading. We need a clear vision for the organization. This is our objective function. Let us try to define a few possible “visions” to see how they would affect our KPI selection problem.

  1. Our vision is to make the CTO happy with the technology department
  2. Our vision is to enable the organization to provide services our customers love
  3. Our vision is to replace all humans in the company with robots maintained by others

These examples are of course contrived but they are made to illustrate that what we want to achieve will heavily influence what we measure, and how we work towards that ideal state. Let us take the first suggestion – our vision is to make the CTO happy with the technology department. Perhaps the deeper motivation for such a vision could be to secure bonuses for ourselves and our friends, or because we are uncertain about management’s ability to see value in what we do so we would like to keep the CTO happy for the sake of our own job security. Of course, none of these are admirable motives but let us pretend this is the case for a moment and see how we would seek to optimize that problem.

The CTO is happy when:

  • We do not ask questions but execute desires from top management quickly
  • We report numbers that make the CTO look good to other executives
  • We buy products and services from vendors the CTO has a tight relationship with

Our KPI’s should then be on speed of implementation, reporting progress through measurements that are easy to make change a lot but does not necessarily create competitive advantage for the company. Perhaps should a KPI also be number of LinkedIn contacts of the CTO associated with each vendor we choose. Obviously – this would be absurd. We are optimizing for the wrong objective function! We see that this type of opportunism is not only suboptimal, it is bordering on corruption.

If, on the other hand, we want to maximize our customer’s love of the services delivered by our organization, we would likely select other KPI’s. When would customers like our products more than those from our competitors?

  • Our products do not have a lot of vulnerabilities and can be trusted
  • Our products are reliable and exceed the expectations the customers have
  • Our risk mitigations are designed to stop harm to our customers
  • Our marketing messages make our customers feel good about our offerings
  • Our products and services are easy to use

Say that this is what we believe underpins making the vision of “most loved supplier” reality. What should we measure to help drive results? We need to make sure our products are trustworthy and reliable – so using quality and security metrics will make sense. We need to make sure our products exceed expectations; meaning we need to watch closely the feedback from customers and the market. We need to make our products very easy to use – measuring user behavior to see if actual use of our products match what we intended would be an important part of making up the full picture.

A lot of this cannot be achieved internally by one department or division alone. We need to sell this approach to the entire organization, from top management to marketing and sales, to engineering. Our sphere of influence needs to expand to make our vision reality. Selling does not necessarily come natural to our team members, so focusing on driving activity before driving results can be a reasonable approach. One way to do this is to look at time spent on working with other units to make sure we do not fall into the internal focus trap. So where the manager obsessed with output based KPI’s would see internal socialization as wasted time, the more relationship aware manager understands that this underpins the creation of business value.

Further, as we expect our team members to “sell our vision” to the organization, people will need support, not just performance push. We will get back to that.

The point of this is, we should not try to measure all the things possible, we need to prioritize, and track KPI’s that align closely with our vision for the future. And to do that, we must first define that vision clearly. It must be shared by everyone, understood, and felt to be “right”. To be effective it must align with our values, and it must align with the values of the organization. In that set of values, we find innovation and agility. A practice that causes dissonance between the values we identify with, and our daily work, leads to frustration. And that has unfortunately become very common, and perhaps it has gotten even worse after COVID due to less strategic focus and involvement?

Creating excellence through people

Leadership is about creating results through others. We cannot do that through one-sided focus on “productivity”. It does not matter if you do a lot of things, if those are not the right things to be done, or if the things we do are not done very well. A top-down management approach will often lead us into doing things without putting our hearts in it, without considering if they are the right things to do, if the measured numbers and reports are produced. That is an illusion of effectiveness.

An approach to leadership that seeks to balance organizational performance and human development is “situational leadership”. This term stems from work done in the 1970’s by academics, and has developed significantly since, but the main take-aways are:

  • Not every situation is most effectively managed with the same style of leadership
  • For long-term organizational performance we need to balance our focus on tasks and relationships

According to this leadership theory, a good leader develops “the competence and commitment of their people so they’re self-motivated rather than dependent on others for direction and guidance”.

It should be clear that an over-focus on task performance will run counter to this principle and can easily lead to micromanagement. Micro management is warranted when competence is very low but enthusiasm to learn is high, but in knowledge organizations primarily employing university graduates this is rarely the situation at hand. Micromanagement in knowledge organizations is counterproductive.

So what should a good leader do?

Ken Blanchard is one of the originators of situational leadership theory, and he has written many books in a semi-fictional style. His most well-known book from the 1980’s is a quick read called “The One-Minute Manager”. It is still a good read about management, for learning about motivation and driving human excellence. In this book he introduces the concept of the serving leader, with the acronym SERVE serving as a reminder of key management practices. The practices are summarized as follows:

  • See the future
  • Engage and develop others
  • Reinvent continuously
  • Value results and relationships
  • Embody the values of the organization

See the future: develop a compelling shared vision of the future

This is the precursor to strategy. How can we plan what actions to take if the direction is unclear? How can we expect people to pull in the same direction, if they have no shared model of what an ideal future looks like? Therefore, creating a vision needs to be a collaborative experience. It is also necessary that the responsibility for articulating a vision for a business unit, lies clearly with the top leader of that unit.

A good vision, whether for a team or an organization should consider the core values of the organization. The values say something about what the organization sees as important, valuable, worth striving for. All organizations have values, whether articulated or not. If they are not articulated, or they are simply “dormant” – somebody defined them, but they are not widely known or reflected upon, they provide no guidance. Start with the values.

An effective vision sets a clear direction. It describes a future ideal state, somewhere we want to go. That state must be compelling to the team, and something everyone agrees that we would like to achieve.

Having a compelling and shared vision makes everything easier. Prioritizing what is important becomes easier. Motivating both oneself and others is much easier. Seeing if the fruit of our work moves us closer to where we want to be, becomes easier. It is a common saying that visibility is important.

Engage and develop others

To accomplish something great together we need to learn, as an organization, and as individuals. Leaders must support development of people, and of good practice. How do we develop people, so that they feel that work is rewarding, and improve their competence in a way that supports the organization in reaching its goals as well? The first thing we need to do is to acknowledge that development and optimization requires time, trust, acknowledgement, support, and effort.

Excellence does not come from task performance alone, although much can be learned “on the job” as well. A good approach to competence management requires the ability to think about systems. An individual alone is complex, a system. A team adds more complexity, not to speak of a large organization, or our entire market. Even society as a whole is relevant to our development. We need to consider systemic effects if we are going to effectively engage and develop others. That means that we must consider if our result focus is interfering with our ability to drive positive development. We need to align our performance management efforts with our competence goals.

Human performance requires motivation. A large part of “engage and develop others” is thus related to motivational leadership. Research in competence management has taught us about many factors that contribute to the motivation of people at work. Key influencing factors are:

  • Task motivation: a desire to solve the problem at hand, intrinsic motivation for the work itself. This is a state we should strive for.
  • Confidence in own competence: the individual’s self-esteem as it relates to competence and knowledge at work and in a group
  • Perceived autonomy: ability and acceptance of independent influence and decision making
  • Perceived use of own competence: that the work to be done requires the skills and abilities of each person to be actively used
  • Clear expectations: a clear understanding of what is expected of output, behaviors and social interaction from colleagues, leaders, and other relationships
  • Time and resources for competence development and training
  • A culture of excellence: where everyone expects the best of everyone, and provides support to achieve that
  • Usefulness of the work – a desire to help the wider organization achieve its goals (again pointing back to the vision)

Leaders play a crucial role in optimizing the environment around the factors above. This can be done through organizational design (who do we hire), how we work together, how we select and work on tasks, how we coach and support one another, how we share our own knowledge, and how we provide feedback to each other.

This is very hard to do unless we trust each other and know each other more personally than what particular job skills we have or what we can read from a CV. The only way to foster that trust is to care deeply about other people, to care about their success in terms of what is important to them, as well as to care about their value and contributions to the social group at work as a whole.

Culture eats strategy for breakfast is an old saying, and it holds a lot of truth.

Reinvent continuously

We will not achieve our vision in a vacuum. We are exposed to both internal and external competitive pressures. Competition for resources, for relevance, and market forces that decide whether our desired future state is still the right goalpost to aim for. To be successful in moving into our ideal future, even when clouded by uncertainty, we must innovate. Without innovation, the competitive pressures will crush us (external threat) and our internal performance will dwindle due to destruction of motivation and achievability of our goal. Hence, innovation must be on every leader’s agenda.

To reinvent you need to learn. Therefore, every leader should make it a practice to learn new things. Not only about the topic of the work, such as information security for example, or about leadership itself. Leaders should learn about the things that matter to society, to the supply chain, to the organization, and to individuals. A lot of this learning can come from fiction, from cultural experiences and from hobbies. It is through the way we interact with the world we learn to understand the world. That means that to drive effective innovation, we should not be workaholics. System thinking requires system understanding, and that understanding cannot come entirely from an inside perspective.

Innovation means change. We do something new, and we take risks. Innovation means doing things we don’t know will work. If we want others to innovate, to drive practice forward, we need leaders who are brave. Failure must be expected, perhaps even celebrated if we learn from it. Failure is always seen as risky by people in an organization due to perceived expectations being successful, efficient, productive. It is important for leaders to show willingness to take risks, try new things, and fail in a transparent way that others can see when things do not go the way we want.

There are many ways to reinvent or innovate. It can happen at the individual level, as a group in a natural, non-directed way, or as a managed project. It is also important to make innovations visible, no matter what type of innovation we are talking about.

Reinvention can be about processes. It can also be about technologies, products. We should always work to improve our processes and ways of working. This means that people must be able to voice their opinions, as well as to experiment. If we talk about trying new ways of doing things, challenging each other’s thinking along the way, we improve the odds of success. To make this reality, it is important that we create a culture where people will speak their minds, and where those who make decisions think about the suggestions and concerns raised. Involvement only works when it is authentic. Experimentation takes time. If someone wants to try something new, discuss and agree on how much “extra time” is OK to spend on experimentation to drive things forward. Maximize time spent on driving creativity, efforts to create and test, and make evaluation easy. Innovation work is where agile shines, working software above extensive documentation. Or demonstration by “doing” above extensive KPI’s.

Value relationships and results

Results matter. But it is through our relationships we create our best results. Relationships drive improvement, innovation, motivation, and quality.

As a leader, take time to build strong relationships with others. Not only with your own leaders, or with your direct reports. Those are important, but so are other people. Those who use the work produced by your unit. Those who need to support your unit in creating results. For example, for an information security team, it is often necessary to get help from the IT helpdesk in handling security incidents. If you as a leader have a strong relationship with the leader of the helpdesk team, and some of the key helpdesk members, their willingness to help and make a real effort when the security team needs help, will be much higher. The same goes for the relationships between your team members, and people who work in adjacent teams that we interact with. Value your people’s efforts to build relationships within the unit, in the organization, and even externally.  Even if their day-to-day work is not about external contact to vendors or customers. Every employee is a brand ambassador, and a strong brand drives results across the whole organization, even in business support functions.

As a leader, you should try to encourage and support people’s efforts in building relationships. One can provide arenas such as cross-functional knowledge sharing, or break activities. One can think strategically on how we engage with other units through the work we do and choose ways of working that makes it easier to build relationships to other people. Those relationships create trust, and trust is the parent of collaboration. This way – relationships help us drive performance. They create results.

Valuing results is also very important. This often comes more natural to an organization driven by measurements and reporting. Showing acknowledgement of results help us improve motivation, trigger ideas for improvement, and further create a need for more collaboration. Through that result focus creates a need for relationship management.

  • Celebrate all wins – big and small
  • When things go wrong – appreciate what can be learned. That is a result too.
  • Evaluate results based on outcome, expectation, handling of challenges and effort.
  • We should value the way a result was achieved as much as the result itself.

Embody the values of the organization

Authenticity is key to trust. The actions of an organizations leaders is very visible to that leader’s direct reports, but also to others. A leader who acts in a way that does not harmonize with the organization’s values does not support achieving the vision.

Unauthenticity will drive mistrust. Nobody is willing to go beyond the bare minimum to follow a leader who acts as if he or she does not actually believe in the vision, in the agreed values. This boils down to “walk the way you talk”. If you talk about agility, but opt for micromanagement, this creates dissonance. If you say you want to empower people to innovate but discourage taking risks, little innovation will occur. Authenticity matters. This means not only trying to behave in accordance with the values of the organization superficially, but actively working to bring the system forward just as you expect others to.

Do you want people to innovate? Then you must innovate. Do you want people to share your vision? Then you must invite participation in its creation and how to articulate it. Do you want people to learn and develop? Then you must learn and develop. There is no better way to portray authenticity than letting people see the things you do. Actions reinforce words.

To embody the values of the organization is not only about the actions you take, but also about the expectations you set. If we want to build excellence, we should not tolerate long-term underperformance. But more importantly, we should not tolerate systematic behaviors that go contrary to our values. When underperformance manifests itself, or behaviors that go contrary to our vision, to our stated values, show up repeated, we must act.

In a culture where tasks are valued above relationships, where measurements count more than progress, underperformance is often met with punishment. No bonus, lower salary adjustments. Or firing the individual. While such measures have their place, they should not be the start of improvement. For a situation where people act differently than we would expect with a set vision, with our defined values, we must ask ourselves what the cause of this behavior is. For a leader the first question should be “is there something in the way I lead that would make people believe those undesired behaviors are tolerated, or even encouraged?”. Sometimes our actions have unintended consequences when interpreted by others.

The next question we should ask is if there are misaligned incentives driving the behaviors we see. Do we reward results in a way that practically force people to take shortcuts or actions we do not actually want to make our measurements hit target? This type of opportunism will often manifest itself when motivation is entirely extrinsic, and there is a mismatch in the interests between the agent (the employee) and the principal (the leader, or the organization).

If we want to identify the cause of the performance slip, or the non-productive behaviors, we can only achieve this through dialog. You as a leader must have a conversation with the person displaying these behaviors. This is a great opportunity for situational leadership. What approach is appropriate and effective in the current situation? Is it a directive style, where you tell the other person what to do? Is it a coaching and participating style, where you support self-reflection to enable the desired change? Warnings and disciplinary actions tend to be an extreme variant of directive leadership style, and if the lack of harmony with expected behavioral standards this can be necessary. We are then often talking about serious violations of norms, or code of conduct. Most often this is not the case, and a very directive approach can be counterproductive, especially if there is not a high level of trust already in the relationship between you and the person you are trying to help change his or her ways. The conclusion of this is that leadership is complex and more about people than it is about measurements. Using the SERVE principle as a guideline for how you think about leadership can be very helpful as it helps you balance focus between driving results and creating strong relationships to underpin the results.

Who supports the leader?

Being a leader can feel very lonely. That is not a good situation and is completely unnecessary. Leaders need support structures. Sometimes you will need to think about complex dilemmas, involving people you care about. Leaders must often make trade-offs between conflicting goals, desires and needs. To do this effectively we need support from those around us. The organization should provide some of that support, through leadership training, mentorship, management systems and through contact with other managers. Your own line manager should be available for discussing such issues. It can also be a very good idea to have a strong mentor to help you reflect on challenging situations.

You should pull necessary support from many sources. Leaders often try to portray themselves as someone with the answer to every question. They often keep the dilemmas hidden and deliver directives for execution. This can easily lead to micromanagement and suboptimal solutions. In many cases you can share the dilemma and have your people help sort out what should be done next instead of presenting them with a directive to execute. Remember – people have been hired for their talents, not as cogs in a wheel.

Another source of support is your friends and family. That support does not have to be “task related”. Simply taking time to have a good life and feel appreciated will make you a better leader. That helps you create results, both on your own, and through others.

Value work-life balance for yourself, and others. Long-term growth depends on it.

The take-away

  • It is your job to make sure there is a compelling vision articulated, shared by everyone
  • Hire the right people and support their development – professionally and as individuals
  • Improve things every day – innovation applies to processes, products and who we involve
  • Appreciate and support relationships at work, and make networking part of what you do
  • Live by the values you and your organization believe in. Be authentic, and build trust.
  • Take care of your mental and physical health – and help others do the same. This is work-life balance in practice.

Application security in Django projects

This is a quick blog post on how to remove some typical vulnerabilities in Django projects.

coffee shop interior
Even a coffee shop visitor registration app needs to take app security into account

The key aspects we are looking at are:

  • Threat modeling: thinking through what attackers could do
  • Secrets management with dotenv
  • Writing unit tests based on a threat model
  • Checking your dependencies with safety
  • Running static analysis with bandit

Threat model

The app we will use as an example here is a visitor registration app to help restaurants and bars with COVID-19 tracing. The app has the following key users:

  • SaaS administrator: access to funn administration of the app for multiple customers (restaurants and bars)
  • Location administrator: access to visitor lists for individual location
  • Personal user: register visits at participating locations, view their own visit history and control privacy settings
  • Unregistered user: register visits at participating locations, persistent browser session lasting 14 days

The source code for this app is available here: https://github.com/hakdo/besokslogg/.

We use the keywords of the STRIDE method to come up with quick attack scenarios and testable controls. Note that most of these controls will only be testable with custom tests for the application logic.

Attack typeScenarioTestable controls
SpoofingAttacker guesses passwordPassword strength requirement by Django (OK – framework code)

Lockout after 10 wrong consecutive attempts (need to implement in own code) (UNIT)
Tampering
RepudiationAttacker can claim not to have downloaded CSV file of all user visits.CSV export generates log that is not readable with the application user (UNIT)
Information disclosureAttacker abusing lack of access control to gain access to visitor list

Attacker steals cookie with MitM attack

Attacker steals cookie in XSS attack
Test that visitor lists cannot be accessed from view without being logged in as the dedicated service account. (UNIT)

Test that cookies are set with secure flag. (UNIT OR STATIC)

Test that cookies are set with HTTPOnly flag. (UNIT or STATIC)

Test that there are no unsafe injections in templates (STATIC)
Denial of serviceAttacker finds a parameter injection that crashes the applicationCheck that invalid parameters lead to a handled exception (cookies, form inputs, url parameters)
Elevation of privilegeAttacker gains SaaS administrator access through phishing.Check that SaaS administrator login requires a safe MFA pattern (UNIT or MANUAL)
Simple threat model for contact tracing app

Secrets management

Django projects get a lot of their settings from a settings.py file. This file includes sensitive information by default, such as a SECRET_KEY used to generate session cookies or sign web tokens, email configurations and so on. Obviously we don’t want to leak this information. Using python-dotenv is a practical way to deal with this. This package allows you to include a .env file with your secrets as environment variables, and then to include then into settings using os.getenv(‘name_of_variable’). This way the settings.py file will not contain any secrets. Remember to add your .env file to .gitignore to avoid pushing it to a repository. In addition, you should use different values for your development and production environment of all secrets.

from dotenv import load_dotenv
load_dotenv()

SECRET_KEY = os.environ.get('SECRET_KEY')

In the code snippet above, we see that SECRET_KEY is no longer exposed. Use the same technique for email server configuration and other sensitive data.

When deploying to production you need to set the environment variables in that environment using a suitable and secure manner to do it. You should avoid storing configurations in files on the server.

Unit tests

As we saw in the threat model, the typical way to fix a security issue is very similar to the typical way you would fix a bug.

  1. Identify the problem
  2. Identify a control that solves the problem
  3. Define a test case
  4. Implement the test and develop the control

In the visitor registration app, an issue we want to avoid is leaking visitor lists for a location. A control that avoids this is an authorisation check in the view that shows the visitor list. Here’s that code.

@login_required()
def visitorlist(request):
    alertmsg = ''
    try:
        thislocation = Location.objects.filter(service_account = request.user)[0]
        if thislocation:
            visits = Visit.objects.filter(location = thislocation).order_by('-arrival')
            chkdate = request.GET.get("chkdate", "")
            if chkdate:
                mydate = datetime.datetime.strptime(chkdate, "%Y-%m-%d")
                endtime = mydate + datetime.timedelta(days=1)
                visits = Visit.objects.filter(location = thislocation, arrival__gte=mydate, arrival__lte=endtime).order_by('-arrival')
                alertmsg = "Viser besøkende for " + mydate.strftime("%d.%m.%Y")
            return render(request, 'visitor/visitorlist.html', {'visits': visits, 'alertmsg': alertmsg})
    except:
        print('Visitor list failed - wrong service account or no service account')
        return redirect('logout')

Here we see that we first require the user to be logged in to visit this view, and then on line 5 we check to see if we have a location where the currently logged in user is registered as a service account. A service account in this app is what we called a “location administrator” in our role descriptions in the beginning of our blog post. It seems our code already implements the required security controls, but to prove that and to make sure we detect it if someone changes that code, we need to write a unit test.

We have written a test where we have 3 users created in the test suite.

class VisitorListAuthorizationTest(TestCase):

    def setUp(self):
        # Create three users
        user1= User.objects.create(username="user1", password="donkeykong2016")
        user2= User.objects.create(username="user2", password="donkeykong2017")
        user3= User.objects.create(username="user3", password="donkeykong2018")
        user1.save()
        user2.save()

        # Create two locations with assigned service accounts
        location1 = Location.objects.create(service_account=user1)
        location2 = Location.objects.create(service_account=user2)
        location1.save()
        location2.save()
    
    def test_return_code_for_user3_on_visitorlist_is_301(self):
        # Authenticate as user 3
        self.client.login(username='user3', password='donkeykong2018')
        response = self.client.get('/visitorlist/')
        self.assertTrue(response.status_code == 301)
    
    def test_redirect_url_for_user3_on_visitorlist_is_login(self):
        # Authenticate as user 3
        self.client.login(username='user3', password='donkeykong2018')
        response = self.client.get('/visitorlist/', follow=True)
        self.assertRedirects(response, '/login/?next=/visitorlist/', 301)
    
    def test_http_response_is_200_on_user1_get_visitorlist(self):
        self.client.login(username='user1', password='donkeykong2016')
        response = self.client.get('/visitorlist/', follow=True)
        self.assertEqual(response.status_code, 200)

Here we are testing that user3 (which is not assigned as “service account” for any location) will be redirected when visiting the /visitorlist/ url.

We are also testing that the security functionality does not break the user story success for the authorized user, user1, who is assigned as service account for location1.

Here we have checked that the wrong user cannot access the URL without getting redirected, and that it works for the allowed user. If someone changes the logic so that the ownership check is skipped, this test will break. If on the other hand, someone changes the URL configuration so that /visitorlist/ no longer points to this view, the test may or may not break. So being careful about changing the inputs required in tests is important.

Vulnerable open source libraries

According to companies selling scanner solutions for open source libraries, it is one of the most common security problems that people are using vulnerable and outdated libraries. It is definitely easy to get vulnerabilities this way, and as dependencies can be hard to trace manually, having a tool to do so is good. For Python the package safety is a good open source alternative to commercial tools. It is based on the NVD (National Vulnerability Database) from NIST. The database is run by pyup.io and is updated every month (free), or you can pay to get updates faster. If you have a high-stakes app it may pay off to go with a commercial option or to write your own dependency checker.

Running it is as easy as

safety check -r requirements.txt

This will check the dependencies in requirements.txt for known vulnerabilities and give a simple output in the terminal. It can be built into CI/CD pipelines too, as it can export vulnerabilities in multiple formats and also give exit status that can be used in automation.

Static analysis with bandit

Static analysis can check for known anti-patterns in code. A popular choice for looking for vulnerabilities in Python code is bandit. It will test for hardcoded passwords, weak crypto and many other things. It will not catch business logic flaws or architectural bad choices but it is a good help for avoiding pitfalls. Make sure you avoid scanning your virtual environment and tests, unless you want a very long report. Scanning your current project is simple:

bandit -r .

To avoid scanning certain paths, create a .bandit file with defined excludes:

[bandit]
exclude: ./venv/,./*/tests.py

This file will exclude the virtual environment in /venv and all files called “tests.py” in all subfolders of the project directory.

A false positive

Bandit doesn’t know the context of the methods and patterns you use. One of the rules it has is to check if you are using the module random in your code. This is a module is a standard Python modules but it is not cryptographically secure. In other words, creating hashing functions or generating certificates based on random numbers generated by it is a bad idea as crypto analysis could create realistic attacks on the products of such generators. Using the random module for non-security purposes on the other hand, is convenient and unproblematic. Our visitor log app does this, and then bandit tells us we did something naughty:

Test results:
        Issue: [B311:blacklist] Standard pseudo-random generators are not suitable for security/cryptographic purposes.
        Severity: Low   Confidence: High
        Location: ./visitor/views.py:59
        More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b311-random
     58          alco = ''
     59          valcode =  ''.join(random.choice(string.ascii_lowercase) for i in range(6)) 
     60          errmsg = ''    
 

What we see here is the check for cryptographically insecure random numbers. We are using it just to verify that a user knows what they are doing; deleting their own account. The app generates a 6-letter random code that the user has to repeat in text box to delete their account and all associated data. This is not security critical. We can then add a comment # nosec to the line in question, and the scanner will not report on this error again.

Things we will not catch

A static analyser will give you false positives, and there will be dangerous patterns it does not have tests for. There will be things in our threat model we have overlooked, and therefore missing security controls and test requirements. Open source libraries can have vulnerabilities that are not yet in the database used by our scanner tool. Such libraries can also in themselves be malicious by design, or because they have been compromised, and our checks will not catch that. Perhaps fuzzing could, but not always. In spite of this, simple tools like writing a few unit tests, and running some scanners, can remove a lot of weaknesses in an application. Building these into a solid CI/CD pipeline will take you a long way towards “secure by default”.

Commercial VPN’s: the Twitter security awareness flamewar edition

A lot of people worry about information security, and perhaps rightly so. We are steadily plagued by ransomware, data breaches, phishing attacks and password stealers; being reminded of good security habits regularly is generally a good thing. Normally, this does not result on angry people. Except for on the Internet of course, and perhaps in particular on Twitter, the platform made for online rage.

Being angry on the Internet: does a VPN help?

Here’s a recent Tweet from infosec awareness blogger John Opdenakker (you can read his blog here https://johnopdenakker.com):

If you click this one you will get some responses, including some harsh ones:

And another one. Felt like an attack, perhaps it was an attack?

So far the disagreement is not quite clear, just that some people obviously think VPN’s are of little use for privacy and security (and I tend to agree). There are of course nicer ways of stating such opinions. I even tried to meddle, hopefully in a somewhat less tense voice. Or maybe not?

This didn’t really end too well, I guess this was the end of it (not directed at me but at @desdotdev.

This is not a very good way to discuss something. My 2 cents here, beyond “be nice to each other”, was really just a link to this quite good argument why commercial VPN’s are mostly not very useful (except if you want to bypass geoblocking or hide your ip from the websites you visit):

A link to a more sound discussion of the VPN debacle

Risks and VPN marketing

For a good writeup on VPN’s not making you secure, I suggest you read the gist above. Of course, everything depends on everything, and in particular your threat model does. If you fear that evil hackers are sitting on your open WiFi network and looking at all your web traffic to non-https sites, sure, a VPN will protect you. But most sites use HTTPS, and if it is a bank or something similar they will also use HSTS (which makes sure the initial connection is safe too). So what are the typical risks of the coffee shop visiting internet browsing person?

  • Email: malware and phishing emails trying to trick you into sharing too much information or installing malware
  • Magecart infected online shopping venues
  • Shoulder surfers reading your love letters from the chair behind you
  • Someone stealing your phone or laptop while you are trying to fetch that cortado
  • Online bullying threatening your mental health while discussing security awareness on Twitter
  • Secret Chinese agents spying on your dance moves on TikTok

Does a VPN help here? No, it doesn’t. It encrypts the traffic between your computer, and a computer controlled by the VPN company. Such companies typically register in countries with little oversight. Usually the argument is “to avoid having to deliver any data to law enforcement” and besides “we don’t keep logs of anything”. Just completely by coincidence the same countries tend to be tax havens that allows you to hide corporate owner structures as well. Very handy. So, instead of trusting your ISP, you set up a tunnel to a computer entirely controlled by a company owned by someone you don’t know, in a jurisdiction that allows them to do so without much oversight, where they promise not to log anything. I am not sure this is a win for privacy or security. And it doesn’t help against China watching your TikTok videos or a Magecart gang stealing your credit card information on your favourite online store.

One of the more popular VPN providers is ExpressVPN. They provide a 10-step security test, which asks mostly useful questions about security habits (although telling random web pages your preferred messaging app, search engine and browser may not be the best idea) – and it also asks you “do you use a VPN”. If you answer “no” – here’s their security advice for you:

ExpressVPN marketing: do you use a VPN?

It is true that it will make it hard to snoop on you on an open wireless network. But this is not in most people’s threat models – not really. The big problems are usually those in our bullet point list above. ExpressVPN is perhaps one of the least scare-mongering VPN sellers, and even they try to scare you into “but security/privacy anxiety” buying their product. The arguments about getting around geoblocking and hiding your ip from the websites you visit are OK – if you have a need to do that. Most people don’t.

When VPN’s tell you to buy their service to stay safe online, they are addressing a very narrow online risk driver – that is negligible in most people’s threat models.

So what should I do when browsing at a coffee shop?

If you worry about the network itself, a VPN may be a solution to that, provided you trust the VPN itself. You could run your own VPN with a cloud provider if you want to and like to do technical stuff. Or, you could just use your phone to connect to the internet if you have a reasonable data plan. I would rather trust a regulated cell provider than an unregulated anonymous corporation in the Caribbean.

Email, viruses and such: be careful with links and attachments, run endpoint security and keep your computer fully up to date. This takes you a long way, and a VPN does not help at all!

Magecart: this one can be hard to spot, use a credit card when shopping online, and check your statements carefully every month. If your bank provides a virtual card with one-time credit card numbers that is even better. Does a VPN help? No.

Theft of phones, laptops and coffee mugs? Keep an eye on your stuff. Does a VPN help? Nope.

Online bullying? Harder to fight this one but don’t let them get to you. Perhaps John is onto something here? If you feel harassed, use the block button 🙂

Secret Chinese agents on TikTok? No solution there, except not showing your dance moves on TikTok. Don’t overshare. Does a VPN help? Probably not.

Protecting the web with a solid content security policy

We have been used to securing web pages with security headers to fend off cross-site scripting attacks, clickjacking attacks and data theft. Many of these headers are now being deprecated and browser may no longer respect these header settings. Instead, we should be using content security policies to reduce the risk to our web content and its users.

Protect your web resources and your users with Content Security Policy headers!

CSP’s are universally supported, and also allows reporting of policy violations, which can aid in detecting hacking attempts.
Mozilla Developer Network has great documentation on the use of CSP’s: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy.

CSP by example

We want to make it even easier to understand how CSP’s can be used, so we have made some demonstrations for the most common directives we should be using. Let us first start with setting the following header:

Content-Security-Policy: default-src ‘self’;

We have created a simple Flask application to demonstrate this. Here’s the view function:

A simple view function setting a CSP header.

Here we are rendering a template “index.html”, and we have set the default-src directive of the CSP to ‘self’. This is a “fallback” directive in case you do not specify other directives for key resources. Here’s what this does to JavaScript and clickjacking, when other directives are missing:

  • Blocks inline JavaScript (that is, anything inside tags, onclick=… on buttons, etc) and JavaScript coming from other domains.
  • Blocks media resources from other domains, including images
  • Blocks stylesheets from external domains, as well as inline style tags (unless explicitly allowed)

Blocking untrusted scripts: XSS

Of course, you can set the default-src to allow those things, and many sites do, but then the protection provided by the directive will be less secure. A lot of legacy web pages have mixed HTML and Javascript in <script> tags or inline event handlers. Such sites often set default-src: ‘self’ ‘unsafe-inline’; to allow such behaviour, but then it will not help protect against common injection attacks. Consider first the difference between no CSP, and the following CSP:

Content-Security-Policy: default-src: ‘self’;

We have implemented this in a route in our Python web app:

Adding the header will help stop XSS attacks.

Let us first try the following url: /xss/safe/hello: the result is injected into the HTML through the Jinja template. It is using the “safe” filter in the template, so the output is not escaped in any way.

Showing that a URL parameter is reflected on the page. This may be XSS vulnerable (it is).

We see here that the word “hello” is reflected on the page. Trying with a typical cross-site-scripting payload: shows us that this page is vulnerable (which we know since there is no sanitation):

No alert box: the CSP directive blocks it!

We did not get an alert box here, saying “XSS”. The application itself is vulnerable, but the browser stopped the event from happening due to our Content-Security-Policy with the default-src directive set to self, and no script-src directive allowing unsafe inline scripts. Opening the dev tools in Safari shows us a bunch of error messages in the console:

Error messages in the browser console (open dev tools to find this).

The first message shows that the lack of nonce or unsafe-inline blocked execution. This is done by the web browser (Safari).

Further, we see that Safari activates its internal XSS auditor and detects my payload. This is not related to CSP’s, and is internal Safari behavior: it activates its XSS auditor unless there is an X-XSS-Protection header asking to explicitly disable XSS protection. This is Safari-specific and should not be assumed as a default. The X-XSS-Protection header is a security header that has been used in Internet Explorer, Chrome and Safari but it is currently be deprecated. Edge has removed its XSS Auditor, and Firefox has not implemented this header. Use Content Security Policies instead.

What if I need to allow inline scripts?

The correct way to allow inline JavaScript is to include the nonce directive (nonce = number used once) or use a hash of the inline script. These values should then rather be placed in the script-src directive than in the default-src one. For more details on how to do this, see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Security-Policy/script-src#Unsafe_inline_script.

Let’s do an example of an unsafe inline script in our template, using a nonce to allow the inline script. Here’s our code:

Example code showing use of nonce.

Remember to make the nonce unguessable by using a long random number, and make sure to regenerate it each time the CSP is sent to the client – if not, you are not providing much of security protection.

Nonces are only good if they can’t be guessed, and that they are truely used only once.

Here we have one script with a nonce included, and one that does not have it included. The nonce’d script will create an alert box, and the script without the nonce tries to set the inner HTML of the paragraph with id “blocked” to “Hello there”. The alert box will be created but the update of the “blocked” paragraph will be blocked by the CSP.

Here’s the HTML template:

A template with two inline scripts. One with an inserted nonce value, one without. Which one will run?

The result is as expected:

Only the nonce’d script will run 🙂

Conclusion: Use CSP’s for protecting against cross-site scripting (XSS) – but keep sanitising as well: defence in depth.

What about clickjacking?

good explanation of clickjacking and how to defend against it is available from Portswigger: https://portswigger.net/web-security/clickjacking.

Here’s a demo of how clickjacking can work using to “hot” domains of today: who.int and zoom.us (the latter is not vulnerable to clickjacking).

Demo of Clickjacking!

Here’s how to stop that from happening. Add the frame-ancestors directive, and whitelist domains you want to be able of iframing your web page.

Content-Security-Policy: default-src: 'self'; frame-ancestors: 'self' 'youtube.com';

Summary

Protecting against common client-side attacks such as XSS and clickjacking can be done using the Content Security Policy header. This should be part of a defense in depth strategy but it is an effective addition to your security controls. As with all controls that can block content, make sure you test thoroughly before you push it to production!

Is COVID-19 killing your motivation?

The COVID-19 pandemic is taking its toll on all of us. One thing is staying at home, another is the thought of all the things that can go wrong. The virus is very infectious, and is likely to kill a lot of people over the next year. The actions we take, and need to take, to curb the damages of the spreading illness is taking freedoms we take for granted away from us. No more travel, no parties, not even a beer with coworkers. For many of us, even work is gone. No wonder motivation is taking a hit! How can we deal with this situation collectively and individually to make the best out of a difficult situation?

When news are mostly about counting our dead, it can be easy to lose faith in humanity

The virus is not only a risk to our health, it is also a risk to our financial well-being, and the social fabric of our lives. The actions taken to limit the spread of the virus and the load it will have on our healthcare systems, is taking its toll on our social lives, and perhaps also our mental health. It is probably a good idea to think through how important aspects of life will be affected, and what you can do to minimize the risk, and what you should prepare to do if bad consequences do materialize.

TopicRisksThings to do
FinanceJob loss

Real estate value loss
Minimize expenses and build a buffer of money
Ask bank for deferral of principal payments
Plan to negotiate if collateral for mortgage is no longer accepted due to real estate market collapse
Physical healthInfected by COVID-19Supplies in storage at home in case of isolation

Space to isolate to avoid infecting other family members
Mental healthFeeling of isolation

Depression
Avoid “crazy news cycles” and negative feedback on social media

Talk to friends regularly, not just coworkers

Get fresh air and some excercise every day

Have a contact ready for telemedicine, e.g. check if your insurance company offers this
Work Loss of visibility

Degradation of quality

Collaboration problems
Set up daily video calls with closes team members

Make results visible in digital channels

Practice active listening
Example individual risk assessment for COVID-19 life impact

News, social media and fake news

The news cycle is a negative spiral of death counts, stock market crashes and experts preaching the end of the world. While it is useful, and important, to know what the situation is to make reasonable decisions, it is not useful to watch negative news around the clock. It is probably a good idea to batch how much one should take in of the news during a crisis, for example to morning and afternoon news.

Social media tend to paint an even worse picture; taking the news cycle and twisting it into something more extreme. My Twitter feed is now full of people arguing we should go for full communism and introduce death penalties for people allowing children to play outside. It is OK to watch stuff like that a short while for entertainment, but it can easily turn into a force of negative influence and perhaps it would be better to take a break from that? Use filters to stay away from hashtags that bring you down without bringing anything useful.

DevSecOps: Embedded security in agile development

The way we write, deploy and maintain software has changed greatly over the years, from waterfall to agile, from monoliths to microservices, from the basement server room to the cloud. Yet, many organizations haven’t changed their security engineering practices – leading to vulnerabilities, data breaches and lots of unpleasantness. This blog post is a summary of my thoughts on how security should be integrated from user story through coding and testing and up and away into the cyber clouds. I’ve developed my thinking around this as my work in the area has moved from industrial control systems and safety critical software to cloud native applications in the “internet economy”.

What is the source of a vulnerability?

At the outset of this discussion, let’s clarify two common terms, as they are used by me. In very unacademic terms:

  • Vulnerability: a flaw in the way a system is designed and operated, that allows an adversary to perform actions that are not intended to be available by the system owner.
  • A threat: actions performed on an asset in the system by an adversary in order to achieve an outcome that he or she is not supposed to be able to do.

The primary objective of security engineering is to stop adversaries from being able to achieve their evil deeds. Most often, evilness is possible because of system flaws. How these flaws end up in the system, is important to understand when we want to make life harder for the adversary. Vulnerabilities are flaws, but not all flaws are vulnerabilities. Fortunately, quality management helps reduce defects whether they can be exploited by evil hackers or not. Let’s look at three types of vulnerabilities we should work to abolish:

  • Bugs: coding errors, implementation flaws. The design and architecture is sound, but the implementation is not. A typical example of this is a SQL injection vulnerability in a web app.
  • Design flaws: errors in architecture and how the system is planned to work. A flawed plan that is implemented perfectly can be very vulnerable. A typical example of this is a broken authorization scheme.
  • Operational flaws: the system makes it hard for users to do things correctly, making it easier to trick privileged users to perform actions they should not. An example would be a confusing permission system, where an adversary uses social engineering of customer support to gain privilege escalation.

Security touchpoints in a DevOps lifecycle

Traditionally there has been a lot of discussion on a secure development lifecycle. But our concern is removing vulnerabilities from the system as a whole, so we should follow the system from infancy through operations. The following touchpoints do not make up a blueprint, it is an overview of security aspects in different system phases.

  • Dev and test environment:
    • Dev environment helpers
    • Pipeline security automation
    • CI/CD security configuration
    • Metrics and build acceptance
    • Rigor vs agility
  • User roles and stories
    • Rights management
  • Architecture: data flow diagram
    • Threat modeling
    • Mitigation planning
    • Validation requirements
  • Sprint planning
    • User story reviews
    • Threat model refinement
    • Security validation testing
  • Coding
    • Secure coding practices
    • Logging for detection
    • Abuse case injection
  • Pipeline security testing
    • Dependency checks
    • Static analysis
    • Mitigation testing
      • Unit and integration testing
      • Detectability
    • Dynamic analysis
    • Build configuration auditing
  • Security debt management
    • Vulnerability prioritization
    • Workload planning
    • Compatibility blockers
  • Runtime monitoring
    • Feedback from ops
    • Production vulnerability identification
    • Hot fixes are normal
    • Incident response feedback

Dev environment aspects

If an adversary takes control of the development environment, he or she can likely inject malicious code in a project. Securing that environment becomes important. The first principle should be: do not use production data, configurations or servers in development. Make sure those are properly separated.

The developer workstation should also be properly hardened, as should any cloud accounts used during development, such as Github, or a cloud based build pipeline. Two-factor auth, patching, no working on admin accounts, encrypt network traffic.

The CI/CD pipeline should be configured securely. No hard-coded secrets, limit who can access them. Control who can change the build config.

During early phases of a project it is tempting to be relaxed with testing, dependency vulnerabilities and so on. This can quickly turn into technical debt – first in one service, then in many, and at the end there is no way to refinance your security debt at lower interest rates. Technical debt compounds like credit card debt – so manage it carefully from the beginning. To help with this, create acceptable build thresholds, and a policy on lifetime of accepted poor metrics. Take metrics from testing tools and let them guide: complexity, code coverage, number of vulnerabilities with CVSS above X, etc. Don’t select too many KPI’s, but don’t allow the ones you track to slip.

One could argue that strict policies and acceptance criteria will hurt agility and slow a project down. Truth is that lack of rigor will come back to bite us, but at the same time too much will indeed slow us down or even turn our agility into a stale bureaucracy. Finding the right balance is important, and this should be informed by context. A system processing large amounts of sensitive personal information requires more formalism and governance than a system where a breach would have less severe consequences. One size does not fit all.

User roles and stories

Most systems have diffent types of users with different needs – and different access rights. Hackers love developers who don’t plan in terms of user roles and stories – the things each user would need to do with the system, because lack of planning often leads to much more liberal permissions “just in case”. User roles and stories should thus be a primary security tool. Consider a simple app for approval of travel expenses in a company. This app has two primary user types:

  • Travelling salesmen who need reimbursements
  • Bosses who will approve or reject reimbursement claims

In addition to this, someone must be able of adding and removing users, granting access to the right travelling salesmen for a given boss, etc. The system also needs an Administrator, with other words.

Let’s take the travelling salesman and look at “user stories” that this role would generate:

  • I need to enter my expenses into a report
  • I need to attach documentation such as receipts to this report
  • I need to be able of sending the report to the boss for approval
  • I want to see the approval status of my expense report
  • I need to recieve a notification if my report is not approved
  • I need to be able of correcting any mistakes based on the rejection

Based on this, it is clear that the permissions of the “travelling salesman” role only needs to give write access to some operations, for data relating to this specific user, and needs read rights on the status of the approval. This goes directly into our authorization concept for the app, and already here generates testable security annotations:

  • A travelling salesman should not be able to read the expense report of another travelling salesman
  • A travellign salesman should not be able of approving expense reports, including his own

These negative unit tests could already go into the design as “security annotations” for the user stories.

In addition to user stories, we have abusers and abuse stories. This is about the type of adversaries, and what they would like to do, that we don’t want them to be able of achieving. Let’s take as an example a hacker hired by a competitor to perform industrial espionage. We have the adversary role “industrial espionage”. Here are some abuse cases we can define that relate to motivation of a player rather than technical vulnerabilities:

  • I want to access all travel reports to map where the sales personnel of the firm are going to see clients
  • I want to see the financial data approved to gauge the size of their travel budget, which would give me information on the size of their operation
  • I’d like to find names of people from their clients they have taken out to dinner, so we know who they are talking to at potential client companies
  • I’d like to get user names and personal data that allow med to gauge if some of the employees could be recurited as insiders or poached to come work for us instead

How is this hypothetical information useful for someone designing an app to use for expense reporting? By knowing the motivations of the adversaries we can better gauge the credibility that a certain type of vulnerability will be attempted exploited. Remember: Vulnerabilities are not the same as threats – and we have limited resources, so the vulnerabilities that would help attackers achieve their goals are more important to remove than those that cannot easily help the adversary.

Vulnerabilities are not the same as threats – and we have limited resources, so the vulnerabilities that would help attackers achieve their goals are more important to remove than those that cannot easily help the adversary.

Architecture and data flow diagrams

Coming back to the sources of vulnerabilities, we want to avoid vulnerabilities of three kinds; software bugs, software design flaws, and flaws in operating procedures. Bugs are implementation errors, and the way we try to avoid them is by managing competence, workload and stress level, and by use of automated security testing such as static analysis and similar tools. Experience from software reliability engineering shows that about 50% of software flaws are implementation erorrs – the rest would then be design flaws. These are designs and architectures that do not implement the intentions of the designer. Static analysis cannot help us here, because there may be no coding errors such as lack of exception handling or lack of input validation – it is just the concept that is wrong; for example giving a user role too many privileges, or allowing a component to talk to a component it shouldn’t have access to. A good tool for identificaiton of such design flaws is threat modeling based on a data flow diagram. Make a diagram of the software data flow, break it down into components on a reasonable level, and consider how an adversary could attack each component and what could be the impact of this. By going through an excercise like this, you will likely identify potential vulnerabilities and weaknesses that you need to handle. The mitigations you introduce may be various security controls – such as blocking internet access for a server that only needs to be available on the internal network. The next question then is – how do you validate that your controls work? Do you order a penetration test form a consulting company? That could work, but it doesn’t scale very well, you want this to work in your pipeline. The primary tools to turn to is unit and integration testing.

We will not discuss the techniques for threat modeling in this post, but there are different techniques that can be applied. Keep it practical, don’t dive too deep into the details – it is better to start with a higher level view on things, and rather refine it as the design is matured. Here are some methods that can be applied in software threat modeling:

Often a STRIDE-like approach is a good start, and for the worst case scenarios it can be worthwhile diving into more detail with attack trees. An attack tree is a fault tree applied to adversarial modeling.

After the key threats have been identified, it is time to plan how to deal with that risk. We should apply the defense-in-depth principle, and remeber that a single security control is usually not enough to stop all attacks – because we do not know what all possible attack patterns are. When we have come up with mitigations for the threats we worry about, we need to validate that they actually work. This validation should happen at the lowest possible level – unit tests, integration tests. It is a good idea for the developer to run his or her own tests, but these validations definitely must live in the build pipeline.

Let’s consider a two-factor authentication flow using SMS-based two-factor authentication. This is the authentication for an application used by politicians, and there are skilled threat actors who would like to gain access to individual accounts.

A simple data flow diagram for a 2FA flow

Here’s how the authentication process work:

  • User connects to the domain and gets an single-page application loaded in the browser with a login form with username and password
  • The user enters credentials, that are sent as a post request to the API server, which validates it with stored credentials (hashed in a safe way) in a database. The API server only accepts requests from the right domain, and the DB server is not internet accessible.
  • When the correct credentials have been added, the SPA updates with a 2fa challenge, and the API server sends a post request to a third-party SMS gateway, which sends the token to the user’s cell phone.
  • The user enters the code, and if valid, is authenticated. A JWT is returned to the browser and stored in localstorage.

Let’s put on the dark hat and consider how we can take over this process.

  1. SIM card swapping combined wiht a phishing email to capture the credentials
  2. SIM card swapping combined with keylogger malware for password capture
  3. Phishing capturing both password and the second factor from a spoofed login page, and reusing credentials immediately
  4. Create an evil browser extension and trick the user to install it using social engineering. Use the browser extension to steal the token.
  5. Compromise a dependency used by the application’s frontend, to allow man-in-the-browser attacks that can steal the JWT after login.
  6. Compromise a dependency used in the API to give direct access to the API server and the database
  7. Compromise the 3rd party SMS gateway to capture credentials, use password captured with phishing or some other technique
  8. Exploit a vulnerability in the API to bypass authentication, either in a dependency or in the code itself.

As we see, the threat is the adversary getting access to a user account. There are many attack patterns that could be used, and only one of them involves only the code written in the application. If we are going to start planning mitigations here, we could first get rid of the two first problems by not using SMS for two-factor authenticaiton but rather relying on an authenticator app, like Google Authenticator. Test: no requests to the SMS gateway.

Phishing: avoid direct post requests from a phishing domain to the API server by only allowing CORS requests from our own domain. Send a verification email when a login is detected from an unknown machine. Tests: check that CORS from other domains fail, and check that an email is sent when a new login occurs.

Browser extensions: capture metadata/fingerprint data and detect token reuse across multiple machines. Test: same token in different browsers/machines should lead to detection and logout.

Compromised dependencies is a particularly difficult attack vector to deal with as the vulnerability is typically unknown. This is in practice a zero-day. For token theft, the mitigation of using meta-data is valid. In addition it is good practice to have a process for acceptance of third-party libraries beyond checking for “known vulnerabilities”. Compromise of the third-party SMS gateway is also difficult to deal with in the software project, but should be part of a supply chain risk management program – but this problem is solved by removing the third-party.

Exploit a vulnerability in the app’s API: perform static analysis and dependency analysis to minimize known vulnerabilities. Test: no high-risk vulnerabilities detected with static analysis or dependency checks.

We see that in spite of having many risk reduction controls in place, we do not cover everything that we know, and there are guaranteed to be attack vectors in use that we do not know about.

Sprint planning – keeping the threat model alive

Sometimes “secure development” methodologies receive criticims for “being slow”. Too much analysis, the sprint stops, productivity drops. This is obviously not good, so the question is rather “how can we make security a natural part of the sprint”? One answer to that, at least a partial one, is to have a threat model based on the overall architecture. When it is time for sprint planning, there are three essential pieces that should be revisited:

  • The user stories or story points we are addressing; do they introduce threats or points of attack not already accounted for?
  • Is the threat model we created still representative for what we are planning to implement? Take a look at the data flow diagram and see if anything has changed – if it has, evaluate if the threat model needs to be updated too.
  • Finally: for the threats relevant to the issues in the sprint backlog, do we have validation for the planned security controls?

Simply discussing these three issues would often be enough to see if there are more “known uknowns” that we need to take care of, and will allow us to update the backlog and test plan with the appropriate annotations and issues.

Coding: the mother of bugs after the design flaws have been agreed upon

The threat modeling as discussed above has as its main purpose to uncover “design flaws”. While writing code, it is perfectly possible to implement a flawed plan in a flawless manner. That is why we should really invest a lot of effort in creating a plan that makes sense. The other half of vulnerabilities are bugs – coding errors. As long as people are still writing code, and not some very smart AI, errors in code will be related to human factors – or human error, as it is popularly called. This often points the finger of blame at a single individual (the developer), but since none of us are working in vacuum, there are many factors that influence these bugs. Let us try to classify these errors (leaning heavily on human factors research) – broadly there are 3 classes of human error:

  • Slips: errors made due to lack of attention, a mishap. Think of this like a typo; you know how to spell a word but you make a small mistake, perhaps because your mind is elsewhere or because the keyboard you are typing on is unfamiliar.
  • Competence gaps: you don’t really know how to do the thing you are trying to do, and this lack of knowledge and practice leads you to make the wrong choice. Think of an inexperienced vehicle driver on a slippery road in the dark of the night.
  • Malicious error injection: an insider writes bad code on purpose to hurt the company – for example because he or she is being blackmailed.

Let’s leave the evil programmer aside and focus on how to minimize bugs that are created due to other factors. Starting with “slips” – which factors would influence us to make such errors? Here are some:

  • Not enough practice to make the action to take “natural”
  • High levels of stress
  • Lack of sleep
  • Task overload: too many things going on at once
  • Outside disturbances (noise, people talking to you about other things)

It is not obvious that the typical open office plan favored by IT firms is the optimal layout for programmers. Workload management, work-life balance and physical working environment are important factors for avoiding such “random bugs” – and therefore also important for the security of your software.

These are mostly “trying to do the right thing but doing it wrong” type of errors. Let’s now turn to the lack of competence side of the equation. Developers have often been trained in complex problem solving – but not necessarily in protecting software from abuse. Secure coding practices, such as how to avoid SQL injection, why you need output escaping and similar types of practical application secuity knowledge, is often not gained by studying computer science. It is also likely that a more self-taught individual would have skipped over such challenges, as the natural focus is on “solving the problem at hand”. This is why a secure coding practice must deliberately be created within an organization, and training and resources provided to teams to make it work. A good baseline should include:

  • How to protect aginst OWASP Top 10 type vulnerabilities
  • Secrets management: how to protect secrets in development and production
  • Detectability of cyber threats: application logging practices

An organization with a plan for this and appropriate training to make sure everyone’s on the same page, will stand a much better chance of avoiding the “competence gap” type errors.

Security testing in the build pipeline

OK, so you have planned your software, created a threat model, commited code. The CI/CD build pipeline triggers. What’s there to stop bad code from reaching your production environment? Let’s consider the potential locations of exploitable bugs in our product:

  • My code
  • The libraries used in that code
  • The environment where my software runs (typically a container in today’s world)

Obviously, if we are trying to push something with known critical errors in either of those locations to production, our pipeline should not accept that. Starting with our own code, a standard test that can uncover many bugs is “static analysis”. Depending on the rules you use, this can be a very good security control but it has limitations. Typically it will find a hardcoded password written as

var password = 'very_secret_password";

but it may not find this password if it isn’t a little bit smart:

var tempstring = 'something_that_may_be_just_a_string";

and yet it may throw an alert on

var password = getsecret();

just because the word “password” is in there. So using the right rules, and tuning them, is important to make this work. Static analysis should be a minimum test to always include.

The next part is our dependencies. Using libraries with known vulnerabilities is a common problem that makes life easy for the adversary. This is why you should always scan the code for external libraries and check if there are known vulnerabilitie sin them. Commercial vendors of such tools often refer to it as “software component analysis”. The primary function is to list all dependencies, check them against databases of known vulnerabilities, and create alerts accordingly. And break the build process based on threshold limits.

Also the enviornment we run on should be secure. When building a container image, make sure it does not contain known vulnerabilities. Using a scanner tool for this is also a good idea.

While static analysis is primarily a build step, testing for known vulnerabilities whether in code libraries or in the environment, should be done regulary to avoid vulnerabilities discovered after the code is deployed from remaining in production over time. Testing the inventory of dependencies against a database of known vulnerabiltiies regulary would be an effective control for this type of risk.

If a library or a dependency in the environment has been injected with malicious code in the supply chain, a simple scan will not identify it. Supply chain risk management is required to keep this type of threat under control, and there are no known trustworthy methods of automatically identifying maliciously injected code in third-party dependencies in the pipeline. One principle that should be followed with respect to this type of threat, however, is minimization of the attack surface. Avoid very deep dependency trees – like an NPM project 25000 dependencies made by 21000 different contributors. Trusting 21000 strangers in your project can be a hard sell.

Another test that should preferably be part of the pipeline, is dynamic testing where actual payloads are tested against injection points. This will typically uncover other vulnerabilities than static analysis will and is thus a good addition. Note that active scanning can take down infrastructure or cause unforeseen errors, so it is a good idea to test against a staging/test environment, and not against production infrastructure.

Finally – we have the tests that will validate the mitigations identified during threat modeling. Unit tests and integration tests for securtiy controls should be added to the pipeline.

Modern environments are usually defined in YAML files (or other types of config files), not by technicians drawing cables. The benefit of this, is that the configuration can be easily tested. It is therefore a good idea to create acceptance tests for your Dockerfiles, Helm charts and other configuration files, to avoid an insider from altering it, or by mistake setting things up to be vulnerable.

Security debt has a high interest rate

Technical debt is a curious beast: if you fail to address it it will compound and likely ruin your project. The worst kind is security debt: whereas not fixing performance issues, removing dead code and so on compunds like a credit card from your bank, leaving vulnerabilities in the code compunds like interest on money you lent from Raymond Reddington. Manage your debt, or you will go out of business based on a ransomware compaign followed by a GDPR fine and some interesting media coverage…

You need to plan for time to pay off your technical debt, in particular your securiyt debt.

Say you want to plan using a certain percentage of your time in a sprint on fixing technical debt, how do you choose which issues to take? I suggest you create a simple prioritization system:

  • Exposed before internal
  • Easy to exploit before hard
  • High impact before low impact

But no matter what method you use to prioritize, the most important thing is that you work on getting rid of known vulnerbilities as part of “business-as-usual”. To avoid going bankrupt due to overwhelming technical debt. Or being hacked.

Sometimes the action you need to take to get rid of a security hole can create other problems. Like installing an update that is not compatible with your code. When this is the case, you may need to spend more resources on it than a “normal” vulnerability because you need to do code rewrites – and that refactoring may also need you to update your threat model and risk mitigations.

Operations: your code on the battle field

In production your code is exposed to its users, and in part it may also be exposed to the internet as a whole. Dealing with feedback from this jungle should be seen as a key part of your vulnerability management program.

First of all, you will get access to logs and feedback from operations, whether it is performance related, bug detections or security incidents. It is important that you feed this into your issue management system and deal with it throughout sprints. Sometimes you may even have a critical situation requiring you to push a “hotfix” – a change to the code as fast as possible. The good thing about a good pipeline is that your hotfix will still go through basic security testing. Hopefully, your agile security process and your CI/CD pipeline is now working so well in symbiosis that it doesn’t slow your hotfix down. In other words: the “hotfix” you are pushing is just a code commit like all others – you are pushing to production several times a day, so how would this be any different?

Another aspect is feedback from incident response. There are two levels of incident response feedback that we should consider:

  1. Incident containment/eradication leading to hotfixes.
  2. Security improvements from the lessons learned stage of incident response

The first part we have already considered. The second part could be improvements to detections, better logging, etc. These should go into the product backlog and be handled during the normal sprints. Don’t let lessons learned end up as a PowerPoint given to a manager – a real lesson learned ends up as a change in your code, your environment, your documentation, or in the incident response procedures themselves.

Key takeaways

This was a long post, here are the key practices to take away from it!

  • Remember that vulnerabilities come from poor operational practices, flaws in design/architecture, and from bugs (implementation errors). Linting only helps with bugs.
  • Use threat modeling to identity operational and design weaknesses
  • All errors are human errors. A good working environment helps reduce vulnerabilities (see performance shaping factors).
  • Validate mitigations using unit tests and integration tests.
  • Test your code in your pipeline.
  • Pay off technical debt religiously.

Two-factor auth for your Node project, without tears

  • How 2fa works with TOTP
  • Code
  • Adding bells and whistles
  • How secure is OTP really?

How 2fa works with TOTP

TOTP is short for time-based one-time password. This is a commonly used method for a second factor in two-factor authentication. The normal login flow for the user is then: first log in with username and password. Then you are presented with a second login form, where you have to enter a one-time password. Typically you will have an app like Google Authenticator or Microsoft Authenticator that will provide you with a one-time code that you can enter. If you have the right code, you will be authenticated and you gain access.

How does the web service know what the right one-time code is?

Consider a web application using two-factor authentication. The user has Google Authenticator on his or her phone to provide one-time passwords – but how does the app know what to compare with?

That comes from the setup: there is a pre-shared secret that is used to generate the tokens based on the current time. These tokens are valid for a limited time (typically 30, 60 or 120 seconds). The time here is “Unix time” – the number of seconds since midnight 1 January 1970. TOTP is a special case for a counter-based token created with the HOTP algorithm (HOTP = HMAC based one-time password). Both of these are described in lengthy detail in RFC documents, and the one for TOTP is RFC 6238. The main point is: both token generator (phone) and validator (web server) needs to know the current unix time and they need a pre-shared secret. Then the token can be calculated a function of the time and this secret.

A one-time password used for two-factor authentication is a function of the current time and a pre-shared key.

Details in RFC 6238
The basics of multi-factor authentication: something you know + something you have

How do I get this into my app?

Thanks to open source libraries, creating a 2fa flow for your app is not hard. Here’s an example made on Glitch for a NodeJS app: https://aerial-reward.glitch.me/login.

The source code for the example is available here: https://glitch.com/edit/#!/aerial-reward. We will go through the main steps in the code to make it easy to understand what we are doing.

Step 1: Choose a library for 2FA.

Open source libraries are great – but they also come with a risk. They may contain vulnerabilities or backdoors. Doing some due diligence up front is probably a good idea. In this case we chose speakeasy because it is popular and well-documented, and running npm audit does not show any vulnerabilities for the library although it hasn’t been updated in 4 years.

Step 2: Activate MFA using a QR code for the user

We assume you have created a user database, and that you have implemented username- and password based login (in a safe way). Now to the MFA part – how can we share the pre-shared secret with the device used to generate the token? This is what we use a QR code for. The user will then log in, be directed to a profile page where “Activate MFA” is an option. Clicking this link shows a QR code that can be scanned with an authenticator app. This shares the pre-shared key with the app. Hence: the QR code is sensitive data, so it should only be available when setting up the app, and not be stored permanently. The user should also be authenticated in order to see the QR code (using username and password).

In our example app, here’s the route for activating MFA after logging in.

app.get('/mfa/activate', (req, res) => {
  if (req.session.isauth) {
    var secret = speakeasy.generateSecret({name: 'Aerial Reward Demo'})
    req.session.mfasecret_temp = secret.base32;
    QRCode.toDataURL(secret.otpauth_url, function(err, data_url) {
      if (err) {
        res.render('profile', {uname: req.session.username, mfa: req.session.mfa, qrcode: '', msg: 'Could not get MFA QR code.', showqr: true})
      } else {
        console.log(data_url);
        // Display this data URL to the user in an <img> tag
        res.render('profile', {uname: req.session.username, mfa: req.session.mfa, qrcode: data_url, msg: 'Please scan with your authenticator app', showqr: true}) 
      }
    })
  } else {
    res.redirect('/login')
  }
})

What this does is the following:

  • Check that the user is authenticated using a session variable (set on login)
  • Create a temporary secret and store as a session variable, using the speakeasy library. This is our pre-shared key. We won’t store it in the user profile before having verified that the setup worked, to avoid locking out the user.
  • Generate a QRCode with the secret. To do this, you need to use a qrcode library, and we used the one qrcode, which seems to do the job OK. The speakeasy library generates an otpauth_url that can be used in the QR code. This otpauth_url contains the pre-shared secret.
  • Finally we are rending a template (the profile page for the user) and supplying the QR code data url to a template (res.render).

For rendering this to the end user we are using a Pug template.

html
  head
    title Login
    link(rel="stylesheet" href="/style.css")
  body
    a(href="/logout") Log out
    br
    h1 Profile for #{uname}
    p MFA: #{mfa}
    unless mfa
      br
      a(href="/mfa/activate") Activate multi-factor authentication
    if showqr
      p= msg
      img(src=qrcode)
      p  When you have added the code to your app, verify that it works here to activate.
      a(href="/mfa/verify") VERIFY MFA CODE
    if mfa
      img(src="https://media.giphy.com/media/81xwEHX23zhvy/giphy.gif")
      p Security is important. Thank you for using MFA!

The QR code is shown in the profile when the right route is used and the user is not already using MFA. This presents the user with a QR code to scan, and then he or she will need to enter a correct OTP code to verify that the setup works. Then we will save the TOTP secret in the user profile.

How it looks for the user

The profile page for the user “safecontrols” with the QR code embedding the secret.

Scanning the QR code on an authenticator app (many to choose from, FreeOTP from Red Hat is a good alternative), gives you OTP tokens. Now the user needs to verify by entering the OTP. Clicking the link “VERIFY MFA CODE” to do this brings up the challenge. Entering the code verifies that you have your phone. When setting things up, the verification will store the secret “permanently” in your user profile.

How do I verify the token then?

We created a route to verify OTP’s. The behavior depends on whether MFA has been set up yet or not.

app.post('/mfa/verify', (req, res) => {
  // Check that the user is authenticated
  var otp = req.body.otp
  if (req.session.isauth && req.session.mfasecret_temp) {
    // OK, move on to verify 2fa activation
    var verified = speakeasy.totp.verifyDelta({
      secret: req.session.mfasecret_temp,
      encoding: 'base32',
      token: otp,
      window: 6
    })
    console.log('verified', verified)
    console.log(req.session.mfasecret_temp)
    console.log(otp)
    if (verified) {
      db.get('users').find({uname: req.session.username}).assign({mfasecret: req.session.mfasecret_temp}).write()
      req.session.mfa = true
      req.session.mfarequired = true
      res.redirect('/profile')
    } else {
      console.log('OTP verification failed during activation')
      res.redirect('/profile')
    }
  } else if (req.session.mfarequired) {
    // OK, normal verification
    console.log('MFA is required for user ', req.session.username)
    var verified = speakeasy.totp.verifyDelta({
      secret: req.session.mfasecret,
      encoding: 'base32',
      token: otp,
      window: 6
    })
    console.log(verified)
    if (verified) {
      req.session.mfa = true
      res.redirect('/profile')  
    } else {
      // we are pretty harsh, thrown out after one try
      req.session.destroy(() => {
        res.redirect('/login')
      })
    }
  } else {
    // Not a valid 2fa challenge situation
    console.log('User is not properly authenticated')
    res.redirect('/')
  }
})

The first path is for the situation where MFA has not yet been set up (this is the activation step). This is checked that the user is authenticated and that there is a temporary secret stored in a session variable. This happens when the user clicks the “VERIFY…” link on the profile page after scanning the QR code, so this session variable will not be available in other cases.

The second path checks if there is a session variable mfarequired set to true. This happens when the user authenticates, if an MFA secret has been stored in the user profile.

The verification itself is done the speakeasy library functions. Note that you can use speakeasy.totp.verify (Boolean) or speakeasy.totp.verifyDelta (gives a time delta). The former did not work for some reason, whereas the Delta version did, which is the only reason for this choice in this app.

How secure is this then?

Nothing is unhackable, and this is no exception to that rule. The security of the OTP flow depends on your settings, as well as other defense mechanisms. How can hackers bypass this?

  • Stealing tokens (man-in-the-middle or stealing the phone)
  • Phishing with fast use of tokens
  • Brute-forcing codes has been reported as a possible attack on OTP’s but this depends on configuration

These are real attacks that can happen, so how to protect against them?

  • Always use https. Never transfer tokens over insecure connections. This protects against man-in-the-middle.
  • Phishing: this is more difficult, if someone obtains your password and a valid token and can use them on the real page before the token expires, they will get in. Using meta-data to calculate a risk-score can help: sign-in from new device requires confirmation by clicking a link sent in email, force password reset after x failed logins, etc. None of that is implemented here. That being said, OTP-based 2FA protects against most phishing attacks – but if you are a high-value target for organized crime of professional spies you probably should think about more secure patterns. Alternatives include push notifications or hardware tokens that avoid typing something into a form.
  • Brute-force: trying many OTP’s until you get it right is possible if the “window” is too long for when a code is considered valid, and you are not logged out after trying 1 or more wrong codes. In the code above the window parameter is set to 6, which is very long and potentially insecure, but the user is logged out if the OTP challenge fails, so brute-force is still not possible.