CCSK Domain 2: Governance and Enterprise Risk Management

Governance and risk management principles remain the same, but there are changes to the risk picture as well as available controls in the cloud. We need in particular take into account the following:

  • Cloud risk trade-offs and tools
  • Effects of service and deployment models
  • Risk management in the cloud
  • Tools of cloud governance

A key aspect to remember when deploying services or data to the cloud is that even if security controls are delegated to a third-party, the responsibility for corporate governance cannot be delegated; it remains within the cloud consumer organization.

Cloud providers aim to streamline and standardize their offerings as much as possible to achieve economies of scale. This is different from a dedicated third-party provider where contractual terms can often be negotiated. This means that governance frameworks should not treat cloud providers with the same approach as those dedicated service providers allowing for custom governance structures to be agreed on.

Responsibilities and mechanisms for governance is regulated in the contract. If a governance need is not described in the contract, there exists a governance gap. This does not mean that the provider should be excluded directly, but it does mean that the consumer should consider how that governance gap can be closed.

Moving to the cloud transfers a lot of the governance and risk management from technical controls to contractual controls.

Cloud governance tools

The key tools of governance in the cloud are contracts, assessments and reporting.

Contracts are the primary tools for extending governance into a third party such as a cloud provider. For public clouds this would typically mean the terms and conditions of the provider. They are the guarantee of a given service level, and also describes requirements for governance support through audits.

Supplier assessments are important as governance tools, especially during provider selection. Performing regular assessments can discover if changes to the offerings of the cloud provider has changed the governance situation, in particular with regard to any governance gaps.

Compliance reporting includes audit reports. They may also include automatically generated compliance data in a dashboard, such as patch level status on software, or some other defined KPI. Audit reports may be internal reports but most often these are made by an accredited third party. Common compliance frameworks are provided by ISO 27017, ISO 38500, COBIT.

Risk management

Enterprise risk management (ERM) in the cloud is based on the shared responsibility model. The provider will take responsibility for certain risk controls, whereas the consumer is responsible for others. Where the split is depends on the service model.

The division of responsibilities should be clearly regulated in the contract. Lack of such regulation can lead to hidden implementation gaps, leaving services vulnerable to abuse.

Service models

IaaS mostly resembles traditional IT as most controls remain under direct management of the cloud consumer. Thus, policies and controls do to a large degree remain under control of the cloud consumer too. There is one primary change and that is the orchestration/management plane. Managing the risk of the management plane becomes a core governance and risk management activity – basically moving responsibilities from on-prem activities to the management plane.

SaaS providers vary greatly in competence and the tools offered for compliance management. It is often possible to negotiate custom contracts with smaller SaaS providers, whereas the more mature or bigger players will have more standardized contracts but also more tools appropriate to governance needs of the enterprise. The SaaS model can be less transparent than desired, and establishing an acceptable contract is important in order to have good control over governance and risk management.

Public cloud providers often allow for less negotiation than private cloud. Hybrid and community governance can easily become complicated because the opinions of several parties will have to be weighed against each other.

Risk trade-offs

Using cloud services will typically result in more trust put in third-parties and less direct access to security controls. Whether this increases or decreases the overall risk level depends on the threat model, as well as political risk.

The key issue is that governance is changed from internal policy and auditing to contracts and audit reports; it is a less hands-on approach and can result in lower transparency and trust in the governance model.

CSA recommendations

  • Identify the shared responsibilities. Use accepted standards to build a cloud governance framework.
  • Understand and manage how contracts affect risk and governance. Consider alternative controls if a contract leaves governance gaps and cannot be changed.
  • Develop a process with criteria for provider selection. Re-assessments should be regular, and preferably automated.
  • Align risks to risk tolerances per asset as different assets may have different tolerance levels.

#2cents

Let us start with the contract side: most cloud deployments will be in a public cloud, and our ability to negotiate custom contracts will be very limited, or non-existing. What we will have to play with is the control options in the management plane.

The first thing we should perhaps take note of, is not really cloud related. We need to have a regulatory compliance matrix in order to make sure our governance framework and risk management processes actually will help us achieve compliance and acceptable risk levels. One practical way to set up a regulatory compliance matrix is to map applicable regulations and governacne requirements to the governance tools we have at our disposal to see if the tools can help achieve compliance.

Regulatory source Contractual impact Supplier assessments Audits Configuration management
GDPR Data processing agreement Security requirements GDPR compliance Data processing acitvities audits Data retention Backups Discoverability Encryption
Customer SLA SLA guarantees
Uptime reporting
ISO 27001
Certifications Audit reports for certifications Extension of company policies to management plane

Based on the regulatory compliance matrix, a more detailed governance matrix can be developed based on applicable guidance. Then governance and risk management gaps can be identified, and closing plans created.

Traditionally cloud deployments have been seen as higher risk than on-premise deployments due to less hands-on risk controls. For many organizations the use of cloud services with proper monitoring will lead to better security because many organizations have insufficient security controls and logging in their on-premise tools. There are thus situations where a shift from hands-on to contractual controls is a good thing for security. One could probably claim that this is the case for most cloud consumers.

One aspect that is critical to security is planning of incident response. To some degree the ability to do incidence response on cloud deployments depends on configurations set in the management plane; especially the use of logging and alerting functionality. It should also be clarified up front where the shared responsibility model puts the responsibility for performing incident response actions throughout all phases (preparation, identification, containment, eradication, recovery and lessons learned).

The best way to take cloud into account in risk management and governance is to make sure policies, procedures and standards cover cloud, and that cloud is not seen as an “add-on” to on-premise services. Only integrated governance systems will achieve transparency and managed regulatory compliance.

CCSK Domain 1: Cloud Computing Concepts and Architecture

Recently I participated in a one-day class on the contents required for the “Certificate of Cloud Security Knowledge” held by Peter HJ van Eijk in Trondheim as part of the conference Sikkerhet og Sårbarhet 2019 (translates from Norwegian to: Security and Vulnerability 2019). The one-day workshop was interesting and the instructor was good at creating interactive discussions – making it much better than the typical PowerPoint overdose of commmercial professional training sessions. There is a certification exam that I have not yet taken, and I decided I should document my notes on my blog; perhaps others can find some use for them too.

The CCSK exam closely follows a document made by the Cloud Security Alliance (CSA) called “CSA Security Guidance for Critical Areas of Focus in Cloud Computing v4.0” – a document you can download for free from the CSA webpage. They also lean on ENISA’s “Cloud Computing Risk Assessment”, which is also a free download.

Cloud computing isn’t about who owns the compute resources (someone else’s computer) – it is about providing scale and cost benefits through rapid elasticity, self-service, shared resource pools and a shared security responsibility model.

The way I’ll do these blog posts is that I’ll first share my notes, and then give a quick comment on what the whole thing means from my point of view (which may not really be that relevant to the CCSK exam if you came here for a shortcut to that).

Introduction to D1 (Cloud Concepts and Architecture)

Domain 1 contains 4 sections:  

  • Defining cloud computing 
  • The cloud logical model 
  • Cloud conceptual, architectural and reference model 
  • Cloud security and compliance scope, responsibilities and models 

NIST definition of cloud computing: a model for ensuring ubiquitous, convenient, on-demand network access to a shared pool for configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. 

A Cloud User is the person or organization requesting computational resources. The Cloud Provider is the person or organization offering the resources. 

Key techniques to create a cloud:  

  • Abstraction: we abstract resources from the underlying infrastructure to create resource pools  
  • Orchestration: coordination of delivering resources out of the pool on demand.  

Clouds are multitenant by nature. Consumers are segregated and isolated but share resource pools.  

Cloud computing models 

The foundation model of cloud computing of the CSA is the NIST model. A more in-depth model used as a reference model is taken from ISO/IEC.  The guidance talks mostly about the NIST model and doesn’t dive into the ISO/IEC model, which probably is sufficient for most definition needs.

Cloud computing has 5 charcteristics:

  1. Shared resource pool (compute resources in a pool that consumers can pull from)
  2. Rapid elasticity (can scale up and down quickly)
  3. Broad network access
  4. On-demand self-service (management plane, API’s)
  5. Measured service (pay-as-you-go)

Cloud computing has 3 service models

  • Software as a Service (SaaS): like Cybehave or Salesforce
  • Platform as a Service (PaaS): like WordPress or AWS Elastic Beanstalk
  • Infrastructure as a Service (IaaS): like VM’s running in Google Cloud

Cloud computing has 4 deployment models:

  • Public Cloud: pool shared by anyone
  • Private Cloud: pool shared within an organization
  • Hybrid Cloud: connection between two clouds, commonly used when an on-prem datacenter connects to a public cloud
  • Community Cloud: pool shared by a community, for example insurance companies that have formed some form of consortium

Models for discussing cloud security

The CSA document discusses multiple model type in a somewhat incoherent manner. The types of models it mentions can be categorized as follows:

  • Conceptual models: descriptions to explain concepts, such as the logic model from CSA.  
  • Controls models: like CCM 
  • Reference architectures: templates for implementing security controls 
  • Design patterns: solutions to particular problems 

The document also outlines a simple cloud security process model 

  • Identify security and compliance requirements, and existing controls 
  • Select provider, service and deployment models 
  • Define the architecture 
  • Assess the security controls 
  • Identify control gaps 
  • Design and implement controls to fill gaps 
  • Manage changes over time 

The CSA logic model

This model explains 4 “layers” of a cloud enviornment and introduces some “funny words”:

  • Infrastructure: the core components in computing infrastructure, such as servers, storage and networks 
  • Metastructure: protocols and mechanisms providing connections between infrastructure and the other layers 
  • Infostructure: The data and information (database records, file storage, etc) 
  • Applistructure: The applications deployed in the cloud and the underlying applications used ot build them. 

The key difference between traditional IT and cloud is the metastructure. Cloud metastructure contains the management plane components.  

Another key feature of cloud is that each layer tends to double. For example infrastructure is managed by the cloud provider, but the cloud consumer will establish a virtual infrastructure that will also need ot be managed (at least in the case of IaaS). 

Cloud security scope and responsibilities 

The responsibility for security domains maps to the access the different stakeholders have to each layer in the architecture stack.  

  • SaaS: cloud provider is responsible for perimeter, logging, and application security and the consumer may only have access to provision users and manage entitlemnets 
  • PaaS: the provider is typically responsible for platform security and the consumer is responsible for the security of the solutions deployed on the platform. Configuring the offered security features is often left to the consumer.  
  • IaaS: cloud provider is responsible for hypervisors, host OS, hardware and facilities, consumer for guest OS and up in the stack.  

Shared responsibility model leaves us with two focus areas:  

  • Cloud providers should clearly document internal security management and security controls available to consumers.  
  • Consumers should create a responsibility matrix to make sure controls are followed up by one of the parties 

Two compliance tools exist from the CSA and are recommended for mapping security controls:  

  • The Consensus Assessment Initiative Questionnaire (CAIQ) 
  • The Cloud Controls Matrix (CCM) 

#2cents

This domain is introductory and provides some terminology for discussing cloud computing. The key aspects from a risk management point of view are:

  • Cloud creates new risks that need to be managed, especially as it introduces more companies involved in maintaining security of the full stack compared to a full in-house managed stack. Requirements, contracts and audits become important tools.
  • The NIST model is more or less universally used in cloud discussions in practice. The service models are known to most IT practitioners, at least on the operations side.
  • The CSA guidance correctly designates the “metastructure” as the new kid on the block. The practical incarnation of this is API’s and console access (e.g. gcloud at API level and Google Cloud Console on “management plane” level). From a security point of view this means that maintaining security of local control libraries becomes very important, as well as identity and access management for the control plane in general.

In addition to the “who does what” problem that can occur with a shared security model, the self-service and fast-scaling properties of cloud computing often lead to “new and shiny” being pushed faster than security is aware of. An often overlooked part of “pushing security left” is that we also need to push both knowledge and accountability together with the ability to access the management plane (or parts of it through API’s or the cloud management console).

How to reduce cybersecurity risks for stores, shops and small businesses

Crime in general is moving online, and with that the digital risks for all businesses are increasing, including for traditional physical stores – as well as eCommerce sites. This blog post is a quick summary of some risks that are growing quickly and what shop owners can do to better control them.

Top 10 Cybersecurity Risks

The following risks are faced by most organizations. For many stores selling physical goods these would be devastating today as they rely more and more on digital services.

How secure is your shop when you include the digital arena? Do you put your customers at risk?
  1. Point of sale malware leading to stolen credit cards
  2. Supply chain disruptions due to cybersecurity incidents
  3. Ransomware on computers used to manage and run stores
  4. Physical system manipulation through sensors and IoT, e.g. an adversary turning off the cooling in a grocery store’s refrigerators
  5. Website hacks
  6. Hacking of customer’s mobile devices due to insecure wireless network
  7. Intrusion into systems via insecure networks
  8. Unavailability of critical digital services due to cyber incidents (e.g. SaaS systems needed to operate the business)
  9. Lack of IT competence to help respond to incidents
  10. Compromised e-mail accounts and social media accounts used to run the business

Securing the shop

Shop owners have long been used to securing their stores against physical theft – using alarms, guards and locks. Here are 5 things all shop owners can do to also secure their businesses against cybersecurity events:

1 – Use only up-to-date IT equipment and software.

Outdated software can be exploited by malware. Keeping software up to date drastically reduces the risk of infection. If you have equipment that cannot be upgraded because it is too old you should get rid of it. The rest should receive updates as quickly as possible when they are made avialable, preferably automatically if possible.

2 – Create a security awareness program for employees.

No business is stronger than its weakest link – and that is true for security too. By teaching employees good cybersecurity habits the risk of an employee downloading a dangerous attachment or accepting a shady excuse for weird behavior from a criminal will be much lower. A combination of on-site discussions and e-learning that can be consumed on mobile devices can be effective for delivering this.

3 – Use the guest network only for guests.

Many stores, coffee shops and other businesses offer free wifi for their customers. Make sure you avoid connecting critical equipment to this network as vulnerabilities can be exposed. Things I’ve seen on networks like this include thermostats, cash registers and printers. Use a separate network for those important things, and do not let outsiders onto that network.

4 – Secure your website like your front door.

Businesses will usually have a web site, quite often with some form of sales and marketing integration – but even if you don’t have anything else than a pretty static web page you should take care of its security. If it is down you lose a few customers, if it is hacked and customers are tricked out of their credit card data they will blame your shop, not the firm you bought the web design from. Make sure you require web designers to maintain and keep your site up to date, and that they follow best practices for web security. You should also consider running a security test of the web page on regular intervals.

5 – Prepare for times of trouble.

You should prepare for bad things to happen and have a plan in place for dealing with it. The basis for creating an incident response plan is a risk assessment that lists the potential threat scenarios. This will also help you come up with security measures that will make those scenarios less likely to occur.

6 – Create backups and test them!

The best medicine against losing data is having a recent backup and knowing how to restore your system. Make sure all critical data are backed up regularly. If you are using a cloud software for critical functions such as customer relationship management (CRM) or accounting, check with your vendor what backup options they have. Ideally your backups should be stored in a location that is not depending on the same infrastructure as the software itself. For example – if Google runs your software you can store your backups with Microsoft.

7 – Minimize the danger of hacked accounts.

The most common way a company gets hacked is a compromised account. This very often happens because of phishing or password reuse. Phishing is the use of e-mails to trick users into giving up their passwords – for example by sending them to a fake login page that is controlled by the hacker. Three things you can do that will reduce this risk by 99% is:

  • Tell everyone to use a password manager and ask them to use very long and complex passwords. They will no longer need to remember the passwords themselves so this will not be a problem. Examples of such software include 1Password and Lastpass.
  • Enforce two-factor authentication wherever possible (2FA for short). 2FA is the use of a second factor in addition to your password, such as a code generated on you mobile in order to log in.
  • Give everyone training on detection of social engineering scams as part of your awareness training program.

All of this may seem like quite a lot of work – but when it becomes a habit it will make your team more efficient, and will significantly reduce the cybersecurity threats for both you and your customers.

If you need tools for awareness training, risk management or just someone to talk to about security – take a look at the offerings from Cybehave – intelligent cloud software for better security.

Running an automated security audit using Burp Professional

Reading about hacking in the news can make it seem like anyone can just point a tool at any website and completely take it over. This is not really the case, as hacking, whether automated or manual, requires vulnerabilities.

A well-known tool for security professionals working with web applications is Burp from Portswigger. This is an excellent tool, and comes in multiple editions from the free community edition, which is a nice proxy that you can use to study HTTP requests and responses (and some other things), to the professional edition aimed at pentesting and enterprise which is more for DevOps automation. In this little test we’ll take the Burp Professional tool and run it using only default settings against a target application I made last year. This app is a simple app for posting things on the internet, and was just a small project I did to learn how to use some of the AWS tools for deployment and monitoring. You find it in all its glory at https://www.woodscreaming.com.

Just entering the URL http://www.woodscreaming.com and launching Burp to attack the application first goes through a crawl and audit of unauthenticated routes it can find (it basically clicks all the links it can find). Burp then registers a user, and starts probing the authenticated routes afterwards, including posting those weird numerical posts.

Woodscreaming.com: note the weird numerical posts. These are telltale signs of automated security testing with random input generation.

What scanners like Burp are usually good at finding, is obvious misconfigurations such as missing security headers, flags on cookies and so on. It did find some of these things in the woodscreaming.com page – but not many.

Waiting for security scanners can seem like it takes forever. Burp estimated some 25.000 days remaining after a while with the minimal http://www.woodscreaming.com page.

After runing for a while, Burp estimated that the remaining scan time was something like 25.000 days. I don’t know why this is the case (not seen this in other applications) but since a user can generate new URL paths simply by posting new content, a linear time estimation may easily diverge. A wild guess at what was going on. Because of this we just stopped the scan after some time as it was unlikely to discover new vulnerabilities after this.

The underlying application is a traditional server-driven MVC application running Django. Burp works well with applications like this and the default setup works better than it typically does for single page applications (SPA’s) that many web applications are today.

So, what did Burp find? Burp assigns a criticality to the vulnerabilities it finds. There were no “High” criticality vulns, but it reported some “Medium” ones.

Missing “Secure” flag on session cookies?

Burp reports 2 cookies that seem to be session cookies and that are missing the Secure flag. This means that these cookies would be set also if the application were to be accessed over an insecure connection (http instead of https), making a man-in-the-middle able to steal the session, or perform a cross-site request forgery attack (CSRF). This is a real find but the actual exposure is limited because the app is only served over https. It should nevertheless be fixed.

A side note on this: cookies are set by the Django framework in their default state, no configuration changes made. Hence, this is likely to be the case also on many other Django sites.

If we go to the “Low” category, there are several issues reported. These are typically harder to exploit, and will also be less likely to cause major breaches in terms of confidentiality, integrity and availability:

  • Client-side HTTP parameter pollution (reflected)
  • CSRF cookie without HTTPOnly flag set
  • Password field with autocomplete enabled
  • Strict transport security not enforced

The first one is perhaps the most interesting one.

HTTP paramter pollution: dangerous or not?

In this case the URL parameter reflected in an anchor tag’s href attribute is not interpreted by the application and thus cannot lead to bad things – but it could have been the case that get parameters had been interpreted in the backend, making it possible to have a person perform an unintended action in a request forgery attack. But in our case we say as the jargon file directs us: “It is not a but, it is a feature”!

So what about the “password field with autocomplete enabled”? This must be one of the most common alerts from auditing software today. This can lead to unintended disclosure of passwords and should be avoided. You’ll find the same on many well-known web pages – but that does not mean we shouldn’t try to avoid it. We’ll put it on the “fix list”.

Are automated tests useful?

Automated tests are useful but they are not the same as a full penetration test. They are good for:

  1. Basic configuration checks. This can typically be done entirely passively, no attack payloads needed.
  2. Identifying vulnerabilities. You will not find all, and you will get some false positives but this is useful.
  3. Learning about vulnerabilities: Burp has a very good documentation and good explanations for the vulnerabilities it finds.

If you add a few manual checks to the automated setup, perhaps in particular give it a site-map before starting a scan and testing inputs with fuzzing (which can also be done using Burp) you can get a relatively thorough security test done with a single tool.

Defending against OSINT in reconnaissance?

Hackers, whether they are cyber criminals trying to trick you into clicking a ransomware download link, or whether they are nation state intelligence operatives planning to gain access to your infrastructure, can improve their odds massively through proper target reconnaissance prior to any form of offensive engagement. Learn how you can review your footprint and make your organization harder to hack.

https://cybehave.no

Cybehave has an interesting post on OSINT and footprinting, and what approach companies can take to reduce the risk from this type of attack surface mapping: https://cybehave.no/2019/03/05/digital-footprint-how-can-you-defend-against-osint/ (disclaimer: written by me and I own 25% of this company).

tl;dr – straight to the to-do list

  • Don’t publish information with no business benefit and that will make you more vulnerable
  • Patch your vulnerabilities – both on the people and tech levels
  • Build a friendly environment for your people. Don’t let them struggle with issues alone.
  • Prepare for the worst (you can still hope for he best)

Storing seeds for multifactor authentication tokens

When setting up an application to use two-factor authentication for example with Google Authenticator, each user will have a unique seed value for the authenticator. The identity server will require knowledge of the seed to verify the token – meaning you will have to store it and retrieve it somehow. This means that if an attacker gets access to the storage solution that links OTP secret seeds to user ID’s (e.g. usernames), the protocol is broken. So, trying to think up some options for securing the secrets – we cannot hash and salt it because it breaks the OTP authentication flow. We are hence left with encrypting the seed before storing it.

The most practical seems to be a symmetric crypto approach, the question is what to use as the crypto key. Here are some approaches I’ve seen people discuss that all seem bad: 

  • User password: if you can phish the password, then you can also generate the OTP provided you know which algorithm/library is used
  • A static application secret: should be safe provided that secret is never leaked but using a static secret means that if it is compromised, all users are compromised. Still better than the user password, though. 
  • Using non-static user level meta data to create a unique key for each user that is not vulnerable to phishing or guessing. Typically visible to admins.
Get username/password
Verify username/password
Get OTP seed (encrypted)
Get metadata and reconstruct encryption key
Verify OTP
Authenticate user and store timestamp and other auth metadata
Construct new encryption key
Encrypt seed
Store in database

The question is what metadata to use. We need the following properties to be true:

  • Not possible to guess for a third party even if we tell what metadata it is
  • Not possible to reconstruct for an administrator with access to the account
  • Not possible to phish or obtain through social engineering or client side attacks

There are many possibilities but here is one possible solution that would satisfy all the above requirements:

Key = Password (Not available to admins) + Timestamp for last login (not guessable/phishable)

Deploying Django to app engine with Python 3.7 runtime – fails because it can’t find pip?

Update 30 April 2019: Problem is back. This time it tries to upgrade to pip 19.1, but the app engine instance is stuck on 19.0.3. Adding pip==19.0.3 in the requirements.txt file saves the deployment.

Update 1 April 2019: now the deploy fails with the same message as described in this post, when the PIP version is specified in the requirements.txt file. Removing the specific pip version line from the requirements file fixes this. I have not seen any change notice or similar from Google on this.

I had an interesting error that took quite some time to hunt down today. These are basically some notes on what caused it and how I tracked it down. I have an app that is deployed to Google App Engine standard. It is running Django and using the Python 3.7 runtime – and it was working quite well for some time. Then yesterday I was going to deploy an update (actually just adding some CSS tweaks), it failed, with a cryptic error message. Running “gcloud app deploy” lead to the following error message:

File upload done.
Updating service [default]…failed.
ERROR: (gcloud.app.deploy) Error Response: [9] Cloud build a33ff087-0f47-4f18-8654-********* status: FAILURE.
Error ID: B212CE0B.
Error type: InternalError.
Error message: pip_install_from_wheels had stderr output:
/env/bin/python3.7: No module named pip
error: pip_install_from_wheels returned code: 1.

This is weird: this is a normal Python project using a requirements.txt file for its dependencies. The file was generated using pip freeze, and should not contain anything weird (it doesn’t). Searching the wisdom of the internet reveals that nobody else seems to have this problem, and it only occurred since yesterday. My hunch was that they’ve changed something on the GAE environment that broke something. Searching the net gives us these options:

  • The requirements.txt file has weird encoding and contains Chinese signs/letters? That was not it.
  • This is because you need to install some special packages for using Python3.. was also not the case and would have been weird changing since a few days ago…
  • You need to manually install pip to make it work – which may be the case sometimes but without SSH access to the instance this isn’t obvious how to do.
The trick is often looking at the right logs….

So, being clueless I turned to looking for the right logs to figure out what is going on. Not being an expert on the GAE environment led to some hunting in the web console until I found “Cloud build”, which sounded promising. That was the right place to be – GAE uses a build process in the cloud to first build the application, and then a Docker image to push to the internal Google Cloud Docker repository. Hunting the build log for this finds this little piece of gold:

Step #1 - "builder": INFO pip_install_from_wheels took 0 seconds
Step #1 - "builder": INFO starting: pip_install_from_wheels
Step #1 - "builder": INFO pip_install_from_wheels /env/bin/python3.7 -m pip install --no-deps --prefix /tmp/tmp9Y3aD7/env /tmp/tmppuvw4s/wheel/pip-19.0.1-py2.py3-none-any.whl --disable-pip-version-check
Step #1 - "builder": INFO `pip_install_from_wheels` stdout:
Step #1 - "builder": Processing /tmp/tmppuvw4s/wheel/pip-19.0.1-py2.py3-none-any.whl
Step #1 - "builder": Installing collected packages: pip
Step #1 - "builder": Found existing installation: pip 18.1
Step #1 - "builder": Uninstalling pip-18.1:
Step #1 - "builder": Successfully uninstalled pip-18.1
Step #1 - "builder": Successfully installed pip-19.0.1
Step #1 - "builder": Failed. No module /env/python/pip

Before the weird error we see it is trying to uninstall pip-18.1, then install pip-19.0.1 (a more recent version), and then it can’t find pip afterwards and the build process fails. This has not been configured by me and is probably Google configuring automatic upgrades of some packages during build – and here it breaks the workflow.

Fixing it

The temporary fix was simple. Adding “pip==18.1” to the requirements.txt file, allowed the build process to run and it deployed nicely.

What did we learn from this?

  • API tools give only partial error messages, making debugging hard.
  • Automated upgrade configs are good but can cause things to break in unforeseen ways.
  • Finding the right logs is the key to fixing weird problems