I had an interesting error that took quite some time to hunt down today. These are basically some notes on what caused it and how I tracked it down. I have an app that is deployed to Google App Engine standard. It is running Django and using the Python 3.7 runtime – and it was working quite well for some time. Then yesterday I was going to deploy an update (actually just adding some CSS tweaks), it failed, with a cryptic error message. Running “gcloud app deploy” lead to the following error message:
File upload done.
Updating service [default]…failed.
ERROR: (gcloud.app.deploy) Error Response:  Cloud build a33ff087-0f47-4f18-8654-********* status: FAILURE.
Error ID: B212CE0B.
Error type: InternalError.
pip_install_from_wheelshad stderr output:
/env/bin/python3.7: No module named pip
pip_install_from_wheelsreturned code: 1.
This is weird: this is a normal Python project using a requirements.txt file for its dependencies. The file was generated using pip freeze, and should not contain anything weird (it doesn’t). Searching the wisdom of the internet reveals that nobody else seems to have this problem, and it only occurred since yesterday. My hunch was that they’ve changed something on the GAE environment that broke something. Searching the net gives us these options:
- The requirements.txt file has weird encoding and contains Chinese signs/letters? That was not it.
- This is because you need to install some special packages for using Python3.. was also not the case and would have been weird changing since a few days ago…
- You need to manually install pip to make it work – which may be the case sometimes but without SSH access to the instance this isn’t obvious how to do.
So, being clueless I turned to looking for the right logs to figure out what is going on. Not being an expert on the GAE environment led to some hunting in the web console until I found “Cloud build”, which sounded promising. That was the right place to be – GAE uses a build process in the cloud to first build the application, and then a Docker image to push to the internal Google Cloud Docker repository. Hunting the build log for this finds this little piece of gold:
Step #1 - "builder": INFO pip_install_from_wheels took 0 seconds
Step #1 - "builder": INFO starting: pip_install_from_wheels
Step #1 - "builder": INFO pip_install_from_wheels /env/bin/python3.7 -m pip install --no-deps --prefix /tmp/tmp9Y3aD7/env /tmp/tmppuvw4s/wheel/pip-19.0.1-py2.py3-none-any.whl --disable-pip-version-check
Step #1 - "builder": INFO `pip_install_from_wheels` stdout:
Step #1 - "builder": Processing /tmp/tmppuvw4s/wheel/pip-19.0.1-py2.py3-none-any.whl
Step #1 - "builder": Installing collected packages: pip
Step #1 - "builder": Found existing installation: pip 18.1
Step #1 - "builder": Uninstalling pip-18.1:
Step #1 - "builder": Successfully uninstalled pip-18.1
Step #1 - "builder": Successfully installed pip-19.0.1
Step #1 - "builder": Failed. No module /env/python/pip
Before the weird error we see it is trying to uninstall pip-18.1, then install pip-19.0.1 (a more recent version), and then it can’t find pip afterwards and the build process fails. This has not been configured by me and is probably Google configuring automatic upgrades of some packages during build – and here it breaks the workflow.
The temporary fix was simple. Adding “pip==18.1” to the requirements.txt file, allowed the build process to run and it deployed nicely.
What did we learn from this?
- API tools give only partial error messages, making debugging hard.
- Automated upgrade configs are good but can cause things to break in unforeseen ways.
- Finding the right logs is the key to fixing weird problems