Creating a simple RSS reader with Python

June 22, 2019June 22, 2019 Håkon OlsenLeave a comment

I’ve tried to find a simple web based RSS reader but don’t really like any of the most popular ones. Too many ads, too much flashy graphics, and limits on number of feeds without paying.

Keeping track of many sources without a feed reader is cumbersome. This is why RSS is still relevant.

So, I started uilding one. This is work in progress, and is currently almost featureless. You can add private and public feeds. Tagging, search, etc is coming in the near future.

You can try it out by going to woodscreaming.com/rss.

Parsing RSS feeds with feedparser

RSS and atom feeds are XML files. It would be possible to parse these files and structure the contents from scratch – but fortunately we don’t need to do that. The excellent module feedparser can do it for us. Here’s what’s done in our little reader :

Parse it: fp = feedparser.parse(feed.url)
fp is now a dict containing the feed data. The most important fields are fp.title and fp.entries

Woodscreaming.com is a traditional Django app with minimal frontend processing. Creating the list of feeds is simply done by passing parsed feeds to a Django template.

Making Django, Elastic Beanstalk and AWS RDS play well together

August 8, 2018August 8, 2018 Håkon Olsen3 Comments

A couple of days ago I decided I should learn a bit more hands-on AWS stuff. So I created a free tier AWS account, and looked around. I decided I’d take a common use case; deploy a web application to Elastic Beanstalk and add a domain and SSL.

Setting up tools

Step 1: reading documentation. AWS has a lot of documentation, and it is mostly written in a friendly manner with easy-to-follow instructions. Based on the documentation I opted for using the command line Elastic Beanstalk tool. To use this you need Python and pip. You can install it with the command

pip install awsebcli –upgrade

If you are having a permissions problem with doing this, you can throw in a “–user” flag at the end of that command. This will install the tool you need to create and manage EB environments from your command line. Since it is a Python utility it works on Windows, as well as Mac and Linux. Installing this did not pose any hiccups. You can read more about how to set this tool up and updating your system path here: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3-install.html.

Before using it you need to set it up. Issue the command

eb init.

This will give you a prompt asking for a number of things, like region to set up in, etc.

Learning point: if you want to set up a database, such as MySQL in the EB environment, you should use the database option when issuing the next command. Anyway to set up your environment, use

eb create https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb3-create.html

If you want a database in your environment add the –db flag with the desired options; you cannot create the database in the EB Console (web-based interface) afterwards, at least not for micro instances that are allowed in the free tier. According to someone on Stack Overflow, this is a bug in AWS that you can wait for them to fix – or use the command line option (supposedly that works but it is not what I did).

If you create a database in your EB environment, your DB will be terminated too if you terminate that environment. You may not want that, so you can consider setting up an external database and connecting to it outside of EB. That is what I did, and there’s more about that a little further down this post.

Creating a Django app

To have something to deploy I created a Django app. This is an asocial network; you can post things with links and hashtags but you can’t follow other users or anything like that. It has user management and uses the default Django admin system and authentication system (session based). I called it woodscreaming and you can view it here: woodscreaming.com.

Setting up a virtual environment

First, to avoid mixing up things and creating a requirements file that works, create a virtual environment. For this I like to use the tool virtualenv (works on all platforms, can be installed with pip if you don’t have it):

virtualenv –python=python venv

“venv” is the name of your virtual environment. Everything you install when the environment is active will be contained in that environment, and you have all dependencies under control (think of it like a semi-container-solution). To activate the environment on Linux/Mac:

source venv/bin/activate

On Windows:

venv\Script\activate

When you have all the dependencies your app needs in place, run

pip freeze > requrements.txt

This creates a requirements.txt file that EB will use to install your app in the cloud.

Adding EB configuration files to the Django project

To make things work, you also need to add some EB specific configuration files to your Django project. Create a folder named .ebextensions in your project’s root folder. In this folder you will need to add a django.config file with the following contents:

option_settings:
  aws:elasticbeanstalk:container:python:
    WSGIPath: projectname/wsgi.py

Of course you need to change the word projectname into the name of your project. This tells EB where to find your wsgi file. This file describes how the web server should be set up and is a Python standard.

You should also tell EB to run migrations to get your data models to work with your database. Adding a file (I called it db-migrate.config) to the .ebextensions folder fixes this. Here’s what you need to add to that file:

container_commands:
  01_migrate:
    command: "django-admin.py migrate"
    leader_only: true
option_settings:
  aws:elasticbeanstalk:application:environment:
    DJANGO_SETTINGS_MODULE: discproject.settings

You should also create a folder called .elasticbeanstalk. The command line client will populate this with a YAML file called config.yml tha tells EB what resources are needed (you don’t need to edit this file yourself).

That’s it to begin with – some changes need to be made when adding an RDS database and setting up http to https forwarding.

Deploying to EB

Deploying to EB is very easy, you simply deactivate your virtual environment by issuing the command “deactivate” and then you run

eb deploy

It now zips your source, uploads it to AWS and installs it and provisions the resources defined in your config.yml file. It takes a while, and then it is done. Then you can see your web app online by issuing the command

eb open

The app will get its own URL automatically, of the format “something.aws-region.elasticbeanstalk.com”. It does not get an SSL certificate (https) automatically – you will need to set up a custom domain for that (more about that later). Anyway, opening it up shows the web app running in the cloud and I am able to use it.

Dev database vs prod database

By default django-admin.py sets up a project hat uses an SQLite database; a single file SQL database that is popular for persistent storage in mobile apps and embedded applications. When deploying your development environment’s database is deployed too, and with each redeploy you will overwrite it. It is not great for concurrent operations, and obviously overwriting all user data on each deploy is not going to work. There are ways around this if you want to stick to SQLite but that is normally not the best solution for a web app database (although it is great for development).

Next we look at how we can create a database in the cloud and use that with our production environment, while using the SQLite one in local development.

Adding an RDS database

Attempt 1: Using the EB Console

In the EB console (the web interface), if you go to “Configuration”, there is a card for “Database” and an option to “modify”. There you can set up your desired database instance and select apply. The problem is… it doesn’t work for some reason. The deployment fails due to some permission error. I’m sure it is possible to fix but I didn’t bother fiddling enough with it to do that. And as mentioned above; if you terminate the environment you will also terminate the database.

Attempt 2: Setting up and RDS database external to EB

This worked. Basically following AWS documentation on how to set it up was quick and easy:

Go to RDS, create a new instance. Select the type of database engine, EC2 instance type etc.
Select db name, username, password (remember to write those down – I use secure notes in LastPass for things like this). Set the DB instance to be “public” to allow queries from outside your VPC to reach it.
Add the RDS security group to your EB EC2 instance. This is important – if you do not do this, it is not possible to query the database from EB.

To add that security group in EB you need to go to the EB console, head to configuration and then select the card for instances. Select “modify” and then head to the security groups table – add the RDS one (it is automatically generated and named something like rds-default-1) and click apply. Because the database is external to EB you also need to add environment variables for the connection. To do this, head to the console again and select “modify” on the software card. Add the following environment variables:

RDS_DB_NAME
RDS_HOST_NAME
RDS_PORT
RDS_USERNAME
RDS_PASSWORD

The values are found in your RDS instance overview (head to the RDS console, select your instance, and you find the variables a bit down on the page). Now, you also need to tell your Python app to read and use these. Add this to your Django settings file:

if 'RDS_HOSTNAME' in os.environ:
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.mysql',
            'NAME': os.environ['RDS_DB_NAME'],
            'USER': os.environ['RDS_USERNAME'],
            'PASSWORD': os.environ['RDS_PASSWORD'],
            'HOST': os.environ['RDS_HOSTNAME'],
            'PORT': os.environ['RDS_PORT'],
        }
    }
else:
    DATABASES = {
        'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'dbp.sqlite3'),
        }
    }

After doing this the EB environment health was showing as “green” and all good. But my web app did not show up and the log showed database connection errors. The solution to that was: read the docs. You also need to add the RDS default security group (the one that allows inbound connections) to the allowed sources for inbound connections. Details here: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/AWSHowTo.RDS.html. After doing this – it works!

Adding a Django superuser to the RDS database

You could SSH into your EC2 instance running the Django app and use the manage.py utility; but this kind of beats the point of having a PaaS that supposedly should be able to configure things without SSH-ing into everything.

To add a Django superuser you should thus add a new Django command to your environment. Here’s a good description of how to do that: https://github.com/codingforentrepreneurs/Guides/blob/master/all/elastic_beanstalk_django.md. You can add the command to your db-migrate.config file in the .ebextensions folder.

Configuring DNS with Route 53

Now, having the default URL is no fun, and you can’t add SSL on that one. So we need to set up DNS. I chose to buy a domain name from Amazon and then set up DNS with Route 53. Setting that up for an EB environment is super-easy: you make an A record as alias to your EB environment URL.

Adding an SSL certificate that terminates on the load balancer

Now that we have a working domain name, and we’ve set up the DNS records we need, we can add an SSL certificate. The easiest way to provision the certificate is to use Amazons certificate management service. You provision one for your domain, and you can verify by adding a CNAME record to your DNS hosted zone in Route 53.

The next thing you need to do to make things work is add the certificate to your Elastic Beanstalk environment. Depending on your threat model and your needs, you can choose the simple route of terminating https on the load balancer (good enough for most cases), or you can set up AWS to also use secure protocols in internal traffic (behind the load balancer). I chose to terminate traffic on the load balancer.

The AWS docs explains how to do this by adding a secure listener on the load balancer: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/configuring-https-elb.html.

Forwarding http to https

To forward http traffic to https there are several ways this can be done. The easiest is to set up forwarding on the Apache server. Since we are not using SSH to fiddle with the server directly, we do this by adding a configuration file to our .ebextensions folder in the Django project, and then redeploying. Adding a file https.config with the following contents does the job:

files:
    "/etc/httpd/conf.d/ssl_rewrite.conf":
        mode: "000644"
        owner: root
        group: root
        content: |
            RewriteEngine On
            <If "-n '%{HTTP:X-Forwarded-Proto}' && %{HTTP:X-Forwarded-Proto} != 'https'">
            RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R,L]
            </If>

Summary

This post is a walk-through of getting the essentials done to use Elastic Beanstalk to serve a web application:

Create an environment and deploy an app
Use config files to manage server processes and configurations
Setting up an external RDS database and connect to it using environment variables
Configuring a custom domain name and setting up DNS
Adding SSL termination on the load balancer
Adding a http to https rewrite rule to Apache on the web server using a config file

How to manage risk and security when outsourcing development

July 5, 2018July 5, 2018 Håkon OlsenLeave a comment

Are you planning to offer a SaaS product, perhaps combined with a mobile app or two? Many companies operating in this space will outsource development, often because they don’t have the right in-house capacity or competence. In many cases the outsourcing adventure ends in tears. Let’s first look at some common pitfalls before diving into what you can do to steer the outsourced flagship clear of the roughest seas.

Common outsourcing pitfalls

I’ve written about project follow-up before, and whether you are building an oil rig or getting someone to write an app for you, the typical “outsourcing pitfalls” remain the same:

Weak follow-up
Lack of documentation requirements
Testing is informal
No competence to ask the right questions
No planning of the operations phase
Lack of privacy in design

Weak follow-up: without regular follow-up the sense of commitment can get lost for the service provider. It is also increasing the chances of misunderstandings by several magnitudes. If I write a specification of a product that should be made, and even if that specification is wonderfully clear to me, it may be interpreted differently by a service provider. With little communication underway towards the product, there is a good chance the deliverable will not be as expected – even if the supplier claims all requirements have been met.

Another big mistake by not having a close follow-up process, is lost opportunities in the form of improvements or additional features that could be super-useful. If the developer gets a brilliant idea, but has no one to approve of it, it may not even be presented to you as the project owner. So, focus on follow-up – if not you are not getting the full return on your outsourcing investment.

Lack of documentation requirements: Many outsourcing projects follow a common pattern: the project owner writes a specification, and gets a product made and delivered. The outsourcing supplier is then often out of the picture: work done and paid for – you now own the product. The plan is perhaps to maintain the code yourself, or to hire an IT team with your own developers to do that. But…. there is no documentation! How was the architecture set up, and why? What do the different functions do? How does it all work? Getting to grips with all of that without proper documentation is hard. Really hard. Hence, putting requirements to the level of documentation into your contracts and specifications is a good investment with regards to avoiding future misunderstandings and a lot of wasted time trying to figure out how everything works.

Informal or no testing: No testing plan? No factory acceptance test (FAT)? No testing documentation? Then how do you determine if the product meets its quality goals – in terms of performance, security, user experience? The supplier may have fulfilled all requirements – because testing was basically left up to them, and they chose a very informal approach that only focuses on functional testing, not performance, security, user experience or even accessibility. It is a good idea to include testing as part of the contract and requirements. It does not need to be prescriptive – the requirement may be for the supplier to develop a test plan for approval, and with a rationale for the chosen testing strategy. This is perhaps the best way forward for many buyers.

No competence to ask the right questions: One reason for the points mentioned so far being overlooked may be that the buying organization does not have the in-house competence to ask the right questions. The right medicine for this may not be to send your startup’s CEO to a “coding bootcamp”, or for a company that is primarily focused on operations to hire its in-house development team – but leaving the supplier with all the know-how leaves you in a very vulnerable position, almost irrespective of the legal protections in your contract. It is often money well spent to hire a consultant to help follow-up the process – ideally from the start so you avoid both specification and contract pitfalls, and the most common plague of outsourcing projects – weak follow-up.

No planning of operations: If you are paying someone to create a SaaS product for you – have you thought about how to put this product into operation? Often important things are left out of the discussion with the outsourcing provider – even if their decisions have a very big impact on your future operations. Have you included the following aspects into your discussions with the dev teams:

Application logs: what should be logged, and to what format, and where should it be logged?
How will you deploy the applications? How will you mange redundancy, content delivery?
Security in operations: how will you update the apps when security demands it, for example through the use of dependencies/libraries where security holes become known? Do you at all know what the dependencies are?
Support: how should your applications be supported? Who picks up the phone or answers that chat message? What information will be available from the app itself for the helpdesk worker to assist the customer?

Lack of privacy in design: The GDPR requires privacy to be built-in. This means following principles such as data minimization, using pseudonomization or anonymization where this is required or makes sense, means to detect data breaches that may threaten the confidentiality and integrity (and in some cases availability) of personal information. Very often in outsourcing projects, this does not happen. Including privacy in the requirements and follow-up discussions is thus not only a good idea but essential to make sure you get privacy by design and default in place. This also points back to the competence bit – perhaps you need to strengthen not only your tech know-how during project follow-up but also privacy and legal management?

A simple framework for successful follow-up of outsourcing projects

The good news is that it is easy to give your outsourcing project much better chances of success. And it is all really down to common sense.

outsourcing_framework — Activities in three phases for improving your outsourcing management skills

Preparation

First, during preparation you will make a description of the product, and the desired outcomes of the outsourcing project. Here you will have a lot to gain from putting in more requirements than the purely functional ones – think about documentation, security, testing and operations related aspects. Include it in your requirements list.

Then, think about the risk in this specification. What can go wrong? Cause delays? Malfunction? Be misunderstood? Review your specification with the risk hat on – and bring in the right competence to help you make that process worthwhile. Find the weaknesses, and then improve.

Decide how you want to follow-up the vendor. Do you want to opt for e-mailed status reports once per week? The number of times that has worked for project follow-up is zero. Make sure you talk regularly. The more often you interact with the supplier, the better the effect is on quality, loyalty, and priorities. Stay on the top priority list for your supplier – if not your product will not be the thing they are thinking about when coming to the office in the morning. Things you can do to get better project follow-up:

Regular meetings – in person if you are in the same location, but also on video works well.
Use a chat tool such as Slack, Microsoft Teams or similar for daily discussions. Keep it informal. Be approachable. That makes everything much better.
Always focus on being helpful. Avoid getting into power struggles, or a very top-down approach. It kills motivation, and makes people avoid telling you about their best ideas. You want those ideas.

Competence. That is the hardest piece of the pussle. Make sure you take a hard look at your own competence, and the competence you have available before deciding you are good to go. This determines if you should get a consultant or hire someone to help follow-up the outsourcing project. For outsourcing of development work, rate your organization’s competence within the following areas:

Project management (budgets, schedule, communications, project risk governance, etc)
Security: do you know enough to understand what cyber threats you need to worry about during dev, and during ops? Can you ask the right questions to make sure your dev team follows good practice and makes the attack surface as small as it should be?
Code development: do you understand development, both on the organizational and code level? Can you ask the right questions to make sure good practice is followed, risks are flagged and priorities are set right?
Operations: Do you have the skills to follow-up deployment, preparations for production logging, availability planning, etc?
User experience: do you have the right people to verify designs and user experiences with respect to usability, accessibility?
Privacy: do you understand how to ensure privacy laws are followed, and that the implementation of data protection measures will be seen as acceptable by both data protection authorities and the users?

For areas where you are weak, consider getting a consultant to help. Often you can find a generalist who can help in more than one area, but it may be hard to cover them all. It is also OK to have some weaknesses in the organization, but you are much better off being aware of them than running blind in those areas. The majority of the follow-up would require competence in project management and code development (including basic security), so that needs to be your top priority to cover well.

Work follow-up

Now we are going to assume you are well-prepared – having put down good requirements, planned on a follow-up structure and that you more or less have covered the relevant competence areas. Here are some hints for putting things into practice:

Regular follow-up: make sure you have formal follow-up meetings even if you communicate regularly on chat or similar tools. Make minutes of meetings that is shared with everyone. Make sure you make the minutes – don’t empower the supplier to determine priorities, that is your job. The meetings should all be called for with agendas so people can be well prepared. Here are topics that should be covered in these meetings:
- Progress: how does it look with respect to schedule, cost and quality
- Ideas and suggestions: useful suggestions, good ideas? If someone has a great idea, write down the concept and follow-up in a separate meeting.
- Problems: any big issues found? Things done to fix problems?
- Risks: any foreseeable issues? Delays? Security? Problems? Organizational issues?
Project risk assessment: keep a risk register. Update it after follow-up meetings. If any big things are popping up, make plans for correcting it, and ask the supplier to help plan mitigations. This really helps!
Knowledge build-up: you are going to take over an application. There is a lot to be learned from the dev process, and this know-how often vanishes with project delivery. Make sure to write down this knowledge, especially from problems that have been solved. A wiki, blog, and similar formats can work well for this, just make sure it is searchable.
Auditing is important for all. It builds quality. I’ve written about good auditing practices before, just in the context of safety, but the same points are still valid for general projects too: Why functional safety audits are useful.

Take-over

Make sure to have a factory acceptance test. Make a test plan. This plan should include everything you need to be happy with to say you will take it over:
- Functions working as they should
- Performance: is it fast enough?
- Security: demonstrate that included security functions are working
- Usability and accessibility: good standards followed? Design principles adhered to?
Initial support: the initial phase is when you will discover the most problems – or rather, your users will discover them. Having a plan for support from the beginning is therefore essential. Someone needs to pick up the phone or answer that chat message – and when they can’t, there must be somewhere to escalate to, preferably a developer who can check if there is something wrong with the code or the set-up. This is why you should probably pay the outsourcing supplier to provide support in the initial weeks or months before you have everything in place in-house; they know the product best after making it for you.
Knowledge transfer: the developers know the most about your application. Make sure they help you understand how everything works. During the take-over phase make sure you ask all questions you have, that you have them demo how things are done, take advantage of any support contracts to extend your knowledge base.

This is not a guarantee for success – but your odds will be much better if you plan and execute follow-up in a good manner. This is one way that works well in practice – for all sorts of buyer-supplier relationship follow-up. Here the context was software – but you may use the same thinking around ships, board games or architectural drawings for that matter. Good luck with your outsourcing project!

Comments? They are very welcome, or hit me up on Twitter @sjefersuper!

Why “secure iframes” on http sites are bad for security

May 14, 2017 Håkon OlsenLeave a comment

Earlier this year it was reported that half of the web is now served over SSL (Wired.com). Still, quite a number of sites are trying to keep things in http, and to serve secure content in embedded parts of the site. There are two approaches to this:

A form embedded in an iframe served over https (not terrible but still a bad idea)
A form that loads over http and submits over https (this is terrible)

The form loading on the http site and submitting to a https site is security-wise meaningless because an attacker can read the data entered into the form on the web page. This means the security added by https is lost because a man-in-the-middle attacker on the http site can snoop on the data in the form directly.

ssl_safecontrols — Users are slowly but surely being trained to look for this padlock symbol and the “https” protocol when interacting with web pages and applications.

The “secure iframe” is slightly better because the form is served over https and a man-in-the-middle cannot easily read the contents of the form. This is aided by iframe sandboxing in modern browsers (see some info about this in Chrome here), although old ones may not be as secure because the sandboxing function was not included. Client-side restrictions can, however, be manipulated.

One of the big problems with security is lack of awareness about security risks. To counter this, browsers today indicate that login forms, payment forms, etc. on http sites are insecure. If you load your iframe over https on an http site, the browser will still warn the user (although the actual content is not submitted insecurely). This counteracts the learned (positive) behavior of looking for a green padlock symbol and the https protocol. Two potential bad effects:

Users start to ignore the unison cry of “only submit data when you see the green padlock” – which will be great for phishing agents and other scammers. This may be “good for business” in the short run, but it certainly is bad for society as a whole, and for your business in the long run.
Users will not trust your login form because it looks insecure and they choose not to trust your site – which is good for the internet and bad for your business.

Takeaways from this:

Serve all pages that interact with users in any form over https
Do not use mixed content in the same page. Just don’t do it.
While you are at it: don’t support weak ciphers and vulnerable crypto. That is also bad for karma, and good for criminals.

safecontrols

Risk management, reliability and security

Tag: development