How machine learning can help the spread of fake news

During the American election campaigns in 2016 fake news was the new big thing, with Russia being accused of orchestrating an intelligence campaign to influence the outcome of the presidential election. Regardless of what Russia did or did not do, spreading the message efficiently requires both that traditional media pick it up to grant it credibility, and that people share it on social media platforms to get as much coverage as possible. Machine learning can play many roles in this, and we will look at an obvious use case, which is pretty much the same way recommendations work on Netflix or Amazon – by use of feature-based labelling.

Any “news” article will have several features. Examples of features are:

  • Language style (using a readability metric)
  • Length of article (word count)
  • Use of celebrities (none, light, medium, heavy)
  • Visual intensity (none, light, medium, heavy)
  • Shock factor (none, light, medium, heavy)

Let us say we consider a news article successful if it receives more than 100k shares on Facebook, or if it is quoted on CNN. So, our news articles can be SUCCESSFUL or NOT SUCCESSFUL depending on these criteria.

One simple but often efficient way we can use machine learning to understand what makes an article successful is to use existing data to train a decision surface. Say we have a collection of 200 news articles, and that we can check whether they are successful or not (they are labelled). This is our training set. Based on that, we can use statistics to find out which features will help us predict which label to apply to which data point. If we boil this down to two factors (language style and word count), we can create a scatter plot of these articles. By analyzing our set of training data, we seek to learn how we can exploit the factors to make our fake news spread. We have plotted our training data in a scatter plot to inspect it visually.

What we learn from simply looking at the plot is that the article should be fairly short, and intermediately difficult in readability (seems to be somewhere between 60 and 80 on the Flesch index, corresponding to articles that can be read by high school graduates).

Using a classification algorithm like the Naïve Bayesian classification algorithm, we can generate a decision surface based on our data.

Everything that falls into the red region will be predicted as successful. Giving up on the ability to plot the features in a single scatter plot, we can feed the algorithm with our full feature set, allowing it to figure out more factors we should care about when creating our fake news campaign.

This shows that the same methods used to drive recommendation engines, can also be used to learn how to best influence people – useful both in marketing, and in trying to “rig elections”. By the way, this simple labeling of data using classifiers like above is one arm of machine learning, known as supervised learning. The data set used in this post was randomly generated – so it didn’t really teach you how to create efficient fake news articles – but it did show you how you can find out.

40 tracking cookies from 2 news sites: this is why you need VPN

You have probably (hopefully) been told that open wifi is insecure, and that you should use a virtual private network to encrypt and protect your traffic. Most people don’t do this, perhaps because it seems hard to do?

Opera software now offers free VPN. It is built into the browser on the desktop, and a standalone app on smartphones. It also comes with the ability to block tracking cookies! Those are cookies that track the pages you look at on the web – for commercial purposes (or so they claim). An old but nice nontechnical write-up on tracking cookies is found at geek.com. The difference from back then is that big data and AI have amplified trackers abilities to spy on you and analyze your online life. 

How many trackers are you exposed to by visiting high traffic news sites? Here’s what Opera VPN reported after visiting CNN.com and Bloomberg.com without clicking a single link on those pages. 

40 trackers? I have no interest in feeding ad networks with my online habits. I suggest you go ahead and activate VPN and cookie filters on you mobile in addition to your desktop, also when browsing on secure networks! 

Make sure security does not stop your people from getting stuff done!

Cybersecurity is on the list of many organizations’ top priorities nowadays. Obviously, protecting the confidentiality, integrity and availability of business data is a crucial part of any modern enterprise’s risk management activities. However, in many cases, security measures are making simple things difficult, and hard things even harder. When this happens, users tend to find workarounds, often involving using private cloud services, private devices, or connecting via sneakernets to do their business. If this is the case at your company – you should rethink your approach to security.

lockouthorse
Feel locked out by your security policy? Are you prevented from doing your job by the IT department?

What can organizations do to maintain security and allowing people to get their work done?

Security measures need to have a sound basis in the threats you are trying to avoid. This means, you should have at least a basic grasp on what kind of threats you are dealing with, and which measures will be effective in dealing with them. Here’s a 6-step list to how you can achieve that.

  1. Perform a cyber security threat identification to list all threats and sort them as “unacceptable” and “acceptable” based on both impact and credibility
  2. Deal with the threats by designing counter-measures; this can be technology, awareness training and response capabilities
  3. Educate your users on the threats and why it is important to avoid letting adversaries in.
  4. The principle of least privilege is sound – but it should not be interpreted as “no access given unless proven beyond doubt that access is needed”. It means – access shall only be given if it is meaningful for that user to have access, and in cases where this increases the attack surface, ensure the user is educated to understand what that means.
  5. Do not overuse filtering techniques for content. That is the same as inviting sneaker nets where you have no control.
  6. Never forget that technology is there to help people get stuff done, not in order to prevent them from doing anything. If a user needs to do something (e.g. to download and test software from the internet), work with the user to find safe ways to do this instead of being an obstacle.

Do SCADA vulnerabilities matter?

Sometimes we talk to people who are responsible for operating distributed control systems. These are sometimes linked up to remote access solutions for a variety of reasons. Still, the same people do often not understand that vulnerabilities are still found for mature systems, and they often fail to take the typically simple actions needed to safeguard their systems.

For example, a new vulnerability was recently discovered for the Siemens Simatic CP 343-1 family. Siemens has published a description of the vulnerability, together with a firmware update to fix the problem: see Siemens.com for details.

So, are there any CP 343’s facing the internet? A quick trip to Shodan shows that, yes, indeed, there are lots of them. Everywhere, more or less.

shodan_cp343

Now, if you did have a look at the Siemens site, you see that the patch was available from release date of the vulnerability, 27 November 2015. What then, is the average update time for patches in a control system environment? There are no patch Tuesdays. In practice, such systems are patched somewhere from monthly to never, with a bias towards never. That means that the bad guys have lots of opportunities for exploiting your systems before a patch is deployed.

This simple example reinforces that we should stick to the basics:

  • Know the threat landscape and your barriers
  • Use architectures that protect your vulnerable systems
  • Do not use remote access where is not needed
  • Reward good security behaviors and sanction bad attitudes with employees
  • Create a risk mitigation plan based on the threat landscape and stick to it practice too

 

Clever phishing attempt from Nigerian scammers

This phishing e-mail landed in my work mailbox last week. This one was interesting as it was very professional and it was not obvious that it wasn’t the real thing. Here’s a snapshot of the e-mail itself:

Further, the PDF file was reasonably well formed:

Indicators that triggered the notion of a scam:

a)      I do not expect any shipment from DHL

b)      Address is a DHL UK address (real) but the copyright is DHL International GmbH, which is actually not the correct entity even for Germany.

c)       The PDF file is produced using a free converter tool not a professional publishing tool, and the logo is low-resolution raster graphics (not visible if not enlarged)

d)      The link “Here” leads to a non-DHL domain (odrillncm dot com) registered in 2015 to a user in Lagos, Nigeria… (found by a whois registry lookup)

Some quality signs:

a)      Address and phone numbers for DHL in the UK are authentic

b)      Good grammar and spelling, correct use of straplines, DHL corporate identity, etc.

c)       Name of rep “David Blair” – a semi-known British TV producer, and a common name, making it hard to verify authenticity by googling or searching on linkedIn/Facebook, etc.

The cues used to identify the scam are probably beyond “average office PC user” level, and this is most likely an identity theft attempt. This sort of phishing is also used to target ICS enviornments and this case is therefore interesting in that respect.

Securing your control systems – what are your priorities?

Information security focuses on three aspects of safeguarding data in our systems (CIA):

  • Confidentiality: data should only be visible to those who have been granted access to them
  • Integrity: data should not be altered by people not authorized to do so
  • Availability: data should be available to everyone and all systems that need them, when the data is needed

In traditional IT, security thinking has been dominated by confidentiality. This is in most cases justified; the data itself is the valuable asset (think credit card information, medical journals, police records, accounting, business plans, etc.). In control systems, the real value is governed by the process control by the control system assets, and availability is extremely import, as well as integrity. Confidentiality on the other side may be less important.

Many organizations plan their security management based on traditional IT priorities, and apply these priorities also in the control system domain. This way, there may be a misalignment between the real priorities of the organization, and where the money and resources is spent.

Dr. Eric Cole, a renowned security expert, recommends asking senior management for these priorities, and then comparing with actual security expenditure from the last year – if there is misalignment between “what’s important” and “what’s done” it is time to take action. Have your thought through if your organization is spending the money where they are most needed to safeguard what is truly critical?

5 days of ICS & SCADA security in Amsterdam

This week I have been present at SANS ICS Security Amsterdam, taking one of their courses on security for industrial control systems. This has been a fantastic opportunity to learn new things, reinforce known concepts at a deeper level, and to network and meet with a large range of people with interests in this field. I’ve been surprised to see people from law enforcement, national security, industry representatives, consultants and vendors coming together at one event. Information security has been a lot in the news lately, and is seen as a big part of the risk picture in almost every region and every industry.

Any security training and seminar needs its own T-shirt. And soft drinks.

Two things in particular have been very interesting to see when it comes to the stuff presented by the SANS course instructors:

  1. Security researchers continue to find basic vulnerabilities in new product lines from major vendors (the big companies we’ve all heard about, I’m not going to shame anyone)
  2. A lot of control systems are still facing the internet, are directly accessible with no or very weak security, and attacks are prevalent as found in honeypot research experiments

Basically, this confirms the notion that “the situation is bad and we need to do something about it”. People said this after Stuxnet, and they are still saying it. My impression from working with various clients is that industry is aware of the risks that exist “out there” but they are in varying degree doing something to control that risk. Too many still believe that “we will never be compromised as long as we have a firewall”. Relating to this, one might ask, what are the “basic vulnerabilities” and how do we work around that?

Many control system components today run on commodity OS’s, or are connected to servers running MS Windows or Linux, e.g. used to display HMI’s. These HMI’s are in many modern systems developed as web apps (running on local servers) for portability, ease of access, etc. This means that many of the vulnerabilities found in regular IT and on the web also apply to control systems. However, these risks are worse in the control system world because these systems need to run all the time and can therefore often not be patched, and should someone break in, they could cause real physical damage (think crashing of cars, blowing up an oil rig or destroying a melting furnace). Some of the top vulnerabilities we are exposed to are the following: buffer overflows (yes, still, lots of stuff running on old systems), SQL injection vulnerabilities and cross-site scripting vulnerabilities (web interfaces…). So, if we cannot patch, what can we do about this?

First of all, perform a risk and vulnerability assessment, taking both possible scenarios and credibility of scenarios into account. Make sure to establish a good baseline security policy and use this for managing these issues – there is lots of guidance available, and often sector specific. If you cannot patch, focus on what you can do; ensure everyone involved in purchasing, maintaining, producing and using control systems are aware of the risks and what types of behaviors are good, and what types are bad. This means that security awareness must be built into the organizational culture.

On the technical side, maybe especially with lots of legacy systems running, make sure the network architecture is reasonable and safe – avoid having critical assets directly facing the internet (do a Shodan search and you will find that lots of asset owners are not following good practice here). The architecture must weight risks and business needs – a full lockdown may be the safest way to go, but it may also stop core business functions from working.

Further, should a breach occur, make sure you have the organizational and technical capabilities to deal with that. Plan and train on incident response – and remember you are not alone. Get help from vendors both in managing the assets during normal operations, and during a crisis situation. Including incident response in service agreements may thus be a good idea.

This was a quick summary of topics we’ve looked at during training, and discussed over beers in Amsterdam. The training by SANS has been excellent, and I’m looking forward to bringing reinforced and new insights back to the office on Monday.

Profiling of hackers presented at ESREL 2015

Yesterday my LR colleague Anders presented our work on aggressor profiling for use in security analysis at the European risk and reliability conference in Zürich. The approach attracted a lot of interest, also from people not working with security. One of the big challenges in integrating security assessments into existing risk management framework is how to work with the notion of probability or likelihood when considering infosec risks. Basically – we don’t know how to quality the probability of a given scenario in a reasonable manner – so how can we then risk assess it and treat it in a rational manner?

The approach presented looks exactly at this. A typical risk management process would involve risk identification, analysis and evaluation of consequences and likelihoods, planning of mitigation and follow-up/stakeholder involvement. We have found working with clients that people find identifying potential consequences of different scenarios much easier than identifying the credibility of scenarios. The approach to assessing credibilities is centered around two actors:

  1. Who is the victim of the crime?
  2. Who is the aggressor?

Given a certain victim with its financial standing, relationships to other organizations, geopolitical factors, etc., we can form an opinion about who would have any motivation to try and attack the asset. Possible categories of such attackers may be

  • Script kiddies
  • Hacktivists
  • Other corporations
  • Nation states
  • Terrorists
  • Rogue internals

Each of these stereotypes would have different traits and triggers shaping the credibility of an attack from them. This is related to motivation or intent, their resources and stamina, their skill sets and the cost-benefit ratio as seen from the bad guy perspective. Giving scores to these different traits and triggers can help establish the opinion of how credible a threat is.

An interesting effect in security is that the likelihood of a threat scenario is not necessarily decoupled from the consequence of the scenario; the motivation of the perpetrator may be reinforced by the potential gains of great damage. This should be kept in mind during considerations of intent and cost-benefit.

Forming structured opinions about this, allows us to sort threat scenarios not only according to consequences, but also according to credibility. That fits into standard risk management framworks. Somewhat simplified we can make a matrix to sort the different threat scenarios into “acceptable”, “should be looked at” and “unacceptable”.

Managing information security should be a natural part of your risk management system

Anyone not hiding in a hole in the ground the last five years must have noticed how much media is writing about cyber threats. The media picture aligns well with the impression I get when talking to clients; business risks for production centered firms tend to no longer be dominated only by random accidents – but also by potential cyber threats. This means that such threats need to be managed.

Most companies already have a risk management system in place, typically along the lines of ISO 30001. A risk management process must consist of a cycle of activities;

  • Identify risks
  • Evaluate risks up against acceptance criteria for severity and probability
  • Plan mitigating actions
  • Involve internal and external stakeholders along the way

Exactly the same process can be used to deal with information security risks. A good methodology for managing such security risks should take into account who the threat actors are in the context of the business. Why do they want to attack? Do they have a lot of resources and know-how? What is the cost-benefit ratio for the bad guys? Understanding these human factors in the equation at play will arm you with a reasonable way to rank the credibility of various attack scenarios. This helps put cyber risks into the typical risk management scenarios; resources should be spent where they bring the largest risk reduction. Typically, highly credible attack scenarios with terrible consequences should be dealt with first, less likely or less severe scenarios second, and scenarios that seem extremely unlikely or have no or little impact on our business are perhaps OK. This sort of risk ranking is well-known to risk management professionals. A few differences on assessing credibility of attack scenarios from more random events are the following;

  • A bad guy may be motivated by worse consequences; the probability of an attack is thus not decoupled from the consequence
  • It is hard to use “probabilities” or “frequencies” in a meaningful sense – a qualitative approach may be just as useful (sometimes true also for random risks!)

Mitigations should thus be planned according to risk reduction needs and effects of the mitigation approaches. This is exactly the same as we do in other risk management settings. Also, communication with stakeholders is equally necessary in this context, if not even more. Risk owners, equipment suppliers, users, other supply chain partners – they can all be affected. And in our connected world, and increasingly so as we move to smarter production systems, cross-infection may be possible across various domains and interfaces. The tools of communication well-known to risk managers are therefore equally important when managing risks to production critical information systems.

Managing infosec requires a keen eye for the context of the business, and adaptability to new realities is a key success factor

The bottom line is: don’t outsource responsibility for risk management to your IT services provider – integrate cyber risk management into your existing risk management process. That is the only way to be in control of your own environment.

Is technical competence king in detecting phishing?

Human factors researchers have taken interest in cyber security. This is good, because we need to think about most attacks in terms of both technology and psychology on both sides of the fence. Phishing emails is the most common initial attack strategy used in targeted attacks. It is therefore important to make your people able to avoid such deception. 

 

Understanding the difference between gold and trash is the main way to avoid phishing
 
A recent paper in the August issue of “Human Factors” by Proctor and Chen discusses decision making in detection of phishing. A key factor found by researchers is that a mismatch between cues in a phishing email and the expectations the recipients have is crucial to detecting a phishing attempt. Such cues are typically technology related such as strange URL’s, errors in corporate identity, slight misuse of terminology. It may this be questioned if awareness training by itself is an effective mitigation element – people need to know their domains well too, as well as what to expect of URL’s and technology solutions from emails and web sites.