Oxford Students for Life

Promoting a culture of life in the University and beyond

Month: December, 2016

Statistical Fallacies in the Abortion Debate: Part 2/2

This blog post is the second in a short three- part series on using statistics in the pro-life debate. This week we will continue looking at some common statistical fallacies people make around the abortion debate and how you can avoid making them in a debate, friendly discussion, argument on the internet or some other kind of conversation. Next week there will be a post giving you some tips about what to do instead.

Last week we discussed some of the problems with using small samples and extreme cases in the abortion debate. This week we are going to consider biased samples, false causality and push polls.

Using biased samples

This is a fairly simple fallacy to understand: if you cite a statistic about abortion, you need to be careful that the demographics sampled reflect the population as a whole. For example, when polling people on abortion, it is important to check that their political leanings/gender/ethnicity/religious beliefs (or lack of them)/age etc reflect the population as a whole.

One example of sampling bias is the polling for the 2015 UK general election. The polls under-sampled Conservative voters, which is the reason why they proved badly wrong. It is not unusual for polls to be around 4% out with a sample of around 400 people, but anything more than this is often due to sampling biases.

Almost any poll apart from a census will have some small measure of bias, but beware of polls or studies with high degrees of demographic or other sampling biases. The above statements on biased samples are probably very obvious, but it can be very easy to make these mistakes if you aren’t careful!

A practical way that this can happen is if you only read studies on a specific abortion topic which help support a pro-life view without checking the literature to see if the results hold in other similar studies. In such a case there is a danger that you might have a biased sample of studies, when what you really want to use is the collection of results from all the relevant studies (provided that there are no significant flaws).

False causality

False causality is one of the most common statistical fallacies that people make, and so it needs to be discussed in a lot of detail. The first thing you need to understand is correlations. Two quantities are positively correlated if when one quantity increases linearly so does the other, and they are negatively correlated if when one quantity increases linearly the other decreases linearly.1

correlation

These data sets all illustrate the concept of correlation. This concept is also strongly related to cubic polynomial regression. Image from here.

An obvious example of correlation in the abortion debate is poverty and abortion rates. It is well known (see here for just one of many examples) that there is a reasonably strong positive correlation between abortion rates and poverty. However, a very common mistake is to claim that because two quantities are correlated that one of them causes the other! This is not always true, since in many cases both quantities may instead be determined by an underlying quantity known as a confounder variable, or perhaps multiple confounders. It may not even be the case that any sort of causal link exists at all!2

A good example of this is the maternal death rate from abortion and the legality of abortion (both in the US). It is very commonly claimed that making abortion illegal will make it very unsafe. While it is true that reported maternal abortion deaths in the US did decrease post Roe V Wade (1973), it is not the case that this was caused by legal abortion. Why? Because if you look at the data since 1940, you can see that abortion related deaths had been declining since then, most likely due to increased access to antibiotics.

When claiming that abortion causes x or is caused by y, you therefore need to make sure that you consider the possibility of false causality first.

Using polls with loaded data

The final fallacy to discuss is the use of polls with data deliberately designed to mislead. Hopefully nobody reading this wants to do this on purpose, although if you do, have you ever considered running for political office?

Joking aside, what we need to discuss is known as push polling. A push poll is one conducted with the purpose of asking loaded questions, typically with the intention of convincing people to vote or think in a certain way. The definitions can vary slightly depending on who you ask, since some users of the term insist that push polls refer only to attempts to trick people into thinking that they are being polled without actually collecting and publishing the results. One example from the US political context was a push poll used by George Bush against John McCain in which voters were asked the following:

“John McCain calls the campaign finance system corrupt, but as chairman of the Senate Commerce Committee, he raises money and travels on the private jets of corporations with legislative proposals before his committee. In view of this, are you much more likely to vote for him, somewhat more likely to vote for him, somewhat more likely to vote against him or much more likely to vote against him?”

A similar issue to push polling is somewhat subtler, but can still have some major implications: changing the phrasing of options available in a poll slightly can alter the results significantly. For example, consider the following three versions of an online poll on voting reform in Canada.3

a.Do you agree that Canada should update its voting method for federal elections to proportional representation?

b.Should Canada eliminate first-past-the-post elections and replace them with proportional representation

c. Should Canada change the method it elects members of parliament from first-past-the-post to proportional representation?

The percentage of votes for yes in each of these polls were 58.3%, 47.1% and 45.8%, even though the question was the same each time! So when citing polls or other data in the abortion debate, check the wording of the question and try to make sure that it’s neutral.

Summary 

Hopefully the above will have helped you to understand some common statistical errors to avoid. Here is a quick recapitulation of the most important points to take away, from the least serious to the most serious fallacies. Remember, these are not just things to avoid yourself in the pro-life debate, but fallacies you may be able to find in pro-choicers’ use of statistics.

5) Extreme cases can be very misleading if used carelessly.

4) Small samples must be treated with caution and the greater the p-value, the more sceptical you should be.

3) Biased, unrepresentative samples should be treated with caution.

2) Don’t confuse correlation with causation.

1) Polling results can be easily influenced by the wording of a question.

Next week we will look at how to use statistics in the abortion debate effectively.

If there are any questions about anything we’ve discussed or about pro-life issues generally, please leave a comment below and we’ll try to respond quickly.

Dane Rogers is a third year DPhil student in the Department of Statistics based at Merton College, currently working on Chinese Restaurants and Lévy process.

Footnotes

1 It is necessary to specify that the relationship is linear, because there may be other ways in which various quantities can be related. For example, there might be a cubic polynomial, exponential or logarithmic relationship about many others.

2For examples of bizarre correlations, see here.

3Note that online polls are usually very unreliable and influenced by sampling bias. As these polls are being tested against each other, it doesn’t matter for the purposes of this argument since we test the relative differences in polling.

Statistical Fallacies in the Abortion Debate: part 1/2

This blog post is the first in a short three-part series on using statistics in the pro-life debate. This week we will look at some common statistical fallacies people make when discussing abortion and how you can avoid making them in a debate, friendly discussion, argument on the internet or some other kind of conversation. Next week we will continue to discuss fallacies, followed by a blog post explaining what to do instead.

Today we are going to be discussing an element of the pro-life debate that often gets overlooked by pro-lifers: fallacies involving statistics. Many of you may look at the image below and think that statistics are terrifying and too difficult for ordinary pro-lifers to use, but hopefully this post will convince you that it is easy to argue persuasively and accurately without needing to know anything particularly advanced.
cubic-regression

Although cubic polynomial regression really is as bad as it sounds if statistics isn’t something you deal with a lotImage via Wikipedia.

Here are several fallacies that you can easily avoid making in a debate without needing to study statistics (although there is no harm in doing this). We will start with the least egregious errors and finish with the worst.

Using extreme cases to make a point

One fallacy of which both pro-life and pro-choice people are often guilty is trying to argue a position on abortion based purely on extreme cases without explaining why the argument also works in general. To give a specific example, it is very common to see pro-lifers try to argue implicitly that we should ban all abortions due to some extreme cases such as abortions due to minor birth defects such a cleft lip and palate. The problem is that while such cases are highly troubling, they really are a tiny proportion of all abortions overall, accounting for about 157 out of 922460 abortions from 2006-2010, or roughly 0.017%1. A much more common variation of this fallacy is to cite cases of very late-term abortions regularly, however most (89% or more) abortions occur during the first trimester, with  52.5% happening before 6 weeks from conception or sooner.

6-week-foetus-2

Which occurs once the pre-born baby has reached around this level of development. Remember an image speaks a thousand words.  Image via PMC Canada.

A further example of this fallacy which many of you will have encountered before is for people to try and argue that abortion should be legal in general and to then jump back on the case of rape when asked to justify the general statement. How to respond to this in a graceful way needs a whole blog post of its own and you should never ever be anything other than compassionate when discussing this topic, but it is worth noting that this can be a fallacious pro-choice argument if it isn’t suitably qualified, given that abortions due to rape account for around 0.3% of all abortions in the US.2

That said, these arguments do not always amount to fallacies if you are careful when using them. In the third post of this series we will explain how to use these sorts of extreme cases correctly and honestly without misleading people.

Using small samples

Another common mistake to watch out for is the use of overly small samples underlying abortion statistics. This might not seem like an immediate issue, but it can lead to some problems where seemingly strong results turn out not to be as significant as they first appear. To explain why this is a problem, we need to discuss a pair of concepts called the null hypothesis and the p-value briefly.

A null hypothesis is an initial belief that you wish to test in view of some evidence. If your data is strong enough, you will reject it in terms of an alternative hypothesis instead. This idea underlies the modern scientific method. The null hypothesis is not something that you can prove per se, so much as something for which you can gather evidence and have confidence in the truth of.

The significance level of a result or p-value is the probability that a seemingly significant result was due to chance, given a particular initial null hypothesis that there is no underlying effect. Typically, a result is not considered significant unless p< 5%, with results such as p< 1% or p<0.5% being considered much stronger.

One common mistake is to assume that if a result has p>5% then it is nonsense and if p<5% then it’s really good evidence. This is another mistake that you can easily make if you are careless- rather think of p as a measure of how sceptical you should be of a result. The smaller p is the better the result. For a fuller discussion of abuses of p-values, see here.

How does this connect to sample sizes? The larger your sample, the less extreme your data needs to be relative to your null hypothesis in order to get a result that might be considered significant. Furthermore, if you run a lot of studies, there is a good chance that at least one of them will show a significant result. Citing a single study by itself is something of which one has to be wary, particularly when the sample size is small. Always give priority to literature reviews and meta-studies.

One example which may invite controversy from the pro-life side is the abortion-breast cancer link (which is discussed at length in here). If the studies with large samples suggest there is not a link whereas those with small samples do, that is going to make many people highly sceptical of the existence of such a link, including pro-lifers! Therefore, it is best not to use this argument unless you have convincing data from large studies.

Next week we will continue discussing statistical fallacies in the abortion debate, talking about biased samples, false causality and push polls.

 If there are any questions about anything we have discussed or about pro-life issues generally, please leave a comment below and we will try to respond quickly.

Dane Rogers is a third year DPhil student in the Department of Statistics based at Merton College, currently working on Chinese Restaurants and Lévy process.

Footnotes

1 It is worth noting that official statistics suggest that the number of abortions due to cleft lip and palate from 2006-2010 was actually 14, but that only reinforces the point made if true.

Note that that there are issues with the quality and accuracy of the data, so there is quite a bit of uncertainty around the true value here.