Is the National Front – a far right party – benefiting from low voter turnout? That question made headlines after the 2014 French local elections. The answer is not an easy one, as standard tools (correlation, regression) give biased answers. Once endogeneity is correctly taken into account, the positive association between low voter turnout and score of the FH is much higher that what a simple correlation analysis would lead to believe.
After the 2014 French local elections, numerous articles on the link between low voter turnout and score of the National Front have been published. You can find here and here (both in French) two typical examples. All the analyses made at the time are based on correlations. Is the score of the National Front higher where voter turnout is lower? What is more efficient, and simpler, than a correlation to answer that question?
Even more so that geo visualisation tools make the statistician’s life easy. By comparing data at a very granular level, they reassuringly give a feeling of analysis depth.
These calculations, and the conclusions drawn from them, are however wrong. The correlation between the score of the National Front and voter turnout is a biased calculation. How could a simple correlation be biased? Because of a crucial statistical issue : endogeneity.
Those interested may read our article on endogeneity: the issue is explained in details and a few examples are given. In this article, we estimate a model linking the score of the far right and voter turnout, while taking into account the endogeneity in the data.
Another weakness of the analyses mentioned above is that they don’t take into account the heterogeneity across areas. Our model take into account a dozen characteristics of these areas. This allow us to uncover, beyond electoral results, the impact of each variable, other things being equal.
Reader who are not interested in the details of the modeling and the data may jump directly to section 5 for the detailed results of the estimation. The results presented in this version differ slightly from an earlier version: this is due to the fact that we added the variable measuring the intensity of religious practice in the model. Most of the time, the difference between the estimated coefficients across the two versions is immaterial.
The model
At this stage, we did not build a structural model to evaluate the link between voter turnout and score of the National Front. The reduced model we estimate links the score of the National Front and assimilated parties, as a percentage of valid votes, with:
– The percentage of voters who did not vote, as a percentage of registered voters,
– And various characteristics of the concerned areas.
The model is linear :
Score of the FN
=
a + b*Percentage of non-voters + c*characteristic 1 + d*characteristic 2 +…+u
We tested 4 different specifications, with logarithmic transformations for the explained variable and for the percentage of non-voters: untransformed score of the FN vs untransformed percentage of non-voters, logarithm of the FN’s score vs untransformed percentage of non-voters…
We also segmented the model across the size of towns: voter turnout is very different according to the size of the town (from 30% non-voters in the first decile to 43% in the last one) and the model is significantly different according to the size of the town.
Data on votes
We use the data at town level from the 2012 elections to Parliament. Only the data of the first round are used. These data can be freely downloaded from www.data.gouv.fr.
One observation in the model is defined by crossing a town with a constituency. A town spread across several constituencies is present as many times in the model.
We put together the votes for the National Front and those for candidates labelled as far right in the data file. The share of the FN is overwhelming : 3 528 663 votes for that party, 49 499 for other candidates.
The estimation is done for all départements of metropolitan France, and for Martinique, Guadeloupe and la Réunion. For other constituencies, (Guyane, Mayotte, Nouvelle Calédonie, Polynésie, Saint Martin/Saint Barthélémy, Saint Pierre et Miquelon, Wallis et Futuna, French nationals living abroad), we don’t have the necessary areas’ characteristics.
Data on towns
Incomes
We used the exhaustive RFLM database of the Treasury, which are available since 2000. On the INSEE website, one can also find incomes at the local level for the years 1998 to 2010 (Source : DGFiP, fichier « Impôt sur le revenu des personnes physiques). But the data are less homogeneous (rupture in 1999 and 2006), and is not calculated by consuming unit.
Data are available at the town level for Metropolitan France, Guadeloupe, Martinique, and la Réunion – except for the RFLM data-, including at the arrondissements level for Paris, Lyon and Marseille. As these arrondissements don’t fit exactly with constituencies, we had to spread the data on the various concerned constituencies:
– Data on the number of households or of consuming units are spread according to the number of constituencies to which the arrondissement belongs (thus, divided by 2 if the arrondissement belong to 2 constituencies, which is the majority of cases).
– Medians or means were allocated to each constituency concerned and we did the average of all indicators relevant to one constituency.
Nominal data were calculated using the evolution of the CPI.
Finally, there are some missing data in the file, either for all the years between 1993 and 1999, or after 2012, or partially when the rules of statistical secrecy don’t allow the data to be published:
– For the years that are fully missing, the data were estimated by applying to the local available data the evolution of the regional GDP, in value for the global data on income, and GDP per head for the data by consuming unit..
– For the partially missing data, by using :
o Either the ratios (number of registered voters divided by number of households or consuming units, household’s income divided by number of households of consuming units) of the same local area for the closest available year,
o Or, if the local area ratios were still missing, the ratios of the canton,
o Or those of the département.
Unemployment
The number of unemployed registered with the official job agency (DEFM) are available at town level, from 2001 to 2011, at their December value, for metropolitan France, Guadeloupe, Martinique and La Réunion..
As for the incomes, missing values wee estimated in two different ways:
– Either, for partially missing data, using the ratio DEFM vs registered voters, at the canton level, of the zone d’emploi, or the département,
– Or, for the fully missing years, by applying the departemental evolution of ILO unemployment to the local available data. We are aware there is a discrepancy between the DEFM and ILO definition of unemployment, but, except if we are mistaken, this seemed to be the best solution with the available data..
Indicators were calculates at the level of the town, the canton and the zone d’emploi, in order to compare their performance in the estimation.
Census data
The other data we use come from the census. The latest detailed datafile at local level are from 2011. Again, they can be freely downloaded from the INSEE website. The following variables are available:
– Population structure by age, sexe, socioprofessional category and education,
– Housing structure : houses, flats, secondary residences, empty housing, with their construction year,
– Percentage of employed and self employed, working full time or not,
– Percentage of immigrants,
– Population structure by distance home-work and mean of transportation, ,
– Population structure by type of household (2 adults working, 1 adult working, only one adult in the household,….)
The data for the Paris, Lyon and Marseille arrondissements were allocated to the constituencies of the respective arrondissements, and we calculated the average of the percentages.
Town segmentation and logarithmic transformation of the explained variable and percentage of non-voters.
In a first segmentation, we split the towns in 5 groups, with the following thresholds: 700, 2100, 7000 and 21000 registered voters. The other tested segmentations were calculated starting from these thresholds, by subtracting 10, 20,…,500 to the first, 30,60, ….1500 to the second,…This gives 50 different segmentations.
The model is estimated for these 50 segmentations and the 4 possible patterns of logarithmic transformation for the score for the National Front and the percentage of non-voters. We retained the model minimising the sum of the squared residuals on all observations.
The final model is based on :
– The non-transformed score of the National Front and the non transfomed percentage of non-voters,
– A town segmentation with the following thresholds in terms of registered voters: 370, 1100, 3700, 11100.
The other variables in the model
The variables entering the model, except the percentage of non-voters, are the following. All are calculated as percentages on the number of relevant units inside the town:
– Sex age distribution (less than 24, between 25 and 54 moins de 24 ans, more than 55)
– Socioprofessionnal category of individuals
o Self employed, entrepreneur, farmer
o Manager in public service company,
o Manager in non public service company,
o Employee in public service company,
o Employee in non-public service company,
o Worker,
o Retired.
– Sex, age, diploma distribution (less than 24, between 24 and 64, more than 65 – no diploma, diploma before baccalauréat, baccalauréat or above),
– Type of household :
o Man or woman living alone,
o Family with children and one adult,
o Family where both adults are working,
o Family where only the man is working,
o Family where only the woman is working, or none of the two adults are not working,
o Several individuals, no family.
– Immigrants proportion,
– Type of housing :
o House,
o Flat,
o Secondary residence,
o Unoccupied housing.
– Median income by consuming unit,
– Unemployment rate of the employment zone (zone d’emploi),
– Distribution of home-work distance and mean of transportation (work and live in the same town, live in a rural town and work in another one/live in an urban town and work in another town of the same urban unit, live in an urban town and work outside the urban uit of residence – public transports, others),
– Distribution of professional status and full/partial time employment:
o Full time employed with indefinite length contract,
o Full time employed, but with some other contract,
o Partial time employed,
o Full time self employed,
o Partial time sel employed.
– Dummy variable for the towns in “sous-préfectures” belonging to the last quartile of men members of clergy density distribution. This is to test whether an intense catholic religious practice act as a bulwark against the National Front, as some qualitative sociological studies tend to show.
We also put in the model the residence department. This is 0/1 variable.
One variable was tested and finally not used: the complaints recorded by the police and gendarmerie. Those data are only available at the level of the département and their small variance is leads to unstable models (those interested can read our article “Prévision de l’abstention”, where voter turnout is modelled. We use this variable, but it does not enter significantly in the model).
The model was estimated bot with Ordinary Least Squares and Double Stage Least Squares. Only 2SLS estimators are valid, as the endogeneity bias is then eliminated. We still report the two sets of estimates, in order for the reader to assess the diagnostic bias a wrong estimation strategy would lead to.
The 2 models by town size were first estimated separately. We then tested all possible equalities between the coefficients corresponding to the same variable across the 5 models. Only the results for the final model are shown below.
In total, we have 36 769 observations and 238 variables. The OLS R² is 0,47.
Two statistical tests were conducted to test our assumptions on endogeneity:
– The Sargan test, or test of over identifying restrictions. This allows us to test whether the two instrumental variables we are using would give homogeneous results if we were using them separately. It is implemented by simultaneously testing whether all coefficients in the regression of the Hausman augmented regression’s residuals on the endogenous variables (and an intercept) are equal to 0. The null hypothesis cannot be rejected by a large margin.
– The Hausman test or exeogneity test. This allows us to test whether the non-voter percentage is indeed endogenous in our model. It is implemented by testing whether the OLS and 2SLS estimated coefficients are significantly different. Another way to implement it is to use an augmented regression. The model is estimated with OLS, with two more variables: the residuals in the regression of the endogenous variables on the instruments – we have two endogenous variables, as the coefficient for the non-voters percentage is not the same in the small towns vs the others. The test is implemented by testing that the coefficients of these residuals in the regression are significantly different from 0, which is the case. The percentage of non-voters is thus indeed endogenous.
The below tables display the estimated coefficients, with the standard error in brackets. Coefficients that are significantly different from 0 are highlighted in bold for 2SLS.
Percentage of non voters
Without taking into account endogeneity | Taking into account endogeneity | |
Percentage of non voters – towns with less than 370 registered voters |
0,074 (0,005) |
0,422 (0,023) |
Percentage of non voters – towns with more than 370 registered voters |
0,194 (0,009) |
0,514 (0,021) |
The percentage of non-voters is one of the most significant variable in the model. The positive association between low voter turnout and the score of the FN is quite clear. 1% more non-voters translates in 0,5% more votes for the FN.
As can be seen, forgetting to take into account endogeneity would lead the analyst to largely under-estimate the positive effect of low turnout on the score of the National Front.
Unemployment rate
Without taking into account endogeneity | Taking into account endogeneity | |
Unemployment rate |
0,981 (0,041) |
0,926 (0,043) |
The unemployment rate also is a very significant variable, at the same level as percentage of non-votes. The impact is the same whatever the size of the town : 1% more unemployment leads to 0,93% more vote for the FN.
Median income by consuming unit
Without taking into account endogeneity | Taking into account endogeneity | |
Median income by consuming unit – towns with less than 370 registered voters |
0,172 (0,065) |
-0,589 (0,086) |
Median income by consuming unit – towns between 370 and 1100 registered voters |
-1,421 (0,166) |
-1,878 (0,210) |
Median income by consuming unit – towns between 1100 and 3700 registered voters |
-1,122 (0,213) |
-1,788 (0,253) |
Median income by consuming unit – towns with more than 3700 registered voters |
-1,170 (0,192) |
-1,918 (0,235) |
The median income by consuming unit is also very significant, even if a bit less than the two previous variables. When the median income is increasing, the score of the FN is decreasing. The effect is larger in bigger towns. In the towns with more than 3700 registered voters, a 1000 euros increase in median income decreases the score of the FN by 0,19%.
Here again, not taking into account endogeneity would lead the analyst to a bad diagnostic mistake: one could conclude that, in smaller towns, median income and score of the FN are positively associated.
Religious practice
There is no exhaustive data on religious practice at the town level. We had to use a through the back door way to assess its possible impact on the National Front score.
The number of members of clergy living in a town is available from the census data. This is however not the right indicator. Indeed, there are twice as many towns as there are priests. Also, the intensity of religious practice is a cultural characteristic that spreads well beyond the limits of a town.
After several trials on possible aggregation levels, we calculated the percentage of members of the clergy at the level of the sous-préfecture (or arrondissement), differentiating men and women. Then, we created dummy variables corresponding to the quartiles of these percentages distributions. The only significant coefficient corresponds to the upper quartile of the distribution of priests (thus, men).
Without taking into account endogeneity | Taking into account endogeneity | |
Towns belonging to the 25% sous-préfectures where the number of priests is highest. |
-0,011 (0,001) |
-0,010 (0,001) |
In the concerned towns, the National Front gets 1% less votes, other things being equal. The fact that this is only true for the percentage of men member of the clergy tends to indicate that the effect is a specifically catholic one.
Proportion of immigrants
Without taking into account endogeneity | Taking into account endogeneity | |
Proportion of immigrants |
-0,050 (0,013) |
-0,089 (0,014) |
The proportion of immigrants in the town has a significant negative impact on the FN score. 1% more immigrants decreases the score of the FN by 0,09%.
This result might seem strange to some readers. It is very robust. We tested several specifications, and the obtained result is very stable. Let us recall our model controls for numerous other variables, including the département. All other things being equal, being more in contact with immigrants decreases the temptation to vote for the National Front.
Here again, not taking into account endogeneity, would lead us to under-estimate the impact of that variable.
Type of housing
Without taking into account endogeneity | Taking into account endogeneity | |
Flat |
-0,054 (0,006) |
-0,059 (0,006) |
House built before 1946 |
0 |
0 |
House built between 1947 and 1990 |
0,000 (0,004) |
-0,003 (0,004) |
House built after 1991 |
0,033 (0,005) |
0,039 (0,005) |
Secondary residences |
-0,017 (0,004) |
-0,024 (0,004) |
Empty housing |
0,018 (0,008) |
0,010 (0,009) |
Occasional housing |
-0,052 (0,030) |
-0,047 (0,032) |
For all variables describing the tow population structure, the model requires to choose a reference category. For housing type, the reference category is the percentage of households living in a house built before 1946.
The most significant effect comes from the percentage of households living in a flat. L’effet le plus significatif vient du pourcentage de ménages vivant en appartement. If there is a 1% increase in that percentage, the FN score decreases by 0,06%. Let us recall again we are speaking everything being equal here, and that the model controls for income: everything being equal, being closer to one’s neighbour decreases the FN vote.
The percentage of recently built houses has a slight positive effect on the FN vote. It is the reverse for secondary résidences.
Diploma
Without taking into account endogeneity | Taking into account endogeneity | |
Men | ||
Less than 24 – no diploma |
-0,050 (0,062) |
-0,122 (0,066) |
Less than 24 – diploma before baccalauréat |
0,155 (0,040) |
0,050 (0,043) |
Less than 24 – diploma after baccalauréat |
-0,175 (0,037) |
-0,136 (0,040) |
25 to 64– no diploma |
0,031 (0,017) |
0,006 (0,018) |
25 to 64– diploma before baccalauréat |
0 |
0 |
25 to 64– diploma after baccalauréat |
-0,155 (0,012) |
-0,113 (0,013) |
More than 65 – no diploma – towns with less than 370 registered voters |
-0,050 (0,024) |
-0,014 (0,025) |
More than 65 – no diploma – towns with more than 370 registered voters |
-0,204 (0,049) |
-0,164 (0,052) |
More than 65 – diploma before baccalauréat |
-0,047 (0,019) |
-0,001 (0,021) |
More than 65 – diploma after baccalauréat |
-0,052 (0,027) |
0,021 (0,029) |
Women | ||
Less than 24 – no diploma-– towns with more than 370 registered voters |
-0,077 (0,080) |
-0,152 (0,085) |
Less than 24 – diploma before baccalauréat |
0,074 (0,051) |
0,009 (0,054) |
Less than 24 – diploma after baccalauréat |
-0,065 (0,040) |
-0,054 (0,043) |
25 to 64 – no diploma – towns with less than 370 registered voters |
0,030 (0,045) |
-0,017 (0,048) |
25 to 64 – no diploma – towns between 370 and 1100 registered voters |
-0,071 (0,054) |
-0,184 (0,059) |
25 to 64 – no diploma – towns with more than 1100 registered voters |
-0,088 (0,058) |
-0,286 (0,065) |
25 to 64– diploma before baccalauréat |
-0,012 (0,043) |
-0,035 (0,046) |
25 to 64– diploma after baccalauréat |
-0,170 (0,042) |
-0,143 (0,045) |
More than 65 – no diploma |
-0,119 (0,044) |
-0,143 (0,047) |
More than 65 – diploma before baccalauréat – towns with less than 370 registered voters |
-0,067 (0,044) |
-0,069 (0,047) |
More than 65 – diploma before baccalauréat – towns between 370 and 1100 registered voters |
-0,159 (0,052) |
-0,185 (0,055) |
More than 65 – diploma before baccalauréat – towns with more than 1100 registered voters |
-0,103 (0,056) |
-0,113 (0,059) |
More than 65 – diploma after baccalauréat – towns with less than 370 registered voters |
-0,063 (0,048) |
-0,039 (0,051) |
More than 65 – diploma after baccalauréat – towns between 370 and 1100 registered voters |
-0,149 (0,062) |
-0,185 (0,066)
|
More than 65 – diploma after baccalauréat – towns with more than 1100 registered voters |
-0,149 (0,070) |
-0,181 (0,075) |
The reference category is the percentage of men aged 25 to 64 with a diploma before baccalauréat. All significant coefficients are negative. Thus, over-representation of the categories highlighted in bold decreases the score of the National Front.
Type of household
Without taking into account endogeneity | Taking into account endogeneity | |
Family with two active adults |
0,013 (0,006) |
0,028 (0,006) |
Family where only the man is active |
0,044 (0,007) |
0,053 (0,008) |
Family with two inactive adults – towns with less than 370 registered voters |
0,007 (0,006) |
0,021 (0,006) |
Family with two inactive adults – towns with more than 370 registered voters |
0,039 (0,011) |
0,060 (0,011) |
One adult with children – town with less than 11100 registered voters |
-0,006 (0,007) |
-0,001 (0,008) |
One adult with children – town with more than 11100 registered voters |
-0,231 (0,075) |
-0,249 (0,079) |
Individual living alone |
0 |
0 |
Other cases |
0,004 (0,010) |
0,008 (0,010) |
The reference category is the percentage of people living alone.
The percentage of households where the man is working and the woman isn’t has a significant positive effect on the FN vote: an increase of 1% leads to a 0,05% increase of the FN vote. Same thing for the families with two inactive adults.
In the largest towns, an increased percentage of families with one adult (and children) decreases the score of the National Front: -0,3% for a 1% increase in that proportion.
Socio-professional category
Without taking into account endogeneity | Taking into account endogeneity | |
Self employed, entrepreneur, farmer |
-0,016 (0,006) |
-0,006 (0,007) |
Manager in public service company, |
-0,043 (0,010) |
-0,041 (0,011) |
Manager in non public service company, |
-0,007 (0,009) |
-0,008 (0,009) |
Employee in public service company – towns with less than 11100 registered voters |
-0,007 (0,006) |
-0,001 (0,006) |
Employee in public service company – towns with more than 11100 registered voters |
0,207 (0,099) |
0,215 (0,105) |
Employee in non-public service company, |
0,016 (0,006) |
0,010 (0,006) |
Worker |
0,022 (0,005) |
0,014 (0,005) |
Retired |
0 |
0 |
The retired category is the reference one.
An increase in the number of public service managers decreases the score of the FN. An increase in the number of workers increases it, as would an increase of the number of employees in the public sector in large towns
Professional status and partial/full time work
Without taking into account endogeneity | Taking into account endogeneity | |
Full time indefinite contract |
0 |
0 |
Other full time employed – towns with less than 11100 registered voters |
0,003 (0,005) |
-0,004 (0,005) |
Other full time employed – towns with more than 11100 registered voters |
0,063 (0,090) |
0,041 (0,096) |
Partial time employed – towns with less than 370 registered voters |
-0,008 (0,004) |
-0,005 (0,004) |
Partial time employed – towns with more than 370 registered voters |
-0,060 (0,010) |
-0,063 (0,011) |
Full time self employed |
0,007 (0,005) |
0,003 (0,005) |
Partial time self employed |
-0,020 (0,010) |
-0,030 (0,010) |
The reference category is the population of people with indefinite full time contracts.
An increase in the number of people not working full time decreases the FN score.
Work-home distance and mean of transportation
Without taking into account endogeneity | Taking into account endogeneity | |
Live and work in the same town – public transport |
-0,032 (0,024) |
-0,054 (0,026) |
Live and work in the same town – other transport – town with less than 370 registered voters |
-0,015 (0,003) |
-0,014 (0,003) |
Live and work in the same town – other transport – town with more than 370 registered voters |
-0,042 (0,004) |
-0,039 (0,005) |
Different work and home town, but same urban unit – public transport – towns with less than 370 registered voters |
-0,004 (0,010) |
0,002 (0,010) |
Different work and home town, but same urban unit – public transport – towns between 370 and 1100 registered voters |
0,085 (0,019) |
0,081 (0,020) |
Different work and home town, but same urban unit – public transport – towns between 1100 and 3700 registered voters |
-0,014 (0,026) |
-0,017 (0,027) |
Different work and home town, but same urban unit – public transport – towns with more than 3700 registered voters |
-0,129 (0,016) |
-0,127 (0,017) |
Different work and home town, but same urban unit – other transport |
0 |
0 |
Work outside of the residence urban unit – public transport |
-0,018 (0,018) |
-0,030 (0,019) |
Work outside of the residence urban unit – other transport |
0,003 (0,002) |
0,001 (0,002) |
The reference category is the population of people who don’t work in the town where they live, but in the same urban unit and who don’t use public transports.
The impact is unambiguous: when more people work near their home, the score of the FN decreases. When the work location is not in the same town as the residence, but not too far away, the impact on the FN score depends on the mean of transportation between work and home. In larger towns, using more public transport will decrease the score of the National Front. It is the reverse in smaller ones.
Départements
We only show here the 5 largest coefficients (FN leaning départements)
Without taking into account endogeneity | Taking into account endogeneity | |
Pyrénées Orientales – towns with less than 11100 registered voters | 0,056 |
(0,031)0,058
(0,033)Gard – towns between 1100 et 3700 registered voters0,071
(0,007)0,082
(0,007)Vaucluse – all towns except between 3700 and 11100 registered voters0,070
(0,006)0,082
(0,006)Gard – town woth more than 3700 registered voters0,071
(0,011)0,085
(0,012)Vaucluse – towns between 3700 and 11100 registered voters0,103
(0,009)0,116
(0,009)
The coefficients can directly be read as vote percentages for the FN. In Vaucluse, the score of the FN is around 10% above what is should be, taking into account the other characteristics of towns in that département.
A modeling such as the one we presented in this article always has some weaknesses. Here are a few. Which one would seem the most important one ?
Regression
The reader is invited to refer to the endogeneity article.