Quality of life and voting behaviour

In light of the  first round of the presidential elections in France (23 april 2017), we analyzed the correspondence between the quality of life characterisation of French departments and the elections results. The relationship between quality of life and voting behaviour (votes for Macron, resp. votes for Le Pen) is most pronounced in the top 10 ranked departments and the 10 departments with the lowest ranking on the quality of life index.

Dowload and read the full document : quality_of_life_and_voting_behaviour


Mapping the results of the French presidential election

In this post we document the workflow (using R) to produce a choropleth map and an isarithmic map representing the results of the presidential election in France (2nd round, 7 may 2017).

  1. Choropleth map of the results of the presidential election

First, we download a shapefile of the municipalities in France (data source : https://www.data.gouv.fr/fr/datasets/decoupage-administratif-communal-francais-issu-d-openstreetmap (january 2017).  We use this shapefile to map the results of the second round of the presidential election (data source : https://public.opendatasoft.com/explore/dataset/election-presidentielle-2017-tour-2).

communes <- readOGR(“.”,”communes-20170112″)



Text mining of State of the Union Addresses of Barack Obama, 2009-2016. Part 1 : Sentiment analysis with R package “syuzhet”

In this post we use the basic functions of the Syuzhet package to perform sentiment analysis on the text of the State of the Union Addresses of president Barack Obama in the period 2009-2016.

Project Gutenberg makes e-texts available and through the corresponding R library we use the UTF-8 file to perform sentiment analysis on  the SOTU texts.   The latter are tokenized by concatenation of the lines in chunks of 10.  The syuzhet package in R uses the NRC emotion lexicon and the get_nrc_sentiment function returns a data frame in which each token represents a row.  The columns include eight emotions (anger, fear, anticipation, trust, surtprise, sadness, joy and disgust) as well as two sentiments (negative and positive).

Download and read the full document : syuzhet_sotu


Assessing spatial heterogeneity in crime prediction. Using geographically weighted regression to explore local patterns in crime prediction in Belgian municipalities

Processes and characteristics of urban areas at the human-environment interface (e.g. social stratification, segregation, urban poverty) depend on a diverse set of socio-demographic, economic and environmental factors.  Due to the heterogeneity of urban areas, it can be assumed that  the strength and direction of the influence of these factors varies over space.

Special properties of geospatial data are spatial autocorrelation and spatial heterogeneity  (nonstationarity).  Spatial autocorrelation implies a spatial association between an attribute value at a particular location and attribute values at other locations close by.  Spatial heterogeneity describes systematic spatial variation of attribute values across space.  These spatial effects must be taken into account when modeling  spatial relationships in a regression model.

Traditionally, global statistical regression approaches are applied to study the influence of explanatory variables on a target variable.  These approaches emphasize similarities across space.  In the following analysis,  this global or “one fits all approach” is juxtaposed against  spatial autocorrelation and nonstationarity.  In particular, we explore  a global non-spatial regression model and both a global and local spatial regression model of the relationship between indicators of socio-economic disadvantage and neighborhood demographic context and crime rates in Belgian municipalities in the period  2008-2012.

An exploration of the spatial patterns of crime is warranted.  The causal processes driving crime may vary over space, that is, predictor variables may operate differently in different locations.  This may be especially relevant in policy studies where there is growing recognition that understanding the context of crime – the where and when of criminal events – is key to understanding how crime can be controlled and prevented.  Crime studies that highlight local variations – local contexts of crime – will likely have more relevance to real-world policy applications.  Empirically, if these variations in causal processes do exist and are not accounted for, the statistical model will be inaccurate.

Estimations provided by a global model might be inadequate in capturing spatially varying relationships, as global statistics are only describing average relations between the dependent variable and the considered explanatory variables.  With increasing spatial variation of local observations, the reliability of global model estimates decreases.

There might be spacial dependencies that refer to attribute values in one location which might depend on values of the attributers in neighboring locations.

The assumption of spatial heterogeneity can be suggested by the fact that criminality and its determinants may be distributed unevenly across space.  Another source of spatial heterogeneity is the dynamics between population and location.  That is, cultural differences and differences in attitudes and behaviour across locations may alter how people react to various contextual variables.  Given the potential of spatial heterogeneity, it would be naïve to assume that the spatial processes between criminality and its determinants are stationary (or universal) and can be captured by a conventional “global” model.

Following Tobler’s first law of geography which states that “everything is related to everything else, but near things are more related than distant things”, GWR has to be calibrated in a way that observations near to observation i have more influence on the estimation of the parameters that data located further away from i.

GWR takes advantage of spatial dependence in the data.  Spatial dependence implies that data available in locations near the focal location are more informative about the relationship between the independent and the dependent variables in the focal location.  When evaluating estimates for a focal location, GWR gives more weight to data from closer locations than to data from more distant locations.  It is assumed that the relative weight of the contributing locations decays at an empirically determined rate as that distance from the focal location increases.

Download and read the full document : assessing_spatial_heterogeneity

Visualisaties van het woon-werkverkeer in België

Inzicht in de woon-werkdynamiek is belangrijk voor het beleid inzake wonen en werken en de relatie tussen beide.  Daarbij stelt zich de vraag of mensen gaan wonen waar de kans op een geschikte baan optimaal is (wonen volgt werken) dan wel dat bedrijven zich eerder vestigen op plaatsen waar veel arbeidspotentieel beschikbaar is (werken volgt wonen).

In deze bijdrage analyseren wij deze thematiek aan de hand van open data die beschikbaar zijn over de relatie tussen wonen en werken in België.  Concreet baseren wij ons op de gegevens over het woon-werkverkeer van de actieve beroepsbevolking tussen de gemeenten in België.  De matrix van pendelbewegingen tussen woon- en werkplaats van de 589 Belgische gemeenten is aangewend om deze bewegingen te visualiseren.  Hierbij is gebruik gemaakt van open source software (R).  Aan de hand van deze data worden woon-werkverplaatsingen achtereenvolgens gevisualiseerd (1) aan de hand van “choropleth maps”, (2) een cartogram  (3) een lijndiagram en (4) een netwerkvoorstelling.

visualisaties woon-werkverkeer België

Big data, open data, small data

The current business environment evolves from a transition towards globalization and a restructuration of the economic order.  The pace of technological change that allows instant connectivity and the current era of ubiquitous computing that resulted from it, represent a new normal.

In the knowledge economy, activity is based on highly networked interactions.  The amount of digital collaboration is increasing among people, things and their interactions.  Through the Internet of People and the Internet of Things, networking is expanding not only in person-to-person interactions, but also in person-to-machine and machine-to-machine interactions.

The digital revolution of recent decades creates a new economy of data.  Fur businesses to become responsive to market conditions, it is necessary to look at the whole ecosystem and access data from disparate sources, inside and outside the corporate firewall.  What has been termed “big data” refers to the capacity to access and analyze large data streams. Not only big data but also the open data movement, that strives to make data freely available, gains increased acceptance.  Open data emerges as a public resource to disseminate datasets that are commonly made available (in many cases through APIs).

The exponential growth of data and the increased reliance on insights derived from data for decision-making, causes a shift in the analytic landscape.  Data analysis is more than an IT-function and is about people and business decisions.  Therefore, the emphasis of analytics is on designing solutions that focus on answering business questions of the end user.  Users want seamless access to information to support decision-making in day-to-day activities.  In both their personal and professional lives, Web-savvy users have adopted the principles of interactive computing and have come to demand customizable tools with high responsiveness.  Analytics, and the insights it delivers, evolves towards a function that follows the lines of the self-service model with business users producing their own reports in an interactive way and performing analytics on demand.

Open data disrupts the traditional workflow from data selection to final product creation and dissemination that is controlled by proprietary technologies.  As businesses need smaller and flexible applications targeted at a specific problem, open data becomes a part of the data mix.  Changing business environments make the idea of providing access to a limited number of specific datasets rapidly outdated.  Therefore, open data becomes an important driver for “small data” : specialized datasets that provide targeted information to address a specific question or problem.  The real challenge is not to manage big data but to access and manage a distributed ecosystem of small data.

Spatial data analysis with R : Voronoi diagrams

Voronoi diagrams represent a way to visualise spatial data.  A voronoi diagram consists of a set of polygons of which the points in each polygon are closest to a set of pre-defined points.  Therefore, voronoi diagrams can be used to analyse point density.

In this guide the subject of voronoi diagrams is documented by way of examples and the corresponding R code.  The data to reproduce the examples is available on my github repo.

Download and read the full document : voronoi_diagrams_with_R