martes, 4 de octubre de 2016

#PredictiveCOL - Forecasting Colombia's peace plebiscite (final update)

For sure, this is the more exciting forecast I have ever done. On one hand, I am Colombian guy, and I really want to live in a peaceful country, and I do want a better place for raising my children. On the other hand, I am very serious when it comes to forecasting.

Maybe you have read this blog before, and you have realized that I love predictions cuz, as a data scientist, you can use your own set of statistical methodologies that relates to voting intention and turnout. If you think genuinely, everything counts while forecasting this kind of processes. For example, you may weekly track people's security perception, how much the president is disliked, how many minutes newscast spend about the peace process, how many articles or opinion columns are written, how many tweets are sent per week about FARC, everything, everything counts and matter. Let us name these variables as contextual variables

Of course, polls and pollsters are also important. They are the only proxy of voting intention. However, polls get old, and they should be updated with new data; also some pollsters are not as confident as others (I really do not believe everything some pollsters claim), and sample size also matters cuz it improves the sampling error. 

Remember that this plebiscite has two options for Colombians to choose. The first option is Yes, that goes for Yes, I approve the agreement between Farc and government. The second option is No, meaning No, I don't support that agreement. With those topics in mind, let me introduce you my (bayesian) predictions for the Colombian plebiscite. Firstly, let me exhibit the trend that the two options have shown over time:
As you can see, we have deflated the data by undecided voters. Note that we extracted the signal (bold lines) from the noise generated by polls. The following graph shows the less robust prediction based only on what polls have estimated.

Now, a more reasonable forecast based on prior information (2014 presidential elections and 2014 legislative elections) that tries to explain the outcomes of the polls by modeling the extracted signal (see trends in previous graphic) with contextual variables. After the model is fitted, we use a Bayesian setup that relates the prior information with the estimated response from this model. The following chart shows this forecast.
Ok, it is clear that Colombian people will support this peace process. However, this election is legitimate if and only if Yes voter turnout is greater than 4.4 million. By using a similar Bayesian methodology, next graph shows the predicted turnout.
Also, we used a small area modeling to forecast the response of every Colombian department (equivalent to a state in the US). The following map shows that the majority of departments are supporting the agreement between FARC and government. However, there are some of them that will vote No. Dark areas are not supporting the deal, while light-gray areas will do support the agreement.
Finally, the posterior probability that Yes defeats No is 98.8%.

14 comentarios:

  1. Este comentario ha sido eliminado por el autor.

    1. Ups lo borré sin querer. Puedes compartir el código?

  2. Respuestas
    1. Ojalá me equivoque y que en todos los departamentos gane la paz!

  3. Te felicito Andrés, sería impecable tu predicción de no ser que la muestra de esa encuesta fue tomada durante el fin de semana del 16 al 18 de septiembre, y no incluye las declaraciones de Obama, de Ban Ki-moon, palabras de Santos en la ONU, y lo que estimo más importante que será el propio acto de la firma y la propaganda mundial que lo sucederá, Caballero sacará otra encuesta el último día que la ley permite, pero creo que muestra tendrá que ser recogida, máximo hasta mañana domingo para tener tiempo de procesarla. Sigue con esto que es muy interesante, al menos para mi que soy apasionada de los números y de las estadísticas

    1. Gracias Adelina! Actualizaré esta predicción con todas las encuestas que vayan llegando.

  4. Ojalá no se equivoque y que en todos los departamentos gane el fin del conflicto con las FAR-C, todos a votar. Gracias

  5. Hola! Que interesante! Yo tengo mis propias predicciones, le gustaría combinar mis datos Con sus métodos?

  6. Nice exercise, Andrés. I have two questions for you:

    1) It´s not clear for me which were the "contextual variables" you choose for the bayesian-based Forecast, can you explain them?

    2) Also, can you explain the relation between the results of both 2014s elections (parliamentary and presidential) with the outcome obtained with the forecast?


    1. Fulio,

      1) Contextual variables are everywhere. The most important decision is to choose variables that "varies" weekly.

      2) They are completely related cuz, parliamentary and presidential elections (back in 2014) starred Álvaro Uribe who is not supporting the agreement.

      Best, AG

  7. Increíble que el Caquetá, Meta y el Putumayo (que han sido tan golpeados por este conflicto) sean departamentos que no apoyen los acuerdos... comprendo el tema por Antioquia y Caldas (aquello del Uribismo). Me gusta mucho ese último gráfico, sería muy bueno si la leyenda mostrara 1:Si, 0:No. Gracias por compartirlo. DeysiC

  8. Hola Andrés, m interesa saber cómo analizas tus predicciones y las de las encuestadoras tradicionales hoy, ya sabiendo el resultado. ¿Qué fallo? ¿Por qué por tanto margen? Es un error metodológico? Es falta de datos?

  9. While the online test is accessible in the Palo Alto Learning Center. To get the Palo Alto Networks Certified Network Security Engineer 6credential, the hopeful needs to pass the Palo Alto PCNSE6 Exam. "