Table of contents


Here is a list of projects in which I participated.


Creation of Sodata: a Big Data Plateform to collect and visualize data from social networks such as Facebook, Twitter & Instagram.

Partners: 1000Mercis, ENS Cachan.

  • Missions:
    • Construct graphs from crawled social networks data (Facebook, Twitter and Instagram)
    • Compute network structures for the constructed graphs.
    • Integrate the visualization to the Sodata plateform.
  • Softwares and Langages:
    • MongoDB, Pig, Javascript, Django (Python), Shell scripts, Gephi, Spark.

Curated Media

Project funded by the french ministry for the economy and finance. It is about making contributions to Radarly, a social media intelligence software developped by Linkfluence.

Partners: Linkfluence, Intercloud, CEA list, Webedia.

  • Missions:
    • Test and evaluation of Radarly.
    • Collect data from the web using Radarly.
    • Data filtering and preprocessing.
    • Data analysis (working on cascade segmentation actually).
  • Softwares and Langages: R
    • Using my favorite R IDE: Rstudio.
    • Most used libraries: data.table, dplyr, reshape2, ggplot2, plotly, knitr, pander and rmarkdown to build beautiful reports and work summaries (html_notebooks).

Open Food System

A research project funded by SEB about digital transformation of cooking and intelligent kitchen. There is 20 partners on this project.

  • Missions:
    • Graph analysis of Facebook group in order to analyse detected social roles.
  • Softwares and Langages:
    • Gephi, MongoDB, interactivevis (beautiful interactive visualisation of network data).


Project funded by the french ministry for the economy and finance. The project aims to develop a system for Optimized and Intelligent Machining operations.

Partners: Spring Technologies, ENS Cachan, Snecma, Datakit,  CADLM.

  • Missions:
    • Apply data mining techniques to usinage smart manufacturing.
    • shape detection
    • Techniques used: regression, neural networks, kmeans, gaussian mixtures, variational gaussian mixtures.
  • Softwares and Langages:
    • Matlab


Coclico est un projet de recherche ANR visant à étudier et proposer une méthode générique innovante permettant une analyse multi-échelle de grands volumes de données spatio-temporelles fournies en continue de qualité très variable.

Collaborative clustering of large scale spatio-temporal data.

  • Keywords:
    • Fuzzy Clustering.
    • Som
    • GtM
    • Variational Inference.
    • Variational Bayesian GTM.

My PhD

The research outlined in my PhD thesis concerns the development of collaborative clustering approaches based on topological methods, such as self-organizing maps (SOM), generative topographic mappings (GTM) and variational Bayesian GTM (VBGTM). The fundamental concept of collaborative clustering is that the clustering algorithms operate locally on individual data sets, but collaborate by exchanging information about their findings. The strength of collaboration, or confidence, is precised by a parameter called coefficient of collaboration. My thesis proposes to learn it automatically during the collaboration phase. Two data scenarios are treated in the thesis, referred as vertical and horizontal collaboration. The vertical collaboration occurs when data sets contain different objects and same patterns. The horizontal collaboration occurs when they have same objects and described by different patterns.

  • Keywords:
    • Machine Learning
    • Collaborative Clustering
    • Self-Organizing Maps (SOM)
    • Generative Topographic Mappings (GTM)
    • Variational Inference
    • Variational Bayesian Generative Topographic Mappings (VBGTM)