Spark Scala SW engineer
Who are we?
Subsidiary of the Ecole polytechnique, we are FX-Conseil, a private service company specialized in partnerships with research centers. We support the emergence and development of technological transformation, products and services thanks to the expertise of research centers, in particular the research center of the Ecole polytechnique. We connect big companies, SMEs and start-ups willing to benefit from our partnerships with research laboratories simply and easily.
We are starting a new research partnership with a world leader of the industry of electronic health-care databases. This is a project with a huge potential, involving the development of big-data techniques and machine learning algorithms for health. The database records every health-related transaction for millions of people.
We are looking for a Software Engineer to work on this new project: she/he will work on the development of the machine learning pipeline applied on this electronic health record database (from one of the world’s main actors of this industry).
Our team is composed of several enthusiast researchers (international researchers in machine learning from Ecole polytechnique) and data-driven software engineers / data-scientists.
What we want to do
We want to use a simple stack based on an HDFS - Spark cluster. We are also used to develop homemade machine learning algorithms in C++/Python. The main workflow will look like this:
- Cleaning: from a production SQL database, we format the data in a large denormalized table;
- Featuring: we compute a lot (a lot) of features based on this table to create a large matrix;
- Learning: we feed this preprocessed data to homemade machine learning algorithms.
An example of tool to deploy the whole stack could be Ansible.
- You develop a lot of stuff in Spark and try to improve our performances using any kind of kick-ass technology, all of that using a slick git workflow.
- You walk across the campus and cross some students, teachers and horses.
- You can play with a cluster to get better performance (or just to learn). You can also use large machines to run your code
- You can use the sport facilities on campus if you want, including swimming pool, sky diving or golf
- When something is unclear about the data, or when you feel the need to, you go and spend the day at our partner’s office, located in a brand new comfy tower in La Defense. The data will be there, but there’ll be a remote secure access to work from FX Conseil.
- You are also part of every decision regarding the evolution of our architecture (such as numerous NoSQL debates for instance).
- You are also looking for new libraries, releases, papers, conferences, literally anything that’s linked to the field of interest.
Who are you?
- Curious is your middle name. Github is your facebook.
- You know why you’re using Eclipse instead of IntelliJ or the other way around
- You spent several years studying computer science and more time playing with data-oriented tools. You acquired good programming skills and know about clean code, tests and versioning.
- You’ve seen things. Mostly weird implementations and hairy publications. You already applied all of this knowledge for some years in a data-centric startup or group.
- You dream in Scala or Python (and you don’t talk about nightmares)
- Also, you know about the machine learning ecosystem and everything revolving around distributed computing
- You can survive at least a month without water or food on Unix and you’ve picked your stand between emacs and vi
You are either:
- A killer in Scala;
- A hardcore Spark developer;
- Fluent in python
- Work in a highly challenging intellectual environment;
- Benefit from the campus infrastructure, which means that you could practice horse riding, golfing and water-polo, unfortunately not at the same time;
- Use a lot of different technologies and try any that looks appealing to you