In November, Servian held their first New Zealand Datathon, partnering with Dataiku and hosting TechWomen NZ members and Women in Data Science. As it was the first Datathon, they wanted to understand some of the challenges and successes so they can host a wider audience and skill set next time around.
The Datathon was held at Servian NZ’s main offices in Auckland and Wellington, across one and a half days. The participants were met by the hosts and the Technical Team, outlining the agenda and structure of the Datathon and the data and problem statement were explained to them. Divided into teams of three, they worked together to produce a result that could then be judged at the end of the second day. Everyone was well fed and watered, and seemed to enjoy themselves very much – as you can see in the photos! On the second day of the Datathon, the real work started. The teams were flat out trying to work a tricky dataset into something they could start to build predictions around. The teams presented their findings at the end of the day and the criteria was judged.
Why a Datathon?
Servian decided to run a Datathon to see how easily Dataiku – one of the world’s leading Enterprise AI and machine learning platforms could be used to approach a data science challenge. Servian use DSS (Dataiku Data Science Studio) for preparing and analysing data, but they were keen to see how quickly they could teach novices.
They were excited to see how groups would use the tool in different ways, and when the teams delivered their results, they were pleased to see that everyone had taken a very unique approach. One group spent time working through the analytics capabilities to find correlations and interactions in the data; another spent a long time working on data hygiene and structure. Seeing how easily the teams were able to take these varied approaches reinforced why Servian are partnered with Dataiku, and why they enjoy using the DSS tool.
Data and Problem Statement
Participants of the Datathon worked on the West Nile Virus Kaggle use case, which challenged them to find a more accurate method of predicting where the outbreaks of the West Nile virus in mosquitoes would likely occur. This information would help the City of Chicago and CPHD to efficiently and effectively allocate resources towards preventing transmission of this potentially deadly virus.
A week prior to the competition, participants were trained on Dataiku(DSS) which was the tooling used in the Datathon. Through the platform’s data wrangling, machine learning and dashboard capabilities, the various teams were able to efficiently prepare the weather, location, testing and spraying data provided. This enabled them to quickly prototype and iterate through various ML models and build dashboards and reports to consolidate their findings and insights towards the problem statement at hand.
Judging – what were the judges looking for?
All teams were given a rating of 1-10 (with 10 being the highest) across these five key criteria:
- Technology – How well are you using the Dataiku DSS platform?
- Potential- How easily could this be developed into something bigger?
- Innovation – How did you approach the problem differently?
- Presentation – How well presented were your results?
- Teamwork – How well did your team organise itself?
Each of the five judges provided their ratings independently and had a huddle to discuss their observations in the last two days of the competition, specifically around how each of the teams collaborated and came up with the solution.
Winning feedback:
Dataiku DSS is collaborative data science software for teams of data scientists, data analysts, and engineers to explore, prototype, build and deliver their own data products more efficiently.
They created random groups of three in Auckland and Wellington and gave us all a problem statement/definition on day one, followed by group technical focus and presentation sessions on day two.
All of us were quite new to the Dataiku platform. But we managed to use our skills, knowledge and experiences in an effective way to distribute responsibilities and managing the mission collaboratively. I used my functional consulting to contribute to setting up an efficient process and using the capabilities of the Dataiku in regards to modelling and visualisation. Another member helped with data statistical modelling using her computer science and another one used her economic knowledge to tell the story.
The team consisting of Victoria Huang, Olga Obalashova and myself, participated in the Datathon Competition and won, thanks so much to the team from Servian.
Dr Fahimeh Zaeri Linked in Page