Junior Software Engineer (Web Scraping)

Job description

We are looking for talented profiles to help build and maintain the distributed data collection system that is at the heart of our business. You'll be in the front-lines, facing massive (but interesting!) challenges as we try to scrape all retail data available.


We are a data-driven company which collects and processes more than 600GB of raw data (HTML) daily. We leverage big data technologies such as Serverless, Spark on AWS EMR to crunch these volumes of data and make it queryable.

 

In this role you will ensure that our data collection engine, which consists of distributed web crawlers, is state of the art and ahead of our competition. You will ensure that we can scrape any webshop, no matter the ban-detection that has been put in place. Next to that it will be important that the proper monitoring tools are in place. We are currently scraping 60 sites and your goal is to at least triple that without losing completeness and quality. 

Responsibilities

  • Developing & maintaining web crawlers using Python & JavaScript.
  • Design & developing internal tools and frameworks
  • Design & develop tools to automate testing & QA
  • Building & maintaining distributed data collection systems on AWS
  • Guarantee data integrity and quality by extending our logging, monitoring and outlier detection systems

About the stack

  • This distributed system is made on top of Amazon Web Services and uses Serverless architectures where possible, with Python & Javascript being the main programming languages used.
  • As Daltix scales from 50+ websites to 200+ websites (which it scrapes multiple times per day!) it has to invest in orchestration technologies such as Kubernetes as well as logging & monitoring solutions to keep an overview at scale.

Requirements

Required

  • Wide-knowledge of computer engineering (e.g through a CS masters, home projects, ...).
  • Strong programming experience with one or more languages: Python (preferred), Javascript, C#, Java, Scala, C/C++...
  • Good understanding REST API's, databases and SQL.
  • Linux command-line experience.
  • Highly proficient in spoken and written English.
  • You are passionate about software engineering.

Preferably you also have:

  • Strong experience with both Python & Javascript.
  • Experience working on top of Amazon Web Services.
  • Excellent coding skills and deep knowledge of web technologies.
  • Excellent problem solving capabilities and a critical mindset.
  • You get energy from working in a highly complex and challenging startup environment with a high tech product

About Daltix

Daltix is a young company from Ghent (BE) with offices in Boom (BE) and Lisbon (PT) bringing real-time insights to the world of retail. We have developed a set of tools to gather, process and analyze e-commerce data from webshops. Every day, we collect prices, promotions and assortment data from a myriad of e-commerce channels. This data is turned into actionable insights for the right people at the right time by extracting high level insights, introducing more structure with A.I. techniques, and enriching the information with alternative data sources. These insights are used by retailers and suppliers to help them in their market positioning.