Data Engineer

Fully Remote IT

28 Sep 2022

Fully Remote

IT

Engineering

Full Time

1

3 years

We are hiring for a fully remote Data Engineer role. This is a full time position, and this person much be located within the United States.


Job Description

We are seeking someone with a technical background who has experience in shuttling data from one place to another, preferably in a professional ETL environment.  The areas of responsibility primarily cover building out new pipeline components and maintaining existing ones for a complex technology stack that spans a variety of languages and frameworks.

The engineering team is responsible for data health and quality in every step of the pipeline process, from initial ingestion to deployment and visualization.  As a result, debugging can require a deep dive into several interfacing pieces of software, and on any given day a team member can expect to work on multiple components that perform very different functions.  The ideal candidate must be comfortable moving between systems, and importantly, they must be curious and hard-working, as we expect them to tackle these problems using their own experience and independent research abilities. 

Specifically, the Data Engineer will be tasked with the following:

  • Manage, modify, and maintain our proprietary software responsible for data storage and transformation of data from a wide variety of sources and delivery methods.
  • Design and build new components that scale to efficiently ingest, normalize, and process data from a growing number of different sources.
  • Run distributed computing jobs using Spark and Elastic MapReduce to prepare and transform terabytes of time-series and event data for modeling.
  • Integrate external APIs into current products and utilize their data to streamline and add value to current offerings.
  • Assist and service the Data Science department through tasks such as building datasets, reducing

Qualifications & Skills

  • Bachelor’s degree in Mathematics, Computer Science, or related field.
  • 2+ years of experience using Python 3 to leverage its strong data science libraries, including Scikit-Learn and Pandas.
  • Strong in at least one other language; experience shell scripting, especially bash.
  • Proficient with different flavors of SQL, especially PostgreSQL, including a knowledge of under-the-hood concepts like indexing and analysis of query plans.
  • Experience extracting data from, and pushing data to, a variety of sources including relational and non-relational databases, RESTful APIs, flat files, FTP servers, and distributed file systems.
  • Excellent communication skills, especially when explaining difficult technical concepts to people in non-technical roles.
  • Strong analytical skills, especially when working with multiple large datasets.
  • Experience with “XaaS” cloud services — we are an AWS shop but will consider candidates with similar experience on other cloud platforms.

Preferred Qualifications

  •  Proficiency in JavaScript used in both front-end (React) and back-end (node.js) contexts, especially for middleware APIs. 
  • Experience displaying data with Tableau, d3.js, or other data visualization tools.
  • Experience working with decision trees, clustering algorithms, and other supervised and unsupervised machine learning techniques.