Thanks for your interest in the ML Data Engineer – Healthcare Data Curation & Cleaning (1 Year Fixed Term) position.
Unfortunately this position has been closed but you can search our 450 open jobs by
DESIRED QUALIFICATIONS:
● 3+ years of experience in software development and data engineering with a strong focus on data cleaning, transformation, and creation.
● Proficiency in Python and experience with data processing libraries (e.g., Pandas, Polars, NumPy).
● Hands-on experience in building and maintaining automated data pipelines for large-scale data processing.
● Familiarity with machine learning frameworks (e.g., PyTorch, JAX, scikit-learn) as applied to data quality and augmentation tasks.
● Expertise in working with healthcare data, including familiarity with the OMOP Common Data Model (OMOP CDM).
● Strong experience in a Linux environment and comfort with UNIX command-line tools.
● Proven ability to work collaboratively in multidisciplinary teams and communicate technical concepts effectively.
PREFERRED QUALIFICATIONS:
● Experience with cloud platforms (e.g., GCP, AWS, or Azure) and distributed computing frameworks.
● Proficiency with version control systems (e.g., Git) and containerization tools (e.g., Docker).
● Familiarity with healthcare data standards and regulatory requirements.
EDUCATION & EXPERIENCE (REQUIRED):
Bachelor’s degree in scientific or analytic field and five years of relevant experience, or a combination of education and relevant experience.
KNOWLEDGE, SKILLS AND ABILITIES (REQUIRED):
• Knowledge of key data structures algorithms, and techniques pertinent to systems that support high volume, velocity, or variety datasets (including data mining, machine learning, NLP, data retrieval).
• Experience with relational, NoSQL, or NewSQL database systems and data modeling, structured and unstructured.
• Experience in parallel and distributed data processing techniques and platforms (MPI, Map/Reduce, Batch).
• Experience in scripting languages and experience in debugging them, experience with high performance/systems languages and techniques.
• Knowledge of benchmark software development and programmable fields/systems, ability to analyze systems and data pipelines and propose solutions that leverage emerging technologies.
• Ability to use and integrate security controls for web applications, mobile platforms, and backend systems.
• Experience deploying reliable data systems and data quality management.
• Ability to research, evaluate, architect, and deploy new tools, frameworks, and patterns to build scalable Big Data platforms.
• Ability to document use cases, solutions and recommendations.
• Demonstrated excellence in written and verbal communication skills.
CERTIFICATIONS & LICENSES:
None
PHYSICAL REQUIREMENTS*:
• Frequently sit, grasp lightly, use fine manipulation and perform desk-based computer tasks, lift, carry, push pull objects that weigh to ten pounds.
• Occasionally sit, use a telephone or write by hand.
• Rarely kneel, crawl, climb, twist, bend, stoop, squat, reach or work above shoulders, sort, file paperwork or parts, operate foot and hand controls.
* - Consistent with its obligations under the law, the University will provide reasonable accommodation to any employee with a disability who requires accommodation to perform the essential functions of his or her job.