Our client, a leading pharmaceutical company, is hiring a Data Scientist, on a contract basis.
Work Location:
Princeton, NJ – 50% on site
Knowledge/Skills Required/Education:
- Ph.D. in quantitative sciences/engineering (computer science, mathematics, statistics, or engineering).
- 5 years of relevant professional experience with a proven track record in machine learning and data science – experience in drug discovery machine learning is desirable but not required.
- Strong knowledge of one or more scripting programming languages, with a focus on machine learning (e.g., Python (preferred), R, Matlab, C/C++).
- Experience utilizing molecular features of small molecules in machine learning models.
- Experience with the use and application of Bayesian statistics and simulation methods in generating probabilistic outcomes.
- Able to extract information from databases using a variety of software packages (e.g., Oracle SQL developer).
- Ability to build and maintain databases aligned with enterprise solutions is desirable but not required.
- Strong analytical and problem-solving skills to understand technical business problems and implement solutions.
- Ability to work effectively on matrixed teams to collaboratively solve challenging problems, while also able to work independently with minimal resources.
- Has good interpersonal, communication, writing and organizational skills.
- Strong preference for on-site presence to enable colocation with data science team.
Responsibilities:
- Write python scripts to enable rapid cleaning and analysis of medium and high throughput datasets.
- Utilize machine learning (ML) approaches to generate small molecules features.
- Utilize Bayesian statistics approaches to estimate uncertainties in assay datasets, based on results on above ML outputs.
- Write and document programming code (python preferred) to facilitate data preparation / cleaning, model development, and evaluation.
- Produce high quality scripts, documentation, and processing pipeline by the end of 2023.
- Create deployable version of processing pipeline for near term use as a stand-alone application and ultimately future integration with enterprise suite.