Our client, a leading utility company, is hiring a Post-Doctoral Research Scientist on a contract basis.
Summit, NJ (Hybrid Working Model)
- Utilizes supervised or unsupervised methods, learning from vast amounts of unlabeled data to drive insight from unstructured text
- Ensures life cycle management of code is maintained through version control and associated repositories.
- Develops high quality analytical and statistical models, insights, patterns, visualizations, that can be used to improve decision making in manufacturing operations.
- Responsible for documentation of all technical work both within and outside of formal document management systems
- Independently develops code and analytical models to automate data transformation and analysis
- Performs data engineering, preprocessing, exploratory data analysis, and model development by interacting with a variety of databases
- Responsible for ingestion, integration and delivery of data across multiple platforms
- Works to maintain and uphold data integrity and clean data principles
- Communicates with team members regularly to provide updates and collaborate on deliverables.
- Displays a high level of teamwork and collaboration both within and across functions
- PhD with knowledge of deep learning methods for NLP (quantitative area of study, Computer Science, preferred)
- Strong background and demonstrable experience in Natural Language Processing and Computational Linguistics is required
- Proficient in writing and developing analytical and machine learning models using python modules including pandas, numpy, scikitlearn, and tensorflow. Experiencing developing and implementing MLOps pipelines
- Experience building analytical and statistical models to answer key business questions
- Experience using git via the command line
- Strong understanding of core statistical concepts to solve real world problems
- Intermediate to advanced proficiency (3+ years post academia experience as an independent contributor designing and delivering data solutions) in SQL.
- Experience interacting with various data warehouses and large-scale, complex datasets using ETL and BI tools and platforms.
- Self-motivated to identify and propose novel methodologies that will drive increased efficiency
- Demonstrate expert knowledge in machine learning and rule-based systems as applied to computational linguistics and natural language processing, as well as development and execution of annotation tasks with teams of experts
- Proficiency in mathematics with the skill to translate complex mathematical algorithms into usable computational methods
- Experience with data mining and analysis techniques across disparate data sources
- Experience working in LINUX/UNIX environments
- Experience interacting with PostgresSQL, Oracle, Impala Cloudera, Okera or similar databases
- Experience with JupyterLabs, Anaconda, and RStudio
- Intermediate proficiency with python
- Experience developing visualizations using a variety of methods (plotly, matplotlib, seaborn)
- Experience working within Domino Data Lab projects
- Technical knowledge of performance tuning and query optimization across large data sets.
- Experience with data cataloguing and enablement through APIs
- Experience with a variety of computer science languages (C++, Java, html/css)
- Exposure to bioprocess engineering/cell therapy data
- Knowledge of GxP requirements (preferably related to data and code management)
- Dashboard development experience (Tableau, Spotfire, DASH)
- Experience working with the pharmaceutical industry