Data Scientist:
Our direct client, a fast-growing fintech firm, is seeking a Data Scientist. The
AI/ML team is developing cutting edge solutions to establish a unique competitive edge for the company. The Data Scientist role will play a key role in developing AI models and engineering firm’s core machine learning and AI products. This individual will be working in a collaborative team environment across machine learning, product management, data engineering, and software engineering teams. The ideal candidate will be passionate about leveraging machine learning techniques to drive innovation and have a strong background in researching and developing AI models.
This position is based in New York City, 4 days per week onsite is expected the base salary is in the $140-160K range, DOE, plus bonus and stock options.
Responsibilities
-
Build and integrate AI/ML/DS tools and workflows to address business needs and increase business efficiency.
-
Support the design, development, training, and deployment of AI/ML models and engineering solutions to solve business problems through a full development and production cycle in the FinTech domain.
-
Build and leverage new and existing tools for Large Language Model (LLM), Natural Language Processing (NLP), Optical Character Recognition (OCR), and intelligent document processing tasks.
-
Evaluate and compare the performance of different AI/ML algorithms and models.
-
Contribute to the improvement of Machine Learning Operations (MLOps) pipelines and procedures to ensure efficiency, scalability, and maintainability.
-
Ensure the reliability, robustness, and scalability of machine learning models in production environments.
-
Collaborate with cross-functional teams, including machine learning engineers, product managers and full stack engineers, to deliver scalable machine learning solutions.
Qualifications
-
4+ years of experience as a hands-on data scientist or AI/ML engineer in AI/ML/DS fields
-
Advanced degree in a relevant field such as AI, ML, Data Science, mathematics, or computer science.
-
Experience building ML and AI models and systems in a production environment in at least Generative AI/LLM or NLP applications
-
Experience working with LLM, such as GPT-4, Llama 3, Mistral, and other commercial or open-source models in a production environment
-
Knowledge of NLP techniques, including text data preprocessing (tokenization, stemming, and text normalization, etc.) and information extraction (summarization, and question answering, etc.)
-
Proficiency in programming languages in Python, and libraries/frameworks like TensorFlow, PyTorch, spaCy and scikit-learn, etc
-
Strong knowledge of machine learning algorithms and statistical techniques, their limitations, and implementation challenges
-
Experience with cloud platforms and distributed computing environments, such as AWS, Google Cloud, or Azure
-
Direct contributions to experiments, including designing experimental details, writing reusable code, running evaluations, and organizing results
-
Strong problem-solving skills and able to work independently and collaboratively in a fast-paced, agile environment
-
Strong communication skills and able to effectively articulate technical concepts to both technical and non-technical audiences
-
Experience with data visualization tools and techniques to effectively communicate and present findings
-
Publication record as a lead author or essential contributor at top venues such as CHI, NeurIPS, UIST, ICML, ICLR, ACL, EMNLP, CVPR, AAAI, and/or ICAPS
-
Portfolio of personal projects on Github, BitBucket, Google Colab, Kaggle, etc.
-
Understanding of regulatory and compliance requirements in the financial industry and their implications for machine learning applications
-
Experience with software development best practices, including source control (Git), CI/CD pipelines, testing, and documentation
-
Familiar with database integration principles and practices, including SQL and NoSQL databases and data warehouse solutions, such as Snowflake
-
Experience with data transformation tools, such as dbt, and orchestration tools such as Airflow