Remote Data Analyst - AI & Threat Analytics Team
Job Overview
We are in search of a skilled and dedicated Data Analyst to join our innovative AI & Threat Analytics team. This remote role provides a unique opportunity to play a pivotal part in enhancing our autofill classification models through effective management, optimization, and analysis of diverse datasets. Candidates residing in the El Dorado Hills, CA, or Chicago, IL, metro areas may also have the option for a hybrid work arrangement.
Key Responsibilities Manage the complete data lifecycle, including collection, cleaning, and preprocessing for HTML-centric datasets utilized in machine learning applications. Leverage web analysis tools to extract and structure data from DOM environments, facilitating model training and validation. Collaborate effectively with machine learning engineers to support feature engineering initiatives and create training datasets tailored to model specifications. Generate and enhance synthetic datasets employing large language models (LLMs) to improve the balance and accessibility of training data. Conduct data analysis utilizing dimensionality reduction techniques (such as t-SNE, PCA, and UMAP) to assess feature efficacy and optimize dataset integrity. Automate data workflows to enhance the efficiency of data processing, manipulation, and transformation tasks. Document data workflows, processes, and methodologies comprehensively to ensure data lineage, replicability, and scalability. Establish validation protocols and data quality systems to maintain consistency and reliability across all datasets.
Required Skills Proficient in Python for data manipulation and analysis, including the use of libraries like Pandas and NumPy, as well as for workflow automation. Extensive experience with web analysis tools (e.g., Selenium, BeautifulSoup) and a solid grasp of HTML and DOM structures for data extraction and preprocessing. Knowledge of natural language processing (NLP) methods such as tokenization, stop word removal, and lemmatization for preparing text data. Experience in generating synthetic datasets and utilizing LLMs to support machine learning data requirements. Strong problem-solving capabilities and a detail-oriented approach to ensuring data quality and governance. Familiarity with cloud platforms (AWS, GCP, Azure) for data storage and processing.
Qualifications A minimum of 2 years of professional experience as a Data Analyst, ideally in a cybersecurity or machine learning context. Excellent collaboration skills, particularly with machine learning engineers and other technical teams. A Bachelor's degree in Data Science, Statistics, Computer Science, or a related discipline, or equivalent experience. Given the roles interaction with GovCloud, all applicants must be classified as a US Person.
Career Growth Opportunities
This position provides substantial opportunities for professional development, enabling you to collaborate with leading machine learning engineers and enhancing your expertise in data analysis and machine learning methodologies.
Company Culture and Values
We are committed to fostering a diverse and inclusive workplace environment that encourages collaboration and innovation, where every employees contributions are valued.
Compensation And Benefits Comprehensive medical, dental, and vision insurance (including domestic partnership coverage). Employer-paid life insurance and supplemental life insurance options for employees and their families. Voluntary short-term and long-term disability insurance. 401(k) plan options, including both Roth and traditional plans. Generous paid time off (PTO) policy that acknowledges your dedication and tenure, with provisions for paid bereavement and jury duty leave. Competitive annual bonuses.
Employment Type: Full-Time Apply Job!