A modular ecosystem under this. namespace.
-
Updated
Dec 19, 2025 - Rust
A modular ecosystem under this. namespace.
🩺 Machine Learning diabetes prediction model using Support Vector Machine (SVM) classifier. Analyzes 8 medical features (glucose, BMI, age, etc.) from Pima Indian dataset to predict diabetes risk with 75-80% accuracy. Built with Python, scikit-learn, pandas. Includes data preprocessing, model training, and prediction system for diabetes..
Example code accompanying the sternberg concept cell data release for Kyzar et al. (2024)
A digital transformation of cyber assessment and authorization data with a relational schema
Unifying Biotic Interactions Data: Terminology, Data Analysis, Standardization, and Proposal of a Data Schema for Plant-Pollinator Interactions
Feature Engineering with Python
Prepare and check data to comply with Darwin Core Standard in R
A Python-based data cleaning project to streamline Quickbooks invoice data for analysis, paving the way for improved insights into sales, pricing, and inventory management.
A new package processes textual descriptions of drone designs to extract structured summaries of their operational capabilities. It focuses on identifying and categorizing key features such as locomot
Building a modern data warehouse with SQL Server, including ETL Processes, Data Modeling and Analytics
Highlighting expertise in data migration, data normalization and standardization, this project demonstrates successful data transfer from Snowflake to Databricks. It emphasizes optimized data flow and enhanced accessibility through standardization, showcasing a commitment to ethical data practices.
Hi folk, During my internship at KultureHire, I completed an end to end Data Analytics project. I created an executive and functional dashboard using pivot tables, conducted a thorough analysis, and provided actionable recommendations. I'm excited to share my work and the insights I discovered.
This project is about cleaning and preparing a global layoffs dataset for analysis, focusing on handling null values, correcting data types, and ensuring data integrity for more accurate insights.
csv-managed is a Rust command-line utility for high‑performance exploration and transformation of CSV data at scale, emphasizing streaming, typed operations, and reproducible workflows via schema and index files.
vuln-structure is a package that extracts vulnerability details from raw text and outputs standardized, structured data for security teams.
🌟 Data Cleaning and Processing 🌟 Handled missing values, removed duplicates, standardized salary formats, and treated outliers for consistency.Revealed trends in company performance, job roles, and salary distributions after refining the dataset. This project highlights the power of data preprocessing as the backbone of reliable analytics.
The call center provided a messy dataset of customers. The objective was to clean, standardize, and remove duplicates to create an accurate, organized contact list. I used Pandas to load, explore, clean, and export the data, delivering a refined list ready for effective customer outreach.
CDIS data standardization with SAS and R
Tutorial code for performing PCA (with mathematical explanation) on breast cancer features computed from digitized images of fine needle aspirate (FNA) of a breast mass. Center the data, calculate correlation matrix, compute principal components, visualize and interpret results.
Add a description, image, and links to the data-standardization topic page so that developers can more easily learn about it.
To associate your repository with the data-standardization topic, visit your repo's landing page and select "manage topics."