training : access to training data and testing (holdout) data . was there sampling of any kind applied to create this dataset? are we introducing any data leaks? production : access to batches or real-time streams of ML content from various sources how can we trust that this stream only has data that is consistent with what we have historically seen? Assumption Reality Reason All of our incoming data is only machine learning related (no spam). We would need a filter to remove spam content that's not ML related. To simplify our ML task, we will assume all the data is ML content. Our task Labeling Our task Labels : categories of machine learning (for simplification, we've restricted the label space to the following tags: natural-language-processing , computer-vision , mlops and other ). Features : text features (title and description) that describe the content. Assumption Reality Reason Content can only belong ...
Posts
Machine Learning Product Design
- Get link
- X
- Other Apps
Background Set the scene for what we're trying to do through a user-centric approach: users : profile/persona of our users goals : our users' main goals pains : obstacles preventing our users from achieving their goals Value proposition Propose the value we can create through a product-centric approach: product : what needs to be built to help our users reach their goals? alleviates : how will the product reduce pains? advantages : how will the product create gains? Objectives Breakdown the product into key objectives that we want to focus on. Solution Describe the solution required to meet our objectives, including its: core features : key features that will be developed. integration : how the product will integrate with other services. alternatives : alternative solutions that we should considered. constraints : limitations that we need to be aware of. out - of - scope . : features that we will not be developing for now. Feasibility How feasible is our solution and do we...
Transforming Patterns Into Insights: From Observation to Action
- Get link
- X
- Other Apps
After uncovering patterns during the initial phase of data exploration, the real value begins to emerge in the next step, transforming those patterns into meaningful insights . These insights are not just summaries; they should tell a story that guides stakeholders toward action and strategy. Whether you’re analyzing customer behavior, operational efficiency, or market dynamics, the journey from “What do I see?” to “What should we do?” is the most important leap in data analytics. ๐ Why This Step Matters Finding a trend is valuable, but without interpretation, it’s just noise. For example: • Observing that weekend sales drop significantly is interesting. • Understanding that this happens because of a lack of promotional activity on weekends leads to insight. • Acting on this by launching weekend campaigns leads to business impact. ๐ง Use Questions to Dig Deeper There’s no one-size-fits-all framework for insight development, but asking the right q...
Finding Hidden Patterns in Data: A Journey Through Exploration
- Get link
- X
- Other Apps
When working with data, one of the most powerful skills you can develop is the ability to recognize meaningful patterns, not just the obvious ones, but subtle structures that emerge when you slice, group, and visualize information from different perspectives. In this article, we’ll walk through a realistic example using a slightly complex dataset , ask guiding questions , and demonstrate how to discover insights that may not be immediately visible. The Dataset: Customer Product Interaction Logs Let’s say you work at an e-commerce company, and you’re analyzing a dataset containing: • Customer ID • Product Category • Session Duration (mins) • Items Viewed • Items Purchased • Total Spend • Date • Device Type (Desktop, Mobile, Tablet) • Region Your Goal: “Identify patterns in user behavior that could inform marketing campaigns and website personalization strategies.” ๐งช Step 1: Basic Statistical Analysis Befo...
TIKTOK IS NOW AN OPEN-SOURCE IN USA ?
- Get link
- X
- Other Apps
In a recent blog post titled “Rebuilding TikTok in America,” Perplexity has proposed acquiring TikTok’s U.S. operations with the intent to revamp its algorithm to better align with American privacy standards. Their plan includes open-sourcing the algorithm to enhance transparency and trust among users. Perplexity asserts that their technical expertise positions them uniquely to undertake this transformation without leading to monopolistic control in the short-form video market. This proposal emerges amidst a backdrop of potential bans and legal challenges facing TikTok in the U.S., primarily due to national security concerns over its Chinese parent company, ByteDance. Other entities, including Amazon, have also expressed interest in acquiring TikTok’s U.S. assets. Additionally, figures like Kevin O’Leary and Frank McCourt have shown interest in purchasing TikTok’s U.S. operations, each presenting their own visions for restructuring the platform to prioritize user privacy ...
TEXT DATA AND THEIR CORRELATIONS
- Get link
- X
- Other Apps
Correlation analysis helps uncover relationships between variables. When comparing two pieces of text to each other, we aren’t looking for a single “correlation coefficient” in the statistical sense, but rather a similarity measure. These measures tell us how closely related or similar two texts are. A small sneak peek into this first - Correlation of Text with Numerical Variables Text data is unstructured and non-numeric, so measuring correlation involving text requires converting text into numeric form first. There are a couple of approaches to do this like the Feature Extraction or the Numeric Score derivation for text providing you two numerical variables whose correlation can be calculated using various metrics like the Kendall's Tau, Pearson's correlational Coefficient or Spearman's Correlational Coefficient. Feature Extraction (Bag-of-Words/TF‑IDF): represent each text (document, sentence, etc.) as a vector of features such as word counts or TF‑IDF scores, each uni...
ACCESS ALL YOUR DATA AT ONE DESTINATION - DATA FABRIC
- Get link
- X
- Other Apps
What is Data Fabric Data exploration and extraction is the most time taking process when forming a research question, when companies want to analyze their data, or even while building large transformer frameworks for various tasks across diverse domains. Our data is usually in various formats structured, semi-structured, unstructured or even real time extraction is needed now, we have Data Fabric - a solution to all the data problems. Data Fabric is a new gen data automation software that can provide an abstraction layer of all the data, whether it's physical form of data, from multi-cloud source or hybrid in nature it's all accumulated to one place by leveraging all their meta data into the abstraction layer allowing the software to access any form/part of data needed immediately by the organization. This helps in skimming over the data related tasks of discovery, transformation, integration, and preparation allowing organization to make better use of it's resources. Why U...