2/26/25

Discovering Machine Language a Primer Guide

Overview

Machine Learning can seem like a complex and mysterious field. This presentation aims to discover the core concepts of Machine Learning, providing a primer guide to key ideas like supervised and unsupervised learning, along with practical examples to illustrate their real-world applications. We'll also explore a GitHub repository with code examples to help you further your understanding and experimentation.

#BuildwithAI Series

Discovering Machine Language a Primer Guide

  • Follow this GitHub repo during the presentation: (Please star and follow the project for updates.)

👉 https://github.com/ozkary/machine-learning-engineering

YouTube Video

Video Agenda

Agenda:

  1. What is Machine Learning?

    • Definition and core concepts
  2. Why is Machine Learning Important?

    • Key applications and benefits
  3. Types of Machine Learning

    • Supervised Learning
    • Examples: Classification & Regression
    • Unsupervised Learning
    • Examples: Clustering & Dimensionality Reduction
  4. Problem Types

    • Regression: Predicting continuous values
    • Classification: Predicting categorical outcomes
  5. Model Development Process

    • Understand the Problem
    • Exploratory Data Analysis (EDA)
    • Data Preprocessing
    • Feature Engineering
    • Data Splitting
    • Model Selection
    • Training & Evaluation

Presentation

What is Machine Learning

ML is a subset of AI that focuses on enabling computers to learn and improve performance on a specific task without being explicitly programmed. In essence, it's about learning from data patterns to make predictions or decisions based on it.

Core Concepts

  • Learn from data
  • Improve performance with more training data
  • Main goal is to make predictions and decisions on new data
  • Learn the relation of data + outcome to define the model
  • The new data + model predicts an outcome

Discovering Machine Language a Primer Guide: What are LLMs?

Why is Machine Learning Important?

ML impacts how computers solve problems. Traditional systems rely on pre-defined rules programmed by humans. This approach struggles with complexity and doesn't adapt to new information. In contrast, ML enables computers to learn directly from data, similar to how humans learn.

  • Coding Rules
def heart_disease_risk_rule_based(age, overweight, diabetic):
     """
     Assesses heart disease risk based on a set of predefined rules.

     Args:
         age: Age of the individual (int).
         overweight: True if overweight, False otherwise (bool).
         diabetic: True if diabetic, False otherwise (bool).

     Returns:
         "High Risk" or "Low Risk" (str).
     """
     if age > 50 and overweight and diabetic:
         return "High Risk"
     elif age > 60 and (overweight or diabetic):
         return "High Risk"
     elif age > 40 and overweight and not diabetic:
        return "Moderate Risk"
     else:
         return "Low Risk"
  • Learning from data
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


df = pd.DataFrame(data)
# Prepare the data
X = df[['Age', 'Overweight', 'Diabetic']]  # Features
y = df['Heart Disease']  # Target

# Split data into training and testing sets
# X has the categories/features
# y has the target value
# train data is for training
# test data is for testing
# .2 means 20% of the data is used for testing 80% for training
# 42 is the seed for random shuffling

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Random Forest classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the model : {accuracy}")

# 70% - 80%: Often considered a reasonable starting point for many classification problems.
# 80% - 90%: Good performance for many applications.
# 90% - 95%: Very good performance. Often challenging to achieve, but possible for well-behaved problems with good data.
# > 95%: Excellent performance, potentially approaching the limits of what's possible for the problem. Be careful of overfitting if you're achieving very high accuracy.
# 100%: Usually a sign of overfitting.

👉 Jupyter Notebook

Types of ML Models - Supervised Learning

Examples

  • Regression: Predicting a continuous value (e.g., house prices, stock prices).
  • Classification: Predicting a category or class label (e.g., cat/dog/bird, disease/no disease).
  • Model Examples: Linear Regression, Logistic Regression, Decision Trees, Random Forest.

Discovering Machine Language a Primer Guide: Supervised Learning

Types of ML Models - Unsupervised Learning

Examples

  • Clustering: Grouping similar data points together (e.g., group patients by symptoms, age groups)

  • Association: Discovering relationships or associations between items (e.g., symptom association)

"Patients who report 'Fever' and 'Cough' are also frequently reporting 'Headache' or 'Muscle Aches'."

  • Model Examples: Clustering (k-means), association (Frequent Pattern Growth)

Discovering Machine Language a Primer Guide: Unsupervised Learning

Supervise Learning - Common Problem Types

Regression and Classification are two main problem types to solve. With Regression, we look to predict a continuous target variable like price, cost. With Classification, we look to predict a discrete target like a group or y/n class.

Problem Types

  1. Regression:

    • In regression, the target variable is continuous and represents a quantity or a number.
    • Example: Predicting house prices, temperature predictions, stock prices.
  2. Classification:

    • In classification, the target variable is discrete and represents a category or a class.
    • Example: spam vs. non-spam emails, predicting heart disease Y/N.

Discovering Machine Language a Primer Guide: Regression and Classification

ML Model Development Process - MLOps

Developing a new ML model involves understanding the core problem, then using a data engineering process to gather, explore, and prepare the data. We then move to the ML process to split the data, select the algorithm, train and evaluate the model.

  • Development Process
    • Understand the problem
    • Exploratory Data Analysis (EDA)
    • Data Preprocessing
    • Feature Engineering
    • Data Splitting
    • Model Selection
    • Train TRaining
    • Model Evaluate & Tuning
    • Deployment

MLOps is the operation process to manage the training, evaluation and deployment of your models

Discovering Machine Language a Primer Guide: MLOps process

👉 Vehicle MSRP Regression

Machine Learning Summary

Machine learning (ML) enables computers to learn patterns from data and make predictions or decisions without explicit programming, unlike rule-based systems. ML models improve as they process more data.

  • Supervised Learning:

    • Learns from labeled data (input-output pairs)
    • Regression: Predicts continuous values (e.g., house prices)
    • Classification: Predicts categories (e.g., heart disease y/n)
  • Unsupervised Learning:

    • Learns from unlabeled data (only inputs) and discovers patterns and structures.
    • Clustering: Grouping similar data points (e.g., group patients by symptoms, age groups)
    • Association: Discovering relationships between items (e.g., symptoms association)

While we've explored foundational areas, numerous other exciting topics exist, such as neural networks, natural language processing, computer vision, large language models (LLMs). Visit the repository for more exploration.

Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!

Leave comments on this post or contact me at:

👍 Originally published by ozkary.com

1/22/25

Smart Charts: Powered by AI to enhance Data Understanding

Overview

This presentation explores how Generative AI, particularly Large Language Models (LLMs), can empower engineers with deeper data understanding. We'll delve into creating complex charts using Python and demonstrate how LLMs can analyze these visualizations, identify trends, and suggest actionable insights. Learn how to effectively utilize LLMs through prompt engineering with natural language and discover how this technology can save you valuable time and effort.

#BuildwithAI Series

Smart Charts: Powered by AI to enhance Data Understanding

  • Follow this GitHub repo during the presentation: (Give it a star and follow the project)

👉 https://github.com/ozkary/ai-engineering

YouTube Video

Video Agenda

Agenda:

  1. Introduction to LLMs and their Role in Data Analysis and Training

    • What are LLMs, and how do they work?
    • LLMs in the context of data analysis and visualization.
  2. Prompt Engineering - Guiding the LLM

    • Crafting effective prompts for chart analysis.
    • Providing context within the prompt (chart type, data).
  3. Tokens - The Building Blocks

    • Understanding the concept of tokens in LLMs.
    • How token limits impact prompt design and model performance.
  4. Let AI Help with Data Insights - Real Use Case

    • Creating complex charts using Python libraries.
    • Write Prompts for Chart Analysis
    • Utilizing an LLM to analyze the generated charts.
    • Demonstrating how LLMs can identify trends, anomalies, and potential areas for improvement.
  5. Live Demo - Create complex charts using python and ask AI to help you with the analysis

    • Live coding demonstration of creating a complex chart and using an LLM to analyze it.

Why Attend?

  • Discover how to leverage LLMs to gain deeper insights from your data visualizations.
  • Learn practical techniques for crafting effective prompts to guide LLM analysis.
  • Enhance your data analysis skills with the power of AI.

Presentation

What are LLM Models - Not Skynet

Large Language Model (LLM) refers to a class of Generative AI models that are designed to understand prompts and questions and generate human-like text based on large amounts of training data. LLMs are built upon Foundation Models which have a focus on language understanding.

Common Tasks

  • Text and Code Generation: LLMs can generate code and data analysis text based on specific prompts

  • Natural Language Processing (NLP): Understand and generate human language, sentiment analysis, translation

  • Text Summarization: LLMs can condense lengthy pieces of text into concise summaries

  • Question Answering: LLMs can access and process information from various sources to answer questions, making a great fit for chatbots

Smart Charts with AI: What are LLMs?

Training LLM Models - Secret Sauce

Models are trained using a combination of machine learning and deep learning. Massive datasets of text are collected, cleaned, and fed into complex neural networks with multiple layers. These networks iteratively learn by analyzing patterns in the data, allowing them to map inputs like chart data to desired outputs, such as chart analysis.

Training Process:

  • Data Collection: Sources from books, articles, code repositories, and online conversations

  • Preprocessing: Data cleaning and formatting for the ML algorithms to understand it effectively

  • Model Training: The neural network architecture is trained on the data. The network adjusts its internal parameters to learn how to map input data (user stories) to desired outputs (code snippets)

  • Fine-tuning: Fine-tune models for specific tasks like code generation, by training the model on relevant data (e.g., specific programming languages, coding conventions).

Smart Charts with AI: Neural-Network

Transformer Architecture - Not Autobots

Transformer is a neural network architecture that excels at processing long sequences of text by analyzing relationships between words, no matter how far apart they are. This allows LLMs to understand complex language patterns and generate human-like text.

Components

  • Encoder: Process the input (use story) by using multiple encoder layers with self-attention Mechanism to analyze the relationship between words

  • Decoder: Uses the encoded information and its own attention mechanism to generate the output text (like code), ensuring it aligns with the text.

  • Attention Mechanism: Enables the model to effectively focus on the most important information for the task at hand, leading to improved NLP and generation capabilities.

Smart Charts with AI: Transformers encoder decoder attention mechanism

👉 Read: Attention is all you need by Google, 2017

Fine-Tuning for Specific Domain

Fine-tuning LLMs is a process to specialize a pre-trained model into a specific domain like data analysis with your process information.

Process:

  • Use the knowledge and parameters gained from a large pretrained dataset, source model
  • Enhance the model performance by retraining the source model with a domain specific and smaller dataset
  • Use the target model for the final integration

Smart Charts with AI: Fine-tuning a model

Tokens - The Building Blocks of Language Models

Large language models work by dissecting text into a sequence of tokens. These tokens act as the building blocks, allowing it to grasp the essence, structure, and connections within the text.

Details

  • Tokens can be individual words, punctuation marks, or even smaller sub-word units, depending on the specific LLM architecture.
  • The length of a word can influence the number of tokens it generates.
  • Similar to how Lego bricks come in various shapes and sizes, tokens can vary depending on the model's design.
  • Measure cost.

👉 Think of tokens as Lego blocks

Prompt Engineering - What is it?

Prompt engineering is the process of designing and optimizing prompts to better utilize LLMs. Well described prompts can help the AI models better understand the context and generate more accurate responses.

Features

  • Clarity and Specificity: Effective prompts are clear, concise, and specific about the task or desired response

  • Task Framing: Provide background information, specifying the desired output format (e.g., code, email, poem), or outlining specific requirements

  • Examples and Counter-Examples: Including relevant examples and counterexamples within the prompt can further guide the LLM

  • Instructional Language: Use clear and concise instructions to improve the LLM's understanding of what information to generate

Smart Charts with AI: Data Analysis Prompt

Chart Analysis with AI

By combining existing charts with AI-driven analysis, we can unlock deeper insights, automate interpretation, and empower users to make more informed decisions.

Data Analysis Flow:

  • Chart Data: Identify the key data points from the chart (e.g., x-axis values, y-axis values, data labels, limits).

  • Chart Prompt: Format this information in a concise and human-readable format, such as:

"We are looking at a control chart measuring Curvature data points…"

  • Analysis Prompt: Provide details about what would you like to learn from the data:

"Interpret this chart, data series, limits and action items to take?"

👉 LLM generated analysis is not perfect, if the prompt is not detail enough, hallucinations may occurred

Smart Charts with AI: Data Analysis Chart

Smart Charts Powered by AI - Summary

LLMs empower developers, data engineers/analysts, and scientists by enhancing data understanding through AI-driven chart analysis. To ensure accurate and insightful analysis, crafting detailed prompts is crucial.

  • Provide Chart Context:

    • Chart Type: (e.g., line chart, bar chart, pie chart, scatter plot)
    • Chart Title
    • Data Series
    • Data Ranges/Limits: (e.g., Time period, Upper/Lower Limits)
  • Provide Guiding Questions:

    • What is the overall trend of the data?
    • Are there any significant peaks or dips?
    • Are there any outliers or anomalies?
    • What are the key takeaways from this chart?
    • What actions, if any, should be considered?

👉 By framing the prompt with contextual and guiding questions, you effectively "train" the model to analyze the chart in a more human-like and insightful manner.

Thanks for reading! 😊 If you enjoyed this post and would like to stay updated with our latest content, don’t forget to follow us. Join our community and be the first to know about new articles, exclusive insights, and more!

Leave comments on this post or contact me at:

👍 Originally published by ozkary.com