Service details

  • Home
  • CS420 – Data Science
CS420

CS420 – Data Science

Data Science is a multidisciplinary field that combines programming, statistical analysis, and domain expertise to extract meaningful insights from structured and unstructured data. This course begins with the foundations of data collection and preprocessing, teaching students how to clean, normalize, and transform raw datasets using tools like Pandas, NumPy, and OpenRefine. Students are introduced to exploratory data analysis (EDA) through visualization libraries such as Matplotlib, Seaborn, and Plotly, which help identify trends, patterns, and anomalies. The course progresses into machine learning, where students learn the difference between supervised and unsupervised learning. Algorithms like linear regression, decision trees, K-nearest neighbors, K-means clustering, and Naïve Bayes are explored, with emphasis on model training, testing, and validation. Students also work with scikit-learn and are briefly introduced to deep learning using frameworks such as TensorFlow or PyTorch. Other key components include Big Data processing using Spark, cloud-based data handling (Google Colab, AWS S3), and database querying (SQL/NoSQL). The course culminates in a capstone project where students build a complete data pipeline — from ingestion to prediction — showcasing real-world applications in finance, health, business, or social media.

Other Services

Projects Overview

about-img

CS420 – Data Science

This course provides an introduction to data science concepts, including data processing, analysis, visualization, and machine learning. Students learn to use Python and tools like Pandas, Matplotlib, and Scikit-learn to explore real-world datasets and extract actionable insights.

  • Data cleaning and preprocessing techniques

  • Statistical analysis and hypothesis testing

  • Exploratory Data Analysis (EDA)

  • Supervised and unsupervised machine learning

  • Model evaluation and deployment basics

  • Python libraries like NumPy, Pandas, Seaborn

  • Jupyter Notebook usage for projects

  • Working on real-world datasets

  • Project-based learning and reporting