STUDENT PERFORMANCE ANALYSIS

INTERNSHIP PROJECT

This project analyzes student exam performance using Python,Pandas and Data Visualization techniques.

INTRODUCTION

In this project, we analyze student performance data to understand patterns in academic scores across math , reading, and writing subjects.

The goal is to:

* Calculate average scores.
* Identity top-performing students.

* Explore factors affecting performance.
* Visualize trends using graphs.

Tools Used:

* Python
* Pandas

* Matplotlib
* Seaborn

In [1]:

from google.colab import files
uploaded = files.upload()

Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.

Saving StudentsPerformance.csv.csv to StudentsPerformance.csv.csv

IMPORT LIBRARIES

In [2]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
print("Environment Ready !")

Environment Ready !

LOAD DATASET

We load the dataset into a Pandas DataFrame.

In [19]:

df= pd.read_csv("StudentsPerformance.csv.csv")
# Show first 5 rows
df.head()

Out[19]:

	gender	race/ethnicity	parental level of education	lunch	test preparation course	math score	reading score	writing score
0	female	group B	bachelor's degree	standard	none	72	72	74
1	female	group C	some college	standard	completed	69	90	88
2	female	group B	master's degree	standard	none	90	95	93
3	male	group A	associate's degree	free/reduced	none	47	57	44
4	male	group C	some college	standard	none	76	78	75

DATA CLEANING

Check for missing values, incorrect data or errors.

In [22]:

df.isnull().sum()

Out[22]:

	0
gender	0
race/ethnicity	0
parental level of education	0
lunch	0
test preparation course	0
math score	0
reading score	0
writing score	0

dtype: int64

In [6]:

df.info()

<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 gender 1000 non-null object 1 race/ethnicity 1000 non-null object 2 parental level of education 1000 non-null object 3 lunch 1000 non-null object 4 test preparation course 1000 non-null object 5 math score 1000 non-null int64 6 reading score 1000 non-null int64 7 writing score 1000 non-null int64 dtypes: int64(3), object(5) memory usage: 62.6+ KB

DATA VALIDATION

In [7]:

(df[['math score','reading score','writing score']]<0).sum()

Out[7]:

	0
math score	0
reading score	0
writing score	0

dtype: int64

In [23]:

avg_math = df["math score"].mean()
avg_read = df["reading score"].mean()
avg_write = df["writing score"].mean()

averages = {"Math Average": avg_math,"Reading Average":avg_read,"Writing Average":avg_write}
averages

Out[23]:

{'Math Average': np.float64(66.089),
 'Reading Average': np.float64(69.169),
 'Writing Average': np.float64(68.054)}

In [9]:

df["total_score"] = df["math score"]+df["reading score"]+df["writing score"]
top_students = df.sort_values(by="total_score",ascending=False).head(5)
top_students

Out[9]:

	gender	race/ethnicity	parental level of education	lunch	test preparation course	math score	reading score	writing score	total_score
916	male	group E	bachelor's degree	standard	completed	100	100	100	300
962	female	group E	associate's degree	standard	none	100	100	100	300
458	female	group E	bachelor's degree	standard	none	100	100	100	300
114	female	group E	bachelor's degree	standard	completed	99	100	100	299
712	female	group D	some college	standard	none	98	100	99	297

DATA VISUALIZATIONS

In [10]:

plt.figure()
sns.histplot(df["math score"],kde=True)
plt.title("Distribution of Math Scores")
plt.show()

In [11]:

gender_avg = df.groupby("gender")[["math score","reading score","writing score"]].mean()
gender_avg.plot(kind="bar")
plt.title("Average Scores by Gender")
plt.show()

In [13]:

parent_avg = df.groupby("parental level of education")["total_score"].mean().sort_values()
parent_avg.plot(kind="barh")
plt.title("Parental Eduaction vs Student Performance")
plt.xlabel("Average Total Score")
plt.show()

In [15]:

race_avg = df.groupby("race/ethnicity")[["math score","reading score","writing score"]].mean()
race_avg.plot(kind="bar")
plt.title("Race/Ethnicity vs Average Scores")
plt.show()

In [17]:

sns.heatmap(df[["math score","reading score","writing score"]].corr(),annot=True)
plt.title("Correlation Between Subjects")
plt.show()

DATA ANALYSIS INSIGHTS

1. Students perform similarly in reading and writing.
2. Math scores show more variation comapared to other subjects.

1. Female students generally score higher in reading and writing.
2. Group E students show higher overall performance.

1. Strong correlation exists between reading and writing scores.
2. Students scoring high in one subject tend to score high in others.

1. Gneder gap is minimal in math but visible in reading/writing.
2. Some students perform expectionally across all subjects.

1. Students whose parents have higher education tends to score better.
2. Education background plays an important role in performance.

CONCLUSION

This project analysed student performance across math,reading and writing. We found that parental education ,gender,and ethnicity influence performance . Reading and Writing scores are strongly correlated , while math varies more. Overall , the dataset provides meaningful insights into academic trends.