STUDENT PERFORMANCE ANALYSIS



INTERNSHIP PROJECT



This project analyzes student exam performance using Python,Pandas and Data Visualization techniques.
 

INTRODUCTION



In this project, we analyze student performance data to understand patterns in academic scores across math , reading, and writing subjects.

The goal is to:


* Calculate average scores.
* Identity top-performing students.

* Explore factors affecting performance.
* Visualize trends using graphs.

Tools Used:



* Python
* Pandas


* Matplotlib
* Seaborn
In [1]:
from google.colab import files
uploaded = files.upload()
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
Saving StudentsPerformance.csv.csv to StudentsPerformance.csv.csv
 

IMPORT LIBRARIES

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
print("Environment Ready !")
Environment Ready !
 

LOAD DATASET



We load the dataset into a Pandas DataFrame.
In [19]:
df= pd.read_csv("StudentsPerformance.csv.csv")
# Show first 5 rows
df.head()
Out[19]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score
0 female group B bachelor's degree standard none 72 72 74
1 female group C some college standard completed 69 90 88
2 female group B master's degree standard none 90 95 93
3 male group A associate's degree free/reduced none 47 57 44
4 male group C some college standard none 76 78 75
 

DATA CLEANING



Check for missing values, incorrect data or errors.
In [22]:
df.isnull().sum()
Out[22]:
0
gender 0
race/ethnicity 0
parental level of education 0
lunch 0
test preparation course 0
math score 0
reading score 0
writing score 0

In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 gender 1000 non-null object 1 race/ethnicity 1000 non-null object 2 parental level of education 1000 non-null object 3 lunch 1000 non-null object 4 test preparation course 1000 non-null object 5 math score 1000 non-null int64 6 reading score 1000 non-null int64 7 writing score 1000 non-null int64 dtypes: int64(3), object(5) memory usage: 62.6+ KB
 

DATA VALIDATION

In [7]:
(df[['math score','reading score','writing score']]<0).sum()
Out[7]:
0
math score 0
reading score 0
writing score 0

In [23]:
avg_math = df["math score"].mean()
avg_read = df["reading score"].mean()
avg_write = df["writing score"].mean()

averages = {"Math Average": avg_math,"Reading Average":avg_read,"Writing Average":avg_write}
averages
Out[23]:
{'Math Average': np.float64(66.089),
 'Reading Average': np.float64(69.169),
 'Writing Average': np.float64(68.054)}
In [9]:
df["total_score"] = df["math score"]+df["reading score"]+df["writing score"]
top_students = df.sort_values(by="total_score",ascending=False).head(5)
top_students
Out[9]:
gender race/ethnicity parental level of education lunch test preparation course math score reading score writing score total_score
916 male group E bachelor's degree standard completed 100 100 100 300
962 female group E associate's degree standard none 100 100 100 300
458 female group E bachelor's degree standard none 100 100 100 300
114 female group E bachelor's degree standard completed 99 100 100 299
712 female group D some college standard none 98 100 99 297
 

DATA VISUALIZATIONS

In [10]:
plt.figure()
sns.histplot(df["math score"],kde=True)
plt.title("Distribution of Math Scores")
plt.show()
Output
In [11]:
gender_avg = df.groupby("gender")[["math score","reading score","writing score"]].mean()
gender_avg.plot(kind="bar")
plt.title("Average Scores by Gender")
plt.show()
Output
In [13]:
parent_avg = df.groupby("parental level of education")["total_score"].mean().sort_values()
parent_avg.plot(kind="barh")
plt.title("Parental Eduaction vs Student Performance")
plt.xlabel("Average Total Score")
plt.show()
Output
In [15]:
race_avg = df.groupby("race/ethnicity")[["math score","reading score","writing score"]].mean()
race_avg.plot(kind="bar")
plt.title("Race/Ethnicity vs Average Scores")
plt.show()
Output
In [17]:
sns.heatmap(df[["math score","reading score","writing score"]].corr(),annot=True)
plt.title("Correlation Between Subjects")
plt.show()
Output
 

DATA ANALYSIS INSIGHTS

 


1. Students perform similarly in reading and writing.
2. Math scores show more variation comapared to other subjects.

1. Female students generally score higher in reading and writing.
2. Group E students show higher overall performance.

1. Strong correlation exists between reading and writing scores.
2. Students scoring high in one subject tend to score high in others.

1. Gneder gap is minimal in math but visible in reading/writing.
2. Some students perform expectionally across all subjects.

1. Students whose parents have higher education tends to score better.
2. Education background plays an important role in performance.

 

CONCLUSION



This project analysed student performance across math,reading and writing. We found that parental education ,gender,and ethnicity influence performance . Reading and Writing scores are strongly correlated , while math varies more. Overall , the dataset provides meaningful insights into academic trends.