Hacks
import pandas as pd
ace_attorney_games = pd.Series(["Phoenix Right: Ace Attorney","Justice for All","Trials and Tribulations","Apollo Justice: Ace Attorney","Ace Attorney Trilogy","Professor Layton vs. Phoenix Wright","Dual Destinies","The Great Ace Attorney: Adventures","The Great Ace Attorney Chronicles"])
year_released = pd.Series([2001,2002,2004,2007,2012,2012,2013,2015,2021])
IMdb_rating = pd.Series([8.3,8,8.8,7.8,6.5,7.7,7.9,8,8.7]) # this is out of 10
pd.DataFrame({"Ace Attorney Game": ace_attorney_games, "Year Released": year_released, "Average Rating (out of 10)": IMdb_rating})
import pandas as pd
import matplotlib.pyplot as plt
# Create DataFrame with scores and number of students
scores = [5, 4, 3, 2, 1]
students = [15322, 28249, 41931, 26799, 22350]
df = pd.DataFrame({'Scores': scores, 'Students': students})
# Create stem plot
fig, ax = plt.subplots()
ax.stem(df['Scores'], df['Students'], linefmt='C0-', markerfmt='C0o', basefmt='C0-')
ax.set_xlabel('Score')
ax.set_ylabel('Number of Students')
ax.set_title('Scoring Distribution for the 2022 AP CSP Exam')
ax.set_xticks(range(1, 6))
# Show plot
plt.show()
Questions (0.9)
- What are the two primary data structures in pandas and how do they differ?
The two primary data structures in pandas are Series and DataFrames. While Series is one dimensional (one column), DataFrames is two dimension (two columns)
- How do you read a CSV file into a pandas DataFrame?
You read a CSV file into a pandas DataFrame using the command pd.read_csv (of course, you need to import pandas as pd in order for this to work).
- How do you select a single column from a pandas DataFrame?
You select a single column from a pandas DataFrame using the command df[‘column name here']
- How do you filter rows in a pandas DataFrame based on a condition?
df_filtered = df[df['Age'] >= [integer]
- How do you group rows in a pandas DataFrame by a particular column?
df.groupby(‘category’)[column name here].count()
- How do you aggregate data in a pandas DataFrame using functions like sum and mean?
df.groupby('category')[column name here].mean()
- How do you handle missing values in a pandas DataFrame?
You can handle missing values in a pandas DataFrame by using functions such as fillna(), replace(), and interpolate() functions, which can all be used to replace NaN values with alternative values.
- How do you merge two pandas DataFrames together?
merged_df = pd.merge(df1, df2, on=[thing here])
- How do you export a pandas DataFrame to a CSV file?
df.to_csv(‘data.csv’, index=False)
- What is the difference between a Series and a DataFrame in Pandas?
While a Series only has one data column, a DataFrame has both a data column and an index (two columns).
Data Analysis / Predictive Analysis (0.9)
- How can Numpy and Pandas be used to preprocess data for predictive analysis?
Numpy and Pandas can be used to preprocess data for predictive analysis in several ways. Numpy can be used to manipulate numerical data, while Pandas can be used to work with structured data. Some examples of preprocessing steps include data cleaning, missing value imputation, scaling/normalization, and feature selection/extraction.
- What machine learning algorithms can be used for predictive analysis, and how do they differ?
There are various machine learning algorithms that can be used for predictive analysis, including linear regression, logistic regression, decision trees, random forests, support vector machines, and neural networks. These algorithms differ in terms of their complexity, interpretability, accuracy, and ability to handle different types of data and tasks.
- Can you discuss some real-world applications of predictive analysis in different industries?
Predictive analysis has many real-world applications across different industries, including healthcare, finance, marketing, and manufacturing. For example, predictive models can be used to predict patient outcomes, detect fraud, personalize marketing campaigns, and optimize production processes.
- Can you explain the role of feature engineering in predictive analysis, and how it can improve model accuracy?
Feature engineering involves creating new features from the existing ones that can improve the accuracy of predictive models. This can be done by transforming, combining, or selecting features based on their relevance to the target variable. Good feature engineering can make a big difference in model performance, especially when the data is complex or high-dimensional.
- How can machine learning models be deployed in real-time applications for predictive analysis?
Machine learning models can be deployed in real-time applications using various techniques, such as microservices, APIs, or containerization. The choice of deployment method depends on the requirements of the application, such as scalability, latency, and security.
- Can you discuss some limitations of Numpy and Pandas, and when it might be necessary to use other data analysis tools?
While Numpy and Pandas are powerful data analysis tools, they have some limitations, such as their ability to handle large datasets, complex data structures, or non-numeric data. In such cases, it may be necessary to use other tools, such as Apache Spark, Dask, or TensorFlow.
- How can predictive analysis be used to improve decision-making and optimize business processes?
Predictive analysis can be used to improve decision-making and optimize business processes in many ways. For example, it can help identify patterns and trends in customer behavior, detect anomalies and outliers in financial data, forecast demand for products, and optimize supply chain operations. By providing actionable insights and predictions, predictive analysis can help businesses make better decisions and achieve their goals more effectively.
Numpy (0.9)
(p.s. wget the tensorflow file to see the lesson.) wget https://raw.githubusercontent.com/KKcbal/amongus/master/_notebooks/2023-04-03-TenserFlow.ipynb its the first half of the tensorflow file For your hacks, use matplotlib and numpy to slice this image to display Waldo. Also find and display one other numpy function and blog about what it is used for.
from skimage import io
from matplotlib import pyplot as plt
waldo = io.imread('../images/waldo.png')
type(waldo)
plt.imshow(waldo)
print("HERE'S WALDO!")
plt.imshow(waldo[120:255, 335:420])
Other numpy functions that I have heard about include the many trigonometric functions available, such as numpy.sin, numpy.cos, and numpy.tan function, all shown below. All of these functions allow one to take a number and find the result depending on which function they choose. The code below allows the user to enter any float that they would like for it to display the values for sine, cosine, tangent, and more.
import numpy as np
num = float(input("What number would you like sine, cosine, tangent, sine inverse, cosine inverse, and tangent inverse of?"))
print(f'User input: {num}')
print(f'sin:{np.sin(num)}, cos:{np.cos(num)}, tan: {np.tan(num)}, sin-1: {np.arcsin(num)}, cos-1: {np.arccos(num)}, tan-1: {np.arctan(num)}')
import tensorflow as tf
import numpy as np
# Generate random data for training and testing
data = input("Enter data as a comma-separated list of x and y values (e.g. 1,2;3,4;5,6): ")
data = np.array([list(map(float, row.split(','))) for row in data.split(';')])
x_train, y_train = data[:int(len(data)*0.8), 0], data[:int(len(data)*0.8), 1]
x_test, y_test = data[int(len(data)*0.8):, 0], data[int(len(data)*0.8):, 1]
# Define the model
model = tf.keras.Sequential([
tf.keras.layers.Dense(units=1, input_shape=[1])
])
# Compile the model
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.1), loss='mean_squared_error')
# Train the model
history = model.fit(x_train, y_train, epochs=100, verbose=0)
# Evaluate the model on training and testing data
train_loss = model.evaluate(x_train, y_train, verbose=0)
test_loss = model.evaluate(x_test, y_test, verbose=0)
# Print the training and testing accuracy
print('Training loss: {:.4f}'.format(train_loss))
print('Testing loss: {:.4f}'.format(test_loss))