Pandas (1/0.9 if submitted by 04/27 11:59)

  • make your own data using your brian, google or chatgpt, should look different than mine.
  • modify my code or write your own
  • output your data other than a bar graph.
  • answer the questions below, the more explained the better.
import pandas as pd

ace_attorney_games = pd.Series(["Phoenix Right: Ace Attorney","Justice for All","Trials and Tribulations","Apollo Justice: Ace Attorney","Ace Attorney Trilogy","Professor Layton vs. Phoenix Wright","Dual Destinies","The Great Ace Attorney: Adventures","The Great Ace Attorney Chronicles"])

year_released = pd.Series([2001,2002,2004,2007,2012,2012,2013,2015,2021])

IMdb_rating = pd.Series([8.3,8,8.8,7.8,6.5,7.7,7.9,8,8.7]) # this is out of 10

pd.DataFrame({"Ace Attorney Game": ace_attorney_games, "Year Released": year_released, "Average Rating (out of 10)": IMdb_rating})
Ace Attorney Game Year Released Average Rating (out of 10)
0 Phoenix Right: Ace Attorney 2001 8.3
1 Justice for All 2002 8.0
2 Trials and Tribulations 2004 8.8
3 Apollo Justice: Ace Attorney 2007 7.8
4 Ace Attorney Trilogy 2012 6.5
5 Professor Layton vs. Phoenix Wright 2012 7.7
6 Dual Destinies 2013 7.9
7 The Great Ace Attorney: Adventures 2015 8.0
8 The Great Ace Attorney Chronicles 2021 8.7
import pandas as pd
import matplotlib.pyplot as plt

# Create DataFrame with scores and number of students
scores = [5, 4, 3, 2, 1]
students = [15322, 28249, 41931, 26799, 22350]
df = pd.DataFrame({'Scores': scores, 'Students': students})

# Create stem plot
fig, ax = plt.subplots()
ax.stem(df['Scores'], df['Students'], linefmt='C0-', markerfmt='C0o', basefmt='C0-')
ax.set_xlabel('Score')
ax.set_ylabel('Number of Students')
ax.set_title('Scoring Distribution for the 2022 AP CSP Exam')

ax.set_xticks(range(1, 6))


# Show plot
plt.show()

Questions (0.9)

  • What are the two primary data structures in pandas and how do they differ?

The two primary data structures in pandas are Series and DataFrames. While Series is one dimensional (one column), DataFrames is two dimension (two columns)

  • How do you read a CSV file into a pandas DataFrame?

You read a CSV file into a pandas DataFrame using the command pd.read_csv (of course, you need to import pandas as pd in order for this to work).

  • How do you select a single column from a pandas DataFrame?

You select a single column from a pandas DataFrame using the command df[‘column name here']

  • How do you filter rows in a pandas DataFrame based on a condition?

df_filtered = df[df['Age'] >= [integer]

  • How do you group rows in a pandas DataFrame by a particular column?

df.groupby(‘category’)[column name here].count()

  • How do you aggregate data in a pandas DataFrame using functions like sum and mean?

df.groupby('category')[column name here].mean()

  • How do you handle missing values in a pandas DataFrame?

You can handle missing values in a pandas DataFrame by using functions such as fillna(), replace(), and interpolate() functions, which can all be used to replace NaN values with alternative values.

  • How do you merge two pandas DataFrames together?

merged_df = pd.merge(df1, df2, on=[thing here])

  • How do you export a pandas DataFrame to a CSV file?

df.to_csv(‘data.csv’, index=False)

  • What is the difference between a Series and a DataFrame in Pandas?

While a Series only has one data column, a DataFrame has both a data column and an index (two columns).

Data Analysis / Predictive Analysis (0.9)

  • How can Numpy and Pandas be used to preprocess data for predictive analysis?

Numpy and Pandas can be used to preprocess data for predictive analysis in several ways. Numpy can be used to manipulate numerical data, while Pandas can be used to work with structured data. Some examples of preprocessing steps include data cleaning, missing value imputation, scaling/normalization, and feature selection/extraction.

  • What machine learning algorithms can be used for predictive analysis, and how do they differ?

There are various machine learning algorithms that can be used for predictive analysis, including linear regression, logistic regression, decision trees, random forests, support vector machines, and neural networks. These algorithms differ in terms of their complexity, interpretability, accuracy, and ability to handle different types of data and tasks.

  • Can you discuss some real-world applications of predictive analysis in different industries?

Predictive analysis has many real-world applications across different industries, including healthcare, finance, marketing, and manufacturing. For example, predictive models can be used to predict patient outcomes, detect fraud, personalize marketing campaigns, and optimize production processes.

  • Can you explain the role of feature engineering in predictive analysis, and how it can improve model accuracy?

Feature engineering involves creating new features from the existing ones that can improve the accuracy of predictive models. This can be done by transforming, combining, or selecting features based on their relevance to the target variable. Good feature engineering can make a big difference in model performance, especially when the data is complex or high-dimensional.

  • How can machine learning models be deployed in real-time applications for predictive analysis?

Machine learning models can be deployed in real-time applications using various techniques, such as microservices, APIs, or containerization. The choice of deployment method depends on the requirements of the application, such as scalability, latency, and security.

  • Can you discuss some limitations of Numpy and Pandas, and when it might be necessary to use other data analysis tools?

While Numpy and Pandas are powerful data analysis tools, they have some limitations, such as their ability to handle large datasets, complex data structures, or non-numeric data. In such cases, it may be necessary to use other tools, such as Apache Spark, Dask, or TensorFlow.

  • How can predictive analysis be used to improve decision-making and optimize business processes?

Predictive analysis can be used to improve decision-making and optimize business processes in many ways. For example, it can help identify patterns and trends in customer behavior, detect anomalies and outliers in financial data, forecast demand for products, and optimize supply chain operations. By providing actionable insights and predictions, predictive analysis can help businesses make better decisions and achieve their goals more effectively.

Numpy (0.9)

(p.s. wget the tensorflow file to see the lesson.) wget https://raw.githubusercontent.com/KKcbal/amongus/master/_notebooks/2023-04-03-TenserFlow.ipynb its the first half of the tensorflow file For your hacks, use matplotlib and numpy to slice this image to display Waldo. Also find and display one other numpy function and blog about what it is used for.

from skimage import io
from matplotlib import pyplot as plt
waldo = io.imread('../images/waldo.png')
type(waldo)
plt.imshow(waldo)
<matplotlib.image.AxesImage at 0x7f9c7adc2e80>
print("HERE'S WALDO!")
plt.imshow(waldo[120:255, 335:420])
HERE'S WALDO!
<matplotlib.image.AxesImage at 0x7f9c7ad8bb50>

Other numpy functions that I have heard about include the many trigonometric functions available, such as numpy.sin, numpy.cos, and numpy.tan function, all shown below. All of these functions allow one to take a number and find the result depending on which function they choose. The code below allows the user to enter any float that they would like for it to display the values for sine, cosine, tangent, and more.

import numpy as np
num = float(input("What number would you like sine, cosine, tangent, sine inverse, cosine inverse, and tangent inverse of?"))
print(f'User input: {num}')
print(f'sin:{np.sin(num)}, cos:{np.cos(num)}, tan: {np.tan(num)}, sin-1: {np.arcsin(num)}, cos-1: {np.arccos(num)}, tan-1: {np.arctan(num)}')
User input: 0.5
sin:0.479425538604203, cos:0.8775825618903728, tan: 0.5463024898437905, sin-1: 0.5235987755982989, cos-1: 1.0471975511965979, tan-1: 0.4636476090008061

TenserFlow (Attempted to do these, but kept getting error)

import tensorflow as tf
import numpy as np

# Generate random data for training and testing
data = input("Enter data as a comma-separated list of x and y values (e.g. 1,2;3,4;5,6): ")
data = np.array([list(map(float, row.split(','))) for row in data.split(';')])
x_train, y_train = data[:int(len(data)*0.8), 0], data[:int(len(data)*0.8), 1]
x_test, y_test = data[int(len(data)*0.8):, 0], data[int(len(data)*0.8):, 1]

# Define the model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=1, input_shape=[1])
])

# Compile the model
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.1), loss='mean_squared_error')

# Train the model
history = model.fit(x_train, y_train, epochs=100, verbose=0)

# Evaluate the model on training and testing data
train_loss = model.evaluate(x_train, y_train, verbose=0)
test_loss = model.evaluate(x_test, y_test, verbose=0)

# Print the training and testing accuracy
print('Training loss: {:.4f}'.format(train_loss))
print('Testing loss: {:.4f}'.format(test_loss))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/emaad/vscode/emaad-fastpages/_notebooks/2023-04-30-Data-Analysis-Hacks.ipynb Cell 14 in <cell line: 1>()
----> <a href='vscode-notebook-cell://wsl%2Bubuntu/home/emaad/vscode/emaad-fastpages/_notebooks/2023-04-30-Data-Analysis-Hacks.ipynb#X41sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0'>1</a> import tensorflow as tf
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/emaad/vscode/emaad-fastpages/_notebooks/2023-04-30-Data-Analysis-Hacks.ipynb#X41sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1'>2</a> import numpy as np
      <a href='vscode-notebook-cell://wsl%2Bubuntu/home/emaad/vscode/emaad-fastpages/_notebooks/2023-04-30-Data-Analysis-Hacks.ipynb#X41sdnNjb2RlLXJlbW90ZQ%3D%3D?line=3'>4</a> # Generate random data for training and testing

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/__init__.py:37, in <module>
     34 import sys as _sys
     35 import typing as _typing
---> 37 from tensorflow.python.tools import module_util as _module_util
     38 from tensorflow.python.util.lazy_loader import LazyLoader as _LazyLoader
     40 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import.

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/__init__.py:42, in <module>
     37 from tensorflow.python.eager import context
     39 # pylint: enable=wildcard-import
     40 
     41 # Bring in subpackages.
---> 42 from tensorflow.python import data
     43 from tensorflow.python import distribute
     44 # from tensorflow.python import keras

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/__init__.py:21, in <module>
     15 """`tf.data.Dataset` API for input pipelines.
     16 
     17 See [Importing Data](https://tensorflow.org/guide/data) for an overview.
     18 """
     20 # pylint: disable=unused-import
---> 21 from tensorflow.python.data import experimental
     22 from tensorflow.python.data.ops.dataset_ops import AUTOTUNE
     23 from tensorflow.python.data.ops.dataset_ops import Dataset

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/__init__.py:97, in <module>
     15 """Experimental API for building input pipelines.
     16 
     17 This module contains experimental `Dataset` sources and transformations that can
   (...)
     93 @@UNKNOWN_CARDINALITY
     94 """
     96 # pylint: disable=unused-import
---> 97 from tensorflow.python.data.experimental import service
     98 from tensorflow.python.data.experimental.ops.batching import dense_to_ragged_batch
     99 from tensorflow.python.data.experimental.ops.batching import dense_to_sparse_batch

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/service/__init__.py:419, in <module>
      1 # Copyright 2020 The TensorFlow Authors. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     15 """API for using the tf.data service.
     16 
     17 This module contains:
   (...)
    416   job of ParameterServerStrategy).
    417 """
--> 419 from tensorflow.python.data.experimental.ops.data_service_ops import distribute
    420 from tensorflow.python.data.experimental.ops.data_service_ops import from_dataset_id
    421 from tensorflow.python.data.experimental.ops.data_service_ops import register_dataset

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py:22, in <module>
     20 from tensorflow.core.protobuf import data_service_pb2
     21 from tensorflow.python import tf2
---> 22 from tensorflow.python.data.experimental.ops import compression_ops
     23 from tensorflow.python.data.experimental.service import _pywrap_server_lib
     24 from tensorflow.python.data.experimental.service import _pywrap_utils

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/experimental/ops/compression_ops.py:16, in <module>
      1 # Copyright 2020 The TensorFlow Authors. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     15 """Ops for compressing and uncompressing dataset elements."""
---> 16 from tensorflow.python.data.util import structure
     17 from tensorflow.python.ops import gen_experimental_dataset_ops as ged_ops
     20 def compress(element):

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/util/structure.py:22, in <module>
     18 import itertools
     20 import wrapt
---> 22 from tensorflow.python.data.util import nest
     23 from tensorflow.python.framework import composite_tensor
     24 from tensorflow.python.framework import ops

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/data/util/nest.py:34, in <module>
      1 # Copyright 2017 The TensorFlow Authors. All Rights Reserved.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     13 # limitations under the License.
     14 # ==============================================================================
     16 """## Functions for working with arbitrarily nested sequences of elements.
     17 
     18 NOTE(mrry): This fork of the `tensorflow.python.util.nest` module
   (...)
     31    arrays.
     32 """
---> 34 from tensorflow.python.framework import sparse_tensor as _sparse_tensor
     35 from tensorflow.python.util import _pywrap_utils
     36 from tensorflow.python.util import nest

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/sparse_tensor.py:25, in <module>
     23 from tensorflow.python import tf2
     24 from tensorflow.python.framework import composite_tensor
---> 25 from tensorflow.python.framework import constant_op
     26 from tensorflow.python.framework import dtypes
     27 from tensorflow.python.framework import ops

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/constant_op.py:25, in <module>
     23 from tensorflow.core.framework import types_pb2
     24 from tensorflow.python.eager import context
---> 25 from tensorflow.python.eager import execute
     26 from tensorflow.python.framework import dtypes
     27 from tensorflow.python.framework import op_callbacks

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:21, in <module>
     19 from tensorflow.python import pywrap_tfe
     20 from tensorflow.python.eager import core
---> 21 from tensorflow.python.framework import dtypes
     22 from tensorflow.python.framework import ops
     23 from tensorflow.python.framework import tensor_shape

File ~/anaconda3/lib/python3.9/site-packages/tensorflow/python/framework/dtypes.py:37, in <module>
     34 from tensorflow.core.function import trace_type
     35 from tensorflow.tools.docs import doc_controls
---> 37 _np_bfloat16 = _pywrap_bfloat16.TF_bfloat16_type()
     38 _np_float8_e4m3fn = _pywrap_float8.TF_float8_e4m3fn_type()
     39 _np_float8_e5m2 = _pywrap_float8.TF_float8_e5m2_type()

TypeError: Unable to convert function return value to a Python type! The signature was
	() -> handle