### Python Basics A #python-script is a text file with the `.py` extension. - List of python commands as it would be typed in the terminal. ### Python Lists A python #list is defined as follows `[a,b,c]`. Lists can contain Lists! ```python family = [['Andrew', 'Son'], ['Anthony', 'Dad'], ['Judith', 'Mom'], ['Raisha', 'Bae']] ``` **Indexing** in python is #zero-indexing which means it starts at 0 - `list[0]` returns the first element of the list - `list[-1]` returns the last element of a list - You can use +/- numbers to index forward/backward **Slicing** lists in python creates subsets of a list - `list[start:end]` - start: inclusive - end: exclusive - You do not have to provide a start or an end index #### Manipulating Lists Changing list elements using **Indexing** and **Slicing** ```python # Indexing list[0] = 1 # Slicing' list[0:3] = [1,2,3,4] ``` Adding/Removing Elements ```python # Adding [1] + [1,2,3,4,5] # Adding at the start [1,2,3,4,5] + [1] # Adding at the end # Removing del(list[0]) # Indexing del(list[0:3]) # Slicing ``` Referencing and instantiating variables in python works as follows. ![[Screenshot 2024-06-11 at 5.32.45 PM.png]] Copying lists without referencing the old list ```python x = [1,2,3] # Copying x in y y = list(x) # OR y = x[:] ``` ### Functions and Packages A **function** is a piece of reusable code! ```python # Some Built-in python functions max(x_list) # Gives the maximum value in a list type(x_list) # Gives the type round(1.68,1) # Rounds numbers help(round) # Provides documentation! len(x_list) # Provides the number of elements in a list sorted(x_list) # Sorts a list ``` A **method** are python functions that belong to **objects**. ```python # Some list methods x_list.index('a') # Returns that index of the search parameter x_list.count('a') # Returns the number of times 'a' appears in the list x_list.append('a') # Adds 'a' to the end of the list ``` ```python # Some string methods x_str.capitalize() # Capitalizes the first element of the string x_str.replace('a','b') # Replaces all occurences of 'a' with 'b' x_str.count('o') # counts the number of o's in x_str ``` An **attribute** is a static call (similar to a method) to an object ie `object.shape`. A **package** is a directory of python scripts where each script is **module**. A **module** specifies functions, methods, and types. ```python # Importing Python packages import numpy as np # Importing specific modules in a python package from numpy import array ``` ### NumPy **NumPy** is numeric python. **SPEED!!!** since NumPy enforces a single data type, many of the below statistical calculations are much faster. One advantage of a **NumPy** array over python #list is that you can perform mathematical operations on **NumPy** arrays. For a python #list you would need to loop through each value and perform the operation. ```python import numpy as np # In this example we are showing the difference between calculating BMI using a list vs a numpy array height = [1, 2, 3, 4] weight = [3, 4, 5, 6] # Python List example bmi = [] for i in range(len(height)): bmi.append(weight[i]/(height[i]**2)) print(bmi) # Numpy Array example np_height = np.array(height) np_weight = np.array(weight) print(np_weight/np_height**2) # Result [3.0, 1.0, 0.5555555555555556, 0.375] [3. 1. 0.55555556 0.375 ] ``` NumPy arrays behave similarly to python #list. ```python # Cool subsetting in NumPy arrays similar to DataFrames bmi[bmi > 23] # will return an array with elements greater than 23 ``` **2D NumPy Arrays** are created using python list-of-lists! ```python np_2d = [[1,2], [3,4]] np.array(np_2d) ``` **Subsetting 2D Arrays** ```python np_2d[0][2] # returns the first row's third element np_2d[0, 2] # same as above. Can be interpretted as first row third column np_2d[:][1:3] # returns all rows, only columns 2 and 3 np_2d[:, 1:3] # same as above. ``` ```python # Datacamp useful exercise import numpy as np # Create np_baseball (2 cols) np_baseball = np.array(baseball) # Print out the 50th row of np_baseball print(np_baseball[49]) # Select the entire second column of np_baseball: np_weight_lb np_weight_lb = np_baseball[:, 1] # Print out height of 124th player print(np_baseball[123, 0]) ``` **Statistics in NumPy** Calculating the #mean in NumPy ```python import numpy as np np.mean(np_2d[:, 0]) # calculating the mean of the first column ``` Calculating the #median in NumPy ```python import numpy as np np.median(np_2d[:, 0]) # calculating the median of the first column ``` Calculating the #correlation in NumPy using the #pearson-correlation-coefficient or Pearson product-moment correlation coefficients ```python import numpy as np np.corrcoef(np_2d[:, 0], np_2d[:, 1] # calculating the correaltion of the first column and the second column ``` Calculating the #standard-deviation in NumPy ```python import numpy as np np.std(np_2d[:, 0]) # calculating the standard deviation of the first column ``` ```python # Nice statisitcs exercise from Datacamp import numpy as np # Convert positions and heights to numpy arrays: np_positions, np_heights np_heights = np.array(heights) np_positions = np.array(positions) # Heights of the goalkeepers: gk_heights gk_heights = np_heights[np_positions == 'GK'] # Heights of the other players: other_heights other_heights = np_heights[np_positions != 'GK'] # Print out the median height of goalkeepers. Replace 'None' print("Median height of goalkeepers: " + str(np.median(gk_heights))) # Print out the median height of other players. Replace 'None' print("Median height of other players: " + str(np.median(other_heights))) ```