### Transforming DataFrames #### Introduction to DataFrames #pandas is built off of #numpy and #matplotlib . **Rectangular Data** or **Tabular Data** is data stored in row/column format. #pandas is designed to work with **Rectangular Data**, known as a DataFrame object. ```python # Exploring a DataFrame print(dogs.head()) # returns first 5 rows of dataframe print(dogs.info()) # lists columns, datatypes, and # of non-null rows print(dogs.shape) # tuple that contains # of rows then # of column print(dogs.describe()) # calculates summary statistics of numerical columns print(dogs.values) # returns 2D Numpy Array of the dataframe print(dogs.columns) # returns column names print(dogs.index) # returns RangeIndex describing the index ``` #### Sorting and subsetting **Sorting** a DataFrame ```python dogs.sort_values('weight_kg') # ascending dogs.sort_values('weight_kg', ascending = False) # descending dogs.sort_values(['weight_kg', 'height_cm']) # sorts by weight then height dogs.sort_values(['weight_kg', 'height_cm'], ascending = [True, False]) ``` **Subsetting columns** ```python dogs['breed'] # subset one column dogs[['breed', 'height_cm']] # multiple columns ``` **Subsetting rows** ```python dogs[dogs['height_cm'] > 50] # get all dogs with heights gtr than 50cm # Using Conditions is_lab = dogs['breed'] == 'Labrador' is_brown = dogs['color'] == 'Brown' dogs[is_lab & is_brown] # & -> and # | -> or # ~ -> not # Multiple Categorical Conditions dogs[dogs['color'].isin(['Black', 'Brown'])] ``` #### New Columns Adding new columns has many names: - mutating - transforming - feature engineering ```python dogs['height_m'] = dogs['height_cm'] / 100 ``` **** ### Aggregating DataFrames **** ### Slicing and Indexing DataFrames **** ### Creating and Visualizing DataFrames