# Using 'as' allows to avoid typing full name each time the module is referred to
import pandas as pd


sr1 = pd.Series([150.0, 120.0, 3000.0])
sr1

0     150.0
1     120.0
2    3000.0
dtype: float64


sr1[0] # Slicing is simiar to standard Python objects

150.0


sr1[sr1 > 200]

2    3000.0
dtype: float64


d = {'apple': 150.0, 'banana': 120.0, 'watermelon': 3000.0}


sr2 = pd.Series(d)
sr2

apple          150.0
banana         120.0
watermelon    3000.0
dtype: float64


sr2[0] # Recall that this slicing would be impossible for standard dictionary

150.0


sr2.index

Index(['apple', 'banana', 'watermelon'], dtype='object')


data = {'fruit': ['apple', 'banana', 'watermelon'], # DataFrame can be constructed from
        'weight': [150.0, 120.0, 3000.0],           # a dict of equal-length lists/arrays
        'berry': [False, True, True]}           
df = pd.DataFrame(data)
df


df.iloc[0] # First row

fruit     apple
weight    150.0
berry     False
Name: 0, dtype: object


df.iloc[:,0] # First column

0         apple
1        banana
2    watermelon
Name: fruit, dtype: object


df.iloc[:2] # Select the first two rows (with convenience shortcut for slicing)


df[:2]  # Shortcut


df.loc[:, ['fruit', 'berry']] # Select the columns 'fruit' and 'berry'


df[['fruit', 'berry']] # Shortcut


df.columns # Retrieve the names of all columns

Index(['fruit', 'weight', 'berry'], dtype='object')


df.columns[0] # This Index object is subsettable

'fruit'


df.columns.str.startswith('fr') # As column names are strings, we can apply str methods

array([ True, False, False])


df.iloc[:,df.columns.str.startswith('fr')] # This is helpful with more complicated column selection criteria


df[df.loc[:,'berry'] == False] # Select rows where fruits are not berries


df[df['berry'] == False] # The same can be achieved with more concise syntax


weight200 = df[df['weight'] > 200] # Create new dataset with rows where weight is higher than 200
weight200


df['fruit'].map(lambda x: x.upper())

0         APPLE
1        BANANA
2    WATERMELON
Name: fruit, dtype: object


transform = lambda x: x.capitalize()


transformed = df['fruit'].map(transform)


transformed

0         Apple
1        Banana
2    Watermelon
Name: fruit, dtype: object


f = open('../temp/test.txt', 'w') # Create a new file object in write mode


f.write('This is a test file.') # Write a string of characters to it

20


f.close() # Flush output buffers to disk and close the connection


with open('../temp/test.txt', 'r') as f: # Note that we use 'r' mode for reading
    text = f.read()


text

'This is a test file.'


# We specify that we want to combine first two rows as a header
kaggle2021 = pd.read_csv('../data/kaggle_survey_2021_responses.csv', header = [0,1])

/home/tpaskhalis/.local/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3441: DtypeWarning: Columns (195,201) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)


kaggle2021.head() # Returns the top n (n=5 default) rows


kaggle2021.tail() # Returns the bottom n (n=5 default) rows


kaggle2021.to_csv('../temp/kaggle2021.csv')

Expression	Selection Operation
`df[val]`	Column or sequence of columns +convenience (e.g. slice)
`df.loc[lab_i]`	Row or subset of rows by label
`df.loc[:, lab_j]`	Column or subset of columns by label
`df.loc[lab_i, lab_j]`	Both rows and columns by label
`df.iloc[i]`	Row or subset of rows by integer position
`df.iloc[:, j]`	Column or subset of columns by integer position
`df.iloc[i, j]`	Both rows and columns by integer position
`df.at[lab_i, lab_j]`	Single scalar value by row and column label
`df.iat[i, j]`	Single scalar value by row and column integer position

	Time from Start to Finish (seconds)	Q1	Q2	Q3	Q4	Q5	Q6	Q7_Part_1	Q7_Part_2	Q7_Part_3	...	Q38_B_Part_3	Q38_B_Part_4	Q38_B_Part_5	Q38_B_Part_6	Q38_B_Part_7	Q38_B_Part_8	Q38_B_Part_9	Q38_B_Part_10	Q38_B_Part_11	Q38_B_OTHER
	Duration (in seconds)	What is your age (# years)?	What is your gender? - Selected Choice	In which country do you currently reside?	What is the highest level of formal education that you have attained or plan to attain within the next 2 years?	Select the title most similar to your current role (or most recent title if retired): - Selected Choice	For how many years have you been writing code and/or programming?	What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python	What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R	What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL	...	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Comet.ml	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Sacred + Omniboard	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - TensorBoard	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Guild.ai	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Polyaxon	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - ClearML	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Domino Model Monitor	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - MLflow	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - None	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Other
0	910	50-54	Man	India	Bachelor’s degree	Other	5-10 years	Python	R	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	784	50-54	Man	Indonesia	Master’s degree	Program/Project Manager	20+ years	NaN	NaN	SQL	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	None	NaN
2	924	22-24	Man	Pakistan	Master’s degree	Software Engineer	1-3 years	Python	NaN	NaN	...	NaN	NaN	TensorBoard	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	575	45-49	Man	Mexico	Doctoral degree	Research Scientist	20+ years	Python	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	None	NaN
4	781	45-49	Man	India	Doctoral degree	Other	< 1 years	Python	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	Time from Start to Finish (seconds)	Q1	Q2	Q3	Q4	Q5	Q6	Q7_Part_1	Q7_Part_2	Q7_Part_3	...	Q38_B_Part_3	Q38_B_Part_4	Q38_B_Part_5	Q38_B_Part_6	Q38_B_Part_7	Q38_B_Part_8	Q38_B_Part_9	Q38_B_Part_10	Q38_B_Part_11	Q38_B_OTHER
	Duration (in seconds)	What is your age (# years)?	What is your gender? - Selected Choice	In which country do you currently reside?	What is the highest level of formal education that you have attained or plan to attain within the next 2 years?	Select the title most similar to your current role (or most recent title if retired): - Selected Choice	For how many years have you been writing code and/or programming?	What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python	What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R	What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL	...	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Comet.ml	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Sacred + Omniboard	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - TensorBoard	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Guild.ai	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Polyaxon	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - ClearML	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Domino Model Monitor	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - MLflow	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - None	In the next 2 years, do you hope to become more familiar with any of these tools for managing ML experiments? (Select all that apply) - Selected Choice - Other
25968	1756	30-34	Man	Egypt	Bachelor’s degree	Data Analyst	1-3 years	Python	NaN	SQL	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25969	253	22-24	Man	China	Master’s degree	Student	1-3 years	Python	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25970	494	50-54	Man	Sweden	Doctoral degree	Research Scientist	I have never written code	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	None	NaN
25971	277	45-49	Man	United States of America	Master’s degree	Data Scientist	5-10 years	Python	NaN	SQL	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
25972	255	18-21	Man	India	Bachelor’s degree	Business Analyst	I have never written code	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	None	NaN

Day 1, Part 3: Pandas and Data I/O¶

Introduction to Python¶

Tom Paskhalis¶

2022-06-27¶

RECSM Summer School 2022¶

Rectangular data¶

Tidy data¶

Data in Python¶

Pandas¶

Series¶

Indexing in Series¶

DataFrame - the workhorse of data analysis¶

Indexing in DataFrame¶

Summary of indexing in DataFrame¶

Subsetting in DataFrame¶

Columns in DataFrame¶

Filtering in DataFrame¶

Variable transformation¶

File object¶

Data input and output¶

Data output example¶

Data input example¶

Reading and writing data in `pandas`¶

Reading data in `pandas` example¶

Visual data inspection¶

Visual data inspection continued¶

Reading in other (non-`.csv`) data files¶

Writing data out in `pandas`¶

Additional pandas materials¶

Tomorrow¶

Day 1, Part 3: Pandas and Data I/O¶

Introduction to Python¶

Tom Paskhalis¶

2022-06-27¶

RECSM Summer School 2022¶

Rectangular data¶

Tidy data¶

Data in Python¶

Pandas¶

Series¶

Indexing in Series¶

DataFrame - the workhorse of data analysis¶

Indexing in DataFrame¶

Summary of indexing in DataFrame¶

Subsetting in DataFrame¶

Columns in DataFrame¶

Filtering in DataFrame¶

Variable transformation¶

File object¶

Data input and output¶

Data output example¶

Data input example¶

Reading and writing data in pandas¶

Reading data in pandas example¶

Visual data inspection¶

Visual data inspection continued¶

Reading in other (non-.csv) data files¶

Writing data out in pandas¶

Additional pandas materials¶

Tomorrow¶

Reading and writing data in `pandas`¶

Reading data in `pandas` example¶

Reading in other (non-`.csv`) data files¶

Writing data out in `pandas`¶