Pandas基礎操作(Dataframe and Series

Kola (Yan-Hao Wang)

Mar 16, 2023

DataFrame

初始化：可以使用 list, dict, numpy 初始化

載入資料：

read_csv()
a. chunksize 參數，當你的資料量太大時，沒辦法一次讀入整個檔案，因為我們的 RAM 不夠大，這時候可以用 chunksize 設定一次讀入 RAM 的資料量，
b. usecols=[‘column1’, ‘column2’, ‘column4’] 參數，指定你讀入時要哪些 columns。此文章有提供詳細使用，甚至教你怎麼 exclude some specific columns。
c. index_col=False，在讀入 csv 時，總會有第一行表示之前此 row 所在的 index(index column)，所以可以用index = Falsue不讀入此column。https://stackoverflow.com/questions/45532711/pandas-read-csv-method-is-using-too-much-ram
3. nrows 參數，你想要讀幾個 row，nrows = 1通常代表只讀column name的row。

df相關資訊：

df.shape 回傳dataframe形狀，Tuple of array dimensions. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shape.html

存取資料：

head(a) 取得前 a 個 row 資料
中括弧
loc()，使用欄位名稱。iloc()，使用index存取
at()、iat() 類似上著，但是存取單一個元素

at and iat are meant to access a scalar, that is, a single element in the dataframe, while loc and iloc are ments to access several elements at the same time, potentially to perform vectorized operations.
https://stackoverflow.com/questions/28757389/pandas-loc-vs-iloc-vs-at-vs-iat

Iterate(迭代): 盡量不要使用意代，因為會很慢

for index, row in df.iterrows(): 走訪所有row，index代表第幾個row(從0開始，不包括column name row)，row為那個row的object，注意更改row的value不會更改df的value，還是要對df直接操作。https://blog.csdn.net/Softdiamonds/article/details/80218777

Pandas SettingwithCopy 警告解决方案

Series

…

Pandas基礎操作(Dataframe and Series

DataFrame

Series

Written by Kola (Yan-Hao Wang)

No responses yet