pandas.DataFrame的for循环迭代的实现

Editor · 发表于 2023-5-4 17:23:42

循环更新值当使用for语句循环（迭代）pandas.DataFrame时，简单的使用for语句便可以取得返回列名，因此使用重复使用for方法，便可以获取每行的值。
以下面的pandas.DataFrame为例。
import pandas as pd
df = pd.DataFrame({'age': [24, 42], 'state': ['NY', 'CA'], 'point': [64, 92]},
               index=['Alice', 'Bob'])
print(df)
#       age state  point
# Alice 24 NY    64
# Bob    42 CA    92
在此对以下内容进行说明：

pandas.DataFrame for循环的应用

逐列检索

DataFrame.iteritems()

逐行检索

DataFrame.iterrows()

DataFrame.itertuples()

检索特定列的值

循环更新值
pandas.DataFrame for循环的应用
当pandas.DataFrame直接使用for循环时，按以下顺序获取列名（列名）。
for column_name in df:
print(type(column_name))
print(column_name)
print('======\n')
#
# age
# ======
#
#
# state
# ======
#
#
# point
# ======
#
调用方法__iter __（）。
for column_name in df.__iter__():
print(type(column_name))
print(column_name)
print('======\n')
#
# age
# ======
#
#
# state
# ======
#
#
# point
# ======
#
逐列检索
DataFrame.iteritems()
使用iteritems（）方法，您可以一一获取列名称（列名称）和元组（列名称，系列）的每个列的数据（pandas.Series类型）。
pandas.Series可以通过指定索引名称等来检索行的值。
for column_name, item in df.iteritems():
print(type(column_name))
print(column_name)
print('~~~~~~')
print(type(item))
print(item)
print('------')
print(item['Alice'])
print(item[0])
print(item.Alice)
print('======\n')
#
# age
# ~~~~~~
#
# Alice 24
# Bob    42
# Name: age, dtype: int64
# ------
# 24
# 24
# 24
# ======
#
#
# state
# ~~~~~~
#
# Alice NY
# Bob    CA
# Name: state, dtype: object
# ------
# NY
# NY
# NY
# ======
#
#
# point
# ~~~~~~
#
# Alice 64
# Bob    92
# Name: point, dtype: int64
# ------
# 64
# 64
# 64
# ======
#
逐行检索
一次检索一行的方法包括iterrows（）和itertuples（）。 itertuples（）更快。
如果只需要特定列的值，则如下所述，指定列并将它们分别在for循环中进行迭代会更快。

DataFrame.iterrows()
通过使用iterrows（）方法，可以获得每一行的数据（pandas.Series类型）和行名和元组（索引，系列）。
pandas.Series可以通过指定列名等来检索列的值。
for index, row in df.iterrows():
print(type(index))
print(index)
print('~~~~~~')
print(type(row))
print(row)
print('------')
print(row['point'])
print(row[2])
print(row.point)
print('======\n')
#
# Alice
# ~~~~~~
#
# age    24
# state NY
# point 64
# Name: Alice, dtype: object
# ------
# 64
# 64
# 64
# ======
#
#
# Bob
# ~~~~~~
#
# age    42
# state CA
# point 92
# Name: Bob, dtype: object
# ------
# 92
# 92
# 92
# ======
DataFrame.itertuples()
使用itertuples（）方法，可以一一获取索引名（行名）和该行数据的元组。元组的第一个元素是索引名称。
默认情况下，返回一个名为Pandas的namedtuple。由于它是namedtuple，因此可以访问每个元素的值。
for row in df.itertuples():
print(type(row))
print(row)
print('------')
print(row[3])
print(row.point)
print('======\n')
#
# Pandas(Index='Alice', age=24, state='NY', point=64)
# ------
# 64
# 64
# ======
#
#
# Pandas(Index='Bob', age=42, state='CA', point=92)
# ------
# 92
# 92
# ======
#
如果参数name为None，则返回一个普通的元组。
for row in df.itertuples(name=None):
print(type(row))
print(row)
print('------')
print(row[3])
print('======\n')
#
# ('Alice', 24, 'NY', 64)
# ------
# 64
# ======
#
#
# ('Bob', 42, 'CA', 92)
# ------
# 92
# ======
检索特定列的值
上述的iterrows（）和itertuples（）方法可以检索每一行中的所有列元素，但是如果仅需要特定的列元素，可以使用以下方法。
pandas.DataFrame的列是pandas.Series。
print(df['age'])
# Alice 24
# Bob    42
# Name: age, dtype: int64
print(type(df['age']))
#
如果将pandas.Series应用于for循环，则可以按顺序获取值，因此，如果指定pandas.DataFrame列并将其应用于for循环，则可以按顺序获取该列中的值。
for age in df['age']:
print(age)
# 24
# 42
如果使用内置函数zip（），则可以一次收集多列值。
for age, point in zip(df['age'], df['point']):
print(age, point)
# 24 64
# 42 92
如果要获取索引（行名），使用index属性。如以上示例所示，可以与其他列一起通过zip（）获得。
print(df.index)
# Index(['Alice', 'Bob'], dtype='object')
print(type(df.index))
#
for index in df.index:
print(index)
# Alice
# Bob
for index, state in zip(df.index, df['state']):
print(index, state)
# Alice NY
# Bob CA
循环更新值
iterrows（）方法逐行检索值，返回一个副本，而不是视图，因此更改pandas.Series不会更新原始数据。
for index, row in df.iterrows():
row['point'] += row['age']
print(df)
#       age state  point
# Alice 24 NY    64
# Bob    42 CA    92
at[]选择并处理原始DataFrame中的数据时更新。
for index, row in df.iterrows():
df.at[index, 'point'] += row['age']
print(df)
#       age state  point
# Alice 24 NY    88
# Bob    42 CA 134
有关at[]的文章另请参考以下连接。
Pandas获取和修改任意位置的值（at,iat,loc,iloc）
请注意，上面的示例使用at[]只是一个示例，在许多情况下，有必要使用for循环来更新元素或基于现有列添加新列，for循环的编写更加简单快捷。
与上述相同的处理。上面更新的对象被进一步更新。
df['point'] += df['age']
print(df)
#       age state  point
# Alice 24 NY 112
# Bob    42 CA 176
可以添加新列。
df['new'] = df['point'] + df['age'] * 2
print(df)
#       age state  point  new
# Alice 24 NY 112  160
# Bob    42 CA 176  260
除了简单的算术运算之外，NumPy函数还可以应用于列的每个元素。以下是平方根的示例。另外，这里，NumPy的功能可以通过pd.np访问，但是，当然可以单独导入NumPy。
df['age_sqrt'] = pd.np.sqrt(df['age'])
print(df)
#       age state  point  new  age_sqrt
# Alice 24 NY 112  160  4.898979
# Bob    42 CA 176  260  6.480741
对于字符串，提供了用于直接处理列（系列）的字符串方法。下面是转换为小写并提取第一个字符的示例。
df['state_0'] = df['state'].str.lower().str[0]
print(df)
#       age state  point  new  age_sqrt state_0
# Alice 24 NY 112  160  4.898979    n
# Bob    42 CA 176  260  6.480741    c
到此这篇关于pandas.DataFrame的for循环迭代的实现的文章就介绍到这了,更多相关pandas.DataFrame for循环内容请搜索知鸟论坛以前的文章或继续浏览下面的相关文章希望大家以后多多支持知鸟论坛！

塞翁364 · 发表于 2023-6-29 01:03:03

论坛不能没有像楼主这样的人才啊！我会一直支持知鸟论坛。

墙和鸡蛋 · 发表于 2023-6-29 05:45:38

论坛不能没有像楼主这样的人才啊！我会一直支持知鸟论坛。

当当当当裤裆坦 · 发表于 2023-6-29 16:15:55

既然你诚信诚意的推荐了，那我就勉为其难的看看吧！知鸟论坛不走平凡路。

贺老师 · 发表于 2023-6-29 18:04:38

感谢楼主的无私分享！要想知鸟论坛好就靠你我他

123456809 · 发表于 2023-6-29 21:36:43

楼主，我太崇拜你了！我想我是一天也不能离开知鸟论坛。

Gordon520 · 发表于 2023-6-30 00:41:49

这东西我收了！谢谢楼主！知鸟论坛真好！

啤酒瓶空了缓 · 发表于 2023-6-30 01:00:33

感谢楼主的无私分享！要想知鸟论坛好就靠你我他

123456819 · 发表于 2023-6-30 01:44:48

论坛不能没有像楼主这样的人才啊！我会一直支持知鸟论坛。

123456823 · 发表于 2023-6-30 09:16:03

论坛不能没有像楼主这样的人才啊！我会一直支持知鸟论坛。

[Python] pandas.DataFrame的for循环迭代的实现