dataframe添加一列并迭代赋值_pd.DataFrame的assign方法

更新时间:2023-05-18 19:55:53 阅读: 评论:0

dataframe添加⼀列并迭代赋值_pd.DataFrame的assign⽅法
最近⾝体不太舒服,停更⼏天,万分抱歉,望tie⼦们见谅。另外,在单位开始着⼿处理to C业务,需要处理size较⼤的数据,最近忙于学习、运⽤pandas和sql,近期总结、梳理⼀些常⽤的知识点。
添加新column(s) — 直接赋值
⽐较加单,直接上code
import pandas as pd
import numpy as np
df = pd.DataFrame({'temp_c': [17.0, 25.0]} ,
index=('Portland', 'Berkeley') )
df
temp_c
Portland 17.0
Berkeley 25.0
df['temp_f']= df.temp_c * 9 / 5 + 32
# 等价于 df['temp_f']= df['temp_c'] * 9 / 5 + 32
df
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
直接赋值的⽅法,类似于dict  class的  key-value 赋值⼀般
如果没有这个column,则产⽣新的column;
如果column存在,更新该column中的value值
添加新column(s) — assign method
assign method是 pd.DataFrame class 的⼀个内置⽅法,⽤于添加或者更新 pd.DataFrame 的column(s)这个才是今天的重点知识点
assign的code定义
class DataFrame(NDFrame):
def assign(lf, **kwargs) -> "DataFrame":
r"""
Assign new columns to a DataFrame.
Returns a new object with all original columns in addition to new ones.
Existing columns that are re-assigned will be overwritten.
Parameters
----------
**kwargs : dict of {str: callable or Series}
The column names are keywords. If the values are
callable, they are computed on the DataFrame and
借款条范本assigned to the new columns. The callable must not
change input DataFrame (though pandas doesn't check it).
If the values are not callable, (e.g. a Series, scalar, or array),
they are simply assigned.
Returns
-------
DataFrame
A new DataFrame with the new columns in addition to
all the existing columns.
Notes
-----
Assigning multiple columns within the same ``assign`` is possible.
Later items in '\*\*kwargs' may refer to newly created or modified
columns in 'df'; items are computed and assigned into 'df' in order.
.. versionchanged:: 0.23.0
Keyword argument order is maintained.
Examples
-
-------
>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},悲伤爱情
...                  index=['Portland', 'Berkeley'])
>>> df
temp_c
Portland    17.0
Berkeley    25.0
Where the value is a callable, evaluated on `df`:
>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0
Alternatively, the same behavior can be achieved by directly
referencing an existing Series or quence:
>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0
You can create multiple columns within the same assign where one
of the columns depends on another one defined within the same assign:
>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
...          temp_k=lambda x: (x['temp_f'] +  459.67) * 5 / 9)
temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15
"""
Basic explanation
功能作⽤:(1)Returns a new object with all original columns in addition to new ones.  返回 pd.DataFrame,返回值保留原先dataFrame的所有column(s),同时添加新的column(s)  (2) Existing columns that are re-assigned will be overwritten. 如果assign所涉及的column(s)已经存在,则更新column(s)中的值
参数说明:**kwargs : dict of {str: callable or Series}  The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn't check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.
代码⽰例
代码⽰例1、
import pandas as pd
import numpy as np
df = pd.DataFrame({'temp_c': [17.0, 25.0]} ,
index=('Portland', 'Berkeley') )
df
temp_c
海贼壁纸
Portland 17.0
Berkeley 25.0
df.assign(  temp_f=df['temp_c'] * 9 / 5 + 32 )
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
df
temp_c
Portland 17.0
Berkeley 25.0
代码很简单,不多做解释,参数名  temp_f 作为新column名,参数值df['temp_c'] * 9 / 5 + 32 作为新column中的值注意⼀点:原先df不变
依从性
代码⽰例2
index=('Portland', 'Berkeley') )
df
temp_c
Portland 17.0
Berkeley 25.0
temp_f=lambda x: x.temp_c * 9 / 5 + 32
print(temp_f(  df ))
print( type(temp_f(  df )))
Portland    62.6
Berkeley    77.0
Name: temp_c, dtype: float64
'ies.Series'>
df.assign(temp_f=temp_f)
temp_c temp_f
Portland 17.0 62.6
Berkeley 25.0 77.0
temp_f=lambda x: p_c * 9 / 5 + 32)
print(temp_f(  df ))
print( type(temp_f(  df )))
print( df.assign(temp_f=df['temp_c'] * 9 / 5 + 32))
[62.6, 77.0]
'list'>
苏州好玩的temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0
民心有点意思吧,注意以下三点
依旧是 参数名= 参数值,这种赋值形式,参数名⽤于定于新的(或者待更新的)column名
此刻temp_f为⼀个匿名function,该function的参数默认为当前的pd.DataFrame对象
⾄于这个callable怎么定义,可以⾃由发挥,第⼀层我就简单⽤了pd.DataFrame对象的四则运算,第⼆次我转成了list对象,当然还可以添加更为复杂的路径,但是通常都是当前pd.DataFrame对象⼏组columns的统计性的数值
代码⽰例3、
index=('Portland', 'Berkeley') )
temp_f=lambda x: p_c * 9 / 5 + 32)
画画儿童画temp_k = lambda x: (x['temp_f'] +  459.67) * 5 / 9
适合女性的职业
df.assign(temp_f= temp_f , temp_k = temp_k )
temp_c temp_f temp_k
Portland 17.0 62.6 290.15
Berkeley 25.0 77.0 298.15
有意思,很复杂,其实也很简单,两个知识点:
可以同时assign 多个columns
one of the columns depends on another one defined within the same assign,即在添加的columns中 某⼀列的数值可以依赖于另⼀列⽣成
有意思的⽰例
df = pd.DataFrame({'col1':list('abcde'),'col2':range(5,10),'col3':[1.3,2.5,3.6,4.6,5.8]},
index=range(1,6))
df
col1 col2 col3
1 a 5 1.3
2 b 6 2.5
3 c 7 3.6
4 d 8 4.6
5 e 9 5.8
df.assign(C=pd.Series(list('def')))
col1 col2 col3 C
1 a 5 1.3 e
2 b 6 2.5 f
3 c 7 3.6 NaN
4 d 8 4.6 NaN
5 e 9 5.8 NaN
思考:为什么会出现NaN?(提⽰:索引对齐)assign左右两边的索引不⼀样,请问结果的索引谁说了算?
df = pd.DataFrame({'col1':list('abcde'),'col2':range(5,10),'col3':[1.3,2.5,3.6,4.6,5.8]},
index=range(1,6))
df

本文发布于:2023-05-18 19:55:53,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/914000.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:参数   赋值   处理   有点   数值
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图