首页 > 美文鉴赏

dataframe添加一列并迭代赋值_pd.DataFrame的assign方法

更新时间:2023-05-18 19:55:53 阅读：评论：0

dataframe添加⼀列并迭代赋值_pd.DataFrame的assign⽅法

最近⾝体不太舒服，停更⼏天，万分抱歉，望tie⼦们见谅。另外，在单位开始着⼿处理to C业务，需要处理size较⼤的数据，最近忙于学习、运⽤pandas和sql，近期总结、梳理⼀些常⽤的知识点。

添加新column(s) — 直接赋值

⽐较加单，直接上code

import pandas as pd

import numpy as np

df = pd.DataFrame({'temp_c': [17.0, 25.0]} ,

index=('Portland', 'Berkeley') )

temp_c

Portland 17.0

Berkeley 25.0

df['temp_f']= df.temp_c * 9 / 5 + 32

# 等价于 df['temp_f']= df['temp_c'] * 9 / 5 + 32

temp_c temp_f

Portland 17.0 62.6

Berkeley 25.0 77.0

直接赋值的⽅法，类似于dict class的 key-value 赋值⼀般

如果没有这个column，则产⽣新的column；

如果column存在，更新该column中的value值

添加新column(s) — assign method

assign method是 pd.DataFrame class 的⼀个内置⽅法，⽤于添加或者更新 pd.DataFrame 的column(s)这个才是今天的重点知识点

assign的code定义

class DataFrame(NDFrame):

def assign(lf, **kwargs) -> "DataFrame":

r"""

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones.

Existing columns that are re-assigned will be overwritten.

Parameters

----------

**kwargs : dict of {str: callable or Series}

The column names are keywords. If the values are

callable, they are computed on the DataFrame and

借款条范本assigned to the new columns. The callable must not

change input DataFrame (though pandas doesn't check it).

If the values are not callable, (e.g. a Series, scalar, or array),

they are simply assigned.

Returns

-------

DataFrame

A new DataFrame with the new columns in addition to

all the existing columns.

Notes

-----

Assigning multiple columns within the same ``assign`` is possible.

Later items in '\*\*kwargs' may refer to newly created or modified

columns in 'df'; items are computed and assigned into 'df' in order.

.. versionchanged:: 0.23.0

Keyword argument order is maintained.

Examples

-------

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},悲伤爱情

... index=['Portland', 'Berkeley'])

>>> df

temp_c

Portland 17.0

Berkeley 25.0

Where the value is a callable, evaluated on `df`:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)

temp_c temp_f

Portland 17.0 62.6

Berkeley 25.0 77.0

Alternatively, the same behavior can be achieved by directly

referencing an existing Series or quence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)

temp_c temp_f

Portland 17.0 62.6

Berkeley 25.0 77.0

You can create multiple columns within the same assign where one

of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,

... temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9)

temp_c temp_f temp_k

Portland 17.0 62.6 290.15

Berkeley 25.0 77.0 298.15

"""

Basic explanation

功能作⽤：(1)Returns a new object with all original columns in addition to new ones. 返回 pd.DataFrame，返回值保留原先dataFrame的所有column(s)，同时添加新的column(s) (2) Existing columns that are re-assigned will be overwritten. 如果assign所涉及的column(s)已经存在，则更新column(s)中的值

参数说明：**kwargs : dict of {str: callable or Series} The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn't check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

代码⽰例

代码⽰例1、

import pandas as pd

import numpy as np

df = pd.DataFrame({'temp_c': [17.0, 25.0]} ,

index=('Portland', 'Berkeley') )

temp_c

海贼壁纸

Portland 17.0

Berkeley 25.0

df.assign( temp_f=df['temp_c'] * 9 / 5 + 32 )

temp_c temp_f

Portland 17.0 62.6

Berkeley 25.0 77.0

temp_c

Portland 17.0

Berkeley 25.0

代码很简单，不多做解释，参数名 temp_f 作为新column名，参数值df['temp_c'] * 9 / 5 + 32 作为新column中的值注意⼀点：原先df不变

依从性

代码⽰例2

index=('Portland', 'Berkeley') )

temp_c

Portland 17.0

Berkeley 25.0

temp_f=lambda x: x.temp_c * 9 / 5 + 32

print(temp_f( df ))

print( type(temp_f( df )))

Portland 62.6

Berkeley 77.0

Name: temp_c, dtype: float64

'ies.Series'>

df.assign(temp_f=temp_f)

temp_c temp_f

Portland 17.0 62.6

Berkeley 25.0 77.0

temp_f=lambda x: p_c * 9 / 5 + 32)

print(temp_f( df ))

print( type(temp_f( df )))

print( df.assign(temp_f=df['temp_c'] * 9 / 5 + 32))

[62.6, 77.0]

'list'>

苏州好玩的temp_c temp_f

Portland 17.0 62.6

Berkeley 25.0 77.0

民心有点意思吧，注意以下三点

依旧是参数名= 参数值，这种赋值形式，参数名⽤于定于新的(或者待更新的)column名

此刻temp_f为⼀个匿名function，该function的参数默认为当前的pd.DataFrame对象

⾄于这个callable怎么定义，可以⾃由发挥，第⼀层我就简单⽤了pd.DataFrame对象的四则运算，第⼆次我转成了list对象，当然还可以添加更为复杂的路径，但是通常都是当前pd.DataFrame对象⼏组columns的统计性的数值

代码⽰例3、

index=('Portland', 'Berkeley') )

temp_f=lambda x: p_c * 9 / 5 + 32)

画画儿童画temp_k = lambda x: (x['temp_f'] + 459.67) * 5 / 9

适合女性的职业

df.assign(temp_f= temp_f , temp_k = temp_k )

temp_c temp_f temp_k

Portland 17.0 62.6 290.15

Berkeley 25.0 77.0 298.15

有意思，很复杂，其实也很简单，两个知识点：

可以同时assign 多个columns

one of the columns depends on another one defined within the same assign，即在添加的columns中某⼀列的数值可以依赖于另⼀列⽣成

有意思的⽰例

df = pd.DataFrame({'col1':list('abcde'),'col2':range(5,10),'col3':[1.3,2.5,3.6,4.6,5.8]},

index=range(1,6))

col1 col2 col3

1 a 5 1.3

2 b 6 2.5

3 c 7 3.6

4 d 8 4.6

5 e 9 5.8

df.assign(C=pd.Series(list('def')))

col1 col2 col3 C

1 a 5 1.3 e

2 b 6 2.5 f

3 c 7 3.6 NaN

4 d 8 4.6 NaN

5 e 9 5.8 NaN

思考：为什么会出现NaN？(提⽰：索引对齐)assign左右两边的索引不⼀样，请问结果的索引谁说了算？

df = pd.DataFrame({'col1':list('abcde'),'col2':range(5,10),'col3':[1.3,2.5,3.6,4.6,5.8]},

index=range(1,6))

本文发布于:2023-05-18 19:55:53，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/914000.html

上一篇：转正总结个人总结汽车行业集合3篇

下一篇：前端js分享一个以张html多个小页面切换的一个我推演的方法

标签：参数赋值处理有点数值

留言与评论（共有 0 条评论）