Pandas数据类型-DataFrame数据编辑_data类型数据加两年-CSDN博客

本文链接：https://blog.csdn.net/weixin_48668114/article/details/126280154

Pandas数据类型-DataFrame数据编辑

编辑数据列
编辑数据行
- loc直接赋值
- at直接赋值

编辑数据列

直接赋值

df[]=，当赋单值时，会自动增加一列
df.loc[]=

import pandas as pd
df1 = pd.read_excel(r".\study\test_excel.xlsx", sheet_name="student")
print("df1---------------\n", df1, type(df1))

df1---------------
   name  age sex address  score
0   刘一   18   女      上海    100
1   花二   40   男      上海     99
2   张三   25   男      北京     80
3   李四   30   男      西安     40
4   王五   70   男      青岛     70
5   孙六   65   女      泰州     90 <class 'pandas.core.frame.DataFrame'>

df1["class"]= [1,2,3,4,5,6]
df1.loc[:, "grade"]= ["A", "B", "C", "D", "E", "F"]
print(df1)
df1["school"] = 5
print(df1)

  name  age sex address  score  class grade
0   刘一   18   女      上海    100      1     A
1   花二   40   男      上海     99      2     B
2   张三   25   男      北京     80      3     C
3   李四   30   男      西安     40      4     D
4   王五   70   男      青岛     70      5     E
5   孙六   65   女      泰州     90      6     F
  name  age sex address  score  class grade  school
0   刘一   18   女      上海    100      1     A       5
1   花二   40   男      上海     99      2     B       5
2   张三   25   男      北京     80      3     C       5
3   李四   30   男      西安     40      4     D       5
4   王五   70   男      青岛     70      5     E       5
5   孙六   65   女      泰州     90      6     F       5

loc 方法

loc 方法和 iloc 方法一样，可以索引 DataFrame 数据，一般是通过 data.loc[index, col] = value 来进行赋值.
可以利用：来索引全部行再进行赋值。
当赋单值时，会自动增加一列

df1 = pd.DataFrame([1, 2, 3],["a", "b", "c"], columns=["first"])
print(df1)
df1.loc[:, 'second'] = 0
df1

   first
a      1
b      2
c      3

	first	second
a	1	0
b	2	0
c	3	0

insert方法

insert(loc, column, value, allow_duplicates=False)
loc: 要插入的位置
column: 插入列的列名
value: 插入列的数据
allow_duplicates: DataFrame默认是不允许添加重复的列的，如果想添加，需要修改allow_duplicates值，默认为False

df1 = pd.read_excel(r".\study\test_excel.xlsx", sheet_name="student")
print("df1---------------\n", df1, type(df1))
df1.insert(df1.shape[1], 'class', 3, allow_duplicates=False)
df1

df1---------------
   name  age sex address  score
0   刘一   18   女      上海    100
1   花二   40   男      上海     99
2   张三   25   男      北京     80
3   李四   30   男      西安     40
4   王五   70   男      青岛     70
5   孙六   65   女      泰州     90 <class 'pandas.core.frame.DataFrame'>

	name	age	sex	address	score	class
0	刘一	18	女	上海	100	3
1	花二	40	男	上海	99	3
2	张三	25	男	北京	80	3
3	李四	30	男	西安	40	3
4	王五	70	男	青岛	70	3
5	孙六	65	女	泰州	90	3

concat 方法

concat方法是用来拼接的，同一类型,形成新的数据
如果列名相同或者未指定列名，则会向下拼接为多行的数据
如果指定了不同列名，则列也会拓展

# 相同列名
df1 = pd.DataFrame([1, 2, 3],["a", "b", "c"], columns=["第1列"])
df2 = pd.DataFrame([4, 5, 6],["x", "y", "z"], columns=["第1列"])
pd.concat([df1, df2], sort=False)

	第1列
a	1
b	2
c	3
x	4
y	5
z	6

# 不同列名
df1 = pd.DataFrame([1, 2, 3],["a", "b", "c"], columns=["first"])
df2 = pd.DataFrame([4, 5, 6],["x", "y", "z"], columns=["second"])
df3 = pd.concat([df1, df2], sort=False)
df3

	first	second
a	1.0	NaN
b	2.0	NaN
c	3.0	NaN
x	NaN	4.0
y	NaN	5.0
z	NaN	6.0

dataframe.assign()

可以进行多列添加
Dataframe.assign()返回一个新对象，其包含新列，还包含所有原始列。重新分配的现有列将被覆盖
DataFrame.assign(**kwargs) dict of {str: callable or Series}
key是列名。如果value是callable，则参数是DataFrame，计算结果将分配给新列。callable对象不能更改输入DataFrame(尽管pandas不检查它)。如果value不是callable(例如Series、标量或数组)，则只对其赋值

df1 = pd.DataFrame({'temp_c': [17.0, 25.0]},
                  index=['Portland', 'Berkeley'])
df1

	temp_c
Portland	17.0
Berkeley	25.0

df2 = df1.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
print("df2----------", df2)
# the same behavior can be achieved by directly referencing an existing Series or sequence:
df4 = df1.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32,
          temp_k=lambda x: (x['temp_f'] +  459.67) * 5 / 9)
print("df4----------", df4)
df5 = df1.assign(air_quality=[20, 50], pollution_index=[70, 80])
print("df5----------", df5)

df2----------           temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0
df4----------           temp_c  temp_f  temp_k
Portland    17.0    62.6  290.15
Berkeley    25.0    77.0  298.15
df5----------           temp_c  air_quality  pollution_index
Portland    17.0           20               70
Berkeley    25.0           50               80

编辑数据行

loc直接赋值

注意可以省略列，也可以用:
如果loc[index]中的index存在，那么就会替换原本index上的值

df1 = pd.DataFrame({'one':[1, 2, 3, 4], 'two':[11, 22, 33, 44]})
new_index = [66, 88]
df1.loc[4, :] = new_index  # df1.loc[4]
print(df1)

print(df1.loc[0:4, ['one', 'two']])
df1.loc[0:1, :]=[[30,50],[70,80]]    #修改多行
print(df1)

    one   two
0   1.0  11.0
1   2.0  22.0
2   3.0  33.0
3   4.0  44.0
4  66.0  88.0
    one   two
0   1.0  11.0
1   2.0  22.0
2   3.0  33.0
3   4.0  44.0
4  66.0  88.0
    one   two
0  30.0  50.0
1  70.0  80.0
2   3.0  33.0
3   4.0  44.0
4  66.0  88.0

at直接赋值

df1.at[2021, :] = [100, 200]
df1.at[2022, :] = 88
df1

	one	two
0	30.0	50.0
1	30.0	50.0
2	3.0	33.0
3	4.0	44.0
4	66.0	88.0
2021	100.0	200.0
2022	88.0	88.0