python 为无列名的数据增加标题

在处理外部数据时,经常会遇到无列名(无表头、无标题行)的数据。以下是从 https://openflights.org/data.htm 下载的全球机场数据:

上述数据没有表头,且数据经常来自外部,直接修改文件加上表头不便于后期数据更新,故不考虑直接修改原始数据源。
为了方便后期的数据处理,我们希望能将上述数据加上表头,整理为类似下方的格式。

读取文件时,指定列名内容

import pandas as pd

# 定義列標題
airports_colums = ['AirportID','Name','City','Country','IATA','ICAO','Latitude','Longitude','Altitude','Timezone','DST','Tz database time zone','Type','Source']
# 載入 airports.csv
airports_data = pd.read_csv('data/airports.csv', header=None, names=airports_colums)
  • read_csv:参数header=None,表示读取的数据不包含表头。pandas会在其第一行加上 0, 1, 2, … ,n 的一行;参数names=airports_colums表示将数组 airports_colums 设置为列的名称

导入后对 DataFrame 进行处理

import pandas as pd

# 載入 airports.csv,header未定義
airports_data = pd.read_csv('data/airports.csv', header=None)
# 定義列標題
airports_colums = ['AirportID','Name','City','Country','IATA','ICAO','Latitude','Longitude','Altitude','Timezone','DST','Tz database time zone','Type','Source']
# 將列標題應用至 DataFrame
airports_data.columns = airports_colums
  • read_csv:参数header=None,表示读取的数据不包含表头。读取的数据最前面会加上 0, 1, 2, … ,n 的一列。
  • airports_colums,定义表头的数组。
  • airports_data.columns,将数组赋予columns属性

Press On

Nothing in the world can take the place of Persistence. Talent will not; nothing is more common than unsuccessful men with talent. Genius will not; unrewarded genius is almost a proverb. Education will not; the world is full of educated derelicts. Persistence and determination alone are omnipotent. The slogan ‘Press On’ has solved and always will solve the problems of the human race.

Calvin Coolidge
30th president of US (1872 – 1933)