CSV Data
- What is a CSV?
- Reading a CSV
- Writing to a CSV
What is a CSV?
CSV stands for “Comma Separated Values.” It is the simplest form of storing data in tabular form as plain text. It is important to know to work with CSV because we mostly rely on CSV data in our day-to-day lives as data scientists.
Structure of CSV:
# Hinweis YearsExperience, Salary <- HEADER 1.1, 39343.00 - 1.3, 46205.00 | 1.5, 37731.00 | 2.0, 43525.00 | 2.2, 39891.00 |<- Values, seperated by a comma 2.9, 56642.00 | 3.0, 60150.00 | 3.2, 54445.00 | 3.2, 64445.00 - ...
We have a file named Salary_Data.csv
The first line of a CSV file is the header and contains the names of the fields/features.
After the header, each line of the file is an observation/a record. The values of a record are separated by “comma.”
Reading a CSV
CSV files can be handled in multiple ways in Python.
Using csv.reader
Reading a CSV using Python’s inbuilt module
called csv using csv.reader
object.
Steps to read a CSV file:
Import the csv library
# Hinweis import csv
Open the CSV file
The .open()
method
in python is used to open files and return a file object.
file = open("/tmp/Salary_Data.csv") print(type(file))
The type of file is “_io.TextIOWrapper
” which is a file object that is returned by the open()
method.
3. Use the csv.reader object to read the CSV file
import csv file = open("/tmp/Salary_Data.csv") csvreader = csv.reader(file) print(type(csvreader)) print(csvreader)
Extract the field names
Create an empty list called header. Use the next()
method to obtain the header.
The .next()
method returns the current row and moves to the next row.
The first time you run next()
it returns the header and the next time you run it returns the first record and so on.
import csv file = open("/tmp/Salary_Data.csv") csvreader = csv.reader(file) header = [] header = next(csvreader) print(header)
Extract the rows/records
Create an empty list called rows and iterate through the csvreader object and append each row to the rows list.
import csv file = open("/tmp/Salary_Data.csv") csvreader = csv.reader(file) rows = [] for row in csvreader: rows.append(row) print(rows) for row in rows: print(row)
Close the file
.close()
method is used to close the opened file. Once it is closed, we cannot perform any operations on it.
import csv file = open("/tmp/Salary_Data.csv") csvreader = csv.reader(file) header = [] header = next(csvreader) print(header) rows = [] for row in csvreader: rows.append(row) print(rows) for row in rows: print(row) file.close()
Naturally, we might forget to close an open file. To avoid that we can use the with()
statement
to automatically release the resources. In simple terms, there is no need to call the .close()
method if we are using with()
statement.
Implementing the above code using with() statement:
Syntax: with open(filename, mode) as alias_filename:
Modes:
‘r’ – to read an existing file,
‘w’ – to create a new file if the given file doesn’t exist and write to it,
‘a’ – to append to existing file content,
‘+’ – to create a new file for reading and writing
import csv rows = [] with open("/tmp/Salary_Data.csv", 'r') as file: csvreader = csv.reader(file) header = next(csvreader) for row in csvreader: rows.append(row) print(header) print(rows)
Using .readlines()
Now the question is – “Is it possible to fetch the header, rows using only
open()
and with()
statements and without the csv library?” Let’s see…
.readlines()
method is the answer. It returns all the lines in a file as a list.
Each item of the list is a row of our CSV file.
The first row of the file.readlines()
is the header and the rest of them are the records.
import csv with open('/tmp/Salary_Data.csv') as file: content = file.readlines() header = content[:1] rows = content[1:] print(header) print(rows)
**The \n
from the output can be removed using .strip()
method.
What if we have a huge dataset with hundreds of features and thousands of records. Would it be possible to handle lists??
Here comes the pandas library into the picture.
2.3 Using pandas
Steps of reading CSV files using pandas
1. Import pandas library
import pandas as pd
2. Load CSV files to pandas using
read_csv()
Basic Syntax: pandas.read_csv(filename, delimiter=’,’)
import pandas as pd data= pd.read_csv("/tmp/Salary_Data.csv") print(data)
Extract the field names
.columns
is used to obtain the header/field names.
import pandas as pd data= pd.read_csv("/tmp/Salary_Data.csv") print(data.columns)
4. Extract the rows
All the data of a data frame can be accessed using the field names.
import pandas as pd data= pd.read_csv("/tmp/Salary_Data.csv") print(data.Salary)
3. Writing to a CSV file
We can write to a CSV file in multiple ways.
3.1 Using csv.writer
Let’s assume we are recording 3 Students data (Name, M1 Score, M2 Score)
header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]]
Steps of writing to a CSV file:
1. Import csv library
import csv
2. Define a filename and Open the file using open()
3. Create a csvwriter
object using csv.writer()
4. Write the header
5. Write the rest of the data
code for steps 2-5
import csv header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] filename = 'Students_Data.csv' with open(filename, 'w', newline="") as file: csvwriter = csv.writer(file) # 1. create a csvwriter object csvwriter.writerow(header) # 2. write the header csvwriter.writerows(data) # 3. write the rest of the data file.close() # print current directory import os files = [f for f in os.listdir('.') if os.path.isfile(f)] for f in files: print(f) print() # contents of the data file file = open(filename, 'r') print(file.read())
Using writelines()
Iterate through each list and convert the list elements to a string and write to the csv file.
import csv header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] filename = 'Student_scores_2.csv' with open(filename, 'w') as file: for header in header: file.write(str(header)+', ') file.write('\n') for row in data: for x in row: file.write(str(x)+', ') file.write('\n') # print current directory import os files = [f for f in os.listdir('.') if os.path.isfile(f)] for f in files: print(f) print() # contents of the data file file = open(filename, 'r') print(file.read())
Using pandas
Steps to writing to a CSV using pandas
Import pandas library
# Hinweis import pandas as pd
Create a pandas dataframe using
pd.DataFrame
Syntax: pd.DataFrame(data, columns)
The data parameter takes the records/observations and the columns parameter takes the columns/field names.
import pandas as pd header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] data = pd.DataFrame(data, columns=header) print(data)
Write to a CSV file using
to_csv()
Syntax: DataFrame.to_csv(filename, sep=’,’, index=False)
**separator is ‘,’ by default.
index=False
to remove the index numbers.
import pandas as pd header = ['Name', 'M1 Score', 'M2 Score'] data = [['Alex', 62, 80], ['Brad', 45, 56], ['Joey', 85, 98]] data = pd.DataFrame(data, columns=header) print(data) filename = 'Stu_data.csv' data.to_csv(filename, index=False) # print current directory import os files = [f for f in os.listdir('.') if os.path.isfile(f)] for f in files: print(f) print() # contents of the data file file = open(filename, 'r') print(file.read())
zurück: 12