Using Pandas in Python

Hello all,

This is the fourth article in the series Python for Data Science. If you are new to this series, we would recommend you to read our previous articles

  1. Python for Data Science Series - Part 1
  2. Python for Data Science Series - Part 2
  3. Using Numpy in Python


To install pandas in your python environment, please use the below command

pip install pandas


Please refer the videos below for detailed explanation on Pandas


After you have installed Pandas, please refer the following notebook to understand on how to use Pandas functionalities.




In [1]:
import pandas as pd
import os

Creating a DataFrame

In [2]:
data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
pd.DataFrame([data1,data2,data3],columns = ['a','b','c'],index = ['row1','row2','row3'])
Out[2]:
a b c
row1 1 2 3
row2 4 5 6
row3 7 8 9
In [3]:
pd.DataFrame([data1,data2,data3])
Out[3]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9

Reading CSV thru Pandas

In [4]:
csv_file_path = r"E:\openknowledgeshare.blogspot.com\Python\Data\SBIN.NS.csv"
In [5]:
df = pd.read_csv(csv_file_path)
df
Out[5]:
Date Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
5 1996-01-08 17.478701 17.643801 16.922100 17.063601 0.401045 55395173.0
6 1996-01-09 16.889000 18.681700 16.705000 17.997700 0.422999 82057541.0
7 1996-01-10 17.408001 17.714600 17.054100 17.172100 0.403595 54360749.0
8 1996-01-11 16.983400 17.926901 16.983400 17.827801 0.419006 65973105.0
9 1996-01-12 17.879700 18.233500 17.573099 17.837299 0.419230 102152486.0
10 1996-01-15 18.021200 18.200500 17.785400 17.893900 0.420560 68169665.0
11 1996-01-16 17.903299 19.530899 17.785400 18.271299 0.429430 92497162.0
12 1996-01-17 17.856100 17.974100 17.455099 17.521200 0.411800 65477621.0
13 1996-01-18 17.318300 17.625000 17.124901 17.403200 0.409027 67620658.0
14 1996-01-19 17.290001 17.672100 17.290001 17.610800 0.413906 47652365.0
15 1996-01-22 17.738199 18.313700 17.700500 18.224100 0.428321 81512773.0
16 1996-01-23 18.389200 18.771400 18.162800 18.417500 0.432866 130297069.0
17 1996-01-24 17.832500 17.926901 17.351400 17.794800 0.418231 54484753.0
18 1996-01-25 18.115601 18.275999 17.879700 17.964600 0.422222 67701207.0
19 1996-01-26 NaN NaN NaN NaN NaN NaN
20 1996-01-29 18.162800 18.960100 18.073099 18.785500 0.441515 75689902.0
21 1996-01-30 19.059099 19.700701 18.587400 19.205400 0.451384 89163372.0
22 1996-01-31 18.776100 19.129900 18.214701 18.549601 0.435971 56831283.0
23 1996-02-01 18.422199 19.625200 18.422199 19.469500 0.457591 96599350.0
24 1996-02-02 19.861099 20.946199 19.719601 20.814100 0.489193 106010377.0
25 1996-02-05 21.087700 22.880400 21.087700 22.125601 0.520017 160942921.0
26 1996-02-06 22.455799 23.163401 21.795300 22.314301 0.524452 147345447.0
27 1996-02-07 21.597200 22.361401 20.757500 21.922701 0.515249 67480756.0
28 1996-02-08 21.988800 22.903999 21.578300 22.545401 0.529884 52796396.0
29 1996-02-09 22.762400 24.460800 22.351999 24.139999 0.567362 74580228.0
... ... ... ... ... ... ... ...
5495 2017-09-21 269.750000 271.399994 267.350006 268.500000 268.500000 8509419.0
5496 2017-09-22 267.250000 267.399994 261.100006 261.850006 261.850006 12860892.0
5497 2017-09-25 261.500000 261.799988 254.399994 258.750000 258.750000 13937050.0
5498 2017-09-26 258.500000 259.649994 256.000000 258.250000 258.250000 11728345.0
5499 2017-09-27 259.899994 259.899994 249.500000 250.399994 250.399994 17080769.0
5500 2017-09-28 250.250000 253.550003 249.500000 252.550003 252.550003 17776419.0
5501 2017-09-29 254.699997 255.500000 253.050003 253.850006 253.850006 10073641.0
5502 2017-10-03 253.850006 256.399994 250.600006 251.300003 251.300003 10547836.0
5503 2017-10-04 251.649994 254.649994 251.350006 253.199997 253.199997 14063896.0
5504 2017-10-05 253.699997 254.100006 249.199997 251.600006 251.600006 13476692.0
5505 2017-10-06 252.149994 257.899994 252.100006 256.750000 256.750000 11932139.0
5506 2017-10-09 256.799988 258.600006 254.250000 256.850006 256.850006 8009211.0
5507 2017-10-10 256.850006 258.200012 255.699997 256.950012 256.950012 9742878.0
5508 2017-10-11 257.149994 258.450012 250.449997 251.750000 251.750000 14192548.0
5509 2017-10-12 252.000000 252.949997 248.500000 251.199997 251.199997 13404759.0
5510 2017-10-13 251.100006 255.350006 250.199997 252.100006 252.100006 15697555.0
5511 2017-10-16 254.600006 254.699997 250.100006 252.050003 252.050003 12171830.0
5512 2017-10-17 252.050003 254.699997 250.649994 251.149994 251.149994 13505216.0
5513 2017-10-18 246.000000 247.800003 242.550003 243.750000 243.750000 20677891.0
5514 2017-10-23 243.800003 247.000000 241.149994 245.949997 245.949997 12360204.0
5515 2017-10-24 246.000000 255.300003 245.949997 254.449997 254.449997 17197805.0
5516 2017-10-25 279.850006 328.049988 279.850006 324.899994 324.899994 262677081.0
5517 2017-10-26 330.000000 351.299988 317.500000 320.500000 320.500000 112804243.0
5518 2017-10-27 320.250000 323.600006 310.100006 311.049988 311.049988 27580765.0
5519 2017-10-30 312.000000 322.350006 310.600006 312.000000 312.000000 28530254.0
5520 2017-10-31 313.000000 313.850006 304.299988 305.799988 305.799988 16455750.0
5521 2017-11-01 309.399994 321.450012 309.399994 319.850006 319.850006 34907283.0
5522 2017-11-02 320.000000 323.149994 313.549988 314.350006 314.350006 23110527.0
5523 2017-11-03 315.450012 327.500000 312.600006 325.000000 325.000000 26898400.0
5524 2017-11-06 323.600006 330.000000 321.450012 329.000000 329.000000 17143303.0

5525 rows × 7 columns

In [6]:
df.shape
Out[6]:
(5525, 7)
In [7]:
df.columns
Out[7]:
Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')
In [8]:
df.rename(columns={'Date':'NewDate'},inplace=True)
In [9]:
df
Out[9]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
5 1996-01-08 17.478701 17.643801 16.922100 17.063601 0.401045 55395173.0
6 1996-01-09 16.889000 18.681700 16.705000 17.997700 0.422999 82057541.0
7 1996-01-10 17.408001 17.714600 17.054100 17.172100 0.403595 54360749.0
8 1996-01-11 16.983400 17.926901 16.983400 17.827801 0.419006 65973105.0
9 1996-01-12 17.879700 18.233500 17.573099 17.837299 0.419230 102152486.0
10 1996-01-15 18.021200 18.200500 17.785400 17.893900 0.420560 68169665.0
11 1996-01-16 17.903299 19.530899 17.785400 18.271299 0.429430 92497162.0
12 1996-01-17 17.856100 17.974100 17.455099 17.521200 0.411800 65477621.0
13 1996-01-18 17.318300 17.625000 17.124901 17.403200 0.409027 67620658.0
14 1996-01-19 17.290001 17.672100 17.290001 17.610800 0.413906 47652365.0
15 1996-01-22 17.738199 18.313700 17.700500 18.224100 0.428321 81512773.0
16 1996-01-23 18.389200 18.771400 18.162800 18.417500 0.432866 130297069.0
17 1996-01-24 17.832500 17.926901 17.351400 17.794800 0.418231 54484753.0
18 1996-01-25 18.115601 18.275999 17.879700 17.964600 0.422222 67701207.0
19 1996-01-26 NaN NaN NaN NaN NaN NaN
20 1996-01-29 18.162800 18.960100 18.073099 18.785500 0.441515 75689902.0
21 1996-01-30 19.059099 19.700701 18.587400 19.205400 0.451384 89163372.0
22 1996-01-31 18.776100 19.129900 18.214701 18.549601 0.435971 56831283.0
23 1996-02-01 18.422199 19.625200 18.422199 19.469500 0.457591 96599350.0
24 1996-02-02 19.861099 20.946199 19.719601 20.814100 0.489193 106010377.0
25 1996-02-05 21.087700 22.880400 21.087700 22.125601 0.520017 160942921.0
26 1996-02-06 22.455799 23.163401 21.795300 22.314301 0.524452 147345447.0
27 1996-02-07 21.597200 22.361401 20.757500 21.922701 0.515249 67480756.0
28 1996-02-08 21.988800 22.903999 21.578300 22.545401 0.529884 52796396.0
29 1996-02-09 22.762400 24.460800 22.351999 24.139999 0.567362 74580228.0
... ... ... ... ... ... ... ...
5495 2017-09-21 269.750000 271.399994 267.350006 268.500000 268.500000 8509419.0
5496 2017-09-22 267.250000 267.399994 261.100006 261.850006 261.850006 12860892.0
5497 2017-09-25 261.500000 261.799988 254.399994 258.750000 258.750000 13937050.0
5498 2017-09-26 258.500000 259.649994 256.000000 258.250000 258.250000 11728345.0
5499 2017-09-27 259.899994 259.899994 249.500000 250.399994 250.399994 17080769.0
5500 2017-09-28 250.250000 253.550003 249.500000 252.550003 252.550003 17776419.0
5501 2017-09-29 254.699997 255.500000 253.050003 253.850006 253.850006 10073641.0
5502 2017-10-03 253.850006 256.399994 250.600006 251.300003 251.300003 10547836.0
5503 2017-10-04 251.649994 254.649994 251.350006 253.199997 253.199997 14063896.0
5504 2017-10-05 253.699997 254.100006 249.199997 251.600006 251.600006 13476692.0
5505 2017-10-06 252.149994 257.899994 252.100006 256.750000 256.750000 11932139.0
5506 2017-10-09 256.799988 258.600006 254.250000 256.850006 256.850006 8009211.0
5507 2017-10-10 256.850006 258.200012 255.699997 256.950012 256.950012 9742878.0
5508 2017-10-11 257.149994 258.450012 250.449997 251.750000 251.750000 14192548.0
5509 2017-10-12 252.000000 252.949997 248.500000 251.199997 251.199997 13404759.0
5510 2017-10-13 251.100006 255.350006 250.199997 252.100006 252.100006 15697555.0
5511 2017-10-16 254.600006 254.699997 250.100006 252.050003 252.050003 12171830.0
5512 2017-10-17 252.050003 254.699997 250.649994 251.149994 251.149994 13505216.0
5513 2017-10-18 246.000000 247.800003 242.550003 243.750000 243.750000 20677891.0
5514 2017-10-23 243.800003 247.000000 241.149994 245.949997 245.949997 12360204.0
5515 2017-10-24 246.000000 255.300003 245.949997 254.449997 254.449997 17197805.0
5516 2017-10-25 279.850006 328.049988 279.850006 324.899994 324.899994 262677081.0
5517 2017-10-26 330.000000 351.299988 317.500000 320.500000 320.500000 112804243.0
5518 2017-10-27 320.250000 323.600006 310.100006 311.049988 311.049988 27580765.0
5519 2017-10-30 312.000000 322.350006 310.600006 312.000000 312.000000 28530254.0
5520 2017-10-31 313.000000 313.850006 304.299988 305.799988 305.799988 16455750.0
5521 2017-11-01 309.399994 321.450012 309.399994 319.850006 319.850006 34907283.0
5522 2017-11-02 320.000000 323.149994 313.549988 314.350006 314.350006 23110527.0
5523 2017-11-03 315.450012 327.500000 312.600006 325.000000 325.000000 26898400.0
5524 2017-11-06 323.600006 330.000000 321.450012 329.000000 329.000000 17143303.0

5525 rows × 7 columns

In [10]:
df.head()
Out[10]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [11]:
df.head(10)
Out[11]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
5 1996-01-08 17.478701 17.643801 16.922100 17.063601 0.401045 55395173.0
6 1996-01-09 16.889000 18.681700 16.705000 17.997700 0.422999 82057541.0
7 1996-01-10 17.408001 17.714600 17.054100 17.172100 0.403595 54360749.0
8 1996-01-11 16.983400 17.926901 16.983400 17.827801 0.419006 65973105.0
9 1996-01-12 17.879700 18.233500 17.573099 17.837299 0.419230 102152486.0
In [12]:
df.tail()
Out[12]:
NewDate Open High Low Close Adj Close Volume
5520 2017-10-31 313.000000 313.850006 304.299988 305.799988 305.799988 16455750.0
5521 2017-11-01 309.399994 321.450012 309.399994 319.850006 319.850006 34907283.0
5522 2017-11-02 320.000000 323.149994 313.549988 314.350006 314.350006 23110527.0
5523 2017-11-03 315.450012 327.500000 312.600006 325.000000 325.000000 26898400.0
5524 2017-11-06 323.600006 330.000000 321.450012 329.000000 329.000000 17143303.0
In [13]:
df.describe()
Out[13]:
Open High Low Close Adj Close Volume
count 5401.000000 5401.000000 5401.000000 5401.000000 5401.000000 5.401000e+03
mean 123.851330 125.677336 121.835167 123.665226 75.381429 3.252849e+07
std 98.179606 99.383543 96.796041 98.012898 91.862829 3.718180e+07
min 13.478200 13.959400 13.214000 13.346100 0.401045 0.000000e+00
25% 24.946699 25.409000 24.328699 24.809900 1.202995 1.302272e+07
50% 94.918198 96.616501 93.507599 94.814400 25.289371 2.101967e+07
75% 217.725006 220.850006 214.000000 217.227005 134.646362 3.698718e+07
max 345.899994 351.500000 341.325012 343.345001 329.000000 4.469483e+08
In [14]:
df.head()
Out[14]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [15]:
df.set_index('NewDate').head()
Out[15]:
Open High Low Close Adj Close Volume
NewDate
1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [16]:
df["High"].head()
Out[16]:
0    18.978901
1    18.964800
2    18.568501
3    17.832500
4    17.785400
Name: High, dtype: float64
In [17]:
type(df)
Out[17]:
pandas.core.frame.DataFrame
In [18]:
type(df["High"])
Out[18]:
pandas.core.series.Series
In [19]:
df["High"].to_frame().head()
Out[19]:
High
0 18.978901
1 18.964800
2 18.568501
3 17.832500
4 17.785400
In [20]:
df.iloc[10:20]
Out[20]:
NewDate Open High Low Close Adj Close Volume
10 1996-01-15 18.021200 18.200500 17.785400 17.893900 0.420560 68169665.0
11 1996-01-16 17.903299 19.530899 17.785400 18.271299 0.429430 92497162.0
12 1996-01-17 17.856100 17.974100 17.455099 17.521200 0.411800 65477621.0
13 1996-01-18 17.318300 17.625000 17.124901 17.403200 0.409027 67620658.0
14 1996-01-19 17.290001 17.672100 17.290001 17.610800 0.413906 47652365.0
15 1996-01-22 17.738199 18.313700 17.700500 18.224100 0.428321 81512773.0
16 1996-01-23 18.389200 18.771400 18.162800 18.417500 0.432866 130297069.0
17 1996-01-24 17.832500 17.926901 17.351400 17.794800 0.418231 54484753.0
18 1996-01-25 18.115601 18.275999 17.879700 17.964600 0.422222 67701207.0
19 1996-01-26 NaN NaN NaN NaN NaN NaN
In [21]:
df.iloc[10:20][["Close","High"]]
Out[21]:
Close High
10 17.893900 18.200500
11 18.271299 19.530899
12 17.521200 17.974100
13 17.403200 17.625000
14 17.610800 17.672100
15 18.224100 18.313700
16 18.417500 18.771400
17 17.794800 17.926901
18 17.964600 18.275999
19 NaN NaN
In [22]:
df.iloc[[1,3,10]][["Close","High"]]
Out[22]:
Close High
1 18.224100 18.9648
3 17.676901 17.8325
10 17.893900 18.2005
In [23]:
a = df.head().values
a
Out[23]:
array([['1996-01-01', 18.6912, 18.978901, 18.540198999999998, 18.8232,
        0.44240100000000004, 43733533.0],
       ['1996-01-02', 18.893998999999997, 18.9648, 17.738198999999998,
        18.2241, 0.428321, 56167281.0],
       ['1996-01-03', 18.3279, 18.568501, 17.643801, 17.738198999999998,
        0.4169, 68296319.0],
       ['1996-01-04', 17.5023, 17.8325, 17.224001, 17.676901,
        0.41546000000000005, 86073880.0],
       ['1996-01-05', 17.738198999999998, 17.7854, 17.4599, 17.577801,
        0.413131, 76613040.0]], dtype=object)
In [24]:
a.shape
Out[24]:
(5, 7)
In [25]:
type(a)
Out[25]:
numpy.ndarray
In [26]:
df.head()
Out[26]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [ ]:
 

Manipulating Cell Values of Dataframe

In [27]:
data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
df1 = pd.DataFrame([data1,data2,data3],columns = ['a','b','c'])
df1
Out[27]:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
In [28]:
df1.iloc[1]['c']
Out[28]:
6
In [29]:
df1.iloc[1]['c'] = 10
df1
Out[29]:
a b c
0 1 2 3
1 4 5 10
2 7 8 9
In [30]:
df1.iloc[2]
Out[30]:
a    7
b    8
c    9
Name: 2, dtype: int64
In [31]:
type(df1.iloc[2])
Out[31]:
pandas.core.series.Series
In [32]:
df1.iloc[2] = [10,20,30]
In [33]:
df1
Out[33]:
a b c
0 1 2 3
1 4 5 10
2 10 20 30
In [34]:
df1.iloc[[1,2]] = [[100,200,300],[1030,2010,3000]]
In [35]:
df1
Out[35]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
In [36]:
df1.iloc[3] = [10,4,1]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-36-4d6d8c5cbdab> in <module>
----> 1 df1.iloc[3] = [10,4,1]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    187         else:
    188             key = com.apply_if_callable(key, self.obj)
--> 189         indexer = self._get_setitem_indexer(key)
    190         self._setitem_with_indexer(indexer, value)
    191 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_setitem_indexer(self, key)
    173 
    174         try:
--> 175             return self._convert_to_indexer(key, is_setter=True)
    176         except TypeError as e:
    177 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   2245 
   2246         try:
-> 2247             self._validate_key(obj, axis)
   2248             return obj
   2249         except ValueError:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_key(self, key, axis)
   2068             return
   2069         elif is_integer(key):
-> 2070             self._validate_integer(key, axis)
   2071         elif isinstance(key, tuple):
   2072             # a tuple should already have been caught by this point

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_integer(self, key, axis)
   2137         len_axis = len(self.obj._get_axis(axis))
   2138         if key >= len_axis or key < -len_axis:
-> 2139             raise IndexError("single positional indexer is out-of-bounds")
   2140 
   2141     def _getitem_tuple(self, tup):

IndexError: single positional indexer is out-of-bounds

Appending Row to DataFrame

In [37]:
a = [1,2,3]
a.append(5)
a
Out[37]:
[1, 2, 3, 5]
In [38]:
df1
Out[38]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
In [39]:
df1.append([[10,8,4]])
Out[39]:
a b c 0 1 2
0 1.0 2.0 3.0 NaN NaN NaN
1 100.0 200.0 300.0 NaN NaN NaN
2 1030.0 2010.0 3000.0 NaN NaN NaN
0 NaN NaN NaN 10.0 8.0 4.0
In [40]:
df1
Out[40]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000

Case 1: DataFrame without Column Names

In [41]:
data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
df2 = pd.DataFrame([data1,data2,data3])
df2
Out[41]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
In [42]:
df2 = df2.append([[10,20,30]])
df2
Out[42]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
0 10 20 30
In [43]:
df2.reset_index()
Out[43]:
index 0 1 2
0 0 1 2 3
1 1 4 5 6
2 2 7 8 9
3 0 10 20 30
In [44]:
df2.reset_index(drop=True,inplace = True)
df2
Out[44]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
3 10 20 30

Case2 : DataFrame with Column Names

In [45]:
df1
Out[45]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
In [46]:
df1.append([[10,20,30]],ignore_index=True)
Out[46]:
a b c 0 1 2
0 1.0 2.0 3.0 NaN NaN NaN
1 100.0 200.0 300.0 NaN NaN NaN
2 1030.0 2010.0 3000.0 NaN NaN NaN
3 NaN NaN NaN 10.0 20.0 30.0
In [47]:
df1.append({'a':10,'b':20,'c':30},ignore_index=True)
Out[47]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
3 10 20 30
In [48]:
columnnames_list = list(df1.columns)
columnnames_list
Out[48]:
['a', 'b', 'c']
In [49]:
data = [1000,2000,3000]
In [50]:
data_dict = {}
for i in range(0,len(columnnames_list)):
    each_column = columnnames_list[i]
    this_data_point = data[i]
    data_dict[each_column] = this_data_point
data_dict
Out[50]:
{'a': 1000, 'b': 2000, 'c': 3000}
In [51]:
df1 = df1.append(data_dict,ignore_index = True)
df1
Out[51]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
3 1000 2000 3000
In [52]:
df1
Out[52]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
3 1000 2000 3000

Sampling DataFrame

In [53]:
import numpy as np
In [54]:
a = [1,2,3,5,6,7]
np.random.choice(a,3)
Out[54]:
array([5, 7, 1])
In [55]:
df.sample(10)
Out[55]:
NewDate Open High Low Close Adj Close Volume
395 1997-07-07 33.509102 33.768600 33.325199 33.376999 1.042259 89206826.0
339 1997-04-18 NaN NaN NaN NaN NaN NaN
3782 2010-10-15 328.100006 328.100006 315.024994 316.565002 151.746780 14198900.0
3961 2011-07-05 243.899994 247.839996 240.949997 247.330002 136.146362 15940790.0
2912 2007-04-10 93.597298 94.205803 91.049797 93.521797 24.729269 11069930.0
426 1997-08-19 29.933201 30.268200 29.725599 29.886000 0.933246 47957605.0
4316 2012-12-11 232.970001 233.854996 228.270004 230.880005 154.805847 20557710.0
2018 2003-09-25 39.627899 40.363800 39.156101 39.344799 4.891809 64702354.0
3431 2009-05-22 168.699997 173.990005 167.600006 173.199997 60.047340 27543220.0
3930 2011-05-23 231.399994 231.399994 223.824997 224.975006 123.840775 24033650.0

Merge, Join and Concatenate DataFrames

In [56]:
df1
Out[56]:
a b c
0 1 2 3
1 100 200 300
2 1030 2010 3000
3 1000 2000 3000
In [57]:
df2
Out[57]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
3 10 20 30
In [58]:
df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
df3
Out[58]:
a b c 0 1 2
0 1 2 3 1 2 3
1 100 200 300 4 5 6
2 1030 2010 3000 7 8 9
3 1000 2000 3000 10 20 30
In [59]:
data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
df2 = pd.DataFrame([data1,data2,data3])
df2
Out[59]:
0 1 2
0 1 2 3
1 4 5 6
2 7 8 9
In [60]:
df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
df3
Out[60]:
a b c 0 1 2
0 1 2 3 1.0 2.0 3.0
1 100 200 300 4.0 5.0 6.0
2 1030 2010 3000 7.0 8.0 9.0
3 1000 2000 3000 NaN NaN NaN
In [61]:
df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='inner')
df3
Out[61]:
a b c 0 1 2
0 1 2 3 1 2 3
1 100 200 300 4 5 6
2 1030 2010 3000 7 8 9
In [62]:
df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='left')
df3
Out[62]:
a b c 0 1 2
0 1 2 3 1.0 2.0 3.0
1 100 200 300 4.0 5.0 6.0
2 1030 2010 3000 7.0 8.0 9.0
3 1000 2000 3000 NaN NaN NaN
In [63]:
df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='right')
df3
Out[63]:
a b c 0 1 2
0 1 2 3 1 2 3
1 100 200 300 4 5 6
2 1030 2010 3000 7 8 9
In [64]:
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})


df2 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
                    'C': ['C0', 'C1', 'C4', 'C3'],
                      'D': ['D0', 'D1', 'D4', 'D3']})
In [65]:
df1
Out[65]:
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
In [66]:
df2
Out[66]:
key C D
0 K0 C0 D0
1 K1 C1 D1
2 K4 C4 D4
3 K3 C3 D3
In [67]:
df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
df3
Out[67]:
key_x A B key_y C D
0 K0 A0 B0 K0 C0 D0
1 K1 A1 B1 K1 C1 D1
2 K2 A2 B2 K4 C4 D4
3 K3 A3 B3 K3 C3 D3
In [68]:
df3 = pd.merge(df1,df2,on ='key',how='outer')
df3
Out[68]:
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 C3 D3
4 K4 NaN NaN C4 D4
In [69]:
df3 = pd.merge(df1,df2,on ='key',how='inner')
df3
Out[69]:
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K3 A3 B3 C3 D3
In [70]:
df3 = pd.merge(df1,df2,on ='key',how='left')
df3
Out[70]:
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 C3 D3
In [71]:
df3 = pd.merge(df1,df2,on ='key',how='right')
df3
Out[71]:
key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K3 A3 B3 C3 D3
3 K4 NaN NaN C4 D4

Concat

In [72]:
s3 = pd.Series([0, 1, 2, 3], name='foo')
s4 = pd.Series([4, 5, 6, 7,12])
s5 = pd.Series([11, 10, 9, 8,13,14])
In [73]:
s3
Out[73]:
0    0
1    1
2    2
3    3
Name: foo, dtype: int64
In [74]:
s4
Out[74]:
0     4
1     5
2     6
3     7
4    12
dtype: int64
In [75]:
s5
Out[75]:
0    11
1    10
2     9
3     8
4    13
5    14
dtype: int64
In [76]:
pd.concat([s3, s4, s5], axis=1)
Out[76]:
foo 0 1
0 0.0 4.0 11
1 1.0 5.0 10
2 2.0 6.0 9
3 3.0 7.0 8
4 NaN 12.0 13
5 NaN NaN 14
In [77]:
pd.concat([s3, s4, s5], axis=0)
Out[77]:
0     0
1     1
2     2
3     3
0     4
1     5
2     6
3     7
4    12
0    11
1    10
2     9
3     8
4    13
5    14
dtype: int64
In [78]:
df1
Out[78]:
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
In [79]:
df2
Out[79]:
key C D
0 K0 C0 D0
1 K1 C1 D1
2 K4 C4 D4
3 K3 C3 D3
In [80]:
df1.append(df2)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:6692: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  sort=sort)
Out[80]:
A B C D key
0 A0 B0 NaN NaN K0
1 A1 B1 NaN NaN K1
2 A2 B2 NaN NaN K2
3 A3 B3 NaN NaN K3
0 NaN NaN C0 D0 K0
1 NaN NaN C1 D1 K1
2 NaN NaN C4 D4 K4
3 NaN NaN C3 D3 K3
In [81]:
df3 = pd.concat([df1,df2], ignore_index=True, sort=False)
df3
Out[81]:
key A B C D
0 K0 A0 B0 NaN NaN
1 K1 A1 B1 NaN NaN
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 NaN NaN
4 K0 NaN NaN C0 D0
5 K1 NaN NaN C1 D1
6 K4 NaN NaN C4 D4
7 K3 NaN NaN C3 D3

Changing Column Order

In [82]:
df3[['A','B','C','D','key']]
Out[82]:
A B C D key
0 A0 B0 NaN NaN K0
1 A1 B1 NaN NaN K1
2 A2 B2 NaN NaN K2
3 A3 B3 NaN NaN K3
4 NaN NaN C0 D0 K0
5 NaN NaN C1 D1 K1
6 NaN NaN C4 D4 K4
7 NaN NaN C3 D3 K3

Handling Missing Values

In [83]:
df3
Out[83]:
key A B C D
0 K0 A0 B0 NaN NaN
1 K1 A1 B1 NaN NaN
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 NaN NaN
4 K0 NaN NaN C0 D0
5 K1 NaN NaN C1 D1
6 K4 NaN NaN C4 D4
7 K3 NaN NaN C3 D3
In [84]:
df4 = df.head().copy()
df4
Out[84]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [85]:
df3
Out[85]:
key A B C D
0 K0 A0 B0 NaN NaN
1 K1 A1 B1 NaN NaN
2 K2 A2 B2 NaN NaN
3 K3 A3 B3 NaN NaN
4 K0 NaN NaN C0 D0
5 K1 NaN NaN C1 D1
6 K4 NaN NaN C4 D4
7 K3 NaN NaN C3 D3
In [86]:
df4
Out[86]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [87]:
df5 = pd.concat([df3,df4],ignore_index=True, sort=False)
df5
Out[87]:
key A B C D NewDate Open High Low Close Adj Close Volume
0 K0 A0 B0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 K1 A1 B1 NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 K2 A2 B2 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 K3 A3 B3 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 K0 NaN NaN C0 D0 NaN NaN NaN NaN NaN NaN NaN
5 K1 NaN NaN C1 D1 NaN NaN NaN NaN NaN NaN NaN
6 K4 NaN NaN C4 D4 NaN NaN NaN NaN NaN NaN NaN
7 K3 NaN NaN C3 D3 NaN NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
9 NaN NaN NaN NaN NaN 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
10 NaN NaN NaN NaN NaN 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
11 NaN NaN NaN NaN NaN 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
12 NaN NaN NaN NaN NaN 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [88]:
df5.isna()
Out[88]:
key A B C D NewDate Open High Low Close Adj Close Volume
0 False False False True True True True True True True True True
1 False False False True True True True True True True True True
2 False False False True True True True True True True True True
3 False False False True True True True True True True True True
4 False True True False False True True True True True True True
5 False True True False False True True True True True True True
6 False True True False False True True True True True True True
7 False True True False False True True True True True True True
8 True True True True True False False False False False False False
9 True True True True True False False False False False False False
10 True True True True True False False False False False False False
11 True True True True True False False False False False False False
12 True True True True True False False False False False False False
In [89]:
df5.fillna("EMPTY")
Out[89]:
key A B C D NewDate Open High Low Close Adj Close Volume
0 K0 A0 B0 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
1 K1 A1 B1 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
2 K2 A2 B2 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
3 K3 A3 B3 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
4 K0 EMPTY EMPTY C0 D0 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
5 K1 EMPTY EMPTY C1 D1 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
6 K4 EMPTY EMPTY C4 D4 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
7 K3 EMPTY EMPTY C3 D3 EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
8 EMPTY EMPTY EMPTY EMPTY EMPTY 1996-01-01 18.6912 18.9789 18.5402 18.8232 0.442401 4.37335e+07
9 EMPTY EMPTY EMPTY EMPTY EMPTY 1996-01-02 18.894 18.9648 17.7382 18.2241 0.428321 5.61673e+07
10 EMPTY EMPTY EMPTY EMPTY EMPTY 1996-01-03 18.3279 18.5685 17.6438 17.7382 0.4169 6.82963e+07
11 EMPTY EMPTY EMPTY EMPTY EMPTY 1996-01-04 17.5023 17.8325 17.224 17.6769 0.41546 8.60739e+07
12 EMPTY EMPTY EMPTY EMPTY EMPTY 1996-01-05 17.7382 17.7854 17.4599 17.5778 0.413131 7.6613e+07
In [90]:
df5.replace(np.nan,0,inplace = True)
In [91]:
df5
Out[91]:
key A B C D NewDate Open High Low Close Adj Close Volume
0 K0 A0 B0 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
1 K1 A1 B1 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
2 K2 A2 B2 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
3 K3 A3 B3 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
4 K0 0 0 C0 D0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
5 K1 0 0 C1 D1 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
6 K4 0 0 C4 D4 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
7 K3 0 0 C3 D3 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
8 0 0 0 0 0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
9 0 0 0 0 0 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
10 0 0 0 0 0 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
11 0 0 0 0 0 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
12 0 0 0 0 0 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0

Filtering DataFrame

In [92]:
df5['key']
Out[92]:
0     K0
1     K1
2     K2
3     K3
4     K0
5     K1
6     K4
7     K3
8      0
9      0
10     0
11     0
12     0
Name: key, dtype: object
In [93]:
df5[df5['key'] == 'K0']
Out[93]:
key A B C D NewDate Open High Low Close Adj Close Volume
0 K0 A0 B0 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
4 K0 0 0 C0 D0 0 0.0 0.0 0.0 0.0 0.0 0.0
In [94]:
df5[(df5['key'] == 'K4') & (df5['Open'] == 0)]
Out[94]:
key A B C D NewDate Open High Low Close Adj Close Volume
6 K4 0 0 C4 D4 0 0.0 0.0 0.0 0.0 0.0 0.0
In [95]:
keylist = ['K2','K3']
df5[(df5['key'].isin(keylist)) & (df5['Open'] == 0)]
Out[95]:
key A B C D NewDate Open High Low Close Adj Close Volume
2 K2 A2 B2 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
3 K3 A3 B3 0 0 0 0.0 0.0 0.0 0.0 0.0 0.0
7 K3 0 0 C3 D3 0 0.0 0.0 0.0 0.0 0.0 0.0
In [96]:
df5[df5['Open'] < 100]
Out[96]:
key A B C D NewDate Open High Low Close Adj Close Volume
0 K0 A0 B0 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
1 K1 A1 B1 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
2 K2 A2 B2 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
3 K3 A3 B3 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
4 K0 0 0 C0 D0 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
5 K1 0 0 C1 D1 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
6 K4 0 0 C4 D4 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
7 K3 0 0 C3 D3 0 0.000000 0.000000 0.000000 0.000000 0.000000 0.0
8 0 0 0 0 0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
9 0 0 0 0 0 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
10 0 0 0 0 0 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
11 0 0 0 0 0 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
12 0 0 0 0 0 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0

Renaming Columns

In [97]:
temp = df.sample(10).copy()
temp
Out[97]:
NewDate Open High Low Close Adj Close Volume
3849 2011-01-21 252.889999 260.700012 252.199997 259.690002 124.483513 29086400.0
3826 2010-12-21 270.924988 276.500000 270.924988 274.375000 131.522842 19613340.0
1549 2001-12-07 19.342199 19.658300 18.780800 18.856300 1.340288 3678530.0
2520 2005-09-05 76.382698 77.623497 76.109100 77.090401 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN NaN NaN
675 1998-08-03 19.625200 19.625200 18.611000 19.111000 0.752655 35140718.0
4872 2015-03-13 290.000000 291.049988 280.000000 280.750000 271.203003 13414248.0
4201 2012-06-22 214.699997 218.449997 212.850006 215.779999 144.681274 35627920.0
88 1996-05-02 28.541500 29.390699 27.574400 28.083900 0.660055 90601602.0
5089 2016-01-29 184.199997 185.750000 178.100006 179.899994 175.976257 24259147.0
In [98]:
temp.rename(columns = {'NewDate':'OneMoreNewDate','Adj Close':'ABC'},inplace = True)
temp
Out[98]:
OneMoreNewDate Open High Low Close ABC Volume
3849 2011-01-21 252.889999 260.700012 252.199997 259.690002 124.483513 29086400.0
3826 2010-12-21 270.924988 276.500000 270.924988 274.375000 131.522842 19613340.0
1549 2001-12-07 19.342199 19.658300 18.780800 18.856300 1.340288 3678530.0
2520 2005-09-05 76.382698 77.623497 76.109100 77.090401 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN NaN NaN
675 1998-08-03 19.625200 19.625200 18.611000 19.111000 0.752655 35140718.0
4872 2015-03-13 290.000000 291.049988 280.000000 280.750000 271.203003 13414248.0
4201 2012-06-22 214.699997 218.449997 212.850006 215.779999 144.681274 35627920.0
88 1996-05-02 28.541500 29.390699 27.574400 28.083900 0.660055 90601602.0
5089 2016-01-29 184.199997 185.750000 178.100006 179.899994 175.976257 24259147.0
In [99]:
temp.columns = ['A','B','CD','EF','GH','i',3939]
temp
Out[99]:
A B CD EF GH i 3939
3849 2011-01-21 252.889999 260.700012 252.199997 259.690002 124.483513 29086400.0
3826 2010-12-21 270.924988 276.500000 270.924988 274.375000 131.522842 19613340.0
1549 2001-12-07 19.342199 19.658300 18.780800 18.856300 1.340288 3678530.0
2520 2005-09-05 76.382698 77.623497 76.109100 77.090401 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN NaN NaN
675 1998-08-03 19.625200 19.625200 18.611000 19.111000 0.752655 35140718.0
4872 2015-03-13 290.000000 291.049988 280.000000 280.750000 271.203003 13414248.0
4201 2012-06-22 214.699997 218.449997 212.850006 215.779999 144.681274 35627920.0
88 1996-05-02 28.541500 29.390699 27.574400 28.083900 0.660055 90601602.0
5089 2016-01-29 184.199997 185.750000 178.100006 179.899994 175.976257 24259147.0

Deleting Column

In [100]:
temp.drop("B",axis=1,inplace = True)
temp
Out[100]:
A CD EF GH i 3939
3849 2011-01-21 260.700012 252.199997 259.690002 124.483513 29086400.0
3826 2010-12-21 276.500000 270.924988 274.375000 131.522842 19613340.0
1549 2001-12-07 19.658300 18.780800 18.856300 1.340288 3678530.0
2520 2005-09-05 77.623497 76.109100 77.090401 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN NaN
675 1998-08-03 19.625200 18.611000 19.111000 0.752655 35140718.0
4872 2015-03-13 291.049988 280.000000 280.750000 271.203003 13414248.0
4201 2012-06-22 218.449997 212.850006 215.779999 144.681274 35627920.0
88 1996-05-02 29.390699 27.574400 28.083900 0.660055 90601602.0
5089 2016-01-29 185.750000 178.100006 179.899994 175.976257 24259147.0
In [101]:
temp.drop(724,axis=0,inplace = True)
temp
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-101-a77a313d8f56> in <module>
----> 1 temp.drop(724,axis=0,inplace = True)
      2 temp

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3938                                            index=index, columns=columns,
   3939                                            level=level, inplace=inplace,
-> 3940                                            errors=errors)
   3941 
   3942     @rewrite_axis_style_signature('mapper', [('copy', True),

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3778         for axis, labels in axes.items():
   3779             if labels is not None:
-> 3780                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3781 
   3782         if inplace:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
   3810                 new_axis = axis.drop(labels, level=level, errors=errors)
   3811             else:
-> 3812                 new_axis = axis.drop(labels, errors=errors)
   3813             result = self.reindex(**{axis_name: new_axis})
   3814 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
   4963             if errors != 'ignore':
   4964                 raise KeyError(
-> 4965                     '{} not found in axis'.format(labels[mask]))
   4966             indexer = indexer[~mask]
   4967         return self.delete(indexer)

KeyError: '[724] not found in axis'
In [102]:
temp.drop([1499,1145],axis=0,inplace = True)
temp
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-102-ad62c42643be> in <module>
----> 1 temp.drop([1499,1145],axis=0,inplace = True)
      2 temp

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3938                                            index=index, columns=columns,
   3939                                            level=level, inplace=inplace,
-> 3940                                            errors=errors)
   3941 
   3942     @rewrite_axis_style_signature('mapper', [('copy', True),

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3778         for axis, labels in axes.items():
   3779             if labels is not None:
-> 3780                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3781 
   3782         if inplace:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
   3810                 new_axis = axis.drop(labels, level=level, errors=errors)
   3811             else:
-> 3812                 new_axis = axis.drop(labels, errors=errors)
   3813             result = self.reindex(**{axis_name: new_axis})
   3814 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
   4963             if errors != 'ignore':
   4964                 raise KeyError(
-> 4965                     '{} not found in axis'.format(labels[mask]))
   4966             indexer = indexer[~mask]
   4967         return self.delete(indexer)

KeyError: '[1499 1145] not found in axis'
In [103]:
temp.drop("GH",axis=1,inplace = True)
temp
Out[103]:
A CD EF i 3939
3849 2011-01-21 260.700012 252.199997 124.483513 29086400.0
3826 2010-12-21 276.500000 270.924988 131.522842 19613340.0
1549 2001-12-07 19.658300 18.780800 1.340288 3678530.0
2520 2005-09-05 77.623497 76.109100 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN
675 1998-08-03 19.625200 18.611000 0.752655 35140718.0
4872 2015-03-13 291.049988 280.000000 271.203003 13414248.0
4201 2012-06-22 218.449997 212.850006 144.681274 35627920.0
88 1996-05-02 29.390699 27.574400 0.660055 90601602.0
5089 2016-01-29 185.750000 178.100006 175.976257 24259147.0
In [104]:
df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})


df2 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
                    'C': ['C0', 'C1', 'C4', 'C3'],
                      'D': ['D0', 'D1', 'D4', 'D3']})
In [105]:
df1
Out[105]:
key A B
0 K0 A0 B0
1 K1 A1 B1
2 K2 A2 B2
3 K3 A3 B3
In [106]:
df1.drop('A',axis=1,inplace = True)
df1
Out[106]:
key B
0 K0 B0
1 K1 B1
2 K2 B2
3 K3 B3
In [107]:
df1.drop(2,axis=0,inplace = True)
df1
Out[107]:
key B
0 K0 B0
1 K1 B1
3 K3 B3

Rowwise and Column wise Fuctions

In [108]:
temp
Out[108]:
A CD EF i 3939
3849 2011-01-21 260.700012 252.199997 124.483513 29086400.0
3826 2010-12-21 276.500000 270.924988 131.522842 19613340.0
1549 2001-12-07 19.658300 18.780800 1.340288 3678530.0
2520 2005-09-05 77.623497 76.109100 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN
675 1998-08-03 19.625200 18.611000 0.752655 35140718.0
4872 2015-03-13 291.049988 280.000000 271.203003 13414248.0
4201 2012-06-22 218.449997 212.850006 144.681274 35627920.0
88 1996-05-02 29.390699 27.574400 0.660055 90601602.0
5089 2016-01-29 185.750000 178.100006 175.976257 24259147.0
In [109]:
temp['CD'].dtype
Out[109]:
dtype('float64')
In [110]:
temp['CD'] = temp['CD'].astype('int32')
temp['CD']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-110-1a7810bf8276> in <module>
----> 1 temp['CD'] = temp['CD'].astype('int32')
      2 temp['CD']

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5689             # else, only a single dtype is given
   5690             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691                                          **kwargs)
   5692             return self._constructor(new_data).__finalize__(self)
   5693 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
    529 
    530     def astype(self, dtype, **kwargs):
--> 531         return self.apply('astype', dtype=dtype, **kwargs)
    532 
    533     def convert(self, **kwargs):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    393                                             copy=align_copy)
    394 
--> 395             applied = getattr(b, f)(**kwargs)
    396             result_blocks = _extend_blocks(applied, result_blocks)
    397 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    532     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    533         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534                             **kwargs)
    535 
    536     def _astype(self, dtype, copy=False, errors='raise', values=None,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    631 
    632                     # _astype_nansafe works fine with 1-d only
--> 633                     values = astype_nansafe(values.ravel(), dtype, copy=True)
    634 
    635                 # TODO(extension)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    674 
    675         if not np.isfinite(arr).all():
--> 676             raise ValueError('Cannot convert non-finite values (NA or inf) to '
    677                              'integer')
    678 

ValueError: Cannot convert non-finite values (NA or inf) to integer
In [111]:
temp
Out[111]:
A CD EF i 3939
3849 2011-01-21 260.700012 252.199997 124.483513 29086400.0
3826 2010-12-21 276.500000 270.924988 131.522842 19613340.0
1549 2001-12-07 19.658300 18.780800 1.340288 3678530.0
2520 2005-09-05 77.623497 76.109100 16.440184 14539912.0
574 1998-03-13 NaN NaN NaN NaN
675 1998-08-03 19.625200 18.611000 0.752655 35140718.0
4872 2015-03-13 291.049988 280.000000 271.203003 13414248.0
4201 2012-06-22 218.449997 212.850006 144.681274 35627920.0
88 1996-05-02 29.390699 27.574400 0.660055 90601602.0
5089 2016-01-29 185.750000 178.100006 175.976257 24259147.0
In [112]:
def range_indicator(input_value):
    output_value = False
    if(input_value > 50):
        output_value = True
    return output_value
In [113]:
range_indicator(278)
Out[113]:
True
In [114]:
temp['CD']
Out[114]:
3849    260.700012
3826    276.500000
1549     19.658300
2520     77.623497
574            NaN
675      19.625200
4872    291.049988
4201    218.449997
88       29.390699
5089    185.750000
Name: CD, dtype: float64
In [115]:
temp['Range_Indicator'] = temp['CD'].apply(lambda x: range_indicator(x))
temp
Out[115]:
A CD EF i 3939 Range_Indicator
3849 2011-01-21 260.700012 252.199997 124.483513 29086400.0 True
3826 2010-12-21 276.500000 270.924988 131.522842 19613340.0 True
1549 2001-12-07 19.658300 18.780800 1.340288 3678530.0 False
2520 2005-09-05 77.623497 76.109100 16.440184 14539912.0 True
574 1998-03-13 NaN NaN NaN NaN False
675 1998-08-03 19.625200 18.611000 0.752655 35140718.0 False
4872 2015-03-13 291.049988 280.000000 271.203003 13414248.0 True
4201 2012-06-22 218.449997 212.850006 144.681274 35627920.0 True
88 1996-05-02 29.390699 27.574400 0.660055 90601602.0 False
5089 2016-01-29 185.750000 178.100006 175.976257 24259147.0 True
In [116]:
lambda x: range_indicator(x)
Out[116]:
<function __main__.<lambda>(x)>
In [117]:
for each_item in [12,4,5,321]:
    print(range_indicator(each_item))
False
False
False
True
In [118]:
list(map(range_indicator,[12,4,5,321]))
Out[118]:
[False, False, False, True]
In [119]:
list(map(range_indicator,temp['CD']))
Out[119]:
[True, True, False, True, False, False, True, True, False, True]
In [ ]:
 

Pandas Visualization

In [120]:
import matplotlib.pyplot as plt
In [121]:
df.head()
Out[121]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [122]:
df.tail()
Out[122]:
NewDate Open High Low Close Adj Close Volume
5520 2017-10-31 313.000000 313.850006 304.299988 305.799988 305.799988 16455750.0
5521 2017-11-01 309.399994 321.450012 309.399994 319.850006 319.850006 34907283.0
5522 2017-11-02 320.000000 323.149994 313.549988 314.350006 314.350006 23110527.0
5523 2017-11-03 315.450012 327.500000 312.600006 325.000000 325.000000 26898400.0
5524 2017-11-06 323.600006 330.000000 321.450012 329.000000 329.000000 17143303.0
In [123]:
df.plot(x='NewDate',y = 'Open')
Out[123]:
<matplotlib.axes._subplots.AxesSubplot at 0x2c1f06bd2b0>
In [124]:
df.plot(x='Open',y = 'Close',kind='scatter')
Out[124]:
<matplotlib.axes._subplots.AxesSubplot at 0x2c1f074a6a0>
In [125]:
df['Open'].hist()
Out[125]:
<matplotlib.axes._subplots.AxesSubplot at 0x2c1f00c4160>
In [126]:
df.shape
Out[126]:
(5525, 7)

Statistics

In [127]:
df.head()
Out[127]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [130]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5525 entries, 0 to 5524
Data columns (total 7 columns):
NewDate      5525 non-null object
Open         5401 non-null float64
High         5401 non-null float64
Low          5401 non-null float64
Close        5401 non-null float64
Adj Close    5401 non-null float64
Volume       5401 non-null float64
dtypes: float64(6), object(1)
memory usage: 302.2+ KB
In [131]:
df.describe()
Out[131]:
Open High Low Close Adj Close Volume
count 5401.000000 5401.000000 5401.000000 5401.000000 5401.000000 5.401000e+03
mean 123.851330 125.677336 121.835167 123.665226 75.381429 3.252849e+07
std 98.179606 99.383543 96.796041 98.012898 91.862829 3.718180e+07
min 13.478200 13.959400 13.214000 13.346100 0.401045 0.000000e+00
25% 24.946699 25.409000 24.328699 24.809900 1.202995 1.302272e+07
50% 94.918198 96.616501 93.507599 94.814400 25.289371 2.101967e+07
75% 217.725006 220.850006 214.000000 217.227005 134.646362 3.698718e+07
max 345.899994 351.500000 341.325012 343.345001 329.000000 4.469483e+08
In [132]:
df["Open"].hist()
Out[132]:
<matplotlib.axes._subplots.AxesSubplot at 0x2c1f058c588>
In [194]:
df.corr()
Out[194]:
Open High Low Close Adj Close Volume
Open 1.000000 0.999716 0.999686 0.999410 0.903800 -0.332790
High 0.999716 1.000000 0.999577 0.999734 0.903005 -0.330531
Low 0.999686 0.999577 1.000000 0.999733 0.905389 -0.334000
Close 0.999410 0.999734 0.999733 1.000000 0.904266 -0.331829
Adj Close 0.903800 0.903005 0.905389 0.904266 1.000000 -0.282631
Volume -0.332790 -0.330531 -0.334000 -0.331829 -0.282631 1.000000
In [200]:
df.head()
Out[200]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [204]:
df.head().sum(axis=0).sum()
Out[204]:
330884417.046214
In [133]:
a = [1,2,3,4,5,10,20,50]
In [134]:
sum(a)/len(a)
Out[134]:
11.875
In [136]:
#Standard Deviation
import numpy as np
In [137]:
a  = np.asarray(a)
In [138]:
a
Out[138]:
array([ 1,  2,  3,  4,  5, 10, 20, 50])
In [139]:
np.mean(a)
Out[139]:
11.875
In [140]:
np.std(a)
Out[140]:
15.519644809079878
In [141]:
df.head()
Out[141]:
NewDate Open High Low Close Adj Close Volume
0 1996-01-01 18.691200 18.978901 18.540199 18.823200 0.442401 43733533.0
1 1996-01-02 18.893999 18.964800 17.738199 18.224100 0.428321 56167281.0
2 1996-01-03 18.327900 18.568501 17.643801 17.738199 0.416900 68296319.0
3 1996-01-04 17.502300 17.832500 17.224001 17.676901 0.415460 86073880.0
4 1996-01-05 17.738199 17.785400 17.459900 17.577801 0.413131 76613040.0
In [142]:
df1
Out[142]:
key B
0 K0 B0
1 K1 B1
3 K3 B3
In [146]:
df.apply(lambda x: x['Open']+ x['High'])
Out[146]:
0        37.670101
1        37.858799
2        36.896401
3        35.334800
4        35.523599
5        35.122502
6        35.570700
7        35.122601
8        34.910301
9        36.113200
10       36.221700
11       37.434198
12       35.830200
13       34.943300
14       34.962101
15       36.051899
16       37.160600
17       35.759401
18       36.391600
19             NaN
20       37.122900
21       38.759800
22       37.906000
23       38.047399
24       40.807298
25       43.968100
26       45.619200
27       43.958601
28       44.892799
29       47.223200
           ...    
5495    541.149994
5496    534.649994
5497    523.299988
5498    518.149994
5499    519.799988
5500    503.800003
5501    510.199997
5502    510.250000
5503    506.299988
5504    507.800003
5505    510.049988
5506    515.399994
5507    515.050018
5508    515.600006
5509    504.949997
5510    506.450012
5511    509.300003
5512    506.750000
5513    493.800003
5514    490.800003
5515    501.300003
5516    607.899994
5517    681.299988
5518    643.850006
5519    634.350006
5520    626.850006
5521    630.850006
5522    643.149994
5523    642.950012
5524    653.600006
Length: 5525, dtype: float64
In [149]:
df[["Open","High","Low","Close"]].apply(lambda x: 2*x)
Out[149]:
Open High Low Close
0 37.382400 37.957802 37.080398 37.646400
1 37.787998 37.929600 35.476398 36.448200
2 36.655800 37.137002 35.287602 35.476398
3 35.004600 35.665000 34.448002 35.353802
4 35.476398 35.570800 34.919800 35.155602
5 34.957402 35.287602 33.844200 34.127202
6 33.778000 37.363400 33.410000 35.995400
7 34.816002 35.429200 34.108200 34.344200
8 33.966800 35.853802 33.966800 35.655602
9 35.759400 36.467000 35.146198 35.674598
10 36.042400 36.401000 35.570800 35.787800
11 35.806598 39.061798 35.570800 36.542598
12 35.712200 35.948200 34.910198 35.042400
13 34.636600 35.250000 34.249802 34.806400
14 34.580002 35.344200 34.580002 35.221600
15 35.476398 36.627400 35.401000 36.448200
16 36.778400 37.542800 36.325600 36.835000
17 35.665000 35.853802 34.702800 35.589600
18 36.231202 36.551998 35.759400 35.929200
19 NaN NaN NaN NaN
20 36.325600 37.920200 36.146198 37.571000
21 38.118198 39.401402 37.174800 38.410800
22 37.552200 38.259800 36.429402 37.099202
23 36.844398 39.250400 36.844398 38.939000
24 39.722198 41.892398 39.439202 41.628200
25 42.175400 45.760800 42.175400 44.251202
26 44.911598 46.326802 43.590600 44.628602
27 43.194400 44.722802 41.515000 43.845402
28 43.977600 45.807998 43.156600 45.090802
29 45.524800 48.921600 44.703998 48.279998
... ... ... ... ...
5495 539.500000 542.799988 534.700012 537.000000
5496 534.500000 534.799988 522.200012 523.700012
5497 523.000000 523.599976 508.799988 517.500000
5498 517.000000 519.299988 512.000000 516.500000
5499 519.799988 519.799988 499.000000 500.799988
5500 500.500000 507.100006 499.000000 505.100006
5501 509.399994 511.000000 506.100006 507.700012
5502 507.700012 512.799988 501.200012 502.600006
5503 503.299988 509.299988 502.700012 506.399994
5504 507.399994 508.200012 498.399994 503.200012
5505 504.299988 515.799988 504.200012 513.500000
5506 513.599976 517.200012 508.500000 513.700012
5507 513.700012 516.400024 511.399994 513.900024
5508 514.299988 516.900024 500.899994 503.500000
5509 504.000000 505.899994 497.000000 502.399994
5510 502.200012 510.700012 500.399994 504.200012
5511 509.200012 509.399994 500.200012 504.100006
5512 504.100006 509.399994 501.299988 502.299988
5513 492.000000 495.600006 485.100006 487.500000
5514 487.600006 494.000000 482.299988 491.899994
5515 492.000000 510.600006 491.899994 508.899994
5516 559.700012 656.099976 559.700012 649.799988
5517 660.000000 702.599976 635.000000 641.000000
5518 640.500000 647.200012 620.200012 622.099976
5519 624.000000 644.700012 621.200012 624.000000
5520 626.000000 627.700012 608.599976 611.599976
5521 618.799988 642.900024 618.799988 639.700012
5522 640.000000 646.299988 627.099976 628.700012
5523 630.900024 655.000000 625.200012 650.000000
5524 647.200012 660.000000 642.900024 658.000000

5525 rows × 4 columns

In [154]:
df["NewDate"] = pd.to_datetime(df["NewDate"],format='%Y-%m-%d')
In [155]:
df["NewDate"].dtype
Out[155]:
dtype('<M8[ns]')
In [159]:
df['NewDate'].iloc[0].day
Out[159]:
1
In [161]:
df['NewDate'].iloc[0].date()
Out[161]:
datetime.date(1996, 1, 1)
In [162]:
import datetime
In [166]:
datetime.timedelta(days=10,hours=10)
Out[166]:
datetime.timedelta(days=10, seconds=36000)
In [168]:
df['NewDate'].iloc[0] + datetime.timedelta(days=10,hours=10)
Out[168]:
Timestamp('1996-01-11 10:00:00')
In [169]:
a = df['NewDate'].iloc[0] + datetime.timedelta(days=10,hours=10)
In [170]:
a
Out[170]:
Timestamp('1996-01-11 10:00:00')
In [171]:
a.strptime()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-171-b950a9e83a09> in <module>
----> 1 a.strptime()

TypeError: strptime() takes exactly 2 arguments (0 given)
In [172]:
import datetime
d = datetime.datetime.strptime("01/27/2012", "%m/%d/%Y")
In [173]:
d
Out[173]:
datetime.datetime(2012, 1, 27, 0, 0)
In [175]:
b = datetime.datetime.strftime(d,"%Y-%m-%d")
b
Out[175]:
'2012-01-27'
In [176]:
type(b)
Out[176]:
str
In [178]:
import time
In [181]:
time.ctime()
Out[181]:
str

TimeStamp Folder Creation

In [187]:
timestamp_now = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
In [188]:
import os
In [191]:
timestamp_now = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
main_folder_path = r'E:\openknowledgeshare.blogspot.com\Python\Outputs'
os.makedirs(os.path.join(main_folder_path,timestamp_now))

Time Taken

In [193]:
start_time = time.clock()

for i in range(0,10):
    i
    
end_time = time.clock()
print("Time taken for this code is {} seconds".format(end_time - start_time))
Time taken for this code is 0.00012940000033268007 seconds
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  """Entry point for launching an IPython kernel.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:7: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  import sys

Stacking

In [205]:
df1
Out[205]:
key B
0 K0 B0
1 K1 B1
3 K3 B3
In [206]:
df2
Out[206]:
key C D
0 K0 C0 D0
1 K1 C1 D1
2 K4 C4 D4
3 K3 C3 D3
In [207]:
df2.stack()
Out[207]:
0  key    K0
   C      C0
   D      D0
1  key    K1
   C      C1
   D      D1
2  key    K4
   C      C4
   D      D4
3  key    K3
   C      C3
   D      D3
dtype: object
In [208]:
df2.unstack('')
Out[208]:
key  0    K0
     1    K1
     2    K4
     3    K3
C    0    C0
     1    C1
     2    C4
     3    C3
D    0    D0
     1    D1
     2    D4
     3    D3
dtype: object
In [211]:
import pandas as pd
import numpy as np

header = pd.MultiIndex.from_product([['Semester1','Semester2'],['Maths','Science']])

d=([[12,45,67,56],[78,89,45,67],[45,67,89,90],[67,44,56,55]])

df = pd.DataFrame(d,
                  columns=['Alisa','Bobby','Cathrine','Jack'],
                  index=header)
In [212]:
df
Out[212]:
Alisa Bobby Cathrine Jack
Semester1 Maths 12 45 67 56
Science 78 89 45 67
Semester2 Maths 45 67 89 90
Science 67 44 56 55
In [214]:
df.xs("Semester1")
Out[214]:
Alisa Bobby Cathrine Jack
Maths 12 45 67 56
Science 78 89 45 67
In [215]:
df.xs("Semester2")
Out[215]:
Alisa Bobby Cathrine Jack
Maths 45 67 89 90
Science 67 44 56 55
In [219]:
df.xs("Maths",level=1)
Out[219]:
Alisa Bobby Cathrine Jack
Semester1 12 45 67 56
Semester2 45 67 89 90
In [220]:
df
Out[220]:
Alisa Bobby Cathrine Jack
Semester1 Maths 12 45 67 56
Science 78 89 45 67
Semester2 Maths 45 67 89 90
Science 67 44 56 55
In [224]:
df.unstack()
Out[224]:
Alisa Bobby Cathrine Jack
Maths Science Maths Science Maths Science Maths Science
Semester1 12 78 45 89 67 45 56 67
Semester2 45 67 67 44 89 56 90 55
In [ ]:
 

Comments

Popular posts from this blog

How to run Jupyter Notebooks in Cloud

How to download , install and configure Anaconda - Python

Project: Implementation of Label Editor