Hello all,

This is the fourth article in the series Python for Data Science. If you are new to this series, we would recommend you to read our previous articles

To install pandas in your python environment, please use the below command

pip install pandas

Please refer the videos below for detailed explanation on Pandas

After you have installed Pandas, please refer the following notebook to understand on how to use Pandas functionalities.

In [1]:

import pandas as pd
import os

Creating a DataFrame¶

In [2]:

data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
pd.DataFrame([data1,data2,data3],columns = ['a','b','c'],index = ['row1','row2','row3'])

Out[2]:

	a	b	c
row1	1	2	3
row2	4	5	6
row3	7	8	9

In [3]:

pd.DataFrame([data1,data2,data3])

Out[3]:

	0	1	2
0	1	2	3
1	4	5	6
2	7	8	9

Reading CSV thru Pandas¶

In [4]:

csv_file_path = r"E:\openknowledgeshare.blogspot.com\Python\Data\SBIN.NS.csv"

In [5]:

df = pd.read_csv(csv_file_path)
df

Out[5]:

	Date	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0
5	1996-01-08	17.478701	17.643801	16.922100	17.063601	0.401045	55395173.0
6	1996-01-09	16.889000	18.681700	16.705000	17.997700	0.422999	82057541.0
7	1996-01-10	17.408001	17.714600	17.054100	17.172100	0.403595	54360749.0
8	1996-01-11	16.983400	17.926901	16.983400	17.827801	0.419006	65973105.0
9	1996-01-12	17.879700	18.233500	17.573099	17.837299	0.419230	102152486.0
10	1996-01-15	18.021200	18.200500	17.785400	17.893900	0.420560	68169665.0
11	1996-01-16	17.903299	19.530899	17.785400	18.271299	0.429430	92497162.0
12	1996-01-17	17.856100	17.974100	17.455099	17.521200	0.411800	65477621.0
13	1996-01-18	17.318300	17.625000	17.124901	17.403200	0.409027	67620658.0
14	1996-01-19	17.290001	17.672100	17.290001	17.610800	0.413906	47652365.0
15	1996-01-22	17.738199	18.313700	17.700500	18.224100	0.428321	81512773.0
16	1996-01-23	18.389200	18.771400	18.162800	18.417500	0.432866	130297069.0
17	1996-01-24	17.832500	17.926901	17.351400	17.794800	0.418231	54484753.0
18	1996-01-25	18.115601	18.275999	17.879700	17.964600	0.422222	67701207.0
19	1996-01-26	NaN	NaN	NaN	NaN	NaN	NaN
20	1996-01-29	18.162800	18.960100	18.073099	18.785500	0.441515	75689902.0
21	1996-01-30	19.059099	19.700701	18.587400	19.205400	0.451384	89163372.0
22	1996-01-31	18.776100	19.129900	18.214701	18.549601	0.435971	56831283.0
23	1996-02-01	18.422199	19.625200	18.422199	19.469500	0.457591	96599350.0
24	1996-02-02	19.861099	20.946199	19.719601	20.814100	0.489193	106010377.0
25	1996-02-05	21.087700	22.880400	21.087700	22.125601	0.520017	160942921.0
26	1996-02-06	22.455799	23.163401	21.795300	22.314301	0.524452	147345447.0
27	1996-02-07	21.597200	22.361401	20.757500	21.922701	0.515249	67480756.0
28	1996-02-08	21.988800	22.903999	21.578300	22.545401	0.529884	52796396.0
29	1996-02-09	22.762400	24.460800	22.351999	24.139999	0.567362	74580228.0
...	...	...	...	...	...	...	...
5495	2017-09-21	269.750000	271.399994	267.350006	268.500000	268.500000	8509419.0
5496	2017-09-22	267.250000	267.399994	261.100006	261.850006	261.850006	12860892.0
5497	2017-09-25	261.500000	261.799988	254.399994	258.750000	258.750000	13937050.0
5498	2017-09-26	258.500000	259.649994	256.000000	258.250000	258.250000	11728345.0
5499	2017-09-27	259.899994	259.899994	249.500000	250.399994	250.399994	17080769.0
5500	2017-09-28	250.250000	253.550003	249.500000	252.550003	252.550003	17776419.0
5501	2017-09-29	254.699997	255.500000	253.050003	253.850006	253.850006	10073641.0
5502	2017-10-03	253.850006	256.399994	250.600006	251.300003	251.300003	10547836.0
5503	2017-10-04	251.649994	254.649994	251.350006	253.199997	253.199997	14063896.0
5504	2017-10-05	253.699997	254.100006	249.199997	251.600006	251.600006	13476692.0
5505	2017-10-06	252.149994	257.899994	252.100006	256.750000	256.750000	11932139.0
5506	2017-10-09	256.799988	258.600006	254.250000	256.850006	256.850006	8009211.0
5507	2017-10-10	256.850006	258.200012	255.699997	256.950012	256.950012	9742878.0
5508	2017-10-11	257.149994	258.450012	250.449997	251.750000	251.750000	14192548.0
5509	2017-10-12	252.000000	252.949997	248.500000	251.199997	251.199997	13404759.0
5510	2017-10-13	251.100006	255.350006	250.199997	252.100006	252.100006	15697555.0
5511	2017-10-16	254.600006	254.699997	250.100006	252.050003	252.050003	12171830.0
5512	2017-10-17	252.050003	254.699997	250.649994	251.149994	251.149994	13505216.0
5513	2017-10-18	246.000000	247.800003	242.550003	243.750000	243.750000	20677891.0
5514	2017-10-23	243.800003	247.000000	241.149994	245.949997	245.949997	12360204.0
5515	2017-10-24	246.000000	255.300003	245.949997	254.449997	254.449997	17197805.0
5516	2017-10-25	279.850006	328.049988	279.850006	324.899994	324.899994	262677081.0
5517	2017-10-26	330.000000	351.299988	317.500000	320.500000	320.500000	112804243.0
5518	2017-10-27	320.250000	323.600006	310.100006	311.049988	311.049988	27580765.0
5519	2017-10-30	312.000000	322.350006	310.600006	312.000000	312.000000	28530254.0
5520	2017-10-31	313.000000	313.850006	304.299988	305.799988	305.799988	16455750.0
5521	2017-11-01	309.399994	321.450012	309.399994	319.850006	319.850006	34907283.0
5522	2017-11-02	320.000000	323.149994	313.549988	314.350006	314.350006	23110527.0
5523	2017-11-03	315.450012	327.500000	312.600006	325.000000	325.000000	26898400.0
5524	2017-11-06	323.600006	330.000000	321.450012	329.000000	329.000000	17143303.0

5525 rows × 7 columns

In [6]:

df.shape

Out[6]:

(5525, 7)

In [7]:

df.columns

Out[7]:

Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')

In [8]:

df.rename(columns={'Date':'NewDate'},inplace=True)

In [9]:

df

Out[9]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0
5	1996-01-08	17.478701	17.643801	16.922100	17.063601	0.401045	55395173.0
6	1996-01-09	16.889000	18.681700	16.705000	17.997700	0.422999	82057541.0
7	1996-01-10	17.408001	17.714600	17.054100	17.172100	0.403595	54360749.0
8	1996-01-11	16.983400	17.926901	16.983400	17.827801	0.419006	65973105.0
9	1996-01-12	17.879700	18.233500	17.573099	17.837299	0.419230	102152486.0
10	1996-01-15	18.021200	18.200500	17.785400	17.893900	0.420560	68169665.0
11	1996-01-16	17.903299	19.530899	17.785400	18.271299	0.429430	92497162.0
12	1996-01-17	17.856100	17.974100	17.455099	17.521200	0.411800	65477621.0
13	1996-01-18	17.318300	17.625000	17.124901	17.403200	0.409027	67620658.0
14	1996-01-19	17.290001	17.672100	17.290001	17.610800	0.413906	47652365.0
15	1996-01-22	17.738199	18.313700	17.700500	18.224100	0.428321	81512773.0
16	1996-01-23	18.389200	18.771400	18.162800	18.417500	0.432866	130297069.0
17	1996-01-24	17.832500	17.926901	17.351400	17.794800	0.418231	54484753.0
18	1996-01-25	18.115601	18.275999	17.879700	17.964600	0.422222	67701207.0
19	1996-01-26	NaN	NaN	NaN	NaN	NaN	NaN
20	1996-01-29	18.162800	18.960100	18.073099	18.785500	0.441515	75689902.0
21	1996-01-30	19.059099	19.700701	18.587400	19.205400	0.451384	89163372.0
22	1996-01-31	18.776100	19.129900	18.214701	18.549601	0.435971	56831283.0
23	1996-02-01	18.422199	19.625200	18.422199	19.469500	0.457591	96599350.0
24	1996-02-02	19.861099	20.946199	19.719601	20.814100	0.489193	106010377.0
25	1996-02-05	21.087700	22.880400	21.087700	22.125601	0.520017	160942921.0
26	1996-02-06	22.455799	23.163401	21.795300	22.314301	0.524452	147345447.0
27	1996-02-07	21.597200	22.361401	20.757500	21.922701	0.515249	67480756.0
28	1996-02-08	21.988800	22.903999	21.578300	22.545401	0.529884	52796396.0
29	1996-02-09	22.762400	24.460800	22.351999	24.139999	0.567362	74580228.0
...	...	...	...	...	...	...	...
5495	2017-09-21	269.750000	271.399994	267.350006	268.500000	268.500000	8509419.0
5496	2017-09-22	267.250000	267.399994	261.100006	261.850006	261.850006	12860892.0
5497	2017-09-25	261.500000	261.799988	254.399994	258.750000	258.750000	13937050.0
5498	2017-09-26	258.500000	259.649994	256.000000	258.250000	258.250000	11728345.0
5499	2017-09-27	259.899994	259.899994	249.500000	250.399994	250.399994	17080769.0
5500	2017-09-28	250.250000	253.550003	249.500000	252.550003	252.550003	17776419.0
5501	2017-09-29	254.699997	255.500000	253.050003	253.850006	253.850006	10073641.0
5502	2017-10-03	253.850006	256.399994	250.600006	251.300003	251.300003	10547836.0
5503	2017-10-04	251.649994	254.649994	251.350006	253.199997	253.199997	14063896.0
5504	2017-10-05	253.699997	254.100006	249.199997	251.600006	251.600006	13476692.0
5505	2017-10-06	252.149994	257.899994	252.100006	256.750000	256.750000	11932139.0
5506	2017-10-09	256.799988	258.600006	254.250000	256.850006	256.850006	8009211.0
5507	2017-10-10	256.850006	258.200012	255.699997	256.950012	256.950012	9742878.0
5508	2017-10-11	257.149994	258.450012	250.449997	251.750000	251.750000	14192548.0
5509	2017-10-12	252.000000	252.949997	248.500000	251.199997	251.199997	13404759.0
5510	2017-10-13	251.100006	255.350006	250.199997	252.100006	252.100006	15697555.0
5511	2017-10-16	254.600006	254.699997	250.100006	252.050003	252.050003	12171830.0
5512	2017-10-17	252.050003	254.699997	250.649994	251.149994	251.149994	13505216.0
5513	2017-10-18	246.000000	247.800003	242.550003	243.750000	243.750000	20677891.0
5514	2017-10-23	243.800003	247.000000	241.149994	245.949997	245.949997	12360204.0
5515	2017-10-24	246.000000	255.300003	245.949997	254.449997	254.449997	17197805.0
5516	2017-10-25	279.850006	328.049988	279.850006	324.899994	324.899994	262677081.0
5517	2017-10-26	330.000000	351.299988	317.500000	320.500000	320.500000	112804243.0
5518	2017-10-27	320.250000	323.600006	310.100006	311.049988	311.049988	27580765.0
5519	2017-10-30	312.000000	322.350006	310.600006	312.000000	312.000000	28530254.0
5520	2017-10-31	313.000000	313.850006	304.299988	305.799988	305.799988	16455750.0
5521	2017-11-01	309.399994	321.450012	309.399994	319.850006	319.850006	34907283.0
5522	2017-11-02	320.000000	323.149994	313.549988	314.350006	314.350006	23110527.0
5523	2017-11-03	315.450012	327.500000	312.600006	325.000000	325.000000	26898400.0
5524	2017-11-06	323.600006	330.000000	321.450012	329.000000	329.000000	17143303.0

5525 rows × 7 columns

In [10]:

df.head()

Out[10]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [11]:

df.head(10)

Out[11]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0
5	1996-01-08	17.478701	17.643801	16.922100	17.063601	0.401045	55395173.0
6	1996-01-09	16.889000	18.681700	16.705000	17.997700	0.422999	82057541.0
7	1996-01-10	17.408001	17.714600	17.054100	17.172100	0.403595	54360749.0
8	1996-01-11	16.983400	17.926901	16.983400	17.827801	0.419006	65973105.0
9	1996-01-12	17.879700	18.233500	17.573099	17.837299	0.419230	102152486.0

In [12]:

df.tail()

Out[12]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
5520	2017-10-31	313.000000	313.850006	304.299988	305.799988	305.799988	16455750.0
5521	2017-11-01	309.399994	321.450012	309.399994	319.850006	319.850006	34907283.0
5522	2017-11-02	320.000000	323.149994	313.549988	314.350006	314.350006	23110527.0
5523	2017-11-03	315.450012	327.500000	312.600006	325.000000	325.000000	26898400.0
5524	2017-11-06	323.600006	330.000000	321.450012	329.000000	329.000000	17143303.0

In [13]:

df.describe()

Out[13]:

	Open	High	Low	Close	Adj Close	Volume
count	5401.000000	5401.000000	5401.000000	5401.000000	5401.000000	5.401000e+03
mean	123.851330	125.677336	121.835167	123.665226	75.381429	3.252849e+07
std	98.179606	99.383543	96.796041	98.012898	91.862829	3.718180e+07
min	13.478200	13.959400	13.214000	13.346100	0.401045	0.000000e+00
25%	24.946699	25.409000	24.328699	24.809900	1.202995	1.302272e+07
50%	94.918198	96.616501	93.507599	94.814400	25.289371	2.101967e+07
75%	217.725006	220.850006	214.000000	217.227005	134.646362	3.698718e+07
max	345.899994	351.500000	341.325012	343.345001	329.000000	4.469483e+08

In [14]:

df.head()

Out[14]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [15]:

df.set_index('NewDate').head()

Out[15]:

	Open	High	Low	Close	Adj Close	Volume
NewDate
1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [16]:

df["High"].head()

Out[16]:

0    18.978901
1    18.964800
2    18.568501
3    17.832500
4    17.785400
Name: High, dtype: float64

In [17]:

type(df)

Out[17]:

pandas.core.frame.DataFrame

In [18]:

type(df["High"])

Out[18]:

pandas.core.series.Series

In [19]:

df["High"].to_frame().head()

Out[19]:

	High
0	18.978901
1	18.964800
2	18.568501
3	17.832500
4	17.785400

In [20]:

df.iloc[10:20]

Out[20]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
10	1996-01-15	18.021200	18.200500	17.785400	17.893900	0.420560	68169665.0
11	1996-01-16	17.903299	19.530899	17.785400	18.271299	0.429430	92497162.0
12	1996-01-17	17.856100	17.974100	17.455099	17.521200	0.411800	65477621.0
13	1996-01-18	17.318300	17.625000	17.124901	17.403200	0.409027	67620658.0
14	1996-01-19	17.290001	17.672100	17.290001	17.610800	0.413906	47652365.0
15	1996-01-22	17.738199	18.313700	17.700500	18.224100	0.428321	81512773.0
16	1996-01-23	18.389200	18.771400	18.162800	18.417500	0.432866	130297069.0
17	1996-01-24	17.832500	17.926901	17.351400	17.794800	0.418231	54484753.0
18	1996-01-25	18.115601	18.275999	17.879700	17.964600	0.422222	67701207.0
19	1996-01-26	NaN	NaN	NaN	NaN	NaN	NaN

In [21]:

df.iloc[10:20][["Close","High"]]

Out[21]:

	Close	High
10	17.893900	18.200500
11	18.271299	19.530899
12	17.521200	17.974100
13	17.403200	17.625000
14	17.610800	17.672100
15	18.224100	18.313700
16	18.417500	18.771400
17	17.794800	17.926901
18	17.964600	18.275999
19	NaN	NaN

In [22]:

df.iloc[[1,3,10]][["Close","High"]]

Out[22]:

	Close	High
1	18.224100	18.9648
3	17.676901	17.8325
10	17.893900	18.2005

In [23]:

a = df.head().values
a

Out[23]:

array([['1996-01-01', 18.6912, 18.978901, 18.540198999999998, 18.8232,
        0.44240100000000004, 43733533.0],
       ['1996-01-02', 18.893998999999997, 18.9648, 17.738198999999998,
        18.2241, 0.428321, 56167281.0],
       ['1996-01-03', 18.3279, 18.568501, 17.643801, 17.738198999999998,
        0.4169, 68296319.0],
       ['1996-01-04', 17.5023, 17.8325, 17.224001, 17.676901,
        0.41546000000000005, 86073880.0],
       ['1996-01-05', 17.738198999999998, 17.7854, 17.4599, 17.577801,
        0.413131, 76613040.0]], dtype=object)

In [24]:

a.shape

Out[24]:

(5, 7)

In [25]:

type(a)

Out[25]:

numpy.ndarray

In [26]:

df.head()

Out[26]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [ ]:

Manipulating Cell Values of Dataframe¶

In [27]:

data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
df1 = pd.DataFrame([data1,data2,data3],columns = ['a','b','c'])
df1

Out[27]:

	a	b	c
0	1	2	3
1	4	5	6
2	7	8	9

In [28]:

df1.iloc[1]['c']

Out[28]:

In [29]:

df1.iloc[1]['c'] = 10
df1

Out[29]:

	a	b	c
0	1	2	3
1	4	5	10
2	7	8	9

In [30]:

df1.iloc[2]

Out[30]:

a    7
b    8
c    9
Name: 2, dtype: int64

In [31]:

type(df1.iloc[2])

Out[31]:

pandas.core.series.Series

In [32]:

df1.iloc[2] = [10,20,30]

In [33]:

df1

Out[33]:

	a	b	c
0	1	2	3
1	4	5	10
2	10	20	30

In [34]:

df1.iloc[[1,2]] = [[100,200,300],[1030,2010,3000]]

In [35]:

df1

Out[35]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000

In [36]:

df1.iloc[3] = [10,4,1]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-36-4d6d8c5cbdab> in <module>
----> 1 df1.iloc[3] = [10,4,1]

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __setitem__(self, key, value)
    187         else:
    188             key = com.apply_if_callable(key, self.obj)
--> 189         indexer = self._get_setitem_indexer(key)
    190         self._setitem_with_indexer(indexer, value)
    191 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_setitem_indexer(self, key)
    173 
    174         try:
--> 175             return self._convert_to_indexer(key, is_setter=True)
    176         except TypeError as e:
    177 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   2245 
   2246         try:
-> 2247             self._validate_key(obj, axis)
   2248             return obj
   2249         except ValueError:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_key(self, key, axis)
   2068             return
   2069         elif is_integer(key):
-> 2070             self._validate_integer(key, axis)
   2071         elif isinstance(key, tuple):
   2072             # a tuple should already have been caught by this point

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _validate_integer(self, key, axis)
   2137         len_axis = len(self.obj._get_axis(axis))
   2138         if key >= len_axis or key < -len_axis:
-> 2139             raise IndexError("single positional indexer is out-of-bounds")
   2140 
   2141     def _getitem_tuple(self, tup):

IndexError: single positional indexer is out-of-bounds

Appending Row to DataFrame¶

In [37]:

a = [1,2,3]
a.append(5)
a

Out[37]:

[1, 2, 3, 5]

In [38]:

df1

Out[38]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000

In [39]:

df1.append([[10,8,4]])

Out[39]:

	a	b	c	0	1	2
0	1.0	2.0	3.0	NaN	NaN	NaN
1	100.0	200.0	300.0	NaN	NaN	NaN
2	1030.0	2010.0	3000.0	NaN	NaN	NaN
0	NaN	NaN	NaN	10.0	8.0	4.0

In [40]:

df1

Out[40]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000

Case 1: DataFrame without Column Names¶

In [41]:

data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
df2 = pd.DataFrame([data1,data2,data3])
df2

Out[41]:

	0	1	2
0	1	2	3
1	4	5	6
2	7	8	9

In [42]:

df2 = df2.append([[10,20,30]])
df2

Out[42]:

	0	1	2
0	1	2	3
1	4	5	6
2	7	8	9
0	10	20	30

In [43]:

df2.reset_index()

Out[43]:

	index	0	1	2
0	0	1	2	3
1	1	4	5	6
2	2	7	8	9
3	0	10	20	30

In [44]:

df2.reset_index(drop=True,inplace = True)
df2

Out[44]:

	0	1	2
0	1	2	3
1	4	5	6
2	7	8	9
3	10	20	30

Case2 : DataFrame with Column Names¶

In [45]:

df1

Out[45]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000

In [46]:

df1.append([[10,20,30]],ignore_index=True)

Out[46]:

	a	b	c	0	1	2
0	1.0	2.0	3.0	NaN	NaN	NaN
1	100.0	200.0	300.0	NaN	NaN	NaN
2	1030.0	2010.0	3000.0	NaN	NaN	NaN
3	NaN	NaN	NaN	10.0	20.0	30.0

In [47]:

df1.append({'a':10,'b':20,'c':30},ignore_index=True)

Out[47]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000
3	10	20	30

In [48]:

columnnames_list = list(df1.columns)
columnnames_list

Out[48]:

['a', 'b', 'c']

In [49]:

data = [1000,2000,3000]

In [50]:

data_dict = {}
for i in range(0,len(columnnames_list)):
    each_column = columnnames_list[i]
    this_data_point = data[i]
    data_dict[each_column] = this_data_point
data_dict

Out[50]:

{'a': 1000, 'b': 2000, 'c': 3000}

In [51]:

df1 = df1.append(data_dict,ignore_index = True)
df1

Out[51]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000
3	1000	2000	3000

In [52]:

df1

Out[52]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000
3	1000	2000	3000

Sampling DataFrame¶

In [53]:

import numpy as np

In [54]:

a = [1,2,3,5,6,7]
np.random.choice(a,3)

Out[54]:

array([5, 7, 1])

In [55]:

df.sample(10)

Out[55]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
395	1997-07-07	33.509102	33.768600	33.325199	33.376999	1.042259	89206826.0
339	1997-04-18	NaN	NaN	NaN	NaN	NaN	NaN
3782	2010-10-15	328.100006	328.100006	315.024994	316.565002	151.746780	14198900.0
3961	2011-07-05	243.899994	247.839996	240.949997	247.330002	136.146362	15940790.0
2912	2007-04-10	93.597298	94.205803	91.049797	93.521797	24.729269	11069930.0
426	1997-08-19	29.933201	30.268200	29.725599	29.886000	0.933246	47957605.0
4316	2012-12-11	232.970001	233.854996	228.270004	230.880005	154.805847	20557710.0
2018	2003-09-25	39.627899	40.363800	39.156101	39.344799	4.891809	64702354.0
3431	2009-05-22	168.699997	173.990005	167.600006	173.199997	60.047340	27543220.0
3930	2011-05-23	231.399994	231.399994	223.824997	224.975006	123.840775	24033650.0

Merge, Join and Concatenate DataFrames¶

In [56]:

df1

Out[56]:

	a	b	c
0	1	2	3
1	100	200	300
2	1030	2010	3000
3	1000	2000	3000

In [57]:

df2

Out[57]:

	0	1	2
0	1	2	3
1	4	5	6
2	7	8	9
3	10	20	30

In [58]:

df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
df3

Out[58]:

	a	b	c	0	1	2
0	1	2	3	1	2	3
1	100	200	300	4	5	6
2	1030	2010	3000	7	8	9
3	1000	2000	3000	10	20	30

In [59]:

data1 = [1,2,3]
data2 = [4,5,6]
data3 = [7,8,9]
df2 = pd.DataFrame([data1,data2,data3])
df2

Out[59]:

	0	1	2
0	1	2	3
1	4	5	6
2	7	8	9

In [60]:

df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
df3

Out[60]:

	a	b	c	0	1	2
0	1	2	3	1.0	2.0	3.0
1	100	200	300	4.0	5.0	6.0
2	1030	2010	3000	7.0	8.0	9.0
3	1000	2000	3000	NaN	NaN	NaN

In [61]:

df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='inner')
df3

Out[61]:

	a	b	c	0	1	2
0	1	2	3	1	2	3
1	100	200	300	4	5	6
2	1030	2010	3000	7	8	9

In [62]:

df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='left')
df3

Out[62]:

	a	b	c	0	1	2
0	1	2	3	1.0	2.0	3.0
1	100	200	300	4.0	5.0	6.0
2	1030	2010	3000	7.0	8.0	9.0
3	1000	2000	3000	NaN	NaN	NaN

In [63]:

df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='right')
df3

Out[63]:

	a	b	c	0	1	2
0	1	2	3	1	2	3
1	100	200	300	4	5	6
2	1030	2010	3000	7	8	9

In [64]:

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})


df2 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
                    'C': ['C0', 'C1', 'C4', 'C3'],
                      'D': ['D0', 'D1', 'D4', 'D3']})

In [65]:

df1

Out[65]:

	key	A	B
0	K0	A0	B0
1	K1	A1	B1
2	K2	A2	B2
3	K3	A3	B3

In [66]:

df2

Out[66]:

	key	C	D
0	K0	C0	D0
1	K1	C1	D1
2	K4	C4	D4
3	K3	C3	D3

In [67]:

df3 = pd.merge(df1,df2,left_index=True,right_index=True,how='outer')
df3

Out[67]:

	key_x	A	B	key_y	C	D
0	K0	A0	B0	K0	C0	D0
1	K1	A1	B1	K1	C1	D1
2	K2	A2	B2	K4	C4	D4
3	K3	A3	B3	K3	C3	D3

In [68]:

df3 = pd.merge(df1,df2,on ='key',how='outer')
df3

Out[68]:

	key	A	B	C	D
0	K0	A0	B0	C0	D0
1	K1	A1	B1	C1	D1
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	C3	D3
4	K4	NaN	NaN	C4	D4

In [69]:

df3 = pd.merge(df1,df2,on ='key',how='inner')
df3

Out[69]:

	key	A	B	C	D
0	K0	A0	B0	C0	D0
1	K1	A1	B1	C1	D1
2	K3	A3	B3	C3	D3

In [70]:

df3 = pd.merge(df1,df2,on ='key',how='left')
df3

Out[70]:

	key	A	B	C	D
0	K0	A0	B0	C0	D0
1	K1	A1	B1	C1	D1
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	C3	D3

In [71]:

df3 = pd.merge(df1,df2,on ='key',how='right')
df3

Out[71]:

	key	A	B	C	D
0	K0	A0	B0	C0	D0
1	K1	A1	B1	C1	D1
2	K3	A3	B3	C3	D3
3	K4	NaN	NaN	C4	D4

Concat¶

In [72]:

s3 = pd.Series([0, 1, 2, 3], name='foo')
s4 = pd.Series([4, 5, 6, 7,12])
s5 = pd.Series([11, 10, 9, 8,13,14])

In [73]:

s3

Out[73]:

0    0
1    1
2    2
3    3
Name: foo, dtype: int64

In [74]:

s4

Out[74]:

0     4
1     5
2     6
3     7
4    12
dtype: int64

In [75]:

s5

Out[75]:

0    11
1    10
2     9
3     8
4    13
5    14
dtype: int64

In [76]:

pd.concat([s3, s4, s5], axis=1)

Out[76]:

	foo	0	1
0	0.0	4.0	11
1	1.0	5.0	10
2	2.0	6.0	9
3	3.0	7.0	8
4	NaN	12.0	13
5	NaN	NaN	14

In [77]:

pd.concat([s3, s4, s5], axis=0)

Out[77]:

0     0
1     1
2     2
3     3
0     4
1     5
2     6
3     7
4    12
0    11
1    10
2     9
3     8
4    13
5    14
dtype: int64

In [78]:

df1

Out[78]:

	key	A	B
0	K0	A0	B0
1	K1	A1	B1
2	K2	A2	B2
3	K3	A3	B3

In [79]:

df2

Out[79]:

	key	C	D
0	K0	C0	D0
1	K1	C1	D1
2	K4	C4	D4
3	K3	C3	D3

In [80]:

df1.append(df2)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:6692: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  sort=sort)

Out[80]:

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
0	NaN	NaN	C0	D0	K0
1	NaN	NaN	C1	D1	K1
2	NaN	NaN	C4	D4	K4
3	NaN	NaN	C3	D3	K3

In [81]:

df3 = pd.concat([df1,df2], ignore_index=True, sort=False)
df3

Out[81]:

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

Changing Column Order¶

In [82]:

df3[['A','B','C','D','key']]

Out[82]:

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
4	NaN	NaN	C0	D0	K0
5	NaN	NaN	C1	D1	K1
6	NaN	NaN	C4	D4	K4
7	NaN	NaN	C3	D3	K3

Handling Missing Values¶

In [83]:

df3

Out[83]:

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

In [84]:

df4 = df.head().copy()
df4

Out[84]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [85]:

df3

Out[85]:

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

In [86]:

df4

Out[86]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [87]:

df5 = pd.concat([df3,df4],ignore_index=True, sort=False)
df5

Out[87]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
0	K0	A0	B0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	K1	A1	B1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	K2	A2	B2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	K3	A3	B3	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	K0	NaN	NaN	C0	D0	NaN	NaN	NaN	NaN	NaN	NaN	NaN
5	K1	NaN	NaN	C1	D1	NaN	NaN	NaN	NaN	NaN	NaN	NaN
6	K4	NaN	NaN	C4	D4	NaN	NaN	NaN	NaN	NaN	NaN	NaN
7	K3	NaN	NaN	C3	D3	NaN	NaN	NaN	NaN	NaN	NaN	NaN
8	NaN	NaN	NaN	NaN	NaN	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
9	NaN	NaN	NaN	NaN	NaN	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
10	NaN	NaN	NaN	NaN	NaN	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
11	NaN	NaN	NaN	NaN	NaN	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
12	NaN	NaN	NaN	NaN	NaN	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [88]:

df5.isna()

Out[88]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
0	False	False	False	True	True	True	True	True	True	True	True	True
1	False	False	False	True	True	True	True	True	True	True	True	True
2	False	False	False	True	True	True	True	True	True	True	True	True
3	False	False	False	True	True	True	True	True	True	True	True	True
4	False	True	True	False	False	True	True	True	True	True	True	True
5	False	True	True	False	False	True	True	True	True	True	True	True
6	False	True	True	False	False	True	True	True	True	True	True	True
7	False	True	True	False	False	True	True	True	True	True	True	True
8	True	True	True	True	True	False	False	False	False	False	False	False
9	True	True	True	True	True	False	False	False	False	False	False	False
10	True	True	True	True	True	False	False	False	False	False	False	False
11	True	True	True	True	True	False	False	False	False	False	False	False
12	True	True	True	True	True	False	False	False	False	False	False	False

In [89]:

df5.fillna("EMPTY")

Out[89]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
0	K0	A0	B0	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
1	K1	A1	B1	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
2	K2	A2	B2	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
3	K3	A3	B3	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
4	K0	EMPTY	EMPTY	C0	D0	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
5	K1	EMPTY	EMPTY	C1	D1	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
6	K4	EMPTY	EMPTY	C4	D4	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
7	K3	EMPTY	EMPTY	C3	D3	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY
8	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	1996-01-01	18.6912	18.9789	18.5402	18.8232	0.442401	4.37335e+07
9	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	1996-01-02	18.894	18.9648	17.7382	18.2241	0.428321	5.61673e+07
10	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	1996-01-03	18.3279	18.5685	17.6438	17.7382	0.4169	6.82963e+07
11	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	1996-01-04	17.5023	17.8325	17.224	17.6769	0.41546	8.60739e+07
12	EMPTY	EMPTY	EMPTY	EMPTY	EMPTY	1996-01-05	17.7382	17.7854	17.4599	17.5778	0.413131	7.6613e+07

In [90]:

df5.replace(np.nan,0,inplace = True)

In [91]:

df5

Out[91]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
0	K0	A0	B0	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
1	K1	A1	B1	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
2	K2	A2	B2	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
3	K3	A3	B3	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
4	K0	0	0	C0	D0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
5	K1	0	0	C1	D1	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
6	K4	0	0	C4	D4	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
7	K3	0	0	C3	D3	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
8	0	0	0	0	0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
9	0	0	0	0	0	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
10	0	0	0	0	0	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
11	0	0	0	0	0	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
12	0	0	0	0	0	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

Filtering DataFrame¶

In [92]:

df5['key']

Out[92]:

0     K0
1     K1
2     K2
3     K3
4     K0
5     K1
6     K4
7     K3
8      0
9      0
10     0
11     0
12     0
Name: key, dtype: object

In [93]:

df5[df5['key'] == 'K0']

Out[93]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
0	K0	A0	B0	0	0	0	0.0	0.0	0.0	0.0	0.0	0.0
4	K0	0	0	C0	D0	0	0.0	0.0	0.0	0.0	0.0	0.0

In [94]:

df5[(df5['key'] == 'K4') & (df5['Open'] == 0)]

Out[94]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
6	K4	0	0	C4	D4	0	0.0	0.0	0.0	0.0	0.0	0.0

In [95]:

keylist = ['K2','K3']
df5[(df5['key'].isin(keylist)) & (df5['Open'] == 0)]

Out[95]:

	key	A	B	C	D
2	K2	A2	B2	0	0
3	K3	A3	B3	0	0
7	K3	0	0	C3	D3

In [96]:

df5[df5['Open'] < 100]

Out[96]:

	key	A	B	C	D	NewDate	Open	High	Low	Close	Adj Close	Volume
0	K0	A0	B0	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
1	K1	A1	B1	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
2	K2	A2	B2	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
3	K3	A3	B3	0	0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
4	K0	0	0	C0	D0	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
5	K1	0	0	C1	D1	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
6	K4	0	0	C4	D4	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
7	K3	0	0	C3	D3	0	0.000000	0.000000	0.000000	0.000000	0.000000	0.0
8	0	0	0	0	0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
9	0	0	0	0	0	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
10	0	0	0	0	0	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
11	0	0	0	0	0	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
12	0	0	0	0	0	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

Renaming Columns¶

In [97]:

temp = df.sample(10).copy()
temp

Out[97]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
3849	2011-01-21	252.889999	260.700012	252.199997	259.690002	124.483513	29086400.0
3826	2010-12-21	270.924988	276.500000	270.924988	274.375000	131.522842	19613340.0
1549	2001-12-07	19.342199	19.658300	18.780800	18.856300	1.340288	3678530.0
2520	2005-09-05	76.382698	77.623497	76.109100	77.090401	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	19.625200	18.611000	19.111000	0.752655	35140718.0
4872	2015-03-13	290.000000	291.049988	280.000000	280.750000	271.203003	13414248.0
4201	2012-06-22	214.699997	218.449997	212.850006	215.779999	144.681274	35627920.0
88	1996-05-02	28.541500	29.390699	27.574400	28.083900	0.660055	90601602.0
5089	2016-01-29	184.199997	185.750000	178.100006	179.899994	175.976257	24259147.0

In [98]:

temp.rename(columns = {'NewDate':'OneMoreNewDate','Adj Close':'ABC'},inplace = True)
temp

Out[98]:

	OneMoreNewDate	Open	High	Low	Close	ABC	Volume
3849	2011-01-21	252.889999	260.700012	252.199997	259.690002	124.483513	29086400.0
3826	2010-12-21	270.924988	276.500000	270.924988	274.375000	131.522842	19613340.0
1549	2001-12-07	19.342199	19.658300	18.780800	18.856300	1.340288	3678530.0
2520	2005-09-05	76.382698	77.623497	76.109100	77.090401	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	19.625200	18.611000	19.111000	0.752655	35140718.0
4872	2015-03-13	290.000000	291.049988	280.000000	280.750000	271.203003	13414248.0
4201	2012-06-22	214.699997	218.449997	212.850006	215.779999	144.681274	35627920.0
88	1996-05-02	28.541500	29.390699	27.574400	28.083900	0.660055	90601602.0
5089	2016-01-29	184.199997	185.750000	178.100006	179.899994	175.976257	24259147.0

In [99]:

temp.columns = ['A','B','CD','EF','GH','i',3939]
temp

Out[99]:

	A	B	CD	EF	GH	i	3939
3849	2011-01-21	252.889999	260.700012	252.199997	259.690002	124.483513	29086400.0
3826	2010-12-21	270.924988	276.500000	270.924988	274.375000	131.522842	19613340.0
1549	2001-12-07	19.342199	19.658300	18.780800	18.856300	1.340288	3678530.0
2520	2005-09-05	76.382698	77.623497	76.109100	77.090401	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	19.625200	18.611000	19.111000	0.752655	35140718.0
4872	2015-03-13	290.000000	291.049988	280.000000	280.750000	271.203003	13414248.0
4201	2012-06-22	214.699997	218.449997	212.850006	215.779999	144.681274	35627920.0
88	1996-05-02	28.541500	29.390699	27.574400	28.083900	0.660055	90601602.0
5089	2016-01-29	184.199997	185.750000	178.100006	179.899994	175.976257	24259147.0

Deleting Column¶

In [100]:

temp.drop("B",axis=1,inplace = True)
temp

Out[100]:

	A	CD	EF	GH	i	3939
3849	2011-01-21	260.700012	252.199997	259.690002	124.483513	29086400.0
3826	2010-12-21	276.500000	270.924988	274.375000	131.522842	19613340.0
1549	2001-12-07	19.658300	18.780800	18.856300	1.340288	3678530.0
2520	2005-09-05	77.623497	76.109100	77.090401	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	18.611000	19.111000	0.752655	35140718.0
4872	2015-03-13	291.049988	280.000000	280.750000	271.203003	13414248.0
4201	2012-06-22	218.449997	212.850006	215.779999	144.681274	35627920.0
88	1996-05-02	29.390699	27.574400	28.083900	0.660055	90601602.0
5089	2016-01-29	185.750000	178.100006	179.899994	175.976257	24259147.0

In [101]:

temp.drop(724,axis=0,inplace = True)
temp

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-101-a77a313d8f56> in <module>
----> 1 temp.drop(724,axis=0,inplace = True)
      2 temp

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3938                                            index=index, columns=columns,
   3939                                            level=level, inplace=inplace,
-> 3940                                            errors=errors)
   3941 
   3942     @rewrite_axis_style_signature('mapper', [('copy', True),

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3778         for axis, labels in axes.items():
   3779             if labels is not None:
-> 3780                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3781 
   3782         if inplace:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
   3810                 new_axis = axis.drop(labels, level=level, errors=errors)
   3811             else:
-> 3812                 new_axis = axis.drop(labels, errors=errors)
   3813             result = self.reindex(**{axis_name: new_axis})
   3814 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
   4963             if errors != 'ignore':
   4964                 raise KeyError(
-> 4965                     '{} not found in axis'.format(labels[mask]))
   4966             indexer = indexer[~mask]
   4967         return self.delete(indexer)

KeyError: '[724] not found in axis'

In [102]:

temp.drop([1499,1145],axis=0,inplace = True)
temp

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-102-ad62c42643be> in <module>
----> 1 temp.drop([1499,1145],axis=0,inplace = True)
      2 temp

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3938                                            index=index, columns=columns,
   3939                                            level=level, inplace=inplace,
-> 3940                                            errors=errors)
   3941 
   3942     @rewrite_axis_style_signature('mapper', [('copy', True),

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3778         for axis, labels in axes.items():
   3779             if labels is not None:
-> 3780                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3781 
   3782         if inplace:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
   3810                 new_axis = axis.drop(labels, level=level, errors=errors)
   3811             else:
-> 3812                 new_axis = axis.drop(labels, errors=errors)
   3813             result = self.reindex(**{axis_name: new_axis})
   3814 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
   4963             if errors != 'ignore':
   4964                 raise KeyError(
-> 4965                     '{} not found in axis'.format(labels[mask]))
   4966             indexer = indexer[~mask]
   4967         return self.delete(indexer)

KeyError: '[1499 1145] not found in axis'

In [103]:

temp.drop("GH",axis=1,inplace = True)
temp

Out[103]:

	A	CD	EF	i	3939
3849	2011-01-21	260.700012	252.199997	124.483513	29086400.0
3826	2010-12-21	276.500000	270.924988	131.522842	19613340.0
1549	2001-12-07	19.658300	18.780800	1.340288	3678530.0
2520	2005-09-05	77.623497	76.109100	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	18.611000	0.752655	35140718.0
4872	2015-03-13	291.049988	280.000000	271.203003	13414248.0
4201	2012-06-22	218.449997	212.850006	144.681274	35627920.0
88	1996-05-02	29.390699	27.574400	0.660055	90601602.0
5089	2016-01-29	185.750000	178.100006	175.976257	24259147.0

In [104]:

df1 = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                    'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})


df2 = pd.DataFrame({'key': ['K0', 'K1', 'K4', 'K3'],
                    'C': ['C0', 'C1', 'C4', 'C3'],
                      'D': ['D0', 'D1', 'D4', 'D3']})

In [105]:

df1

Out[105]:

	key	A	B
0	K0	A0	B0
1	K1	A1	B1
2	K2	A2	B2
3	K3	A3	B3

In [106]:

df1.drop('A',axis=1,inplace = True)
df1

Out[106]:

	key	B
0	K0	B0
1	K1	B1
2	K2	B2
3	K3	B3

In [107]:

df1.drop(2,axis=0,inplace = True)
df1

Out[107]:

	key	B
0	K0	B0
1	K1	B1
3	K3	B3

Rowwise and Column wise Fuctions¶

In [108]:

temp

Out[108]:

	A	CD	EF	i	3939
3849	2011-01-21	260.700012	252.199997	124.483513	29086400.0
3826	2010-12-21	276.500000	270.924988	131.522842	19613340.0
1549	2001-12-07	19.658300	18.780800	1.340288	3678530.0
2520	2005-09-05	77.623497	76.109100	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	18.611000	0.752655	35140718.0
4872	2015-03-13	291.049988	280.000000	271.203003	13414248.0
4201	2012-06-22	218.449997	212.850006	144.681274	35627920.0
88	1996-05-02	29.390699	27.574400	0.660055	90601602.0
5089	2016-01-29	185.750000	178.100006	175.976257	24259147.0

In [109]:

temp['CD'].dtype

Out[109]:

dtype('float64')

In [110]:

temp['CD'] = temp['CD'].astype('int32')
temp['CD']

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-110-1a7810bf8276> in <module>
----> 1 temp['CD'] = temp['CD'].astype('int32')
      2 temp['CD']

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors, **kwargs)
   5689             # else, only a single dtype is given
   5690             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691                                          **kwargs)
   5692             return self._constructor(new_data).__finalize__(self)
   5693 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, **kwargs)
    529 
    530     def astype(self, dtype, **kwargs):
--> 531         return self.apply('astype', dtype=dtype, **kwargs)
    532 
    533     def convert(self, **kwargs):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    393                                             copy=align_copy)
    394 
--> 395             applied = getattr(b, f)(**kwargs)
    396             result_blocks = _extend_blocks(applied, result_blocks)
    397 

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    532     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    533         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534                             **kwargs)
    535 
    536     def _astype(self, dtype, copy=False, errors='raise', values=None,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals\blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    631 
    632                     # _astype_nansafe works fine with 1-d only
--> 633                     values = astype_nansafe(values.ravel(), dtype, copy=True)
    634 
    635                 # TODO(extension)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
    674 
    675         if not np.isfinite(arr).all():
--> 676             raise ValueError('Cannot convert non-finite values (NA or inf) to '
    677                              'integer')
    678 

ValueError: Cannot convert non-finite values (NA or inf) to integer

In [111]:

temp

Out[111]:

	A	CD	EF	i	3939
3849	2011-01-21	260.700012	252.199997	124.483513	29086400.0
3826	2010-12-21	276.500000	270.924988	131.522842	19613340.0
1549	2001-12-07	19.658300	18.780800	1.340288	3678530.0
2520	2005-09-05	77.623497	76.109100	16.440184	14539912.0
574	1998-03-13	NaN	NaN	NaN	NaN
675	1998-08-03	19.625200	18.611000	0.752655	35140718.0
4872	2015-03-13	291.049988	280.000000	271.203003	13414248.0
4201	2012-06-22	218.449997	212.850006	144.681274	35627920.0
88	1996-05-02	29.390699	27.574400	0.660055	90601602.0
5089	2016-01-29	185.750000	178.100006	175.976257	24259147.0

In [112]:

def range_indicator(input_value):
    output_value = False
    if(input_value > 50):
        output_value = True
    return output_value

In [113]:

range_indicator(278)

Out[113]:

True

In [114]:

temp['CD']

Out[114]:

3849    260.700012
3826    276.500000
1549     19.658300
2520     77.623497
574            NaN
675      19.625200
4872    291.049988
4201    218.449997
88       29.390699
5089    185.750000
Name: CD, dtype: float64

In [115]:

temp['Range_Indicator'] = temp['CD'].apply(lambda x: range_indicator(x))
temp

Out[115]:

	A	CD	EF	i	3939	Range_Indicator
3849	2011-01-21	260.700012	252.199997	124.483513	29086400.0	True
3826	2010-12-21	276.500000	270.924988	131.522842	19613340.0	True
1549	2001-12-07	19.658300	18.780800	1.340288	3678530.0	False
2520	2005-09-05	77.623497	76.109100	16.440184	14539912.0	True
574	1998-03-13	NaN	NaN	NaN	NaN	False
675	1998-08-03	19.625200	18.611000	0.752655	35140718.0	False
4872	2015-03-13	291.049988	280.000000	271.203003	13414248.0	True
4201	2012-06-22	218.449997	212.850006	144.681274	35627920.0	True
88	1996-05-02	29.390699	27.574400	0.660055	90601602.0	False
5089	2016-01-29	185.750000	178.100006	175.976257	24259147.0	True

In [116]:

lambda x: range_indicator(x)

Out[116]:

<function __main__.<lambda>(x)>

In [117]:

for each_item in [12,4,5,321]:
    print(range_indicator(each_item))

False
False
False
True

In [118]:

list(map(range_indicator,[12,4,5,321]))

Out[118]:

[False, False, False, True]

In [119]:

list(map(range_indicator,temp['CD']))

Out[119]:

[True, True, False, True, False, False, True, True, False, True]

In [ ]:

Pandas Visualization¶

In [120]:

import matplotlib.pyplot as plt

In [121]:

df.head()

Out[121]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [122]:

df.tail()

Out[122]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
5520	2017-10-31	313.000000	313.850006	304.299988	305.799988	305.799988	16455750.0
5521	2017-11-01	309.399994	321.450012	309.399994	319.850006	319.850006	34907283.0
5522	2017-11-02	320.000000	323.149994	313.549988	314.350006	314.350006	23110527.0
5523	2017-11-03	315.450012	327.500000	312.600006	325.000000	325.000000	26898400.0
5524	2017-11-06	323.600006	330.000000	321.450012	329.000000	329.000000	17143303.0

In [123]:

df.plot(x='NewDate',y = 'Open')

Out[123]:

<matplotlib.axes._subplots.AxesSubplot at 0x2c1f06bd2b0>

In [124]:

df.plot(x='Open',y = 'Close',kind='scatter')

Out[124]:

<matplotlib.axes._subplots.AxesSubplot at 0x2c1f074a6a0>

In [125]:

df['Open'].hist()

Out[125]:

<matplotlib.axes._subplots.AxesSubplot at 0x2c1f00c4160>

In [126]:

df.shape

Out[126]:

(5525, 7)

Statistics¶

In [127]:

df.head()

Out[127]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [130]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5525 entries, 0 to 5524
Data columns (total 7 columns):
NewDate      5525 non-null object
Open         5401 non-null float64
High         5401 non-null float64
Low          5401 non-null float64
Close        5401 non-null float64
Adj Close    5401 non-null float64
Volume       5401 non-null float64
dtypes: float64(6), object(1)
memory usage: 302.2+ KB

In [131]:

df.describe()

Out[131]:

	Open	High	Low	Close	Adj Close	Volume
count	5401.000000	5401.000000	5401.000000	5401.000000	5401.000000	5.401000e+03
mean	123.851330	125.677336	121.835167	123.665226	75.381429	3.252849e+07
std	98.179606	99.383543	96.796041	98.012898	91.862829	3.718180e+07
min	13.478200	13.959400	13.214000	13.346100	0.401045	0.000000e+00
25%	24.946699	25.409000	24.328699	24.809900	1.202995	1.302272e+07
50%	94.918198	96.616501	93.507599	94.814400	25.289371	2.101967e+07
75%	217.725006	220.850006	214.000000	217.227005	134.646362	3.698718e+07
max	345.899994	351.500000	341.325012	343.345001	329.000000	4.469483e+08

In [132]:

df["Open"].hist()

Out[132]:

<matplotlib.axes._subplots.AxesSubplot at 0x2c1f058c588>

In [194]:

df.corr()

Out[194]:

	Open	High	Low	Close	Adj Close	Volume
Open	1.000000	0.999716	0.999686	0.999410	0.903800	-0.332790
High	0.999716	1.000000	0.999577	0.999734	0.903005	-0.330531
Low	0.999686	0.999577	1.000000	0.999733	0.905389	-0.334000
Close	0.999410	0.999734	0.999733	1.000000	0.904266	-0.331829
Adj Close	0.903800	0.903005	0.905389	0.904266	1.000000	-0.282631
Volume	-0.332790	-0.330531	-0.334000	-0.331829	-0.282631	1.000000

In [200]:

df.head()

Out[200]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [204]:

df.head().sum(axis=0).sum()

Out[204]:

330884417.046214

In [133]:

a = [1,2,3,4,5,10,20,50]

In [134]:

sum(a)/len(a)

Out[134]:

11.875

In [136]:

#Standard Deviation
import numpy as np

In [137]:

a  = np.asarray(a)

In [138]:

Out[138]:

array([ 1,  2,  3,  4,  5, 10, 20, 50])

In [139]:

np.mean(a)

Out[139]:

11.875

In [140]:

np.std(a)

Out[140]:

15.519644809079878

In [141]:

df.head()

Out[141]:

	NewDate	Open	High	Low	Close	Adj Close	Volume
0	1996-01-01	18.691200	18.978901	18.540199	18.823200	0.442401	43733533.0
1	1996-01-02	18.893999	18.964800	17.738199	18.224100	0.428321	56167281.0
2	1996-01-03	18.327900	18.568501	17.643801	17.738199	0.416900	68296319.0
3	1996-01-04	17.502300	17.832500	17.224001	17.676901	0.415460	86073880.0
4	1996-01-05	17.738199	17.785400	17.459900	17.577801	0.413131	76613040.0

In [142]:

df1

Out[142]:

	key	B
0	K0	B0
1	K1	B1
3	K3	B3

In [146]:

df.apply(lambda x: x['Open']+ x['High'])

Out[146]:

0        37.670101
1        37.858799
2        36.896401
3        35.334800
4        35.523599
5        35.122502
6        35.570700
7        35.122601
8        34.910301
9        36.113200
10       36.221700
11       37.434198
12       35.830200
13       34.943300
14       34.962101
15       36.051899
16       37.160600
17       35.759401
18       36.391600
19             NaN
20       37.122900
21       38.759800
22       37.906000
23       38.047399
24       40.807298
25       43.968100
26       45.619200
27       43.958601
28       44.892799
29       47.223200
           ...    
5495    541.149994
5496    534.649994
5497    523.299988
5498    518.149994
5499    519.799988
5500    503.800003
5501    510.199997
5502    510.250000
5503    506.299988
5504    507.800003
5505    510.049988
5506    515.399994
5507    515.050018
5508    515.600006
5509    504.949997
5510    506.450012
5511    509.300003
5512    506.750000
5513    493.800003
5514    490.800003
5515    501.300003
5516    607.899994
5517    681.299988
5518    643.850006
5519    634.350006
5520    626.850006
5521    630.850006
5522    643.149994
5523    642.950012
5524    653.600006
Length: 5525, dtype: float64

In [149]:

df[["Open","High","Low","Close"]].apply(lambda x: 2*x)

Out[149]:

	Open	High	Low	Close
0	37.382400	37.957802	37.080398	37.646400
1	37.787998	37.929600	35.476398	36.448200
2	36.655800	37.137002	35.287602	35.476398
3	35.004600	35.665000	34.448002	35.353802
4	35.476398	35.570800	34.919800	35.155602
5	34.957402	35.287602	33.844200	34.127202
6	33.778000	37.363400	33.410000	35.995400
7	34.816002	35.429200	34.108200	34.344200
8	33.966800	35.853802	33.966800	35.655602
9	35.759400	36.467000	35.146198	35.674598
10	36.042400	36.401000	35.570800	35.787800
11	35.806598	39.061798	35.570800	36.542598
12	35.712200	35.948200	34.910198	35.042400
13	34.636600	35.250000	34.249802	34.806400
14	34.580002	35.344200	34.580002	35.221600
15	35.476398	36.627400	35.401000	36.448200
16	36.778400	37.542800	36.325600	36.835000
17	35.665000	35.853802	34.702800	35.589600
18	36.231202	36.551998	35.759400	35.929200
19	NaN	NaN	NaN	NaN
20	36.325600	37.920200	36.146198	37.571000
21	38.118198	39.401402	37.174800	38.410800
22	37.552200	38.259800	36.429402	37.099202
23	36.844398	39.250400	36.844398	38.939000
24	39.722198	41.892398	39.439202	41.628200
25	42.175400	45.760800	42.175400	44.251202
26	44.911598	46.326802	43.590600	44.628602
27	43.194400	44.722802	41.515000	43.845402
28	43.977600	45.807998	43.156600	45.090802
29	45.524800	48.921600	44.703998	48.279998
...	...	...	...	...
5495	539.500000	542.799988	534.700012	537.000000
5496	534.500000	534.799988	522.200012	523.700012
5497	523.000000	523.599976	508.799988	517.500000
5498	517.000000	519.299988	512.000000	516.500000
5499	519.799988	519.799988	499.000000	500.799988
5500	500.500000	507.100006	499.000000	505.100006
5501	509.399994	511.000000	506.100006	507.700012
5502	507.700012	512.799988	501.200012	502.600006
5503	503.299988	509.299988	502.700012	506.399994
5504	507.399994	508.200012	498.399994	503.200012
5505	504.299988	515.799988	504.200012	513.500000
5506	513.599976	517.200012	508.500000	513.700012
5507	513.700012	516.400024	511.399994	513.900024
5508	514.299988	516.900024	500.899994	503.500000
5509	504.000000	505.899994	497.000000	502.399994
5510	502.200012	510.700012	500.399994	504.200012
5511	509.200012	509.399994	500.200012	504.100006
5512	504.100006	509.399994	501.299988	502.299988
5513	492.000000	495.600006	485.100006	487.500000
5514	487.600006	494.000000	482.299988	491.899994
5515	492.000000	510.600006	491.899994	508.899994
5516	559.700012	656.099976	559.700012	649.799988
5517	660.000000	702.599976	635.000000	641.000000
5518	640.500000	647.200012	620.200012	622.099976
5519	624.000000	644.700012	621.200012	624.000000
5520	626.000000	627.700012	608.599976	611.599976
5521	618.799988	642.900024	618.799988	639.700012
5522	640.000000	646.299988	627.099976	628.700012
5523	630.900024	655.000000	625.200012	650.000000
5524	647.200012	660.000000	642.900024	658.000000

5525 rows × 4 columns

In [154]:

df["NewDate"] = pd.to_datetime(df["NewDate"],format='%Y-%m-%d')

In [155]:

df["NewDate"].dtype

Out[155]:

dtype('<M8[ns]')

In [159]:

df['NewDate'].iloc[0].day

Out[159]:

In [161]:

df['NewDate'].iloc[0].date()

Out[161]:

datetime.date(1996, 1, 1)

In [162]:

import datetime

In [166]:

datetime.timedelta(days=10,hours=10)

Out[166]:

datetime.timedelta(days=10, seconds=36000)

In [168]:

df['NewDate'].iloc[0] + datetime.timedelta(days=10,hours=10)

Out[168]:

Timestamp('1996-01-11 10:00:00')

In [169]:

a = df['NewDate'].iloc[0] + datetime.timedelta(days=10,hours=10)

In [170]:

Out[170]:

Timestamp('1996-01-11 10:00:00')

In [171]:

a.strptime()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-171-b950a9e83a09> in <module>
----> 1 a.strptime()

TypeError: strptime() takes exactly 2 arguments (0 given)

In [172]:

import datetime
d = datetime.datetime.strptime("01/27/2012", "%m/%d/%Y")

In [173]:

Out[173]:

datetime.datetime(2012, 1, 27, 0, 0)

In [175]:

b = datetime.datetime.strftime(d,"%Y-%m-%d")
b

Out[175]:

'2012-01-27'

In [176]:

type(b)

Out[176]:

str

In [178]:

import time

In [181]:

time.ctime()

Out[181]:

str

TimeStamp Folder Creation¶

In [187]:

timestamp_now = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')

In [188]:

import os

In [191]:

timestamp_now = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
main_folder_path = r'E:\openknowledgeshare.blogspot.com\Python\Outputs'
os.makedirs(os.path.join(main_folder_path,timestamp_now))

Time Taken¶

In [193]:

start_time = time.clock()

for i in range(0,10):
    i
    
end_time = time.clock()
print("Time taken for this code is {} seconds".format(end_time - start_time))

Time taken for this code is 0.00012940000033268007 seconds

C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:1: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  """Entry point for launching an IPython kernel.
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:7: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  import sys

Stacking¶

In [205]:

df1

Out[205]:

	key	B
0	K0	B0
1	K1	B1
3	K3	B3

In [206]:

df2

Out[206]:

	key	C	D
0	K0	C0	D0
1	K1	C1	D1
2	K4	C4	D4
3	K3	C3	D3

In [207]:

df2.stack()

Out[207]:

0  key    K0
   C      C0
   D      D0
1  key    K1
   C      C1
   D      D1
2  key    K4
   C      C4
   D      D4
3  key    K3
   C      C3
   D      D3
dtype: object

In [208]:

df2.unstack('')

Out[208]:

key  0    K0
     1    K1
     2    K4
     3    K3
C    0    C0
     1    C1
     2    C4
     3    C3
D    0    D0
     1    D1
     2    D4
     3    D3
dtype: object

In [211]:

import pandas as pd
import numpy as np

header = pd.MultiIndex.from_product([['Semester1','Semester2'],['Maths','Science']])

d=([[12,45,67,56],[78,89,45,67],[45,67,89,90],[67,44,56,55]])

df = pd.DataFrame(d,
                  columns=['Alisa','Bobby','Cathrine','Jack'],
                  index=header)

In [212]:

df

Out[212]:

		Alisa	Bobby	Cathrine	Jack
Semester1	Maths	12	45	67	56
Semester1	Science	78	89	45	67
Semester2	Maths	45	67	89	90
Semester2	Science	67	44	56	55

In [214]:

df.xs("Semester1")

Out[214]:

	Alisa	Bobby	Cathrine	Jack
Maths	12	45	67	56
Science	78	89	45	67

In [215]:

df.xs("Semester2")

Out[215]:

	Alisa	Bobby	Cathrine	Jack
Maths	45	67	89	90
Science	67	44	56	55

In [219]:

df.xs("Maths",level=1)

Out[219]:

	Alisa	Bobby	Cathrine	Jack
Semester1	12	45	67	56
Semester2	45	67	89	90

In [220]:

df

Out[220]:

		Alisa	Bobby	Cathrine	Jack
Semester1	Maths	12	45	67	56
Semester1	Science	78	89	45	67
Semester2	Maths	45	67	89	90
Semester2	Science	67	44	56	55

In [224]:

df.unstack()

Out[224]:

	Alisa		Bobby		Cathrine		Jack
	Maths	Science	Maths	Science	Maths	Science	Maths	Science
Semester1	12	78	45	89	67	45	56	67
Semester2	45	67	67	44	89	56	90	55

In [ ]:

Open Knowledge Share

Using Pandas in Python

Creating a DataFrame¶

Reading CSV thru Pandas¶

Manipulating Cell Values of Dataframe¶

Appending Row to DataFrame¶

Case 1: DataFrame without Column Names¶

Case2 : DataFrame with Column Names¶

Sampling DataFrame¶

Merge, Join and Concatenate DataFrames¶

Concat¶

Changing Column Order¶

Handling Missing Values¶

Filtering DataFrame¶

Renaming Columns¶

Deleting Column¶

Rowwise and Column wise Fuctions¶

Pandas Visualization¶

Statistics¶

TimeStamp Folder Creation¶

Time Taken¶

Stacking¶

Comments

Post a Comment

Popular posts from this blog

How to run Jupyter Notebooks in Cloud

How to download , install and configure Anaconda - Python

Project: Implementation of Label Editor

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
0	NaN	NaN	C0	D0	K0
1	NaN	NaN	C1	D1	K1
2	NaN	NaN	C4	D4	K4
3	NaN	NaN	C3	D3	K3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
4	NaN	NaN	C0	D0	K0
5	NaN	NaN	C1	D1	K1
6	NaN	NaN	C4	D4	K4
7	NaN	NaN	C3	D3	K3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	key	A	B	C	D
2	K2	A2	B2	0	0
3	K3	A3	B3	0	0
7	K3	0	0	C3	D3

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
0	NaN	NaN	C0	D0	K0
1	NaN	NaN	C1	D1	K1
2	NaN	NaN	C4	D4	K4
3	NaN	NaN	C3	D3	K3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
4	NaN	NaN	C0	D0	K0
5	NaN	NaN	C1	D1	K1
6	NaN	NaN	C4	D4	K4
7	NaN	NaN	C3	D3	K3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	key	A	B	C	D
2	K2	A2	B2	0	0
3	K3	A3	B3	0	0
7	K3	0	0	C3	D3

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
0	NaN	NaN	C0	D0	K0
1	NaN	NaN	C1	D1	K1
2	NaN	NaN	C4	D4	K4
3	NaN	NaN	C3	D3	K3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	A	B	C	D	key
0	A0	B0	NaN	NaN	K0
1	A1	B1	NaN	NaN	K1
2	A2	B2	NaN	NaN	K2
3	A3	B3	NaN	NaN	K3
4	NaN	NaN	C0	D0	K0
5	NaN	NaN	C1	D1	K1
6	NaN	NaN	C4	D4	K4
7	NaN	NaN	C3	D3	K3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	key	A	B	C	D
0	K0	A0	B0	NaN	NaN
1	K1	A1	B1	NaN	NaN
2	K2	A2	B2	NaN	NaN
3	K3	A3	B3	NaN	NaN
4	K0	NaN	NaN	C0	D0
5	K1	NaN	NaN	C1	D1
6	K4	NaN	NaN	C4	D4
7	K3	NaN	NaN	C3	D3

	key	A	B	C	D
2	K2	A2	B2	0	0
3	K3	A3	B3	0	0
7	K3	0	0	C3	D3