Python Overview Interview Questions

Estimated reading time: 4 minutes

So you have landed an interview and worked hard at upskilling your Python knowledge. There are going to be some questions about Python and the different aspects of it that you will need to be able to talk about that are not all coding!

Here we discuss some of the key elements that you should be comfortable explaining.

What are the key Features of Python?

In the below screenshot that will feature in our video, if you are asked this question they will help you be able to discuss.

Below I have outlined some of the key benefits you should be comfortable discussing.

It is great as it is open source and well-supported, you will always find an answer to your question somewhere.

Also as it is easy to code and understand, the ability to quickly upskill and deliver some good programs is a massive benefit.

As there are a lot of different platforms out there, it has been adapted to easily work on any with little effort. This is a massive boost to have it used across a number of development environments without too much tweaking.

Finally, some languages need you to compile the application first, Python does not it just runs.

What are the limitations of Python?

While there is a lot of chat about Python, it also comes with some caveats which you should be able to talk to.

One of the first things to discuss is that its speed can inhibit how well an application performs. If you require real-time data and using Python you need to consider how well performance will be inhibited by it.

There are scenarios where an application is written in an older version of code, and you want to introduce new functionality, with a newer version. This could lead to problems of the code not working that currently exists, that needs to be rewritten. As a result, additional programming time may need to be factored in to fix the compatibility issues found.

Finally, As Python uses a lot of memory you need to have it on a computer and or server that can handle the memory requests. This is especially important where the application is been used in real-time and needs to deliver output pretty quickly to the user interface.

What is Python good for?

As detailed below, there are many uses of Python, this is not an exhaustive list I may add.

A common theme for some of the points below is that Python can process data and provide information that you are not aware of which can aid decision-making.

Alternatively, it can also be used as a tool for automating and or predicting the behaviour of the subjects it pertains to, sometimes these may not be obvious, but helps speed up the delivery of certain repetitive tasks.

What are the data types Python support?

Finally below is a list of the data types you should be familiar with, and be able to discuss. Some of these are frequently used.

These come from the Python data types web page itself, so a good reference point if you need to further understand or improve your knowledge.

What Is An Array In Python?

Estimated reading time: 3 minutes

Python arrays are used to manage how data is analysed and have the data in a structured form for the data analyst or data scientist to use.

What is an Array?

An array has the following properties:

  1. The data in it is of the same data type.
  2. The data is stored as contiguous memory locations, meaning the lowest value is at index 0 and the last value is at the highest index value.
  3. Arrays assign index values to each data point contained within it.
  4. You can append values to the array.
  5. You can delete values from the array, that is it is mutable.

What are the differences between arrays and lists?

For the most part arrays and lists are the same, but with one difference:

A list can store any data type you want e.g. strings, integers etc.

On the other hand, arrays can only store data that is of the same data type.

What are the different ways that I can create an array?

(A) Use Numpy

# Use numpy

import numpy as np

a = np.array([1,2,3,4])

print(a)
print(type(a))
print(a.dtype)

Output:
[1 2 3 4]
<class 'numpy.ndarray'>
int32

If we have one of the values as a string, all the other values are converted to a string , as follows:
# Use numpy

import numpy as np

a = np.array([1,2,3,'4'])

print(a)
print(type(a))
print(a.dtype)

Output:

['1' '2' '3' '4']
<class 'numpy.ndarray'>
<U11

# <U11 - When this happens, it means one of the values was returning as a string
# All the other values in the array as a result are converted to strings

(B) Use array

import array as test_array

a = test_array.array('i',[1,2,3])

print(a)
print(type(a))

Output:
array('i', [1, 2, 3])
<class 'array.array'>

On the Python.org website, below are the list of values that can be populated into the above program, depending on what your need is:

When should I use arrays?

It really depends on the nature of your python program, but below are some examples that may help you make a decision:

(A) Many variables of the same type : There maybe a scenario where you have to create an array to store data that is of the same data type. For example you have a list of codes to look up against, which are all integers.

(B) Faster and more efficient : If speed is what you are looking for using arrays, will help improve performance of your computer program, using lists is much slower.

(C) Compactability and efficiency : If the nature of your program needs to store large amounts of data that needs to be accessed quickly , then this would be a good reason to use them.

(D) Ability to retrieve data quickly through indexing: As arrays have index values associated with them, they data can be easiy retrieved.

(E) Need to compute some mathematical values: Arrays are excellent for any numerical operations you need to complete, as the level of coding is minimal.

import array as test_array

a = test_array.array('i',[1,2,3])

mydivider = 2

mynewlist = [x / mydivider for x in a]
print(mynewlist)

result:
[0.5, 1.0, 1.5]

So in summary:

Speed , efficiency and ease of use are the main reasons to use an array.

We use arrays here in how to show percentage differences between files in python , why not go over and see it in action!

How to show percentage differences between files in Python

Estimated reading time: 5 minutes

In our previous post on how to compare CSV files for differences we showed how you could see the differences, but what if you wanted to see if a record was 100% matched or not?

Here we are going to use SequenceMatcher which is a python class that allows you to compare files and return if they are matched as a percentage.

Let’s look at the code

Import statements and read in the CSV files. Here we do the usual of importing the libraries we need and read in the two CSV files we use:

Note we also set up the data frame settings here as well, so we can see the data frame properly. See further down.

import pandas as pd
import numpy as np
import math
from difflib import SequenceMatcher
pd.set_option('max_columns', None)
pd.set_option('display.width', None)


#Read in the CSV files
df1 = pd.read_csv('CSV1.csv')
df2 = pd.read_csv('CSV2.csv')

The two CSV files look like this:

CSV1

CSV2

Next, we are going to create an array. Creating an array allows the comparison to be completed as it creates indexes, and the indexes in each array can be compared and then the percentage difference can be calculated.

#Create an array for both dataframes
array1 = np.array(df1)
array2 = np.array(df2)

Our next step is to transfer the arrays to a data frame, change all integer values to a string, and then join both data frames into one.

In this instance the changing of values into a string allows those values to be iterated over, otherwise, you will get an error ” TypeError: ‘int’ object is not iterable

#Transfer the arrays to a dataframe
df_CSV_1 = pd.DataFrame(array1, columns=['No1','Film1','Year1','Length1'])
df_CSV_2 = pd.DataFrame(array2, columns=['No2','Film2','Year2','Length2'])

#Change all the values to a string, as numbers cannot be iterated over.
df_CSV_1['Year1'] = df_CSV_1['Year1'].astype('str')
df_CSV_2['Year2'] = df_CSV_2['Year2'].astype('str')
df_CSV_1['Length1'] = df_CSV_1['Length1'].astype('str')
df_CSV_2['Length2'] = df_CSV_2['Length2'].astype('str')

#join the dataframes
df = pd.concat([df_CSV_1,df_CSV_2], axis=1)

We are now moving to the main part of the program, which gives us the answers we need. Here we create a function that does the calculations for us:

#Create a function to calculate the differences and show as a ratio.
def create_ratio(df, columna, columnb):
    return SequenceMatcher(None,df[columna],df[columnb]).ratio()

Next we calculate the differences and format the output

#Here we use apply which will pull in the data that needs to be passed to the fuction above.
df['Film_comp'] = df.apply(create_ratio,args=('Film1','Film2'),axis=1)
df['Year_comp'] = df.apply(create_ratio,args=('Year1','Year2'),axis=1)
df['Length_comp'] = df.apply(create_ratio,args=('Length1','Length2'),axis=1)

#This creates the values we are looking for
df['Film_comp'] = round(df['Film_comp'].astype('float'),2)*100
df['Year_comp'] = round(df['Year_comp'].astype('float'),2)*100
df['Length_comp'] = round(df['Length_comp'].astype('float'),2)*100

#this removes the decimal point that is added as a result of using the datatype 'Float'
df['Film_comp'] = df['Film_comp'].astype('int')
df['Year_comp'] = df['Year_comp'].astype('int')
df['Length_comp'] = df['Length_comp'].astype('int')
#Print the output
print(df)

And the final output looks like this:

An explanation of the output

As can be seen, the last three columns are the percentages of the match obtained, with 100 been an exact match.

For index value 1 there is Joker in the first file, but Jokers is in the second file.

The ratio is calculated as follows:

Joker is length 5, Jokers is length 6 = 11 characters in length

So the logic looks at the sequence, iterating through the sequence it can see that the first 10 characters are in the same order, but the 11th character is not, as a result, it calculates the following:

(10/11) * 100 = 90.90

Finally, the round function sets the value we are looking for to 91.

On the same line, we shall compare the year:

2019 and 2008 are a total of eight characters.

In the sequence, the first two of each match, and as they are also found, give us four, and the ratio is as follows:

4/8 *100 = 50

For Index 20 we also compared the film name, in total there are 17 characters, but the program ignores what they call junk, so the space is not included, for that reason the ratio only calculates over sixteen characters.

In order to understand this better I have compiled the below:

Index 1Index 1
201920088JokerJokers11
2Correct Spot1JCorrect Spot1
0Correct Spot1oCorrect Spot1
1Incorrect spot0KCorrect Spot1
9Incorrect spot0eCorrect Spot1
rCorrect Spot1
2Found1JFound1
0Found1oFound1
1Found0KFound1
9Found0eFound1
rFound1
Total in Comparison8Total in Comparison11
Ratio0.50Ratio0.91
Index 20
The DirtThe Dirty16
TCorrect Spot1
HCorrect Spot1
ECorrect Spot1
Correct Spot1
TFound1
HFound1
EFound1
DCorrect Spot1
ICorrect Spot1
RCorrect Spot1
TCorrect Spot1
DFound1
IFound1
RFound1
TFound1
Total in Comparison16
Ratio0.94

How to Add Formulas to Excel using Python

Estimated reading time: 3 minutes

You may be working on automating some exports to Excel using Python to compare files or just pure and simple adding formulas to an Excel file before you open it up.

Here we explain adding formulas to your Excel output using Numpy or adding the calculations to specific cells in the output.

Adding formulas to specific cells

First of all, let’s look at the normal spreadsheet with some calculations, these have the formulas typed in. The ultimate objective is to have the Python code do this for us, one less step.

As can be seen, the cells have the formulas in them, but this would be a very time-consuming process if you had to do it multiple times, in multiple spreadsheets.

To get around this we can write the Python logic as follows:

  1. Create three lists and three dataframes as follows.
datasetA_list = np.array([1,2,3,4,5,6,7,8,9,10])

datasetB_list = np.array([9,8,65,43,3,21,3,2,1,7])

dataset_list = ('sum','average','median','standard deviation','count','correlation')


datasetA = pd.DataFrame(datasetA_list,columns=['ValueA'])
datasetB = pd.DataFrame(datasetB_list,columns=['ValueB'])
dataset_list_calcs = pd.DataFrame(dataset_list, columns=['Calcs'])

2. Next create a path to where you are going to store the data as follows:

path = 'output.xlsx'

3. In this next step create the workbook and location where the data will be stored. This will load the headings created in step 1 to a particular location on the spreadsheet.

workbook = pd.ExcelWriter(path, engine='openpyxl')
workbook.book = load_workbook(path)
workbook.sheets = dict((ws.title,ws) for ws in workbook.book.worksheets)

datasetA.to_excel(workbook,sheet_name="Sheet1", startrow=1,index=False, header=True,)
datasetB.to_excel(workbook,sheet_name="Sheet1", startrow=1, startcol=2,index=False, header=True)
dataset_list_calcs.to_excel(workbook,sheet_name="Sheet1", startrow=1, startcol=4,index=False, header=True)

4. Load the formulas into cells besides their relevant headings. This should line post these formulas beside the relevant heading created in step 1.

###Creating calculations for datasetA

sheet = workbook.sheets['Sheet1']
sheet['E2'] = 'CalcsA'
sheet['F3'] = '=SUM(A3:A12)'
sheet['F4'] = '=AVERAGE(A3:A12)'
sheet['F5'] = '=MEDIAN(A3:A12)'
sheet['F6'] = '=STDEV(A3:A12)'
sheet['F7'] = '=COUNT(A3:A12)'
sheet['F8'] = '=CORREL(A3:A12,C3:C12)'

###Creating calculations for datasetB

sheet = workbook.sheets['Sheet1']
sheet['H2'] = 'CalcsB'
sheet['H3'] = '=SUM(C3:C12)'
sheet['H4'] = '=AVERAGE(C3:C12)'
sheet['H5'] = '=MEDIAN(C3:C12)'
sheet['H6'] = '=STDEV(C3:C12)'
sheet['H7'] = '=COUNT(C3:C12)'
sheet['H8'] = '=CORREL(A3:A12,C3:C12)'

Use Numpy to create the calculations

a. Create the calculations that you will populate into the spreadsheet, using Numpy

a = np.sum(datasetA_list)
b = np.average(datasetA_list)
c = np.median(datasetA_list)
d = np.std(datasetA_list,ddof=1) ## Setting DDOF = 0 will give a differnt figure, this corrects to match the output.
f = np.count_nonzero(datasetA_list)
g = np.corrcoef(datasetA_list,datasetB_list)

b. Create the headings and assign them to particular cells

sheet['E14'] = 'Numpy Calculations'
sheet['E15'] = 'Sum'
sheet['E16'] = 'Average'
sheet['E17'] = 'Median'
sheet['E18'] = 'Standard Deviation'
sheet['E19'] = 'Count'
sheet['E20'] = 'Correlation'

c. Assign the variables in step a to a set of cells

sheet['F15'] = a
sheet['F16'] = b
sheet['F17'] = c
sheet['F18'] = d
sheet['F19'] = f
sheet['F20'] = str(g)

d. Save the workbook and close it – This step is important, and always include.

workbook.save()
workbook.close()

And the final output looks like…

how to add sine and cosine in python code

Explaining what this post is about
I was recently online and providing help to a fellow python coder, and a query came up about how you would rewrite some code that had included the sine function  and the cosine function

We were asked to see if we could translate this code:

L_1=1;
L_2=1.5;
q_0=0.5;
q_1=pi()/4;
q_2=pi()/6;
%Position of the end effetor
x= q_0+L_1*cos(q_1)-L_2*cos(pi()-q_1-q_2)
y=L_1*sin(q_1)+L_2*sin(pi()-q_1-q_2)

into its python equivalent.

Some background about the output

In order to get the desired result, there is a need to import a package to provide some of the mathematical analysis required, and was achieved through using Numpy statistical analysis

This package allows the following functions to be used in the logic:

  • Pi
  • Cosine
  • Sine

And the result is

As a result, the below shows the output of the above question converted to its Python equivalent:

import numpy as np

a= np.pi
print("PI value is ", a)

L_1=1
L_2=1.5
q_0=0.5
q_1=a/4
q_2=a/6

print("L_1 value is",L_1)
print("L_2 value is",L_2)
print("q_0 value is",q_0)
print("q_1 value is",q_1)
print("q_2 value is",q_2)


x= (q_0+(L_1*(np.cos(q_1)))-(L_2*(np.cos(a-q_1-q_2))))
y= (L_1*(np.sin(q_1))+(L_2*(np.sin(a-q_1-q_2))))

print("x value is " , x)
print("y value is " , y)

with its output showing:

PI value is  3.141592653589793
L_1 value is 1
L_2 value is 1.5
q_0 value is 0.5
q_1 value is 0.7853981633974483
q_2 value is 0.5235987755982988
0.7071067811865476
-0.25881904510252085
x value is  1.5953353488403288
y value is  2.15599552062015