Common Analytics Interview Questions
Question 1. Can you outline the various steps in an analytics
project?Broadly speaking these are the steps. Of course these may
vary slightly depending on the type of problem, data, tools
available etc.1.Problem definition The first step is to of course
understand the business problem. What is the problem you are trying
to solve what is the business context? Very often however your
client may also just give you a whole lot of data and ask you to do
something with it. In such a case you would need to take a more
exploratory look at the data. Nevertheless if the client has a
specific problem that needs to be tackled, then then first step is
to clearly define and understand the problem. You will then need to
convert the business problem into an analytics problem. I other
words you need to understand exactly what you are going to predict
with the model you build. There is no point in building a fabulous
model, only to realise later that what it is predicting is not
exactly what the business needs.2.Data Exploration Once you have
the problem defined, the next step is to explore the data and
become more familiar with it. This is especially important when
dealing with a completely new data set.3.Data Preparation Now that
you have a good understanding of the data, you will need to prepare
it for modelling. You will identify and treat missing values,
detect outliers, transform variables, create binary variables if
required and so on. This stage is very influenced by the modelling
technique you will use at the next stage. For example, regression
involves a fair amount of data preparation, but decision trees may
need less prep whereas clustering requires a whole different kind
of prep as compared to other techniques.4.Modelling Once the data
is prepared, you can begin modelling. This is usually an iterative
process where you run a model, evaluate the results, tweak your
approach, run another model, evaluate the results, re-tweak and so
on.. You go on doing this until you come up with a model you are
satisfied with or what you feel is the best possible result with
the given data.5.Validation The final model (or maybe the best 2-3
models) should then be put through the validation process. In this
process, you test the model using completely new data set i.e. data
that was not used to build the model. This process ensures that
your model is a good model in general and not just a very good
model for the specific data earlier used (Technically, this is
called avoiding over fitting)6.Implementation and tracking The
final model is chosen after the validation. Then you start
implementing the model and tracking the results. You need to track
results to see the performance of the model over time. In general,
the accuracy of a model goes down over time. How much time will
really depend on the variables how dynamic or static they are, and
the general environment how static or dynamic that is.Question
2.What do you do in data exploration?Data exploration is done to
become familiar with the data. This step is especially important
when dealing with new data. There are a number of things you will
want to do in this step a.What is there in the data look at the
list of all the variables in the data set. Understand the meaning
of each variable using the data dictionary. Go back to the business
for more information in case of any confusion.b.How much data is
there look at the volume of the data (how many records), look at
the time frame of the data (last 3 months, last 6 months
etc.)c.Quality of the data how much missing information, quality of
data in each variable. Are all fields usable? If a field has data
for only 10% of the observations, then maybe that field is not
usable etc.d.You will also identify some important variables and
may do a deeper investigation of these. Like looking at averages,
min and max values, maybe 10thand 90thpercentile as welle.You may
also identify fields that you need to transform in the data prep
stage.Question 3: What do you do in data preparation?In data
preparation, you will prepare the data for the next stage i.e. the
modelling stage. What you do here is influenced by the choice of
technique you use in the next stage.But some things are done in
most cases example identifying missing values and treating them,
identifying outlier values (unusual values) and treating them,
transforming variables, creating binary variables if required
etc,This is the stage where you will partition the data as well.
i.e create training data (to do modelling) and validation (to do
validation).Question 4: How will you treat missing values?The first
step is to identify variables with missing values. Assess the
extent of missing values. Is there a pattern in missing values? If
yes, try and identify the pattern. It may lead to interesting
insights.If no pattern, then we can either ignore missing values
(SAS will not use any observation with missing data) or impute the
missing values.Simple imputation substitute with mean or median
valuesORCase wise imputation for example, if we have missing values
in the income field.Question 5: How will you treat outlier
values?You can identify outliers using graphical analysis and
univariate analysis. If there are only a few outliers, you can
assess them individually. If there are many, you may want to
substitute the outlier values with the 1stpercentile or the
99thpercentile values.If there is a lot of data, you may decide to
ignore records with outliers.Not all extreme values are outliers.
Not all outliers are extreme values.Question 6: How do you assess
the results of a logistic regression analysis?You can use different
methods to assess how good a logistic model is.a. Concordance This
tells you about the ability of the model to discriminate between
the event happening and not happening.b. Lift It helps you assess
how much better the model is compared to random selection.c.
Classification matrix helps you look at the false positives and
true negatives.Some other general questions you will most likely be
asked: What have you done to improve your data analytics knowledge
in the past year? What are your career goals? Why do you want a
career in data analytics?The answers to these questions will have
to be unique to the person answering it. The key is to show
confidence and give well thought out answers that demonstrate you
are knowledgeable about the industry and have the conviction to
work hard and excel as a data analyst.
Macro Interview Question (for fresher)Macro Interview
Question
1. Have you used macros? For what purpose you have used?
Yes I have, I used macros in creating analysis datasets and
tables where it is necessary to make asmall change through out the
program and where it is necessary to use the code again and
again.
2. How would you invoke a macro?After I have defined a macro I
can invoke it by adding the percent sign prefix to its name
likethis: % macro name a semicolon is not required when invoking a
macro, though adding onegenerally does no harm.3. How can you
create a macro variable with in data step?with CALL SYMPUT
4. How would you identify a macro variable?with Ampersand
(&) sign
5. How would you define the end of a macro?The end of the macro
is defined by %Mend Statement
6. For what purposes have you used SAS macros?If we want use a
program step for executing to execute the same Proc step on
multiple data sets.We can accomplish repetitive tasks quickly and
efficiently. A macro program can be reusedmany times. Parameters
passed to the macro program customize the results without having
tochange the code within the macro program. Macros in SAS make a
small change in the programand have SAS echo that change thought
that program.
7. What is the difference between %LOCAL and %GLOBAL?% Local is
a macro variable defined inside a macro.%Global is a macro variable
defined in opencode (outside the macro or can use anywhere).
8. How long can a macro variable be? A token?A component of SAS
known as the word scanner breaks the program text into fundamental
unitscalled tokens. Tokens are passed on demand to the compiler.
The compiler then requests token until it receives a semicolon.
Then the compiler performs the syntax check on the statement.
9. If you use a SYMPUT in a DATA step, when and where can you
use the macro variable?The macro variable created by the CALL
SYMPUT routine cannot be used in the same datastepin which it got
created. Other than that we can use the macro variable at any
time..
10. What do you code to create a macro? End one?We create a
macro with %MACRO statement and end a macro with %MEND
statemnt.
11. What is the difference between %PUT and SYMBOLGEN?
%PUT is used to display user defined messages on log window
after execution of a programwhere as % SYMBOLGEN is used to print
the value of a macro variable resolved, in logwindow.12. How do you
add a number to a macro variable?Using %eval function or %sysevalf
function if the number is a floating number.
13. Can you execute a macro within a macro? Describe.Yes, Such
macros are called nested macros. They can be obtained by using
symget and callsymput macros.
14. If you need the value of a variable rather than the variable
itself what would you use toload the value to a macro variable?If
we need a value of a macro variable then we must define it in such
terms so that we can callthem everywhere in the program. Define it
as Global. There are different ways of assigning aglobal variable.
Simplest method is %LET.
Ex:A, is macro variable. Use following statement to assign the
value of a rather than the variableitself%Let A=xyz; %put
x="&A";
This will assign "xyz" to x, not the variable xyz to x.
15. Can you execute macro within another macro? If so, how would
SAS know where thecurrent macro ended and the new one began?
Yes, I can execute macro within a macro, we call it as nesting
of macros, which is allowed.Every macro's beginning is identified
the keyword %macro and end with %mend.
16. How are parameters passed to a macro?A macro variable
defined in parentheses in a %MACRO statement is a macro parameter.
Macroparameters allow you to pass information into a macro.
%macro plot(yvar= ,xvar= );proc plot;plot
&yvar*&xvar;run;%mend plot;%plot(age,sex)
17. How would you code a macro statement to produce information
on the SAS log?This statement can be coded anywhere?OPTIONS MPRINT
MLOGIC MERROR SYMBOLGEN;
Advance SAS Certification Question
Recently update Advance SAS Certification Question
Option to control input outputAns . busize and buffno
The following SAS program is submitted: %macro execute; Proc
print data= sasuser.houses; Run; %end; %mend; %execute Which
statement completes the program so that it executes on Tuesday? a)
%if &sysday=Tuesday %then %do; b) %if &sysday=Tuesday %then
%do; c) %if &sysdate= Tuesday %then %do; d) %if
&sysdate=Tuesday %then %do;
Assume today is Tuesday, August 15, 2006. Which statement,
submitted at the beginning of a SAS session, assigns the value
Tuesday, August 15, 2006 to the macro variable START?a) %let start=
%eval(today(), weekdate.);b) %let start= %sysfunc(today(),
weekdate.);c) %let start= %sysexec(today(), weekdate.);d)%let
start= %sysevalf(today(), weekdate.);
The following program is submitted: %let value=0.5; %let add=5;
%let newwval=%eval(&value+&add); What is the value of the
macro variable NEWVAL?a) 5b) 5.5c)0.5+5d) null
The SAS data set ONE has a variable X on which an index has been
created. The data sets ONE and THREE are sorted by X. The following
SAS program is submitted: Data two; Set three; Set one key=X;
Run;What is the purpose of including the KEY= option in the
program?a) It forces SAS to use the index X.b) It re-creates the
index X on the output data set TWO.c) It instructs SAS to do a
sequential read of both sorted data sets.d) It gives SAS the option
to use the index X or to do a sequential read of the data set
ONE.
The following SAS program is submitted: Data new(bufsize=6144
bufno=4); Set old; Run;What is the difference between usage of
BUFSIZE= AND BUFNO= options?a) BUFSIZE= specifies the size of the
input buffer in bytes; BUFNO= specifies the number of input
buffers.b) BUFSIZE= specifies the size of the output buffer in
bytes; BUFNO= specifies the number of output buffers.c) BUFSIZE=
specifies the size of the input buffer in kilobytes; BUFNO=
specifies the number of input buffers.d) BUFSIZE= specifies the
size of the output buffer in kilobytes; BUFNO= specifies the number
of output buffers.
Given the data set SASHELP.CLASS:SASHELP.CLASSNAME AGE-------
------Mary 15Philip 16Robert 12Ronald 15The following SAS program
is submitted:%let value = Philip;proc print data =
sashelp.class;
run;
Which WHERE statement successfully completes the program and
produces a report?a) where upcase(name) = upcase(&value);b)
where upcase(name) = %upcase(&value);c) where upcase(name) =
"upcase(&value)";d) where upcase(name) =
"%upcase(&value)";
The following SAS program is submitted:data combine;merge one
two;by id;run;Which SQL procedure program produces the same
results?
A. proc sql;create table combine asselect coalesce(one.id,
two.id) as id,name,salaryfrom one full join twoon one.id =
two.id;quit;B. proc sql;create table combine asselect
one.id,name,salaryfrom one inner join twoon one.id = two.id;quit;C.
proc sql;create table combine asselect coalesce(one.id, two.id) as
id,name,salaryfrom one, twowhere one.id = two.id;quit;D. proc
sql;create table combine asselect one.id,name,salaryfrom one full
join twowhere one.id = two.id;quit;
Given the SAS data sets CLASS1 and CLASS2:CLASS1 CLASS2NAME
COURSE NAME COURSE-------- ----------- -------- ------------Lauren
MATH1 Smith MATH2Patel MATH1 Farmer MATH2Chang MATH1 Patel
MATH2Hillier MATH2
The following SAS program is submitted:proc sql;select name from
CLASS1
select name from CLASS2;quit;The following output is
desired:NAME--------ChangLaurenWhich SQL set operator completes the
program and generates the desired output?A. UNIONB. EXCEPTC.
INTERSECTD. OUTER UNION CORR
The following SAS program is submitted:%macro loop;data one;%do
I = 1 %to 3;var&I = &i; %end;run;%mend;%loop
After this program executes, the following is written to the SAS
log:(LOOP): Beginning execution.(LOOP): %DO loop beginning; index
variable I; start value is 1; stop value is 3; by value is
1.(LOOP): %DO loop index variable I is now 2; loop will iterate
again.(LOOP): %DO loop index variable I is now 3; loop will iterate
again.(LOOP): %DO loop index variable I is now 4; loop will not
iterate again.(LOOP): Ending execution.Which SAS System option
displays the notes in the SAS log?A. MACROB. MLOGICC. MPRINTD.
SYMBOLGEN
The following SAS program is submitted:data temp;array
points{2,3} (10, 15, 20, 25, 30, 35);run;
What impact does the ARRAY statement have in the Program Data
Vector (PDV)?
A. The variables named POINTS1, POINTS2, POINTS3, POINTS4,
POINTS5, POINTS6 arecreated in the PDV.B. The variables named
POINTS10, POINTS15, POINTS20, POINTS25, POINTS30, POINTS35are
created in the PDV.C. The variables named POINTS11, POINTS12,
POINTS13, POINTS21, POINTS22, POINTS23are created in the PDV.D. No
variables are created in the PDV.
Which SAS integrity constraint type ensures that a specific set
or range of values are the onlyvalues in a variable?
A. CHECKB. UNIQUEC. NOT NULLD. PRIMARY KEYThe following SAS
program is submitted:data new (bufsize = 6144 bufno = 4);set
old;run;What is the difference between the usage of BUFSIZE= and
BUFNO= options?
A. BUFSIZE= specifies the size of the input buffer in bytes;
BUFNO= specifies the number ofinput buffers.B. BUFSIZE= specifies
the size of the output buffer in bytes; BUFNO= specifies the number
ofoutput buffers.C. BUFSIZE= specifies the size of the input buffer
in kilobytes; BUFNO= specifies the number ofinput buffers.D.
BUFSIZE= specifies the size of the output buffer in kilobytes;
BUFNO= specifies the number ofoutput buffers.
The following SAS program is submitted:%let first =
yourname;%let last = first;%put &&&last;What is written
to the SAS log?A. FirstB. &&firstC. yournameD.
&yournameGiven the following SAS data set ONE:ONEREP
COST________________________SMITH 200SMITH 400JONES 100SMITH
600JONES 100JONES 200JONES 400SMITH 800JONES 100JONES 300
The following SAS program is submitted:proc sql;select rep,
avg(cost) as AVERAGEfrom one group by rephaving avg(cost) >
(select avg(cost) from one);quit;Which one of the following reports
is generated?A. REP AVERAGE_______________JONES 200B. REP
AVERAGE_________________JONES 320C. REP
AVERAGE________________SMITH 320D. REP AVERAGE________________SMITH
500The following SAS program is submitted:%let value = 9;%let
value2 = 5;%let newval = %eval(&value / &value2);
Which one of the following is the resulting value of the macro
variable NEWVAL?A. 1B. 2C. 1.8D. null
The SAS data set ONE has a variable X on which an index has been
created. The data sets ONEand THREE are sorted by X. Which one of
the following SAS programs uses the index to selectobservations
from the data set ONE?A. data two;set three;set one key = X;run;B.
data two;set three key = X;set one;run;C. data two;set one;set
three key = X;run;D. data two;set three;set one (key = X);run;
The following SAS program is submitted:proc sql;select rep,
area, count(*) as TOTALfrom one group by rep, area;quit;Which one
of the following reports is generated?A. REP AREA
COUNT-----------------------------------------------JONES EAST
100JONES NORTH 600JONES WEST 500SMITH NORTH 800SMITH SOUTH 200
B. REP AREA
TOTAL-----------------------------------------------JONES EAST
100JONES NORTH 600JONES WEST 500SMITH NORTH 800SMITH SOUTH 200
C. REP AREA
TOTAL-----------------------------------------------JONES EAST
1JONES NORTH 2JONES WEST 3SMITH NORTH 3JONES WEST 3SMITH NORTH
3SMITH SOUTH 1D. REP AREA
TOTAL-----------------------------------------------JONES EAST
1JONES NORTH 2JONES WEST 3SMITH NORTH 3SMITH SOUTH 1SMITH NORTH
3SMITH SOUTH 1
The following SAS program is submitted:data temp;array
points{3,2}_temporary_ (10,20,30,40,50,60);score =
points{2,1}run;Which one of the following is the value of the
variable SCORE in the data set TEMP?A. 10B. 20C. 30D. 40
The following SAS program is submitted:%macro execute;
proc print data = sasuser.houses;run;%end;%mend;Which of the
following completes the above program so that it executes on
Tuesday?
A. %if &sysday = Tuesday %then %do;B. %if &sysday =
'Tuesday' %then %do;C. %if "&sysday" = Tuesday %then %do;D. %if
'&sysday' = 'Tuesday' %then %do;
Which one of the following SAS integrity constraint types
ensures that a specific set or range ofvalues are the only values
in a variable?A. CHECKB. UNIQUEC. FORMATD. DISTINCT
Which one of the following options displays the value of a macro
variable in the SAS log?A. MACROB. SOURCEC. SOURCE2D. SYMBOLGEN
What is the correct syntax to create macro variable with
sql?
Select distinct country into:cur seprated by from tablename
The following SAS program is submitted:options yearcutoff =
1950;%macro y2kopt(date);%if &date >= 14610 %then
%do;options yearcutoff = 2000;%end;%else %do;options yearcutoff =
1900;%end;%mend;data _null_ ;date = "01jan2000"d;call
symput("date",left(date));run;%y2kopt(&date)
The SAS date for January 1, 2000 is 14610 and the SAS system
option for YEARCUTOFF is setto 1920 prior to submitting the above
program. Which one of the following is the value ofYEARCUTOFF when
the macro finishes execution?
A. 1900B. 1920C. 1950D. 2000
Check the symtax what will happn when we submit this
program.
Data aa ;Length x y 5 z ;Run ;
Data set will not created.
Which one of the following statements about compressed SAS data
sets is always true?A. Each observation is treated as a single
string of bytes.B. Each observation occupies the same number of
bytes.C. An updated observation is stored in its original
location.D. New observations are added to the end of the SAS data
set
Given the following SAS data set ONE:
ONELEVEL AGE----------------------1 102 203 202 101 102 303 102
203 301 10
The following SAS program is submitted:proc sql;select level,
max(age) as MAXfrom onegroup by levelhaving max(age) > (select
avg(age) from one);quit;Which one of the following reports is
generated?A. LEVEL AGE-------------------2 203 20B. LEVEL
AGE---------------2 303 30C. LEVEL MAX--------------------2 203
30D. LEVEL MAX--------------2 303 30.
The following SAS program is submitted.
filename sales ('external-file1' 'external-file2');data
new;infile sales;input date date9. company $ revenue;run;
Which one of the following is the result of including the
FILENAME statement in this program?A. The FILENAME statement
produces an ERROR message in the SAS log.B. The FILENAME statement
associates SALES with external-file2 followed by external-file1.C.
The FILENAME statement associates SALES with external-file1
followed by external-file2.D. The FILENAME statement reads record 1
from external-file 1, reads record 1 from external-file2, and
combines them into one record
Which technique is use to find the unique value from a data
sets?
First. And last.byProc sql uniqueProc sort
Where we cant use not sorted option ?
Merge
Code
CMARP
Proc print data = dataset name ;By code;Run ;No output will
print.
Which statement is use to write data in a file ;
File statement
What option will display macro code and macro execution details
in log window?
Mlogic and mprint
Data step with view ;
When msg will come to log ; ;
Both time .SAS Macro Interview Question
1. Have you used macros? For what purpose you have used?
Yes I have, I used macros in creating analysis datasets and
tables where it is necessary to make a small change through out the
program and where it is necessary to use the code again and
again.
2. How would you invoke a macro?After I have defined a macro I
can invoke it by adding the percent sign prefix to its name like
this: % macro name a semicolon is not required when invoking a
macro, though adding one generally does no harm.3. How can you
create a macro variable with in data step?with CALL SYMPUT
4. How would you identify a macro variable?with Ampersand
(&) sign
5. How would you define the end of a macro?The end of the macro
is defined by %Mend Statement
6. For what purposes have you used SAS macros?If we want use a
program step for executing to execute the same Proc step on
multiple data sets. We can accomplish repetitive tasks quickly and
efficiently. A macro program can be reused many times. Parameters
passed to the macro program customize the results without having to
change the code within the macro program. Macros in SAS make a
small change in the program and have SAS echo that change thought
that program.
7. What is the difference between %LOCAL and %GLOBAL?% Local is
a macro variable defined inside a macro.%Global is a macro variable
defined in open code (outside the macro or can use anywhere).
8. How long can a macro variable be? A token?A component of SAS
known as the word scanner breaks the program text into fundamental
units called tokens. Tokens are passed on demand to the compiler.
The compiler then requests token until it receives a semicolon.
Then the compiler performs the syntax check on the statement.
9. If you use a SYMPUT in a DATA step, when and where can you
use the macro variable?The macro variable created by the CALL
SYMPUT routine cannot be used in the same datastep in which it got
created. Other than that we can use the macro variable at any
time..
10. What do you code to create a macro? End one?We create a
macro with%MACRO statement and end a macro with %MEND statemnt.