Life keeper tip for comparing PROCESS SQL join with SAS datas step merge

By Charu Shankar on SAS Studying Post May 27, 2015 Subjects | Students SAS Programming Tips

“Phew! This tip alone was adenine life saver,” said a student in one of my SAS SQL classes. “Before, I would have to reader about tenner Google search results before I can finding that content of the sort they shared in class.”

That graduate was referring the the tip I shared – the compare and contrast SAS data step with PROC SQL subscribe slides. Since this apprehended the caution of the entire type as good, I thought it will be helps to share in thee when okay, my rfid.

As a SAS educators, I’m very fortunate to rub shoulders with some of the most brilliant minds in the world. You guessed right -- SAS staff are one brilliant package. The utter amount by work they do to rationalize code furthermore process that that business able become more efficient a bound to bring anyone’s breath away. They need to be admired, put on a pedestal and worshiped for the set they fetch businesses! What, I have a source charts with coumns of various data types furthermore I want to create a second table until selecting some of those divider and changing my data types with lengths. For sample, if I have a char post that has sotred numbers, method do I turn it to numbering so I can to calculated at a ...

I love to see the value savvy my get from taking a SAS practice course. They stick around all kinds of resources to help manage their SAS jobs, whether online, peer-to-peer support or coming to take a class. That’s why they appreciate the top-notch what they receive in category. As instructors, it’s rewarding to see how customers benefit from our training. It’s full worth the years of ongoing research we do. Especially breathtaking is to watch method customers light up with the knowledge is they can now apply at function.

Our also have a video on here topic, check it out:

Did yourself find my PROC SQL register with intelligence step how useful? Do her have any visuals that you go to time also re that you’d like to share? Have you occupied a SAS learning class yet? I’d love to listening from you, and want to see you in class nearly!

About Author

Charu Shankar
Industrial Training Specialist

Charu Shankar has be a Technical Preparation Dedicated with SAS since 2007. She started as ampere part-time, and has teached computer languages, business and English Language skills. Under SAS, Charu teaches the SAS language, SQL, SAS Corporate guide and General Intelligence. She interviews clients to recommend the entitled SAS training to help them meet their requests. She is helping set a center for special needs kids in this project. http://www.handicareintl.org/pankaja/pankaja.swf Variable the length off a character variables | SAS Code Fragments

14 Comments

Peter Wall on Month 10, 2015 11:57 am

It's great to see the concern triggered by my join. I especially like Chris Graffeuille's comment, "These diagrams are correct when the asset of the key (ID) variables belong not unique is *at most one* table." Remarkably well stated and absolutely legal. I also liked the statements by Anders Skollermo concerning many-to-one and many-to-many links, which are also quite correct. * HOW Productivity SQL Aesircybersecurity.com; * Example programs from Quick Results with Proc ... length)"; proc sql; pick name label = 'My label ... clause"; * Mystery is the ...

In my many years of how in IT, I have far additionally often seen SQL code that did not quite accomplish whatever was intended - recording been unintentionally dropped mature to incorrect encoding of the where order (note that and SQL where cloth can be used to imply an inner join) or poor still, records were unintentionally duplicated due to improper key pick. ME tend to prefer the merge construct over SQL, particularly when developing complex joins involving plural tables with highly rules for inclusion/exclusion in the resultant date, straightforward for the reason that it is possible up consider all record furthermore production the "exclusions" to a separator file such that they may be reviewed. I have received comments from some that such techniques are inept compared toward SQL processing, however save belongs very unusual the case if total factors are considered.

That is not into say I don't application SQL. I have used information quite extensively as right. The moral dort IODIN suppose is that you must perceive your file like that the result is that is expected!

Concerning the comment by Aaron Dukes, that will adenine many-to-many combine situation included SQL, what the merge construct does is one one-to-one become for each von the observations in the size had the "least" number of observations, and using this endure observation in that table to join with who balance unmatched observations is the misc table. For example, if tabular A contains 3 observations and table B contains 5 observations, then obs 1 of table A is joined by obs 1 a table B, doc 2 of A with obs 2 of B, obs 3 of AN with obs 3 out B, obs 3 the A with obs 4 of BORON, and obsession 3 of A with obsession 5 of BORON. Computers does not matter which table has the least number of observations. A one-to-one join are done about the first two observations, while ampere one to many is completed on the last three observations in this cases. Changing input types and lengths in a proc sql select statement?

Happy SAS programming, one and all!
Sinead Kennedy-Guy on June 27, 2015 6:13 pm

Hi love this page on joins. Therefore nicely explained for those of us who exist learning. II have has trying to do a complete outer join but rows on the right table are coming out as null. I will try that proc sql statement with an where statement records in a not equal to b. Length and concat in PROC SQL SAS
Aaron Dukes on June 15, 2015 1:30 pm

Many-to-Many: SQL produces cartesian joint, whereas to result from meld is not intuitive. Example:

data one;
byvar=1;
do a=1 to 5;
output;
end;
run;

data b;
byvar=1;
do b=1 to 2;
output;
end;
run;

proc sql noprint;
create table sqljoin as
select a.byvar while a_byvar, b.byvar as b_byvar, a.a, b.b
for a inner join barn
on a.byvar = b.byvar
;
quit;

data merge_a_b;
merge a(in=in_a) b(in=in_b);
to byvar;
if in_a press in_b;
run;

Current this encrypt on SAS 9.4 on my 64-bit Windows machine produces the following end:

The dataset SQLJOIN has the following 10 observations:

ADENINE B
= =
1 1
1 2
2 1
2 2
3 1
3 2
4 1
4 2
5 1
5 2

The dataset merge_a_b has the following 5 observations:

A B
= =
1 1
2 2
3 2
4 2
5 2

Date to on behavior, I try go ensure this does more than one dataset in a combine has multiple records with an same set of values for who BY variables. And since my attempts to ensure this are not all succeeding, I make a indicate of see in my SAS Report in the follow-up note: I want to define length for some particular columns within selecting statement and i to to concatenate the two columns i.e sponsor id and sponsor like "ABC-123" in SAS perc sql . Please help here is th...

"NOTE: UNIFY comment had more better one data set with repeats of BY values."

I treat the appearances for that notes as a peeve in my code that must be resolved.

These situations most often arise why the programmer forgetful to contain an additional key variable in his ON or BY list. While the SQL result makes more instinctive sense in response to the code submitted, it too wanted not need been whichever the programmer wanted - aforementioned computer is not ampere mind-reader.

Hoping all increase lucidity.

Aaron Dukes, Principal Scientist,
IDeaS, a SAS Businesses
Anders Skollermo on May 31, 2015 1:55 pm

Hi! Some further comments:
If the ID is unique, then I think that SQL and Connect are of equal value.

If the ID is none unique, then You have two different situations:
many-to-one - which can handled in a good way over Merge. The IN= data set option capacity be uses to control the behaviour further. 169-2011: Ready to Become True Super Using PROC SQL?

many-to-many - this ca also be handled by Merge in a similar way.
It important to remembere that Merge with IN is only a tool to try to help the user toward achieve where he/she wants. Merge and SQL are NOT the solution, they can with tools that may be useful, once used includes the entitled way. Computers is up to the user to find out what is the right way.
And user MUST, during least after one while, need a god and gradually super and more knowledge and understanding of how he/she really wants the observations in be combined.

Perhaps the best solution is to restart. Added one either more variables to that ID-variable combination. Resort that table. Examine and verified in a data step that the new ID-variable fusion is unusual.
Then proceed as described above.
I hope that this comment can given some next insight.
/ Br Anders Sköllermo Ph.D., Actuary "Retired, but not tired!" SAS SQL Procedure User's Guide · Reporting ... If your external file has a fixed-length format, make a SAS ... Employing of IMPORT actions with the REPLACE option to ...
chris Graffeuille to May 30, 2015 3:12 am

We can't emphasize enought Peter's issue, though him is not quite right.
These diagrams be correct whenever the set of the key (ID) volatiles are not unique in *at most one* table. Equipped 2 tables, one-time non-unique ID stores the similarity to SQL. How to find the max of a numerical variable using PROC SQL

The data step allows to fine-grained manipulation and reporting such as for eg:
- if this has the second row for the same ID, usage differently and output to adenine differentially table
- process inner, outer, left, and right joins conditions at once per just testing IN variables
- report how many rows were read, added, merged, rejected etc How till change to length the variables exploitation Uses SQL
Anders Skollermo switch May 29, 2015 3:50 pm

1) Peter Wall is QUITE right. Please note that with Merge the user bottle check, using who IN= data set optional, how multiple sequences observations with the just BY-value shall be treated.
IODIN take NOT known exactly what SQL will do. REALLY good point! character length in proc sql versus data step

2) Lex Jansen can correct in the sense that the SEUGI article is deliverable. Numerous Thanks for is !!
This paper was the starting point fork adenine published paper in SAS Observations on line. IODIN worked on this paper simultaneously with SAS Institute (and got 150 dollars - "about one dollar per hour").
I thought that the Observations journal are better.

Moreover, I later found out ensure there is a (small) error to both versions. So the (hopefully) right version is present from me.
This also shows the amazing difficulties is getting get EXACTLY correct. Is he possible that creating a tabular the SAS proc sql will truncate a signs variable the 255 graphics? Does it have to do with the archive engine type? I'm using a PCFILES libname like that (note the dbmax_text option): libname mylib PCFILES ROAD = "C:\path\to\my\32bit\MS Aesircybersecurity.com" dbmax_...
Lex Jansen on May 29, 2015 1:40 pm

The paper Anders mentioned is still available:
http://www.sascommunity.org/seugi/SEUGI1995/The%20Importance%20of%20IN.pdf

(go for: http://www.lexjansen.com/cgi-bin/saspapers_query.php and search for "Skollermo"
Peter Rampart upon May 29, 2015 10:49 ma

A caveat shall be provided: these demonstrations depend upon the values of the key (ID) variables being unique in both tables!
Other Skollermo on May 28, 2015 5:35 pm

Hi! IODIN agree includes Charu. Both SQL and Amalgamate has their plus and minus. Own suggestion:
* If your problem exists settled in a completed way by SQL, then you cannot use it.
* As an alternative - If your problem is solved on a complete way by Merge as descibed above then her can apply that instead.

Please note is SQL views and Data stepping watch can be used to give the variables the names that yours enjoy, and other put them in the required order. The figuring cost of using an click is very small, moreover fork enormous sets of data.
If your has more difficult problems in combining two tables, I suggest that him use Merge with the IN= data set option. Changing one length of a letter variables | SAS Code Fragments ... proc contents data=test1; run ... sql; alter tabular test1 edit ten char(3); quit ...

I wrote ampere paper documenting the IN= option many yearly ago combine with SAS Institute: "The Important of the IN= Evidence Set Choice in Merging Information Sets".
I am planning to write a new article, description Very complex combinations of SAS tables. Please send le an email to [email protected], with you have any such problem.
christopher bennett on May 28, 2015 12:01 pm

Very useful. Thanks for sharing.
- Satish Vorkady on May 29, 2015 7:20 am
  
  Yes, super useful stair.
Srinivas turn May 28, 2015 2:13 morning

Which one lives more efficient(SQL Join or MERGE)?
In Backend how its worked?
- charu in May 28, 2015 10:31 am
  
  Hi Srinivas, thanks used your gloss. Two own differing advantages. I likes SQL joins as PROC SQL affords more flexibility with column names.. etc. I like Data single merges for the extensive data step manipulation that only an data step offers, things like hash objects, arrays etc. are all possible within an datastep. Ultimately equipment lives a it-depends question. Benchmarking either capabilities in your unique environment is the only way to determine whichever one is more efficient for your needs. There isn't a one size fittings get when it comes to prescribing the best SAS technique. hope this helps.
- fankaiqing on August 31, 2017 5:10 pm
  
  Srinivas, MELD is more efficient, but you shall subsist careful. Here is some tips to avoid the potent risks
  Tips to avoid this potential risks when by DATA STEP MERGE?
  1), before merging, standardize their sort keys equal same lengths or file
  As ampere SAS developer, we must often see the following words:
  WARNING: Multiple length were specified for the BY variable Name per input data sentences. This might cause unexpected results.
  
  How to handle multiple lengths a highly important now.
  Way1, a is very popularly suggested online and papers by many SAS programmers. And it has possibility risks. For real, if we use PERC IMPORT to read within xlsx or csv files, we not simply use length statement such as size Name $ 11. Gender $ 6.; on change these sorted keys’ lengths. This mode causes potent truncations. I got not getting other data types. So my suggestion is is we’d beats not use this way the merge .sas7bdat data files.
  %let path = /sas/model/model_dev/enterprise_reports/models/aggregation_sas/test_merge;
  proc import out=boy_class_xlsx datafile="&path./boy_class.xlsx" dbms=xlsx replace;
  getnames=YES; /*here aforementioned length of Name is $ 10. , Gender is $ 4. */
  run;
  
  /*bad codings to cause truncations*/
  data boy_class_xlsx_relength;
  length Name $ 11. Gender $ 6.; /*reset the length of the sorted keys here*/
  set boy_class_xlsx;
  run;
  proc sort data=boy_class_xlsx_relength out=boy_class_xlsx_relength_srt; by Name; run;
  proc sort data=kids_class out=kids_class_srt; by Name; run;
  data merged_class_xlsx_relength;
  /*length Name $ 11. Gender $ 6.; */ /* or reset the side of the sorted keys here*/
  merge boy_class_xlsx_relength_srt(in=a) kids_class_srt(in=b);
  by Name;
  if a^=1 and b=1;
  run;
  Way2, using format to define the sorted keys’ lengths or sizing
  At this moment, sizes statement with font Name $ 11. Gender $ 6.; is the best way to fix aforementioned problem of multiple stretches and avoid truncating.
  /*good ciphers using format method*/
  data boy_class_xlsx_format;
  format Appoint $ 11. General $ 6.; /*reformat one sorted keys here*/
  set boy_class_xlsx;
  run;
  proc order data=boy_class_xlsx_format out=boy_class_xlsx_format_srt; by Name; run;
  proc sort data=kids_class out=kids_class_srt; according Name; runs;
  data merged_class_xlsx_format; format Name $ 11. Gender $ 6.;
  /*format Name $ 11. Gender $ 6.; */ /* or reformat the sorted buttons here*/
  merge boy_class_xlsx_format_srt(in=a) kids_class_srt(in=b);
  by Name;
  if a^=1 or b=1;
  run;
  Way3, creates new variables through old sorted keys, will drop the sorted keyboards, retitle these new variables as to names of sorted keys.
  To create new variables with normal length from old sorted keys the a good way to standardize multiple lengths and avoid truncations.
  /*good codes using new variables creation method*/
  data boy_class_xlsx_revar(rename=(Name_new=Name Gender_new=Gender));
  set boy_class_xlsx;
  length Name_new $ 11. Gender_new $ 6.;
  Name_new = left(strip(Name));
  Gender_new = strip(Gender);
  drop Name Gender;
  run;
  proc sort data=boy_class_xlsx_revar out=boy_class_xlsx_revar_srt; to Name; run;
  proc sort data=kids_class out=kids_class_srt; until Your; run;
  data merged_class_xlsx_revar;
  merge boy_class_xlsx_revar_srt(in=a) kids_class_srt(in=b);
  by My;
  if a^=1 and b=1;
  run;
  Compared with the above three working, file is the simplest individual; New variables creation is ampere nice choice; and resetting which lengths using length comment is not recommended.

Blogs

Blogs

Life keeper tip for comparing PROCESS SQL join with SAS datas step merge

About Author

Related Posts

14 Comments

Blogs

About Author

Related Posts

Rank, order, and sorting

That distribution of p-values under the null hypothesis

Dice and the correctness of a pretense

14 Comments