shopify data science interview
Published on: February 9 2023 by pipiads
Table of Contents About shopify data science interview
- Shopify SQL Interview Question for Data Scientists and Data Analysts (StrataScratch 2118)
- Cameron Davidson-Pillon - Data science at Shopify
- Fireside Chat: Solmaz Shahalizadeh, VP of Data Science & Engineering at Shopify (Data Driven NYC)
- Breaking Everything as a Shopify Intern
- Talking Pair Programming with Shopify's VP of Engineering
- DataMasters S2E1: Shopify’s Ella Hilal: Attacking data science with an “always learning mindset”
Shopify SQL Interview Question for Data Scientists and Data Analysts (StrataScratch 2118)
what's up youtube? today we're going to take a look at a shopify sql interview question. this was marked as medium, but it is pretty hard, let's get into it. so this one's called most sold in germany. i'm actually from germany, if you haven't guessed yet, so this one's interesting for me and our task is to find the product with the most orders from users in germany, output the market name of the product or products in case of a tie. and we have four tables. four: we have shopify orders, shopify users, dim product and map product order. now, shopify order seems to contain orders from shopify users, has a bunch of ids in there, dates and all that. shopify users, which has user information: first name, last name, country. i think we're going to have to use country to filter to germany, since it asks for users in germany, not product in germany or anything. we have dim product, which is a dimension table for product which has a product name, product brand, market name. i think we're supposed to output market name. i'll put the market name of the product or products in case for thai. now, in case of a tie is also important. if there's more than one product which has the top amount of orders from users in germany, then we should output more than one. and finally we have map product order, which maps products to orders. so there's just order id and product id in there. so this one's there to make everything more performant and, yeah, to kind of reduce stuff in being just in one place. this one just takes an order id and tests which products were in that order instead of having everything. in short, for orders e. that's kind of just the structure of everything, yeah, but four tables- usually we don't have that many, i think. for starter stretch, often there's only one table in the question, which is not always the case in real interviews. often there are two tables to kind of test your ability to join them, but four tables is quite a lot. so let's work through that. the question itself is quite simple, but it can be hard to work with so many tables. now, if we look at the task, i usually want to find out whether i have to use all of them and which of them contain the important information. we do need orders, since we want to count orders and then filter to the top amount of orders from users in germany. users in germany can be found in the country field and shopify users. so we're going to have to use superfly. users too. dim product has the market name and we should output the market name in the end, and we don't know which order contains which product if we don't use in a product order. yeah, and basically we want to get an output which counts the amount of orders per product from users in germany, then rank that with the highest amount of orders on top and then limit that to only the ones that have the highest amount of voters. rank number one. so i'm using limit and rank, and these are the two things we could use for this question. if we were to use limit1, just output and just limit the amount of rows we output, then we wouldn't be able to accomplish. we would have to do limit one and then we wouldn't allow for tie. if we do limit two and there are three which would be tied would still cut them off, so we're gonna have to rank them, use the rank window function to do that after we combine all these tables and then just filter to rank being one, and that's that's something we do often, i think for the last few videos i i had like three or four out of 10 were filter on the rank being one something. so that's what we're going to do here again. but first let's take care of combining these tables. so we're going to select star from shopify orders, that to shopify users joining on [Music]. actually going to write it out. all this dot user id is shopify users. the id is just called id in here and these seem to be the primary keys linking them. then we have map product order and we do have order id and product id hidden here. order id is something we've seen before in shopify orders, so that's what i'm going to use to link them. and then i'm going to link them: product which contains a product id which we haven't had before. so we're gonna have to do it in this logical order for it to make sense for me. so let's join [Music]. map product order. [Applause] on [Music]. it should be on as well, okay. [Applause]. shopify orders. dot already should be my product order. dot order id. and then finally, we have it's called dim product and we're going to use the order id from shopify orders. no, we have to use the product id from map product order. that's what i just said. then product dot, sql, uid. okay, once again, product id has a different name and dim product. so you've got to be careful here. but that's the mapping we have to do. i hope i don't have any type post, so let's just run that and see if it works here. i do have missing from clause. oh, it's just called my product order. yep, okay, but that gives us a huge table because it combines all of these columns. but it does work. so we have all the information in there. we need to use all tables. and now let's get back to our task of finding the product with the most orders from users in germany. so what i'm going to do right here, before i forget, is filter the country to germany. country is in shopify users. maybe to make it clear, i'm gonna state out the table name, even though country is only in this one table. but yeah, this will reduce our orders to uses from germany. only now we could do account. so let's get product id count, distinct order id, get the number of orders and if we group by, product id should give us a count of orders per product id already is ambiguous. this one is in multiple order, multiple tables. so let's say shopify ordersorder id. product id should not be ambiguous because the other element is called protest uid. it does get messy with these with this many tables, but yeah, i guess that's the main challenge here. so we have a count. we could order that to have the highest count on top, but i wanted to use a rank window function anyway. so i'm gonna- maybe i'll leave that here and now- create a rank based on that count directly. i'm going to split out rank over, type out my window function, syntax partition by order by, and i want to create a rank based on the order of the count of the or based on the order column, pretty much. so the high strength should be the highest order count. so i'm going to order by this entire, this entire count function i just came up with and, yeah, order that by descending order, so highest count will be on top, that will be ranked number one and yeah, that's going to look like this if i run this once again- another error- i just have to get rid of that partition by because i'm not using it. that's kind of my template. but we do get our rank one here for the first two product ids. then it jumps to rank three because we're using the regular rank window function. if i were to use dense rank, it would jump to 2 for the next product id, because dense rank is defined as starting with the next rank, even if there's a tie. so it's going to go 1, 1, 2 instead of one, three, um, it just depends on what you want to use or the other problem definition, but in this case we just need to output the number one, so it doesn't matter which one we're using. i'm just going to change it to regular rank and call that r, always call that rank, column r and yeah. so that's gonna take us pretty far, and now we just need to take all of this, put it into a subquery and filter to rank being one. we also want to output the market name of the product instead of the product. so i'm gonna add the market name as another column here. add that to the group by. this count wouldn't work without the group by and the count in the window function wouldn't work either, so i'm gonna have to leave it here and add this market name as well. all right, so we do have that market name in there now. so let's put that into subquery and select our final output from that. so we're going to select market name from this subquery. let's call it german orders ranked and fit to the rank being one to do the.
Cameron Davidson-Pillon - Data science at Shopify
was a very, very small, I think. we were just called the data team back then and I'd say I joined, join that team of six or so people and from there we just started building data science and data engineering capabilities and, in parallel, sort of data science took off. there were lots of new conferences and new educational programs, mostly because industry and academia was finding value of data science. and I guess, like the, the world of data science has changed a lot since you dived into it in 2012, both in terms of, I guess, the tooling and then the expectations on data scientists. what are some of the the biggest changes that you've seen since, like, say, 2012 ish, and today I would say that the biggest change has been in tooling. definitely the barrier to entry is much, much lower. if I thought about I remember in 2012, at early 2013 actually, in Shopify, we were debating whether to use spark, which was this brand new tiknology and right out of like Berkeley, and we weren't sure about it, I think it was still like pre 1.0 at the time- or use the more traditional tiknology like HDFS, Hadoop. we went with spark and that was a good decision. so we can, we could use our Python skills in spark. we, we paid the the pioneering cost of using spark. that is like get like, fight all the bugs and work out you know the internals of the spark ourselves. but now you know you have spark- I think three or maybe it's 2.6 or something- which is really easy to use. they had a high level API for playing with your data and like dodging data. so like that's totally change. spark is totally changed. I also remember in my master's program I took a course in machine learning and the professor- they mentioned kind of offhand that yeah, like neural networks used to be big but then they kind of fell out of favor versus SVM's but they're kind of back now and he just like offhandedly mentioned that since then, I mean neural networks have really blown up. and when tensor flow was announced in, I think, 2015, like that was a big deal because the end there was this high-level API for doing neural nets and then you had Kerris, which is even higher level API for doing for doing deep learning and so on and so on and so just a barrier to entry so low now in terms of what you can do, and that's great if it means that you can spend less time tinkering with tiknology and in bugs and more time actually doing productive work. and what impact has that had on the day to day job of like the average data scientist? because I'd imagine right if you're spending most of your time trying to fix like bugs and spark and getting tensorflow off the ground or whatever it is that you know people might do. now that's been taken away, you have more time to focus on things and so the expectations might shift. what are some of the ways that that's manifested? I think you can do more interesting work. you're thinking less about the actual tiknology and thinking more about the algorithm or what to add to the algorithm. if I think about you just working with spark it it allowed us to just do things we couldn't do before. yeah, opened up new doors. I've sort of heard a bifurcation- or yeah, but kind of bifurcation- of opinions when it comes to the evolving role of data scientists. on the one hand, some people say: you know data scientists are being forced to develop more engineering skills because the expectation is that you know you got to be able to to scale up your tiknology more now as it's being used in production. but then, on the flip side, some people say, oh, it's, you know, going in a more business and product focused direction, maybe more like analytiks, maybe both to like. I don't know if that is that something you've observed at Shopify- yeah, definitely. so I think about like the functionalisation, even like a simple analysis, like it could be just a simple algorithm or it can be very complex. I- one of our teams spent the better part of a year just trying to put a model into production. mind you, it was like a real time model, so there was like additional complexity there, but it was like really hard. if, even even thinking back, if I wanted to share a simple algorithm with a business unit- like, let's say, I'm working with the HR team and I want to share like a model with them, how would I do that? like, let's say, it's pre 2019, how would I do that? I might have to in my ETL workflow, where I'm kind of like massaging data. I might embed the algorithm there, but then it goes into a sequel database and then I need a tiknology like tableau or mode to kind of display my results, number, a user, user, interactive. they can't really change things because the models kind of been, the outputs been frozen. I could hand over an ipython notebook, but that's kind of yeah, rude, almost to say, hey, like, hey, run this ipython notebook if you work, if you're, if you work, an HR, like you don't know what a command line is, so you can really hand over to your notebook and even, if you like, throws the output and put it into a Jupiter notebook like they're gonna see code and and things like that, which they don't want to see. one of my favorite tiknologies now is streamlet and this allows- it's a new tiknology as of like October last year, in 2019, and it allows this really beautiful or you to create beautiful web UIs where you can display your model or you can- how do I put this? you can have an interactive UI where you can change parameters and the models gonna update in real time. you can add your own data, you can create figures and it looks like from business owners point of view. it looks like a web app. it looks like tableau or mode or looker or something. it looks just as pretty as those, but behind the scenes it's a. it's a Python and very, very simple Python web app and you can embed your, your analysis in there. so just that it like if I'm a data scientist today. I'm not thinking about deployment of models. I'm not thinking about, like: how do I share this, this really important analysis? do I need a data engineer to set up a, a web server, and then I need to create HTML code and CSS code? how do I do all that? now, today I'm thinking about, I'm more about like: how can I make this, this UI, simpler for my user? or how can I improve my model? like how can I sit beside my business user and ask them, like: does this make sense to you? let's quickly go behind the scenes and change the wording here, and it just takes a lot of burden off the sort of engineering side. right, and you have again more time for adding business value.
More:Shopify vs Amazon Pros and Cons Review Comparison
Fireside Chat: Solmaz Shahalizadeh, VP of Data Science & Engineering at Shopify (Data Driven NYC)
[Music]. welcome so much. thank you, very excited to have you here tonight. thanks for joining us. you are the vp of data science and engineering at Shopify. yes, and maybe let's start with a level set on Shopify. what, what? what it is, what it has become over the years- for sure so shall I- is a commerce platform helping entrepreneurs of every size, from small, medium size to really large merchants set up, sell online, be wherever their buyers are. and last, the reported numbers order is that we are over 800,000 merchants in over 175 countries, so we have expanded rapidly. we were founded in 2009, so we've grown quite fast and we went public in 2015. I'm a part of first Marx family in the beginning, so, yes, we are incredibly proud to have imported that journey and have done a very small part to help. what in right- I mean the idea of starting the company at the time where it was geographically and do you think that is now- is 30 billion dollar market cap company. it's just incredible. oh, it's also been amazing to be inside the company. when I joined, we were 300 people and now we are over 5,000, so it's been massive growth really over years, with an office in New York. now, right, the you said, other world that may or may not be public, is the public? sorry, you have an office in New York. yes, no, as a couple weeks ago we welcomed the handshake team to shop fight, so we're gonna have an orange here. but we also we were headquartered in Ottawa, in Canada, so we're close to editing. we have also offices in Toronto, Montreal, Waterloo and brilliant now, yeah, very good. oh, why don't we start with your journey? so you're born in Iran, I believe, to a professor, father and mother, correct, and tell us, how do you, how did you end up in your current position, I think, as I was growing up, I think one of the luxuries I had was that my parents always like gamified math and computer science, and one of my earliest interactions was that actually, let me ask a question, because everyone was asking about modern tiknologies. I'm gonna ask how many people in the audience know what's a punch card? how many have ever seen it being used? awesome, okay. so my mom was was doing this thing and I was like, oh, are you doing crafts? you're making holes in this cards? and she said, no, I'm actually telling a computer to do something, and as a four-year-old, that was just the most fascinating thing. so I don't think I ever learned math or computer science to be good at it, but it was what he didn't enable us to do when. I think that's how I got started. and then then you- I think he did a master's in Sweden, correctly, and then you worked at sloan-kettering. yeah, so I did a master's in Sweden- where what? right after I finished my undergrad in computer science, I really quickly realized that I enjoy using computer science and what I've learned in other domains. so I went in to buy informatiks and I studied in Sweden. then I did research with sloan-kettering. so it was around 2005 and neural networks were not cool yet again. so everyone was like, oh my god, it's not 80s while you're using- they're honest, you think in- or that works to predict the structure of a cell after multiple drugs have come in. and it's kind of fascinating, like what took me two years along with a team to do like right now, with the advances over the last few years in compute and in storage, you can do probably over a week and that has been really fascinating. and then after that I did another degree in in Canada, in McGill, where we used genomic data for predicting outcome of breast cancer so that at the time of diagnosis we can give more personalized treatment to people. and then you find your way into shopify, which is actually, by the way, interesting for anyone. the star I've been trying to recruit doubly that scientist- you, if anyone, should shop it through. yeah, so there was this hackathon called random acts of kindness and basically companies or people get together to solve problems for nonprofit companies where they don't have a large tik team. so I went to the shop by office. they were holding the hackathon and I started toking to people there and they said, yeah, we're growing a data team and why don't you? they said what? I'm coming for a tok. so I went in for a tok, which was the interview, and then I got an offer. so I never actually submit and I think that's like one of the strengths of shop buying hiring is that it we don't go super conventional ways. we just try to get to know people or know about their experiences and excitable problems we are trying to solve. you know, and it'll be to the point about, as Sloan Kettering, neural networks were not hot yet. I think at some point you were data scientists before data scientists became hot as well, and I heard this somewhere will read this over as well. I think so. ten years ago probably, yeah, more than that. so by informatiks became like a very hot topic because we were unable to get access to this massive amount of biological data so you can see all the genes that are expressed in the cell, which is like around 40- 25,000, and then you can get copies of it. so you had this massive sets of data and you had few samples, so we had to figure out a way to find patterns, to find something. so I think feels I had to work interdisciplinary, were sort of first versions of data scientist and I don't think it's limited to buy any informatiks like once we figured out it helped us a lot in hiring because we realized like Social Sciences also have to deal with a lot of messy data and make conclusions about it and they have had methods and they've been really good in inference. so we've hired from that astrophysics. so I think those are the fields that we're doing. data science maybe before it was a field on its own and I hope as a community we tap into those resources more and more because they came with a lot of experience also when they doing yeah, and you know, even that term did a scientist life. everybody toks about all the times, actually very, very new. like one of the first events we had here with Jeff hammer Becker and ginger Patel tok about how they came up with the term. sort of crazy to think that that you know if years ago that title just did not exist, yeah, and they came up with it. and also I find what's also fascinating is that there isn't like a clear definition of what it right. every day there's like a new Venn diagram of intersection of different skills and it's interesting that the field is going through so much reinventing itself as time goes. one. so what does a team look like now at Shopify? what is the data science team in a company of that scale? and so when, when I doing, the data science team was too 20 people. right now my team is over 200 people. so the approach we've taken through the years is that you're first and foremost a product company focused on our merchants. so the way we have also sort of formed our little science teams is this: teams that are embedded in different business units or different product area. interesting, so there's no central data scientists so we all report to the same structure to me. but because we care about the call here at the craft and we also think data is an artifact that doesn't end win. one product, like a merchant, lives through a spectrum of services, so we didn't want to have that disconnect. so we have a central data team, central data platform, data assets, data Lake. but in terms of but what we learned through the years is that the most value comes from data scientists understanding the context of the problems your turn solve. it's very early days, you know we would tag-team, I would say I'll cover finance and maybe a little bit of product, so and so. but as time went by we realize the most value we can give is that while you're understanding the problem, we can apply this tool of data science, machine learning, data engineering better to the problems of the merchants. so that's why day to day sort of the teams are embedded within the product teams. they work closely with the product managers, UX researchers, development teams, but we also have the crafts and org.
More:make dropshipping website
Breaking Everything as a Shopify Intern
hey there, if you don't know me, my name is maria and i like to make tiknology videos on the internet and today i thought it would be nice to sit down and tell you about the time that i up. so basically, what inspired this was that i i started my internship in january as a data engineering intern at shopify and in my second week i kind of did something that wasn't great and i learned from it and it was actually a positive experience, so that's why i wanted to share it. and then, a few weeks after that, i had attended an event specifically made for interns that was called like my up story or something like that, where it was a bunch of full-time employees at shopify, even like the vp of engineering, sharing the times that they up and what they learned from it. so that was really cool and i wanted to share with you what i did, obviously more vaguely, because i can't say specific things because it's like internal information, but i think it's still interesting to share. and, of course, i'm wearing my umbaka earrings, so that means idiot in japanese, because the thing i did was pretty stupid. so yeah, starting with the story, what happened in my first week was obviously onboarding, getting started with things and, as you saw in my like first week of a data engineering intern video, i started this thing called a code lab at the end of the week, which is basically a tutorial, so it would help people learn how to set up a pipeline like a data pipeline in shopify, learning some of the tiknologies that we use and things like that. so that was something i was working on. i started it at that end of the day, on friday, and i continued it into my second week. but what happened was that, since i only work a few days a week, my certificate that i had created became expired. so this, this certificate that you needed, which is just like to authentikate that you are who you say you are. so this is where the problem stems, because, essentially, when i was doing the code lab, you create the certificate, but it there's like a command that you type in the terminal and it creates it and adds it to a file like a, an e json file, so it's like encrypted json and then you would upload that to github and then you can like deploy your pipeline and all this stuff. so you have to like get it encrypted. but okay, what happened was that i had done this, i think, on like tuesday or something, and i don't work on wednesdays, i work on tuesday, thursday, friday. so by the time i had come back on thursday, my stuff was expired so i had to do a new certificate. but i don't remember what i was doing on thursday but somehow i ended up doing this on friday, or like i couldn't figure out or something. but end of day, friday is when i was doing this. like i have such a bad memory but i was doing this every day on friday and i was like to my mentor- we were pair programming for this- because i was like i can't figure out why this isn't working. i'm trying to run the thing again because i have the file, i can see it in front of my face. i see it's the, the two old things. like it has a certificate and i think like maybe a key or something like two different things, two different secrets that it will create. and then what happened was that i was like, okay, i'm running this command that i did before. why isn't it changing these files? all it's saying is that these two files are being created on my computer. and i was like, oh, what if i just open those two files and then copy and paste that information into my thing. i didn't tell this to my mentor because he was like, oh, i have to go work on something else. so i was like, okay, i'll do this and i'll tell you how it goes. like like i'll continue working on it. i didn't tell him what i was gonna do because i don't think he assumed that i was gonna do this thing, even though i think now he, you know, thinks a few steps ahead of what i might be doing. so then what i did was like, okay, i'm going to do that. i did that. i was like: oh, this is like, these are pretty big, whatever. like i'll just put it, i'll just push up on github see if it fixes things. because, like, the whole thing was that i couldn't deploy my. the reason why i found out that my certificate was expired was because i couldn't deploy my pipeline. like it wasn't working because the certificate was expired, because it only lasts for 24 hours. so i was like: okay, then i have to fix it. so then i'm like: okay, maybe i can deploy my pipeline. but no, things weren't working. i was like i sent my recent commit to my mentor and i was like what's wrong, like why isn't this working? he was like what, what did you just do, maria, can you explain? and that was pretty funny because basically i didn't encrypt anything and the thing that i missed, like i was trying to go super fast, i tried to rush myself because i was very impatient. because maybe what you don't know is that in my first internship at shopify- like i've done three so far and my first one- i felt like i was doing pretty well and by the end of it i was like a big contributor to what we were working on. in my second team i felt like i just fell off and like i didn't have a mentor. i was just like i didn't feel like i was contributing that much. i didn't. i don't know if i learned as much as i wanted to learn and i kind of like lost the skills i had gained in my first internship. and then this is my third one and i was like, okay, i need to come back and like get things done really quickly so i can be really good on this team and learn a lot. and then i was just pushing myself too much because i want to finish onboarding, i want to start getting tasks done and like getting issues assigned to me. so that's why i was trying to rush myself and that was a bad idea, because i had missed a step, because there were actually not just one command, they had to run in the terminal. there was a second command and i didn't look back at the tutorial to see that there was that second command. so that was pretty bad of me and i had just like randomly thought of this stupid idea to put these unencrypted certificate and key into a file and upload it on github, which is not safe and essentially okay. what actually happened after that was what the more interesting part of the story. so what my mentor first did was in a slack channel which has a bunch of people in the data org, he wrote a message saying that we did something and this is what happened. here's the branch on github to look at it. and then a bunch of people started a huge thread in that slack channel and they started more threads and more threads, and more threads, because they're all trying to figure out: okay, look, what do we do? because why is this a whole problem? it's because i pushed up the stuff that you're supposed to keep actually secret, because that stuff, uh, like the actual key and everything that is being used across all other apps that use this like certificate. so then they would all need to rotate their certificate because this is being put on github, even though it's a private repository. okay, this is private repository, not deployed at all, but it's still on github. and github is now owned by microsoft and even before it was just another company. so it's like another company can access your secret keys, and it's obviously you. we did, and uh, you can't email github and say, hey, can we delete this stuff? like fully delete it. so you don't have access to it. and they did email them, but it's like still, an extra level of safety is to rotate all of your certificates and keys and everything. so that's, that was a plan. and then they had this one guy who was like kind of leading the discussion, but then he had to go. and also the worst part of this was was again, it was the end of the day on friday, so this is like 6 pm, 7 pm, 8 pm- people were supposed to be going on their weekends and relaxing and then i had calls, this whole issue, that roped in like 30 different people and then a bunch of people who didn't realize that this was gonna affect them, because we took a few like a few hours to figure out like okay, how do we actually rotate this stuff? and
Talking Pair Programming with Shopify's VP of Engineering
hey, farhan, how's it going good, how you doing? i'm doing great. i'm psyched, um, it's great to have you here. you were like, i think, as i was researching you. i think you're probably one of the top pair programming advocates in the world. i don't know if i've seen anyone that has done so much like blog posts and uh, conference, toks and things like that, like trying to sell people on pairing. yeah, i mean, i think it's maybe self-reported that, uh, i have, i'm assuming, maybe the largest pair programming office in the world. right, about 120, 125 pairs. yeah, about 250 people in one office. uh, engineers pairing, i'm pretty sure that's maybe one of the biggest. yeah, i mean, i'd be surprised if anyone could beat that, really. but so you're, you're the. currently you're a vp of engineering at shopify, small ecommerce startup in canada. yeah, small, um, uh. before that you were vp engineering at extreme labs, which i think where that that giant pairing culture was that you were toking about um, and you also ran the largest pivotal office at one point, yep, uh, and you were a cto at helpful, which got acquired by shopify. that's right. you like summarize the last 10 years? yeah, just just like that, um, so i mean that's, that's a lot of engineering and pairing experience to have there. yeah, i mean, i, you know, have to give kudos to how i, you know, learned about pairing at all. um, i went to school with this guy named amarvarma and he had me kind of just come by the extreme labs office when it was first starting- like five or ten people, and i never had seen pairing before i'd heard about it. you know like, you know rumor mill or you know the legend of people who pair program. and then he told me: oh yeah, we've learned this from this company called pivotal labs. and you got to come by, just come by and check it out. and one of the reasons i, you know, decided to like quit my job and join xtreme labs was like i'd never seen this level of intensity or learning that i saw in people pairing. right, i walked by a few people and i was like they are toking and coding and intensely doing that eight hours a day and doing nothing else and i had to learn more. and uh, the genius of um he said: even though you're coming in as vp engineering, you're gonna pair on a project to learn from the inside out what it means. and i paired with an intern and between the two of us- you know either of us probably couldn't have built anything. i didn't know ruby on rails, we were building a ruby on rails application and he was an intern at waterloo, super smart but had never built anything at scale, and over three months we built a very complicated hedge fund trading system and i was sold. yeah, that's, that's beautiful. my, my introduction was preparing- was i got my first ruby on rails job, which is like my first good professional development job, and i had done a lot of like dabbling on my own to like teach myself a bit of ruby and a bit of rails. uh, but i sat and paired with someone who was much more senior than i was for maybe three or four months and i learned just a ridiculous amount from him. that was like really a huge turning point in my like career as a developer, because i didn't just learn ruby and rails but also all of the other tools and the habits that go into being a professional developer that i wasn't going to pick up from a book about ruby right, including keyboard shortcuts and how you look at like where do you go to search for information, and you know we, we um would send emails. this is like you know, when i was doing extreme labs, it was pre-slack- we would send these emails to our clients and we would pair on writing the email. yeah, and it sounds weird and people would say, oh, is that when you know somebody goes to the bathroom, whatever, like no, no, our engineers will say that's the most important part when we're pairing, when we're emailing the client. that's super important. it's not just the coding, it's the triaging the backlog, it's the writing the email, it's looking up information, like it tends to be maybe even more important than the coding in some ways. and so, oh, yeah, yeah, and like source control and debugging, tikniques, and like i learned vim because of that and that has been that has lasted me through languages, right, like i don't write much ruby anymore, but i still have this wonderful text editor ability that i started during those pairing sessions. right, and i think the one misnomer i want to you know, um counter, is that the learning isn't only for three or four months, right, i know that there are some companies who pair program and they're like, oh well, we pair at the beginning, i'm like cool, like i love it. but the question i would have then is: do you think that learning stops like, why did you stop pairing? is it just that there's nothing more to learn? and i think that's where this idea of pairing is: a high fidelity way to work with somebody equally smart but with a different, overlapping venn diagram of experience really bears fruit. yep, strongly agree so, but before we go further, i'm curious: what? how do you define pair programming? good question. so for me it's two developers on one machine. it could have usually does have two keyboards, two mice and two monitors, but one computer. and of course, um, that you can extend that definition to be remote. but in the typical definition it's two people also sitting uh beside each other. yeah, got it. so do you care in this definition about how often someone drives or how often you're swapping who's driving hands on keyboard versus not? so i don't. i know that there's a lot of interesting tikniques that have uh emerged as i started researching more into pair programming after i got into it, like ping pong pairing, and there's another word for one- where somebody drives like somebody does the test and the other person solves the test, like this. so there's lots of structured tikniques. i didn't really focus on the tiknique so much as just the pairing itself. the one mantra we did try to work through at extreme labs- but it wasn't as- uh, i would say, as effective as it could have been- is that usually the more junior, like junior employee on the experience side would want to be the driver, and so that's what we tried to do. it wasn't always possible, because you do end up going back and forth, but i think the notion of the two keyboards, two monitors to mice idea and both people actively being engaged in the problem was the my minimum bar and then other tikniques after that were more like uh, like pro tips. yeah, i, i like that too. that's, i share that definition, or i sort of echo it. and one of the reasons i think it's a good definition is because i think it takes a little bit of the intimidation factor out of pairing like. i think pairing kind of like vim actually has a like maybe a little bit of a marketing problem where people hear it and they think it's this complicated thing, it's gonna be hard and or not fun or i need to spend months learning how to do this at. but at the core of pairing, let's say, it's kind of just like you're sort of just working on some code with somebody, like if you've ever been like, hey, can you take a look at this with me? and they sit down next to you, like you're to my definition, you're pairing. it's great if they can bring a keyboard and you swap off. you know who's who's driving from time to time. but even if it's just two heads looking at the same code to me, you're, you're on your way. yeah, see, so i love that def. so we're aligned on this definition. um, i would say it's almost like if somebody, when somebody was doing that, you could almost like take a picture and be like, congrats, you're pairing. like you don't even know, you didn't know you were pairing. but guess what you're doing? your pairing like, i agree with you, people don't know that and i i can't imagine any developer today who has never done that, who's never said, hey, can you take a look at this for a sec? or gone over to help somebody with a problem. i don't know if they necessarily would tag it as pairing, but that would mean that every developer likely has has done some sort of som.
DataMasters S2E1: Shopify’s Ella Hilal: Attacking data science with an “always learning mindset”
[Music]. this is the solution. is it that? every single solution? how? regardless how beautiful it is, how elegant it is, it's a desk because you need to evolve it, you need to make it better, you need to enhance it. so, by by thinking back to the first principles, you always can challenge the solution and make sure that you're always having a solved problem. welcome to another episode of data masters. i'm anthony dayton, chief product officer at tamer. my guest today is ella halal. she is shopify's director of data, as well as an adjunct assistant professor at the university of waterloo, a data scientist and an evangelist for women in tiknology. today we'll tok to ella about why questioning conventional wisdom is often at the heart of good data science, why it's critikal to have an always learning mindset, and how her team is playing a crucial role in the growth and revenue at shopify. ella, welcome to data masters. thank you for having me. i'm super excited to be here. excellent. so you know, before we get into the details of of shopify, i'm sure people would love to know a bit more about your personal background, uh, and you know what stops you've made along the way in your career and life that land you in the current position you have at shopify. um, i started my career actually as a software developer. i was a java software developer and grew into like a full stack. i also did my masters in my phd in saturn analysis and machine intelligence and then i took it from there like, uh, one job after the other, going into my career, uh, getting to lead from the middle, then started leading a team, moving into a manager role, then a like director role. so, prior to joining shopify, i was the head of data for a company called intelligent mechatronics systems. uh, i was- i'm also, up to now- an adjunct assistant professor at the university of waterloo. um, and yeah, i've been with shopify now for- wow, three years. time flies, but yeah, it's been a fun ride so far. it's uh, it's exciting and every day there's something new to do and to learn. i think a lot of people in data science often have uh sort of deep academic backgrounds. um, and and and. now you work in industry and i think a lot of people sort of think about that distinction and is there anything you can share? partly, you know, based on your you've spent time in academia, you've spent time in industry- thinking about that distinction. i think the key thing that academia like provided me is learning how to learn and, um, i think this is itself is a skill set. um, you don't have to be an academic to have it like. there are people who acquire it over years doing other stuff, but i think academia definitely facilitates like enhancement of this skill, because this is exactly what you do for so many years: you're learning to learn, you're learning to learn from others knowledge and then apply, extrapolate and traverse this knowledge, um, comfortably. so i think this is a key that academia provides is like the ability to learning, to learn, and definitely this is something that becomes a superpower in uh industry: anybody who is able to grow in an industry role because of how fast tiknology changes, because how fast uh things evolve and how fast we need to shift like meaningful products for our customers and our merchants, being able to learn and pivot and change. having this constant learner's mindset is essential. it's not limited to being an academic, but, uh, this is one of the key things that you pick up doing your grad studies, so, uh, that it helps, but it's not solely limited to you. no, that makes uh make perfect sense. now, one thing you didn't mention but that i also did take a note of from your background is your interest in art, and i understand you are, uh, in addition to to the, the work you do, you're an artist. you've in fact sold some paintings. um, it struck me as something fairly unique for someone with your, you know, you know set of accomplishments in the field of data, and then, you know, data scientists and artists are not often, uh, a congruent set. um, it's something that's a bit more sort of abstract and creative versus analytikal and data oriented. maybe you could share a little bit about that, and and how maybe that side of your uh personality and brain uh helps. okay. well, uh, that's true. so i, i'm aspiring artist. i won't call myself an artist. i, i i like to paint. it's actually a very therapeutik for me and it's a way that i use the different size of my brain. um, i, i do acrylic on canvas and i had a few exhibits and actually few of my pieces sold, and i remember that every time somebody thought, like something like this happens, i would be always like, are you sure? like, are you sure? uh, because the data scientist in me like looks at being very pedantik about all the different things that, um, she went wrong, that shouldn't go wrong, and in my head it's: um, i think what gives me an air cover is the fact that i do abstract painting, so like it's very forgiving, like it might like you might be painting a dog and comes out like an airplane and people like, oh, it's brilliant. i was like, yeah, uh, jokes aside, uh, i actually, i actually think, uh, data in itself, the way you think about it, the way you actually build the data narrative and the data storytelling, um, as well as the way you build visualization in itself is a form of an art, like it's not about knowing what you need to know or slicing the data or building the engineering pipelines. is also about taking this whole data point and, like seeing it in your head in a way that builds this whole narrative and story that you can communicate your stakeholders. so, um, i actually see the two sides of my, my world, like, like mapping and cross-pollinating a lot. uh, also the, the understanding of, um, what catches the eye and how to make sure, like, if you have an important data point, how do you visualize it? how do you make sure that, uh, you're able to communicate it? and that applies whether you're building an analytikal product or even if you're building like commercial facing product. either way, you want to make sure that you're providing the most actionable and meaningful insights to your audience, so it does. it does map a bit, not not too far apart. yeah, i know it's. ultimately, you know, much of what anyone does is about communicating, whether you're communicating uh through art or communicating uh through data and, as you point out, visualization is really the intersection between the two. but i also imagine you toked before about this idea of the learning mindset and really being something that you drew from your academic background, something that that you find valuable in your work- shopify around data science, so maybe we could dig into that a little bit more. um, you know, toking about about the always learning mindset, how did you- which, by the way, i think is a really both useful and and unique perspective to bring? are there some? how did you come to think of find it to be so important in your work? are there some specific experiences you had that drove this home for you? yeah, that's a great question. so, like when i think about it, um, tiknology keeps on evolving. every other day there is a new library, new tool, new algorithm, new model, new database coming out. so, with the right learning mindset, with the, with the ability to self-reflect, being humble about the things that we don't know and seeking the knowledge helps. uh helps, sets us for success, regardless what area we're working on and what's the field. with a very dynamic field like data science, this becomes a huge multiplier because, like, if you think just about it, like the amount of uh- whether it's research, the amount of new tools, the amount of new libraries, the amount of uh best practikes that keeps on coming up, it's huge. like this is a very prime field, uh, lush and green. so there's a lot there that is happening, which is amazing for all data scientists. but with this right mindset, we can actually evolve and grow. also, one key thing i learned very early and, as i mentioned, i started with as a developer: loving your code or loving your solutions or your product.