shopify engineering
Published on: January 13 2023 by pipiads
Table of Contents About shopify engineering
- A Day in the Life of a Shopify Software Engineering Intern (2022)
- Shopify's Architecture to handle 80K RPS Celebrity Sales • Simon Eskildsen • GOTO 2017
- I Got an Offer From Shopify! - My Developer Interview Experience
- How Shopify’s engineering improved writes by 50% with ULID
- ShipIt! Presents: Deleting the Undeletable | Shopify Engineering
- Day in the Life of a Remote Software Engineer Intern At Shopify
A Day in the Life of a Shopify Software Engineering Intern (2022)
hey guys, welcome back to the channel. in this video i'm going to take you through a work day in my life. as a remote shopify back-end developer intern. my day starts at 6: 30 am, when my alarm first goes off and my philips, hue light bulbs blaze on. i spend 30 minutes laying in bed watching productive study with me and paint the life videos so i could figure out how to make my day not boring. as at 7 am, i finally get up and don my gym shark year, as today is just another day of pretending to be ali abdol. that's really good. i head over to the bathroom where i brush my teeth, put in my contacts and wash my face. next i walk to the kitchen to make myself my morning bottle of athletik greens. i never really get used to the pungent greens taste, even though i've been drinking it for several months now. i guess tim ferriss wasn't being totally honest when he said it's delicious athletik greens. i also make and drink my daily element electrolyte drink, which is quite salty but much better. yes, my life is controlled by podcast advertisers and i'm not even sponsored by them. what a shame. i then take the elevator down to the basement gym. i've been working with an online running coach for a week now, which means that i'm supposed to run every tuesday, thursday and friday. we use this app called vo2 and she assigns me a custom routine with a warm up, sets and reps and a cool down. it's actually so helpful to have someone checking in on me, because it definitely forces me to be accountable and actually get off my ass. also, why the hell did no one tell me my back is all bent when jogging? this is what happens when you never played any sports as a kid and was only in the math club, nerd. that is why you're not on the team anyway. i caught my workout slightly short as it was getting close to 8 am and i need to get ready for my morning. stand up, i navigate back to my apartment and jump in the shower to wash away the sweat of a half-assed run and clear my head. before diving into the work day, i put on my shopify themed sweatsuit, which the company requires i wear every single day. i'm such a minimalist. next, i make my daily cup of coffee using this fancy temperature-controlled kettle and a chemex. in an attempt to make a matte divella proud, i bring my coffee over to my desk and set up for my first meeting of the day, so the time is around 8: 40 am. right now i have a few minutes before my first meeting, which is the daily stand-up at 8: 45.. i've got my coffee right here in my ember mug- incredible, i only have a few minutes. so i just wanted to explain what the daily stand-up is. what is this meeting? that happens every single day before i begin my work. so the daily stand-up is just a 15-minute morning meeting every day of the week where every member of my team basically stands up and toks about what they're going to do that day. it's kind of like a daily check-in because, especially at a remote company, everyone kind of does their own thing and is very self-motivated and independent. that's why the daily standup is a great way to check in on everyone and make sure they're making progress and moving in the right direction. but anyway, my meeting's about to begin so maybe i'll let you guys see how it actually goes. so i'm going to be still working my way through onboarding stuff today. whenever i have like open time i'll be doing that stuff. i'm also going to be looking for a ruby on rails course, potentially, just like some materials, just to like work on slowly through like the next month or so and just to get like a better understanding of rails. but then i'm also pairing with sandrine later today and then, um, there's some typing and sorbet stuff i'm doing on my own as well, and then i'll circle with her and just work on that together in the afternoon. so that's my day, okay. so i've just left the daily stand up morning meetings over and i just wanted to take this moment to explain what shopify actually is and what do i do at shopify. well, if you guys haven't heard of shopify, shopify is this e-commerce software company. you can use shopify and they will make your website, make your online store, deal with all of the orders, all the payments. so all you have to think about is actually creating the product. it's kind of like a competitor or alternative to amazon. mainly focus on independent merchants. they're huge companies, like gym shark, hot ones. even ali abdul's essentially stationary brand is using shopify. but you also have small businesses, small mom and pop shops that sell homemade soaps, food- basically anything can be sold on shopify. now my role in shopify is in sfn, which is shopify fulfillment network. this is kind of like a startup inside shopify. now what sfn does is: sfn gives merchants the ability to ship their products to shopify and then shopify will fulfill all those orders. basically, shopify will ship all of the products to the customers, so you don't even have to deal with shipping at that point. sfn makes it. so the only thing merchants have to worry about is just creating the product, and half of them outsource it to china anyway. so it's really this all-in-one complete process that sfn and shopify takes care of. sfn has this huge infrastructure all across north america with a bunch of warehouses and a bunch of other stuff too, to make the shipping process as easy as possible for merchants. now sfn inside shopify is still pretty huge. the team specifically i'm on is sfn inbounding, so we deal with all the inbounding transfers. basically we're the ones who make the software that merchants use to ship their products to sfn warehouses and the sfn warehouses used to accept those shipments and categorize them. if you guys are curious about shopify, sfn, sfn inbounding, i'll link some more resources down below. you can go to those websites and watch those videos, but anyway, it's like 9, 10, 9, 15. right now i should probably get my date going. like every good employee of a big tik company, i start my work day off by disappointing cal newport. i check email, then i check slack and then i check email again. this is the fourth week of my internship, so i'm still going through onboarding and learning the ropes. before working for shopify, i never realized how long onboarding actually is for full-time employees. it's literally four to five months straight, and then you're still starting off. i can't actually tell you what i'm working on because tik companies are very hush-hush about their new projects and my manager, chris, would murder me. but i'm basically going through the code base and watching tutorials and what all the major files are for. shopify is primarily written in the language ruby on rails, so a lot of onboarding is sifting through esoteric code and trying to figure out what the hell is going on. i'm also working on my first issue, which is straight up, changing a few variable names and trying to push it to production. i'm mainly attempting to learn the system for writing new code and merging it into shopify, something that isn't trivial. there's a lot of hoops that you have to jump through to actually make any changes, which is probably for the better. they definitely don't want some random ass intern crashing the entire application. i'm listening to some ed sheeran bangers while working. i know this tiknically makes me productive but honestly, who gives a? it makes the workday much more fun. at around 11: 30 am i take my lunch break. i walk over to the fridge and take out my lunch- for today i'm having ghost pepper chicken with sun-dried tomato mashed potatoes. i usually have this healthy meal delivery service called territory foods. they basically send me a box of 9 to 12 pre-made meals for the week. then all i have to do is shove them into the microwave. i've only selected options with at least 40 grams of protein, and also all of the meals taste really good and are nutritionally diverse, meaning they have a protein, vegetable and carb, which is great. this is so much faster than trying to cook healthy meals for myself every day. while heating up lunch, i'm listening to the audi.
Shopify's Architecture to handle 80K RPS Celebrity Sales • Simon Eskildsen • GOTO 2017
[Music]. thank you, Kevin. my name is Simon. as I said, I work as an infrastructure lead, a company called Shopify, based out of Canada. Shopify, just as a way of introduction, is a company that powers a fairly large amount of online commerce, actually both here in Scandinavia but also in North America, so if you bought something that's not been on Amazon, there's a good probability that you may have been through a Shopify store and use some of the infrastructure that we're going to tok about today. the tok today is going to be a fairly high level and I'm gonna be toking a lo of the architecture that we built at Shopify to support these very, very large sales that we see from some of the largest or biggest American celebrities and some of the largest brands, both in Europe and also in North America. so we're gonna be toking about over especially the architecture that we've evolved over the past five years, and we won't go super deep into any partikular point, but rather give a comprehensive overview of all the different components, how they fit together and hopefully inspire something that I think can be leveraged in a lot of other companies, especially SAS companies that can have a similar architecture to ours: clickers having trouble. so Shopify is a company that helps people sell both online and in many other places, and about five years ago we faced a bit of a fork in the road. we were starting to see these customers that were sort of online. first, when retailers went online about 7, 8, 10 years ago, often they'd had a physical presence and then moved online. but we were starting to see a lot more customers that started with an online presence and they also started to require new ways of selling. with social media becoming very, very big over the past, the past decade and especially five years ago, smartphones were out in everyone's hands and Instagram, Facebook, Twitter and so on had become quite large and dissin, able new ways of selling online that people hadn't quite been able to do in the past. what we call this a flash sale- and, yeah, idea of a flash sale is that you announce a product- sometimes minutes in advance, sometimes days in advance, sometimes hours in advance- that you're going to release some kind of exclusive product at this day and time and, as you can imagine, that can drive an astonishing amount of traffic to one site in from just one minute to the other. it's one minute before noon and everything is good, but at noon you have hundreds of thousands of people coming in trying to buy some of the same products at the same time, placing an enormous strain on the system. you can imagine this from superball Kylie Jenner- if some of you know her famous from for being famous and she's out of the Kardashian family- selling lipstik and she can drive a very amount, enormous amount of traffic, such as good, and Kanye can do that as well. so about five years ago, is not these very large Super Bowls and things like that that we're starting to use his way of selling, but rather still customers that could drive a very enormous amount of traffic and we were faced with a fork in the road. do we become a company that tries to support these sales or do we just tell them to go and find another platform, a platform that actually didn't exist but it's a very hard thing to be profitable from, takes thousands of engineering hours to be able to scale to these sales when you have an architected for it from day one. we chose the path of trying to support these. we chose to see these sales as a way to identify bottlenecks, and it's a bit of a canary in a coal mine as to what it's going to break next. the flash sales of today tell us something about what the traffic is going to look like one or two years from now. this was what wrong direction. this was such a powerful decision at the time and so important to our CEO, who's very tiknical, that he wrote an internal essay on why we support flash sales. this is really important in our company philosophy that we want to, as in the face of adversity, become stronger and not weaker or just not use it as an opportunity to grow. when something bad happens, we want to come out stronger, and he wrote an essay about why flash sales were such a case. we have a very strong culture internally of trying to inject chaos into various parts of the organization and try to come out stronger as a result. if you drop a piece of glass, it breaks. if you drop a rock, it's indifferent, but what can you drop that becomes stronger as a result. you know, if you go down to the gym, you run on a treadmill, you come back a couple days later and you can run further and faster. how can you build software in that way? and flash sales was such a thing for our organization. it was a way for our infrastructure to become stronger, just a couple before we get into the meat of the tok and with this preface, of how and how important flash sales have been to the history of our application. here's some of the numbers to keep in the back of your head for the scale that we're running at. we support about half a million merchants, paying merchants all over the world. we processes almost six billion dollars. in the last quarter we run up to 80,000 requests per second to our Ruby on Rails processes. we do about 42 place deploys a day to this and we have over 2,000 employees- a large fraction of those engineers- deploying all the time. the mental model that I wanted to incorporate for this tok is that we have about three different tiers that going to tok about. we have the trafficked here, with all the traffic coming in and which is responsible for getting your requests from your home network all the way to our infrastructure serving back to page that you're requesting. we're gonna be toking about the application and the data. it's here that work together to serve those pages and we're gonna be toking a bit about some of these arrows: how do we failover between regions with zero downtime? how do we shard our application and how do we balance those shards as some shops grow bigger, so that Kanye and Kylie are not on the same same chart but rather spread out. and how do we get from the traffic layer into your application layer and failover regions? how does all of this stuff work together? you should have a pretty good idea for how that works for us by the end of this tok. we're going to be starting by toking about the traffic tear, as I mentioned. the traffic tear to me is the layer that is responsible for how to get the request from A to B, in this case from your home network to our network. we're going to be toking about how we do global routing, because this is something that we face a lot of debate with internally. we're going to be toking about a tiknology called open resti, which is absolutely incredible but quite underused. we're going to be toking about how we protect ourselves from bots doing sales, how we serve cache hits from the load balancers instead of the application tier orders of magnitudes faster, and how we throttle the checkouts doing very, very large sales. to understand how your request gets from the customer to our data centers, or regions, as we call them, we need to understand a little bit about how the internet works. this is a very high-level overview of how the internet works and how their routes propagate. if you are Facebook or Google and you have one domain and not a bunch of customers pointing their dns to your domain, then you can do a lot of really complex traffic engineering at the dns layer. but when you have about a million domains from half a million different customers pointing to your IP, you have very limited amount of control at your traffic level. you only have the IPS that your customer- that the customers are pointing us to you do- can't control anything at the DNS level. so we did something that hasn't had a lot of use in the industry. for those of you who know about networks, it's called BGP, TCP any caste, and I'll explain here how it works. how does traffic get from the customer to our regions in with this TCP BGP anycast? so what happens is:
More:Million Dollar Blogs Revealed! 👀 Spying On Success 👀
I Got an Offer From Shopify! - My Developer Interview Experience
Hello everybody And welcome to another, The YouTube video. So in today's video, I'm going to share with you my Shopify interview experience. Now, the reason I'm making this video is because Shopify is a really cool company, actually really enjoyed interviewing there And I would have loved to work there. The only reason I did not was because I took a job offer at Microsoft instead. Now, full disclosure and transparency here. at both of these positions were intern positions. So I was offered from Shopify a backend developer intern position And instead I decided to go to Microsoft as a software engineer intern. Now, I'm sure these are pretty well the same things. I mean, the titles are not that relevant, especially for an intern position. but anyways, I really enjoyed my time at Shopify. I was actually there in person. This was probably actually over a year ago that I did these interviews And we'll. I just wanted to share my experience in case any of you are considering working there- I and tok- about what I had to go through and how it was different from my interviews at Microsoft and other tik companies. So, with that said, let's get into the video after a quick word from our sponsor. Now, even though Shopify didn't give me your typical algorithmic style interview. I still need to prepare and use algo expert, the sponsor of this video, to do so. Algo expert is the best platform to use to prepare for your software engineering interviews And not only has over 110 coding interview practike questions, but also has something called systems Expert. systems expert teaches the fundamentals of system design with comprehensive video explanations, real code examples and practike system design questions. It covers all the important concepts you need to know to ease your system design interviews, like caching, proxies, load balancers, hashing and over 20 more important topics. After mastering these, you can move on to the systems design quiz to test your knowledge and then head over to the practike system design questions, where you can learn how to design large scale systems like Google drive, Netflix and many others. get started using aligo expert in system expert by hitting the link in the description and using the code tik with Tim for a discount on the platform. So I'm going to try to keep this video fairly structured. I'm going to tok about the following, So: first, the application process. Second, my remote interview. third, my in-person interview And finally, my offer. I will try to tok about some details about the offers, like the compensation and the benefits and all of that, but I don't actually have a hard copy of the offer or I would have attached it to the video and you guys could have looked at it for yourself. but anyways, let's tok about the first thing, which was the application process. So how did I apply to Shopify? How did I actually get these interviews? So, first of all, the resume I use to apply. I actually have a video where I went through and kind of reviewed it and toked about it. So I'll leave a link to that video in the description. I'll try to remember to add a card, someone leave a comment, because I almost guarantee you I'll forget to do that. Uh, but anyways, you can look at my resume from that video. So I'm not going to discuss that here. but the time that I applying for this job, this was November 2019.. I was in my second year of university, just wrapping up my first semester, And I was looking for an internship in the summer of 2020.. So the time you kind of apply for those would have been, you know, like November October, at least for these kind of big tik companies. So I saw that Shopify was close to me- I was actually located in Ottawa at the current time and Shopify as home office or headquarters or Homebase or whatever you call it- is located in Ottawa. They're actually a Canadian company, one of the largest Canadian companies and actually one of the fastest growing ones as well. So I went on the Shopify website, I saw it. there was a bunch of different positions and one of the positions that stood out to me it was a backend developer intern. and they also had front end developer intern. They had like software architect, They had like a bunch of other ones as well, all kinds of web dev related stuff. And I wasn't that good at web development, I just really was good at Python, to be honest, and like a little bit of front end stuff. Uh, so I applied for backend developer. anyways, it took probably three months, two and a half months, three months- for them to actually get back to me. So, funny story, I applied to, sent my resume in. I had to fill out some information about myself as well. There was some like supplementary thing, nothing crazy, but you had to type in like some paragraphs and answer some questions and stuff, but no coding, assessment or exam or anything like that. Anyways, I was actually in Seattle on January 30th when I had my Microsoft interviews and, funny story about two hours after I finished my Microsoft interviews- Again, this was before the pandemic, when you could travel, And all of that- I actually received an email from Shopify saying that I had an interview scheduled four days later. So that was fun. I came back from Seattle the next day And then three days later I had my interview at Shopify. So the first interview was a remote interview and this is what they called a life story interview. So they said: you don't need to prepare for this at all, There's going to be no coding or tiknical questions, We're just going to ask you some behavioral questions. We want to get to know you and just see if you'd be a good cultural fit. And I definitely was not alive. Uh, I had my first interview. I don't know what time it was at, but like on February 4th or something like that, Uh, and I just got asked a bunch of behavioral questions. The first ones I got asked was kinda, you know, like, what do you like to do in your free time? What are your hobbies? What do you enjoy? You know, why'd you pick this school, whatever? how'd you get into coding? And then, after to kind of the first maybe 20 minutes of just some warming up, nice questions, friendly, get to know you kind of stuff. They started asking me your kind of standard questions Like why do you want this job, You know how much experience you have in this programming language, whatever it may be. And I kind of took this as like them trying to filter people out So they didn't bring too many people to the onsite if they weren't kind of worthy of being there. And unfortunately I can't remember much more about that interview, but I remember it lasted about an hour. It was really casual, Um, honestly had a fun time toking with the person. And then, I guess maybe a day later, I received an email that said: Hey, we'd love to bring you to onsite interviews. You know your scheduled date is this, whatever. So I actually walked to my onsite interview. Uh, it wasn't too cold, Fortunately, when I went there. and on my on-site interview I met with two developers. So I went to the Shopify office, which is actually really cool. It's in downtown Ottawa. it's in this really big, massive building And they have like a bunch of different floors on there. I walked in the office. there's like plants everywhere, There's like computers, They have like a lounge, They had like catering, like a bunch of. really you know the stuff. you would expect that a big tik company, essentially. And then I had my interview with the two devs. So what they said was I should come prepared with a coding project that I had worked on. They said I did not have to do anything else, Just bring some coding project you had worked on on your laptop. The devs are gonna sit with you. You should know the project in and out and just walk them through the project tok about. you know, like architectural decisions, why you decide to do specific things, how it works. all of that So fun story. If you want to actually see th.
How Shopify’s engineering improved writes by 50% with ULID
few months ago, shopify's engineering have posted this interesting blog- and Brilliant in fact, uh toking about top 10 tips for building resilient payment system and, uh, I'll reference the blog for you guys and the description and the show notes. but, and each of these 10 tips is absolutely well crafted, just for their use cases. and in this partikular uh episode, I'd like to focus on just one of these tips because, Frankly Speaking, each one of these steps is its own content, its own artikle, oh, and there's not much details, but, boy, you can extract so much if you understand the fundamentals. so, for this partikular show, I'll focus on the database engineering aspects, specifically tip number six, which is to use item portency ease right, and how they use a unique key that optimizes their inserts and select and queries for identified in this case, how about we jump into it all right? so, uh, I toked about what I don't potency is in another video. basically, in a nutshell, I'd important request or an important backend is when you send a request and this request is repeatable such that it doesn't change the state on the back end, right. an example is a get request. a get request, by definition, must be I'd important because if I do a read on a specific endpoint. if I send that read twice, it doesn't matter. nothing changes on the back end. nothing should change, right, uh? post, on the other hand, by definition, is always not item potent, right, unless you make it to be. if you post, if you insert a row, repeating that insert is basically Lee will change the state. you don't want that, it's not a desired Behavior, okay. so if your end point says, okay, slash, post and that creates a new entry, for example, let me fix my. and, of course, Shopify being a payment system, you want the ability to retry a payment without actually causing a double spend. you don't want to pay for something twice. that's never fine, right? and the opposite side for merchants you don't want. uh, if that happens, you don't want an accidental twice of a refund, right? so that's why item point is. here is a critikal concept. I'm going to reference the video for you guys if you're interested to learn more about that. item potency is a very critikal concept to him. you have to build it yourself, you have to configure your backend to be item ported, doesn't? it's not for free, right? that's why something called an upsert is a thing right, where you insert. but if this exists, it becomes an update. an upsert is an item important concept. okay, so let's go ahead and read this blurb and the. in this partikular blurb they tok about, okay, the importance of our- I don't want to see blah, blah, blah, we know, right, but here's the important thing- that I am going to spend the most of the show about just toking about that partikle. okay, because it fascinated me- the Brilliance of Shopify engineering when it comes to database level tuning and data modeling, which is very underrated. all right, let's, let's go ahead and read this: an item potency key needs to be unique. well, that's, that's important, right? because in this partikular case, when they send a request, they add a key to uniquely identify payment requests. that's how you identify a payment or a request, right, if someone retried the same request with the same key, that you know, this is an actual retry. that's whether happening from the user or from a proxy, or from reverse proxy, from API Gateway, some, any middle layer that does that. it tried, it doesn't matter. we know that a retry has happened, right? sometimes the user go back and then forward and then hit refresh and then you get this message: oh, do you want to resend it again? you say yes and that sends the same tiknical request ID, unless you went all the way and generated a brand new request ID and physically wanted to pay again. that's a different story. but most of the uh- the item potency key requests are sent within within a few seconds, right. but what they do here is another one: Z key needs to be unique for the time. we want the request to be retryable and that's a very critikal use case for them. they don't want their cost to be retryable infinitely, right. if you send a request in 2018, payment request in 2018 is not gonna live until 2022. that doesn't make any sense. it should live within. they estimate a payment request to live within a 24 hour. if you never made a payment within a 24-hour- it it was a failed, for example, on the back end, for any reason- we can try to retry it within this amount. but if after that it says, hey, you know what, all bets are off, just do it again, right, we will email you to say, hey, we could not retry that. so typically, 24 hours or less- I think that's something you add as a, as a designer, an architect, we prefer using- and here's the interesting part- we prefer using an N, universally- that sounds like a typo- a universally unique, lexicographically sortable identifier, or this thing that's called UL IDs, right. so you- this is you- called ulid right for these item potency Keys, instead of a random version for uuid. so, if you don't know, Universal unique identifiers is- or sometimes in Microsoft we call them gweds, or globally unique identifier- is a certain number of bits forgot, I think 128 bits, probably mistaken, 128 bits, and these, one remembers, are guaranteed to be unique if you can generate them on the device and then you are 99 sure that's gonna be unique, which is powerful concept. why do you want to use those? you want to use those because you want the client to generate a unique ID, as opposed of a database or a backend to generate a sequential unique identifier. because, you see, sequential identifiers are very powerful because sequence is beautiful in databases. database is liked or like ordered things. it likes things that are ordered because they can put them the same page and you can query them and you can. they've been tucked in nicely to each other and problems. uh, generating a sequence is very expensive because you have to tok to the database to give you a unique sequence, right. so there is a center point to generating, like almost a center point of failure, where we ask someone to give us a unique ID, versus the client just generates it and we know it's Unique. so that's why uuids are very powerful. the problem with uid is: or they are random, or why does that? why is that a problem? let's continue reading and con and explain that a little bit more. ulids contain a 48 bit stamp timestamps followed by 80 bit. so I was right at 128, is that right? yeah, 128. if I can do math. so followed by 80 bit of random data. so UL IDs: Ellie ID has some sort of an order to them, so the first 48-bit has a time stamp and this time stamp will, uh, uh, will inject some sort of an order to these random uuids. what is the benefit of this? the time stamp allow UL IDs to be sorted, unlike random uuids which are not sorted, which works much better with a B3 data structure databases user for indexes. in one high throughput system as Shopify, we've seen a 50 percent decrease in insert statement duration by switching from a uuid version 4 to UU UL IDs for item partikipation. and that's all what they say. they don't tell you how, they don't tell you why, but I'm here to actually explain why this is the case and why it is faster. because everything, once you understand how databases work and how the fundamentals of first principle of databases, this is just like reading one plus one equal to. so let's explain that, okay. so you see, you, you their databases here. they don't spell it out, but it's my sequel, right, my sequel. okay, primary keys are called a clustered index, which means that if you pick a primary key that is, for example, an integer, a clustered primary key index is the table itself. so what does that mean? if your integer is the primary key, then the index structure. at the end there is the leaf Pages where basically the pointers of where this integer points to is the actual pages of data, right? so if you have a row- one, row two, Row three, row four, row five, row 607, 8, 9, 10.. these are tucked in nicely together in a single page and not only there are the values.
ShipIt! Presents: Deleting the Undeletable | Shopify Engineering
[Music]. thank you for joining us for ship it number 17, deleting the undeletable ship. it is our monthly virtual gathering where we bring together engineering, shock folk and to tok about the things they're working on and take questions from all of you that decide to join us today. my name is anita clark and i'm the senior managing editor at shopify and i run shopify's engineering blog. today we're toking about how shopify's privacy and data science teams came together to build a platform to safely manage personally identifiable information, aka pii. i'm welcoming two senior members of the privacy and engineering team to tok to us about this effort. we're taking questions at the end of this presentation, so please use the q a mode to in the chat to ask. welcome and thank you, beirus and jason, for being here. really excited to see your presentation. thanks, hello everyone. my name is peruse and i'm a privacy engineer at shopify, so today we're going to tok about a metamorphosis in our data analytiks platform, in partikular, how to make delivering the unrelatable possible. so i'm sure most of you are familiar with shopify, but if not, we are a global commerce company with one more than 1.7 million merchants of all sizes all around the globe. we've had more than 300 billion dollar worth of sales on our platform and, as you can imagine, there's a lot going on on this platform and, in fact, you collect a variety of analytikal events from our merchants and buyers to improve our own platform and also help our merchants with their decision making. so here is how our analytikal platform used to look like in the past. um, we had different type of events getting fired from all across our platform- things like sign up a new merchant, signs up point of sale transactions or checkouts. they would then go through our messaging pipeline- kafka in this instance- and they would get persistent into our data warehouse and that's where our data scientists could analyze and run jobs on these data sets. and you may be wondering why am i telling you about this? and it has to do. if you take a closer look inside our data red house, often time these events contain personally identifiable information, or pii, and we had obligations to act and find this personal information upon request in an efficient manner. and this turned out to be a very challenging task and at least a lot harder than what we initially thought. and that was because our data warehouse was designed to be immutable by default, so that meant once the data entered data warehouse, it was very difficult to change or delete it, and most of these events were collected in an ad-hoc manner with no guaranteed structure, and i meant, except the person who created these events, nobody know how to parse or read them, so they were essentially binary blocks. and next they had their privacy context missing, and that meant even if you could find personal information- so, for example, an ip address- we couldn't say to whom he actually belonged, like what data subject on that uh piece of pii and a lot of production jobs and dashboard were depending on this data set and any change could cause cascading failures. last but not least was scale, both in terms of the data- this was petabytes of information- and also hundreds of people were depending on using these datasets. and it wasn't just these challenges. there was a lot of maintainability costs and other issues the data team had with these data sets as well. so it kind of brought up a natural and golden opportunity for us, the privacy team and the data team, to work together and address these issues. in the rest of this tok, i'm going to tell you about our collaboration with data team and how we challenge these addresses challenges. our first step was context collection, and that was because we believed that missing context was the root cause of a lot of the problems that we were facing. for the pro for context collection, we introduced a new schematization system and you can see an example of a schema for an event, a sign up event, in this case on the right side. let's read through the schema and see what kind of information and context it collects. first, its structure. all events in all fields in a schema now have a concrete data type and a short description describing what are they. and next we have versioning information. so over time, once these events start to change and evolute, we capture that via our versioning system. and last and most importantly is privacy context. so every personal field in this schema is tagged as a such and you can see, for example, here we have an email, an ip address that are marked to be email nip. there's also some privacy handler tag that says what to do with this personal information. that will tok about them later. so we toked about tracking and finding personal information using these schemas, but we actually haven't still addressed how to delete the information in our data warehouse and for that we decided to actually not have any personal information in our data it has. so that sounds kind of counter intuitive. and, uh, if you don't have any personal data there, we might be losing uh analytikal value. so let's see how that makes sense. so to do that we use what gdpr calls pseudonymization. so pseudonymization is processing a personal data in a manner that no longer can be attributed to a data subject without using additional information. in partikular, we use two type of pseudonymization: obfuscation and tokenization. so obfuscation: in obfuscation, um identifying bits of data are masked and removed and it's an irreversible action. and that means, for example, for an ip address, half of it, it gets masked. or for user agent, all the identifying bits like uh fonts or resolution or clock skew, they all get removed. but we also enrich these, this type of personal information, with aggregate level data that most of data scientists they're after in the first place. so, for example, for ipv, geo located and for user agent, we extract high-level information like device or operating system, and these are actually most uh important information that most data scientists needed. so even though obfuscation solves a lot of problems and uh addresses a lot of use cases for our data scientists, there are immense cases that we still need access to the raw value of personal information and that's what tokenization is for. so in tokenization the raw value of pir is exchanged with a consistent, random token and this consistency is a key property. and that means if you see the same personal information, you're going to get the same token again and again, and these tokens are exchangeable for the actual pii value. so this was a lot to process. let's see an example of this: one organization in practike. so again on the left side, we have example of a sign up event and we have an ip address, an email address, and we build this tool called scrubber that applies to randomization on these events. so ip address got mask and geo located and email address got mass. so its local part is redacted is the main part because it's popular. gmailcom remains and it also got tokenized. so every other time we see this email again, it's going to get the same token again and again. and this is the consistency property that i toked about earlier. so how does this help? if you think about it now, we end up with three kind of data in our data warehouse. first is non-personal data, things like ids or stamps- and next is obfuscated data. i will tok about it earlier. so in these two groups of uh. there is nothing personal anymore to be deleted and there remains the third category, tokenized data. so tokenized data- if you figure out deletion for tokenized data, we're kind of good to go, although it's important to mention that uh data that has gone through through the normalization, as long as there remains a way, using additional information, to be re-identified, it is still considered pii. so we're not completely 100 removing the risk here, because there are always still some re-identification tests, but it's significantly reduced so to u.
Day in the Life of a Remote Software Engineer Intern At Shopify
[Music]. [Applause]. [Music]. that was a lot of effort for a single [Music] shot. okay, but we don't tok about the back of my computer. so welcome to a day in the life of a remote software engineering intern. my name is justin. i'm currently an intern at shopify this summer on the front-end side of things. now shopify is digital by default, which means that basically we don't really have an office anymore. it's going to be remote first. uh, even after the pandemic- kind of sad. i never got to go to the office when it was still there, but about a month in- and it's definitely been a really great experience so far. so one of the things that i really like about shopify is how open and trusting they are. i had access to basically everything and there's nothing like a intern project here per se. it's really like you're contributing as a full member of that team and it's been definitely a huge learning opportunity for me. you usually have like a mentor who's really there to help you pair a program and get you through stuff and get you onboarded well. there's also a manager who's there to oversee everything. so you definitely have a lot of help and what i found is everyone is very, very open to helping. they're just slack message away. it's currently nine o'clock right now, which means that the work day is about to begin. it's very flexible here. you can work anytime you really want, as long as that you're working the full eight hours a day as well. there's a lot of people from a ton of different time zones everywhere, but today's also no meeting wednesday, which means that it's all about, you know, putting your head down and being able to focus on what you're supposed to do. uh, although i do have a couple calls scheduled today, but in the afternoon looking forward to a virtual intern gathering as well as a irl uh intern dinner. that's happening- kind of an unofficial thing happening, so looking forward to both things today. it's time to get to work, so just finished lunch. it's a really nice day out today, so i'm gonna head outside and get some work done. so for today, you know, this morning worked on fixing up some old code, looking through writing tests and kind of getting that all up packaged nicely into my pr and then waiting on a couple new approvals for that to go through and ship. and for this afternoon, you know it's just self-assigned- a new tiket. uh, started asking some questions, trying to get more context as to what is actually needing to be built, and now it's time to build. so, uh, let's head on back inside and get to work. all right, we're back inside. time to get some work. [Music]. so we're on the way now to an interns kind of dinner that's happening here in ottawa. uh, so looking forward to meeting some people in real life outside. grab some food- let's see how it goes. we're out here hanging out with some shopify interns. grab some food together out in the park, found in the sun. [Music]. [Applause]. [Music]. so we must all our belongings that are not supposed to be on the floor in the trash. [Music]. it's not slo-mo, all right, so it's about 10 pm now. uh, so we're gonna be wrapping up the day. you know, showering, cleaning up, getting ready for bed, as well as uh been reading a couple of books lately. thankfully, shopify lets us expense, uh, some learning items, so got a couple books off. our internal book bar had a really great day today and definitely a lot going on all the time. but that does it for today. thank you so much for watching. until next time. [Music]- peace. [Music]. oh.