قالب وردپرس درنا توس
Home / Mac / Virtualizes macOS on Scale for iOS DevOps

Virtualizes macOS on Scale for iOS DevOps



MacStadium has recently hosted a successful panel discussion on Virtualization macOS on Scale for iOS DevOps with some of our best customers at VMworld in Las Vegas. Participants, including Capital One, Box, and Travis CI, highlighted how MacStadium's VMware Mac Sky Holder helps them meet their needs for Apple's evolving and continuous integration. The panel focused on their best practices to reduce building times and increase efficiency, and also affected which tools and methods they considered most useful for achieving these goals.

There were quite a few highlights in the discussion. For one, all flash SAN storage has made a big difference to our customers in order to reduce the time it takes to make the World Cup. After converting to Pure Storage, Capital One has time to go from the template they generate within the vSphere, to reduce it from 20 minutes to approx. 20 seconds. When working with corporate level CI and starting 4000 builds a day, it is important.

Capital One was not the only organization that realized dramatic improvements. Travis CI, which provides developers with software as a service solution for CI and continuous deployment testing, creates different images to test that represent what developers can use (like different versions of Swift and Xcode, etc.). Finally, Travis CI hits up to create 54k World Cup a day. Having the Mac infrastructure capable of supporting what is available to them as a service has become so important. Travis CI regards it as part of its very business model.

Moved from VMware 6.0 to 6.5 has also paid large dividends for MacStadium customers. Associated clones help World Cups see and act more like containers, strong speed processing and commissioning both. With 6.5 and with clean storage, the box went from 20 to 30 minutes to clone a World Cup down to about 1

0 seconds. And the changes and support VMware gives for Mac virtualization is growing only. More support for new VMware features coming soon!

There are several observations and insights available in the presentation, so it's definitely worth checking out if you have not seen it yet. You can watch video of MacStadium Virtualizing macOS on Scale for the iOS DevOps panel here:

Or you can find a transcript of the procedures available under:

Speakers

Greg McGraw, CEO of MacStadium

Ray Sennewald, Senior Software Engineer for Box

Alex Niderberg, Senior Manager and Lead Software Engineer at Capital One

Josh Kalderimis, Vice President of Product at Travis CI

Preston Lasebiken, Lead Systems Engineer for MacStadium

Greg :

Hi, my name is Greg McGraw, and I'm CEO of MacStadium. I'm talking a little about MacStadium in a moment, but this afternoon we're going to talk about virtualization of macOS in scale for iOS DevOps. It's a pretty unique area for us because there are not many companies that host Macs. I have a good panel to join today, both from enterprise-class companies, service providers and MacStadium's resident VMware expert at the end. With me here is Ray Sennewald, Senior Software Engineer for Box. Next to him is Alex Niderberg, senior manager and senior software engineer at Capital One, Josh Kalderimi's Vice President of Product at Travis CI, and Preston Lasebikan, senior system engineer for MacStadium. I would like to let each of these people tell you a little about themselves and about their environments they administer daily.

Ray :

Yes, for sure, I will get started. Then again, my name is Ray. I work at Box. I am a senior software engineer, but I specialize in building and dropping out for application engineers, especially MacOS and MacOS teams as well as the iOS team. So I help support its CI infrastructure, which includes MacStadium.

Alex:

I'm Alex. I am at Capital One. I'm working on a team that makes mobile engineers tools across the United States, the UK and Canada. We are ultimately looking for tools built on MacStadium and AWS to drive our mobile engineers to send features to customers.

Josh:

My name is Josh. I'm one of the founders of Travis CI. I am working to look up the product page of our development. At Travis, we offer developers a software as a service solution for continuous integration and continuous deployment. So, essentially, as developers, you will try to test the software in an environment similar to your development machine, while we need to run insecure code with insecure code with insecure code over Linux, Mac. Then we do this over MacStadium and our Google computer for Linux.

Preston :

I'm Preston. I am a senior system engineer at MacStadium. My main focus supports customers and troubleshooting issues that may occur with VMware running on Apple hardware.

Greg :

Excellent, thank you gentlemen. Just talk about MacStadium, for those of you in the room who have not heard about us or really know what we are doing. We are the only provider of enterprise-class hosting solutions as infrastructure as a service. Someone else does not. Google Cloud does not. Azure does not and Apple does not. So primarily we have taken the form factor of Mac mini, Mac Pro, added some innovation, and basically gave it a new life as data center for data processing. So, we basically built the resource pool for Mac.

Greg :

Where it's at stake is really "Why Macs?" I mean that I get this question a lot when I'm at a conference or at the bar. They say, "You're hosted by Macs?" I say "Yes, we have 20,000 of them" and mainly because we respect the fact that if you really want to develop rich iOS applications, you really need to use Xcode. Xcode is a part of macOS. MacOS only runs on genuine Apple hardware, and therefore MacStadium is available. Other cloud vendors, as said, do not really do this and what we find is also that it's difficult to manage these basically different Mac resources internally. So we try to make it easier.

Greg :

So why VMware? A few years ago, one of our customers said that initially only ran metal on MacStadium, "Hi, can I run VMware on this?" We were already a VMware partner, a global partner, so we said, "Sure." And through this integration and through these experiences, we've continued to build on how to improve the performance of these Mac resource pools in a business-class environment. I will let these gentlemen talk about each of these scenarios for their companies and how they use the best practices, the best tools, to reduce construction time, increase efficiency, and operating quality. In fact, VMware is really the only virtualization option for Mac today. There are a few other small startups that really get short when you really try to build something in goal.

Greg :

So now I would like to ask some of the questions to the panel and get each of you to give us your story. Please say a few words about the CI infrastructure you have run on Mac in your environment.

Ray :

Yes, for sure. So in Box, we have MacStadium as our Mac infrastructure provider. We use Jenkins as our CI platform, and we use the jenkins vSphere plug-in, which means we can do some cool things, especially giving us one-time use of our World Cup. So what we do is that we actually use Terraform to expand all our world cups, so infrastructure is code. With some scripting we add these as nodes to Jenkins, and when we use this plug-in, we can allow the World Cups to return to a snapshot after each building. So it gives us container-like functionality, but for macOS, and that's something we wanted to wear on the box. We wanted to have a clean environment for each building, for all developers, otherwise you will only run into problems.

Alex :

At Capital One we use a mix of enterprise, Jenkins and hardware that we run on MacStadium. We use Terraform Plus Ansible to deliver VMs that currently live for about a week. This infrastructure ends up giving our mobile development team. They are authorized to build CI patterns that run through Jenkins, GitHub, AWS and MacStadium. These patterns are performed on build nodes / slaves that are linked to Jenkins. My team provides a standardized execution environment where they can run both their iOS and Android builds. We usually drive 3,000 or 4,000 builds a day.

Josh :

So in Travis we use MacStadium as well as you can guess. We run 84 Mac Pros across two different vSphere clusters for HA to fail, and over there we do 10 different pictures because CI, when you test open source, tests essentially against what developers can use. So you want to test against Xcode versions and different Swift versions. So we offer different versions of these. So over a day we work if the 54,000 World Cup starts. So, we start a new World Cup, throw it away afterwards, run it for about six minutes on average, throw it away and some of our workload is for open source. So we can help open source communities to test their software better for free because nobody really wants to pay for a Mac to sit in a basement when you're just working with any open source. So we provide a lot of infrastructure for the open source community, while we have a commercial offer.

Greg :

Clearly, many of our customers tell that the real value they really drive out of the Mac infrastructure is to be able to virtualize it. Tell me what was stated in the decision-making process when you only drove metal towards transition to a virtualized environment.

Ray :

So I can start. This was actually a decision that was made before I came to the box, but I was very happy to know that was the case when I started there. I've been there for about a year. On my former organization we managed a set of Mac minis – and I say "we", but it was not really clear ownership of it. Developers would not maintain it, but they needed it for testing. My team, as the CI Infrastructure team, wanted to maintain the software there, but we did not want to maintain it. The host was online and it would not really maintain it because it was drastically different than anything else in their server room. So that was a really big pain for us. So I'm glad what happened in the Box decision that they came to this decision because it makes my life a lot easier, but also developers too.

Alex :

At Capital One, we realized that we invested a lot in being able to deploy and keep backend API infrastructure running in different shield providers. We also began to invest in a growing number of talented mobile engineers and realized that we had undervested in stable tools for our growing number of mobile engineers. Then in the end, we realized that it was not sustainable to stick to the only metal handling methods we had used to achieve the scope and stability we required. We realized that virtualization really enabled us to provide a consistent environment where developers could get their test results and make sure all mobile features were stable before shipping to our customers.

Josh :

For us, only metal has never been an option because with CI, when you drive someone else's code, you drive … you can run a customer from time to time, the next building is the customer two. You can not have remaining items. You can not have any file system changed or log files left. So it's about having clean rooms and boxing environments. So the World Cup is very important for how we do this in a safe way. Even if you take away all of this container discussion that's happening on the Linux site, that's not what CI really aims at either because we need … security is very important for trusting a multi-home environment.

Greg :

Excellent. Tell us a little about some of the benefits or how your clouds and your distributions are configured today using 6.5 or cloned clones or ephemeral builds, just the various ways your environment exists.

Ray :

Yes, so I talked a bit, but at all we only use regular clones with VMware, but in vSphere 6.5. It has been much faster. I think this is also due to storage improvement, which I think we will come in later in the conversation. But we use Terraform to determine these, and we link them to Jenkins as I had already mentioned. So there are quite a lot of meat and potatoes of how we do it.

Alex :

We also operate on VMware 6.5. We have a process where we take an existing World Cup, we add new versions of Xcode and other tools that developers need. Much of our process is related to the use of landlines. So we give a World Cup that allows developers to choose the version of Xcode that they need for their race, and then be able to perform it in this standardized environment.

Josh :

So we also use 6.5. We do not use still images. We use the new SAN environments, so in the many operations we've reviewed with MacStadium, as we've increased our load and expanded, we've focused on how to get startup times. How do we make sure we have fewer spikes, smaller peaks. We use Terraform and Packer for much of our World Cup building, but we are starting World Cup's fresh. So we take a bit of a penalty for this because of the different pictures that we also start at the same time. So because we have … we start not just one photo. We are talking about going through a suspended resume state or snapshot. We start everything from fresh and we take about 60 seconds of punishment. I mean that I know there is some discussion that comes up in a later question about some of the improvements that come with us, but work closely with MacStadium. We've also gone through some SAN improvements, which reduced this from three minutes to 60 [seconds] and gave us great speed improvements over time.

Greg :

Preston, we have heard from many of our customers. It's about speed, performance and building times, and those kinds of things. What are some of the things that MacStadium puts into place when it comes to SAN, when it comes to other network topology to improve or improve performance?

Preston :

So originally we used certain SAN arrays like NetApp for example.

Greg :

Spider-based.

Preston :

Yes, spider-based in the first place, and they are a bit over and of course the review is that you can get out of something that much, much lower. It was done at the time, but there is much newer technology. So we finally came around to replace all in all with all flash storage, as I'm sure some of these guys can tell you that there was a huge increase in performance by switching to flash storage. Throughput, building times cut down with minutes, I think massive factors of time completely changed. We've gone through several network upgrades, one to include relocation of older Cisco devices that go into a new standard campus style topology and move towards VX LAN. So it's mainly your standard Cisco 9K series using VX LAN. The review has been completely changed. Instead of having the entire backbone east to west that has the ability to say 10, 20 gigs, internally right now, we drive on 160 gig's throughput east-west. Just the simple topology.

Greg :

So I know Preston you've looked at 6.7 and basically roll out just now. Are some of the performance improvements that the network gives clear in 6.7?

Preston :

So with 6.7 I've seen more tweaks that can help developers run on MacStadium to make these buildings much faster. There have been changes in things like the cloning process, originally from VMware's VDI. It's the API that most of these developers actually hit. It uses exactly the same process. So there are changes and tweaks to the ability to segment how a clone works.

Preston :

For example, with 6.5, it used to be called VMFork, and now it's actually full title immediate cloning. So the change was I have a base picture and I want to make clones of it from the parents, and now I'm driving the child's picture. In 6.5, they have a very tight relationship, if you do something to make that parent, you will initially break the connection between some of the clowns. In 6.7, the new performance they've given you, now you can have things like HA, DRS, and you can make changes to live. You no longer need to freeze that clone because what you really have to do is build your original image. You build an image, save it, turn it off, turn it off and that's it. You do not touch it. Now you can make immediate changes, and I think there is a marked difference. It certainly creates a lot of usability for people to be able to drive things now.

Greg :

Yes. You mentioned everything flash SAN, and I know that in MacStadium we all have standardized on Pure. They are actually exhibiting this week. It seems to have all the right security. It has all the right performance, has all the right aspects to optimize the environment. I would like to hear from each of the panelists telling us how it changes from the older NetApp storage, how has it affected your buildings and how it made the results.

Ray : [19659002] Yes, so that was great for us. So, as we said, we maintained … basically we have a couple … we have around four templates and we end up giving 100 VMs from it. On the old NetApp it would take us and this also ran an older version of vSphere, I think it was 6.0. It would take us 20 to 30 minutes to clone a World Cup, and with 6.5 and with Pure Storage, we could get it down to about 10 seconds on average. So we could reorder our entire fleet world cup in 10 minutes now, which is great for us, because now we can adapt changes much easier. We really do not want to maintain the state of these different World Cup. So if we wanted to upgrade Xcode for example, I'd just blow away all these World Cups and give them back and that's what Terraform is really good at doing. Snapshots are also very good for us as well.

Ray :

So it really allowed us to take advantage of this model we were working on and that's great for us. In the past, we would not really be able to accommodate Xcode upgrades unless we absolutely needed. So developers would be at my door and asked me, "Hi man, we can get Xcode 9.2?" I'm like, "It's going to take us a lot of work." So now it is much easier for us. So that's very nice.

Alex :

Yes, we have a similar process in which we try to stay closer. I think as you do with the Xcode releases, even to try to get some of the beta out there for our teams to use. We saw when we switched to all the flash arrays, it was time to actually go from the template we generate in the vSphere, until the commission went from 20 minutes to approx. 20 seconds. So it was a big improvement when it came to having these things up and able to update them much faster.

Josh :

So we went from NetApp where I think we went from 10 minutes to a minute's average and for us this was a huge improvement because when you work with 55,000 jobs per day, these spikes eat a lot. In addition, the pictures are so good that it was much more reliable with SAN caching as well, and how we also made our different slices over the job. It has just been .. I guess it's more important to us that everything is compared to buying.

Josh :

What I mean is, instead of running this internally, it has been more effective for us to work with a partner on how to best understand how to scale VMware and how You scalce it effectively across a fleet that continues to grow because as our customers grow, I will not spend time in our construction infrastructure organization. I want to work with a partner on how to best build that infrastructure. So by going from NetApp to SAN and even before that, we would have been able to do it ourselves without working with an effective partner.

Ray :

I will actually put on it a little. There was another thing that went into our decision-making process, as we did not want to have extra Mac hardware at hand. So one of the things we tried to figure out how we wanted to fix this problem, it's nice to have a partner who eventually has extra Macs and has 24×7 hands to go and fix things. So, it was something that allowed us to focus much more on delivering the software and tools, and being able to help on the landing page, and optimizing and managing some of our networking concerns as opposed to handling hardware level.

Greg :

One of the things, of course, since we've started in this hosting Mac type, is just metal and then adds virtualization later, it's customer feedback and clients, and partners like this, which has given us the opportunity to find out how we can make recommendations or adjust the system to really run the performance environment. I mean talking about a Mac Pro and IOPS in the same sentence is a bit of an interesting dichotomy on MacStadium. Let's talk about automation. Clearly with CI, the whole breath of automation builds, automates tools.

Greg :

And you mentioned a few other tools you have used and integrated into your surroundings. How would you characterize the tool environment, tool companies? I know it's a new one that throws it up, it seems like every month. But what are the values ​​they can deliver to you as you are looking for in your surroundings? Yes, I mean the key value that we really look forward to is the ability for us to focus only on having as much as possible checked into the source code so that we can identify where problems arise. When you do things manually or you do something that has not been checked into the source code, it's hard to find out where the problems are. There was something that was integrated for us, so that's where Terraform and Ansible can really help people to provide their infrastructure. And in other parts of the organization we use AWS. So we also do Windows, and we use AWS for that.

Ray :

We do similar things using Ansible and Marionett, actually to make the supply on that page. But this is something we had not before on the MacStadium page, so before we actually used to handle many of these World Cups manually. So bringing in something like Terraform made it much easier for us and made it easier, so if we were to make a configuration change, it is linked to the source code. So, if suddenly, developers are beginning to report problems, I may be okay, what changed on that day? Versus before, I will ask my team and nobody will remember it. We could probably revise the logs on the VMware side, but it does not seem like the right way to do this. I want to say that's where the power of it's going to be from our side. The second part would be able to return to still image as a piece of automation as we use the vSphere API for. So it's very integrated for us so we can ensure that we get clean buildings for every World Cup. For us, just knowing what's available out there, and as Greg said, there are many tools that change all the time. I'm just trying to see what others are using. I'm not really trying to reinvent the wheel on my organization. I feel that there are many organizations out there that have already solved this problem and then find out what's out there and reuse, that's usually the way I approach such problems.

Alex :

One thing we saw as an interesting challenge was how to follow the developer's requirements and how we could have something that could eventually happen to their needs. The way our layout works today is we have a bakery where we save the last golden picture. We start with the latest golden image and add a new version of Xcode, SwiftLint and other tools that our devs devs require. After completing the baking, we will release and save this new golden main image in the bakery. Then we run the new golden master through some automated processes we have to send it to vCenter and create a template. So, when in vCenter, we use Terraform and Ansible to distribute the new GM World Cup. Then we use Ansible for any configuration of the World Cup instance (network, host name changes, …). Then we have a Jenkin validation pipeline written in Python to ensure that the World Cup is healthy and then register it with Jenkins as stable and available to service mobile building, test and distribution requests.

We continue to work to improve the overall set of deployment testing for infrastructure. Many of our internal customers contribute to a warehouse that contains our validation test as we drive before the World Cup is considered stable and put to service. It's one thing we've found very powerful as we have developed our automation journey, in fact, the people who depend on the tools will help us to define what we do to assess a World Cup ready for service. Finally, we run a simple iOS app and Android app that performs UI testing, device testing, just to ensure it also works as we expect.

As we continue to mature our approach, we continue to add further validation in the rollout process. This allows us to ensure that infrastructure change will not cause problems for the mobile engineers we support. (section edited for clarity)

Josh :

You know what I love about this, I also take mental notes. So, there are many similarities that are strange enough, in fact, I brag to think about how the world was like before Terraform. It feels so long ago, and yet it was only one and a half or two years ago. But, with Terraform, and Packer Scripts, and Chief, and Puppet, Ansible, all of these. Everything is about documentation. How we handle our infrastructure is about using this tool as a method of documentation with team pull requests so we can see what changes are made and how they are used and keep logs and states. We use Terraform very much throughout the organization to build VMs over Linux, and for Mac, and also to expand vSphere changes. We have also built CLIs to interact with MacStadium and vSphere to check in and out Mac Pros when we may need to maintain something.

Josh :

We use a lot of Golang for CLIs too. It's about automation for us, the less we have to use, VNC or HDMI client or Flash client of the day, the more that everything can be automated, the better it is for us.

Preston :

Internally, MacStadium also uses automation mainly to use mixtures of power CLI, Python and Ansible to do testing before turning over these environments to customers. Traditionally, before I work on MacStadium, I would have a vCenter with lots of virtual machines in what I had to do. Now I work on MacStadium, it's a little different than my personal case, so now I have hundreds of thousands of vCenters that I have to manage or make sure, "Ok this customer has this and this customer does this and this customer does this . " Ok, we've turned around this environment.

Preston :

So with the help of automation, we can say, "Ok, there's a small setting missing from here, which causes the building to fail. You can not figure it out. This works. Every environment we surrender is exactly the same. "We try to do it to clean up things because a small thing missed your logs, may not get it, can not even realize that this has been turned off or simple small things, or vMotion is not enabled on one host. You try to create buildings and vCenter just spits out wrong and says, "Ok, I can not build anything. I can not do that." Therefore, automation has been quite large in that case.

Greg :

One of the things we've heard from our customers and our prospects is that they will not do it anymore. Much really true for the software as they drive as well. Like, Josh, I specifically asked you since Travis has a CI-host platform, also an on-premise version. I keep hearing Jenkins here and Jenkins does not have a host platform. What are some of the customers you specifically have that will choose either the host-based version or the on-prem version? And do they migrate between the two?

Josh :

Sure, we've got TravisCI.com and .org; there's a longer story about why there's two. And then we've got TravisCI Enterprise which is our on-prem solution which we've got large customers like IBM which will be running it for their own security requirements and needs which connect to the enterprise. One of our customers, Schibsted, a customer as well of MacStadium, uses the exact set up that we use for running Mac builds and what we generally see is this difference … It's a really funny topic for me of private clouds because back in the day we talked about “the cloud” as being AWS. And then “the cloud” moved to being VPCs, like being this private cloud and this is exactly how we see MacStadium. It's just a private cloud of Macs, it works just in the same way.

Josh:       

We've got APIs and what we want to do is provide the same end-face that we use for running our CI solution for our private customers, where the private cloud is really just a contract. And because we're adhering to this contract, we can pass over the same contract to our customers so they can plug it in to the vendor that we recommend. Schibsted is running Mac builds for their needs and I believe we're working on a shared case-study about how we use Mac builds with MacStadium and TravisCI. Does that passionately answer it? I get a little bit rambly at times, I'm sorry.

Greg:              

That's quite all right. Let's shift gears a little bit to security. I'm sure everyone in the room and the companies that you work in security is really paramount and getting more and more important from all the data protection laws, data privacy laws, GDPR. The same thing holds true in the dev environments, and where those dev environments would live and are deployed. Can you tell me a little about some of the things that have impacted, especially in your Mac environments, have impacted you on the security side?

Ray:           

I can't really think of specifics that have really impacted us aside from recently when we were doing a renewal with you guys, our security team had done a re-audit of MacStadium and all I know is that very thankful for your team working with our security team to make sure all those check boxes were checked, so many acronyms I can't remember them all. I heard you mention a couple there. It's not really my domain, all I know is that we were able to get through that and we have to vet every single cloud vendor that we use so it's the same thing that we do with AWS, with Asure, with Google Cloud. We use nearly every cloud provider at Box, different teams use them for different purposes. So for us, security is a huge concern. We're managing customer data, it's not something that we want to worry about and we actually can't use cloud providers unless they maintain all of the security requirements that we have. I don't remember what they are off the top of my head, to be completely honest, but that's what I can say.

Alex:          

Similar from our side. Being on the engineering side of a financial services company, security's a massive concern. One of the things before we were really able to start working with the MacStadium platform was a very thorough evaluation and there are continued checks to make sure that it's ultimately providing something that meets the security and compliance standards of the company. The other thing that's been nice about being in an isolated environment that we ultimately control/configure is having the opportunity to connect that back from our Mac environment to development environments. This allows iOS simulators running dev builds, to be able to talk to internal development APIs, and be able to validate tests beyond mocks.

Josh:  

There's kind of like two levels to this question for me. On the first one is, what virtual machines give us so that is the security that we require for running untrusted code and a multi-tenants situation because we don't know what one person's code is going to do and try to interfere with someone else's code. So security is paramount in the isolation, not just CPU and memory but networking. The other side is our code is all open source. And we do this in such a way so we can share how we do it, but we can also do it in such a way that if people take a bit away that we can do something, then we're all open source and GitHub people can contribute. It's partially security by means of community involvement, but security in the sense of how we utilize the platform and why we are utilizing a certain technology.

Greg:

Preston, on the infrastructure side of MacStadium, are there any of the Cisco enhancements, firewalls, those kind of things that are also complementing those higher security requirements today?

Preston:            

One major thing is there is no shared cloud at MacStadium. Unlike lots of other cloud providers, you're not getting a virtual machine for your virtual private server that sits on the same host or sharing anything. So right now everybody from the firewall on down is dedicated to an individual. It also plays in partially to the point of meeting the hidden API for their calls to automate everything is that you're not sharing the API, that vCenter belongs strictly to each individual customer. Sometimes that may create a difference in manageability because each person's doing something different so the problems may come out differently but it also isolates that. Ok, is that a problem that hits Box, but is not a problem that's going to hit TravisCI or Capital One? It's just you, it won't affect anybody else.

Greg:       

You bring up a good point, it is one of the questions I get often about – "Why can't I spin up an Apple-virtualized environment and just buy by the drink like I can at AWS or Azure?” The reason is because Apple does not allow that. Every deployment in MacStadium is dedicated whether it's a single mini or a hundred Mac Pros or a thousand minis, you've got root level access to that so you've got the full control over what goes in what goes out, full visibility to that. Whereas we as a supplier, do not have access to your data, do not have access to your guest OSs, that's the kind of abstraction that we've been (a) forced to build in, but [b] actually welcome building it as well. It really separates the security responsibility between us.

Greg:

Looking ahead, we talked a little bit about 6.7 and also too Apple is a hardware comp any. As such , they keep coming out with hardware that it may look like this, it may be this wide, it may be this deep. There's some rumors about a new Mac mini, Mac Pro, the iMac Pro just came out recently. Obviously Apple hasn't really focused on the enterprise client in quite a while. It's one of the things we try to do a stop-gap on. Are there any of those other technologies that Apple’s come out with that you're looking at to implement?

Ray:             

Really not too much. The advantage for us is that we're running on VMs. We've got these Mac Pros that are hosted by MacStadium. We really want to ensure that we have a stable environment that we can test on and if there's new hardware that's faster, that's something that we might look to but this is the advantage in my opinion of running on the VMs you don't have to worry about trying out these different pieces of hardware. So from our point of view, we're just standardizing on the Mac Pros and they work pretty well for us.

Alex

I will echo that. I think a lot of the value we get in running within VMware is we really have the opportunity to configure the environment a little differently on top of that same hardware. I think to your point, if there are some performance improvements that maybe a case to consider upgrading some of the underlying infrastructure, but largely we just want to make sure that we have enough capacity and enough speed to keep the developers happy.

Josh:     

I'm going to use my phrase of this is Echelon II levels again because for us on the CI side is there is the question of, “what's the speed the CPU?” Because that greatly affects how fast something runs like the newer CPUs just run faster. Then there's also, how many cores do you dedicate to a VM? How many gigs of RAM? The complexity in that question is … And it all depends on what language you're running on that VM because if you're running Ruby for example, then that's bound to one core. Node is bound to one core. But if you're using Swift and Xcode like you should be if you're using Macs, those are multi-core, but are they multi-core during the build process? Are they multi-core during all the other bits that they're doing?

Josh:        

So usually the more cores that you give to a VM doesn't necessarily make the build faster. Sometimes you'll kind of learn where the peak is. So in part, we want to optimize for developer happiness, things need to run fast, we need to make it cost-effective. The newer CPUs are going to give us the quickest win. The iMac Pros that are coming out … That's really interesting, we haven't experimented with those yet. We're starting to do lots of other experiments with giving more cores and making sure that people are using them effectively because then you get into compiler tricks and how you can actually speed that up. Then there's also caching, the land of CI just gets into this fun part of "there are actually a million little knobs to tweak," and then there's one big easy knob of like faster CPUs.

Greg:       

It's not by accident, but that's how we rack the iMac Pro right there, in a data center. Because again it is a faster processor, it's a little bit more super-charged than the Mac Pro, and for certain applications it really has some value. First of all, I think all of you have answered the fundamental question. You also had a DIY environment for Mac to basically support your dev teams. Then you moved off to outsourcing that where you could and sort of improve it as you go along. I'm sure many of you attendees in the room have lived that same life; I’d love to just get your final thoughts in terms of your overall approach and overall environment, and maybe some helpful hints and tidbits for the audience.

Ray:              

My recommendations are if you guys have questions, ask us once we have some time because I'm sure we have very similar questions or perhaps we don't have the answers but other people in the audience might have the answers. Everybody getting out there and just asking the questions that you've got about those environments would be helpful because I found the macOS environment to be one of the environments that is not the easiest to find answers on Stack Overflow for. But my final thoughts are going back to what Josh said – the tuning in VMware has been really, really helpful to us. We found that we have a lot of CI jobs that really don't need very many CPU cores.

Ray:  

They're running tests, and they're running functional tests, and they're not doing a build. So using VMware we can really leverage these Mac hosts to their full potential. We can run hundreds of VMs on 12 hosts because they really only need one core to run the operating system and to run some functional tests. Then we can parallelize our functional tests and we can get them done really fast and then we can also have some of the VMs that have eight cores so they can do our Xcode build as fast as possible. This is something that's pretty powerful with VMware, I feel like. It's something that we wouldn't really have the capability of doing if we were doing bare metal hardware and having an infrastructure provider like MacStadium is huge to us.

Ray:  

Developers are getting more and more used to having cloud providers. I can't tell you the number of times somebody's come up to me and they'd be like, "Hey, how come we can't just use AWS and just deploy a Mac over there?" I'm like, "Oh man, if I could I would." I'm being completely honest, I want the easiest tool for the job. So having something that can resemble that as close to possible is something that MacStadium can provide to us with utilizing VMware. VMware has some nice automation such as DRS which allows you to kind of treat it as a cloud as close as possible. I really don't want to have to worry about tuning it and doing things like that. I want that to get figured out by itself and DRS has some of those capabilities. If you've seen the other talks, you've probably seen more information about that.

Alex:

Definitely look at your options. I think there are things like Travis out there that you may be able to use and may meet your needs. If you find yourself requiring more secure environment-

Josh:          

Then continue to use Travis.

Alex:       

I'd say invest in the automation and make sure you're able to provide a stable service. A lot of the back and forth we worked on with the development teams was getting them to understand the value of consistency and then helping draw the bounds where they understood when the test fails and the infrastructure's healthy, that means the test is working. That does not mean to call us and panic. "The system's broken," is what we would hear. It took a while to gain the confidence of those folks that the tools were working and very healthy. I think that's something where you want to have tests that you can show everything's healthy from an infrastructure/platform perspective and push that back on them to figure out why their tests are failing or why somebody committed something that's ultimately breaking the pipeline.

Ray:

Completely true. I think a lot of us are very small teams. When you're that small of a team, that much back and forth just creates so many problems. I'm completely with you there. It took a while to build that trust, because our environment was so unstable for the longest time. A lot of times, I'd look at it and, "a test failed." "That is not my problem. This is CI doing its job." Once you can get it to be very stable, then you gain the developers' trust and they get what they need to get out of it.

Josh:        

We were having a bit of a debrief in the speakers room before coming here, and we were all talking about our artisanal, one-of-a-kind setups. We've all got little differences, but what we also realized is that there is a tremendous amount of similarities. We're all doing maybe something a little bit differently to achieve a different means, but it kind of reminded me of how we do it. We need to talk about it more so there's more that we as Travis want to talk about how we use VMware, how we use MacStadium, but how we essentially provision 55,000 VMs per day and why we do it in a certain way and how we build these VMs. Also, how we need to share more so we can actually bring improvements across this.

Josh:             

I think the biggest improvements that are coming to VMware that we've seen incrementally throughout the years have been the automation improvements that are built into the APIs. For us, we don't want to open up the HTML5 client and click around. We want this to be as automated as possible so we can hook this into CI jobs, we can hook this into CLIs that our developers can run when doing maintenance tasks. As you said, we've got an incredibly small team. Across the six million builds that we're doing per month, and those are builds not jobs, we've only got two people on our Mac team. While we've got eight people in our infrastructure team, that's two people on Mac servicing our VMware cluster, working with MacStadium, working with our developers and how to make that better.

Josh:           

A lot of this is us planning for how we can continue to grow, because Mac use is not shrinking. I'm betting there's 80% of people here have got an iPhone. You're all here because you're interested in Mac in some shape or form. Is that an iPhone or … yeah, there we go. At least we've got one person here with an iPhone. I think that's me kind of sidetracking and waffling a bit.

Greg:

Preston?

Preston:         

I'd say one thing, to bridge off that point, is that MacStadium wants to facilitate a place where not only you use our infrastructure but where everybody can collaborate. It goes to that same point. Everybody has the same chain of thing that they have to do and accomplish. If they're not stepping on each others' toes, they're internal. We're trying to get something done. Like anything else, the community is a huge portion of that. Sharing that back on how things work is what we want to be able to do, because that one small company may have figured out how to do something that multi-billion dollar companies might not have seen. That's the difference is everything running perfectly well.

Greg:         

We're not competing on best practices. Your companies may be competing on your core products, and that's fine. But if we can share those best practices, everybody wins. One final question, since you brought it up: AWS. As you know, you can't run macOS on a Windows machine. You can run Windows and Linux on a Mac machine. A couple of our customers who have spun up environments and are working them well through their CI process for Mac builds, they're contemplating or have already brought over some of their Android builds. I’m just curious – this is not a test – I'm just curious, have you considered that as well?

Ray:          

Yeah. My team is very, very small. We can barely support the desktop applications team and, technically, the mobile team, but it's really just the iOS side. The Android side today, they don't actually utilize AWS themselves. Like I was mentioning, Box is very different depending on what team you're on. They use core productivity engineering teams' infrastructure, which is actually in-house infrastructure, which is … I forget what it is to be completely honest. I don't really deal with the Android side of things. I do know that we do have some blade servers hosted by you guys. We run a DHCP server there. We do run some Windows machines that we need to have them as long running.

Ray:          

AWS, one of the good things you can do there is you can spin up a machine and then you can throw it away and you can pay for it for its usage. If you have something longer lived, that's not really the best purpose. It wouldn't be my recommendation to run that necessarily on AWS. That would be something that we're actually looking to bring over to the blade servers. Stuff that we do continuous testing on for our staging environment for functional testing, for example. We could run it on AWS, but it's a little bit more expensive.

Greg:          

Got it.

Alex:        

I can talk to that a little bit. We run Android emulation on HP blades that live in our MacStadium environment. That's really because on the Android side, you have the opportunity in AWS to run a lot of your build. That's where we'll run Android builds. We have an Android build pool that we maintain in AWS. We have an Android emulation pool that we maintain on a combination of the Mac and kind of more commoditized hardware. Then we have our iOS build pool that we'll maintain. That's really the three things that are giving developers on the iOS and Android side the opportunity to run unit tests and run UI tests, as well as all their other build needs.

Josh:        

Can you repeat the question just once more so I can make sure I don't sidetrack too much?

Greg:         

Just any Android, any non-Mac builds, moving over to the Mac infrastructure?

Josh:           

Right now, we use a mixture of AWS and Google for our Linux. We use MacStadium for our Mac and iOS. We're not looking at moving our hosted usage, our hosted Linux over. What we are interested in is how we can use VMware and Macs with on-prem clients. When they're using Travis CI Enterprise, they get Mac and Linux and Windows all within one configuration and setup. It's, I guess, less complicated licensing wise and also less complex than having to have, "Here's your Linux and here's your Mac." If we can give people the simplicity of having an entire CI setup on VMware, we see that as a very interesting product.

Greg

Excellent. Well, thank you very much for sharing your comments, insights, and experiences. We've got about 10 minutes left. I'd love to open up the floor for questions. I'm sure you've got some questions you'd like to ask the panel.

Speaker 1:

Actually, I have a question, maybe for the MacStadium guys. How do you guys deal with managing the infrastructure? Do you have out-of-bound stuff? I mean, in some of the pictures, it looked like you guys have external PCI chassis for maybe 10 gig cards, but I didn't see anything for management. How do you guys deal with that?

Preston:             

As far as managing ESXi and things like that?

Speaker 1:           

I'm talking about if you have a drive fail, you have a PSOD or something like that. Is there anything that you can do so you don't have to actually be in the physical colocation space?

Preston:

Depending on what kind of hardware's actually sitting there, you can do remote boots. It's basically when we give access to the customer. Even through the dashboard, they have access to the power controls. With Apple hardware, you can set things to do restart on power on. You can set that pre-installation.

Speaker 1:            

But with an HP, for example –

Greg:          

Knock on wood, our rate of failure on the Mac hardware is really, really low. In most of our data centers, we have full-time staff, especially with the Mac minis.

Speaker 1:          

With an HP, you can remote into an ILO? Do customers just not have visibility if for some reason it's not responding to ping or they can't spin up a VM or something like that? Then just reach out to you guys and then you guys reboot it or something like that?

Preston:

Essentially, that would work, because nobody's running a single host or two hosts. Everybody has multiple hosts. You have internal alerting that'll say, "This host is not responding." We're actually working on some extra deeper alerting, because the one problem we have, it being Apple hardware, even hearing from VMware, Apple hardware's a black box. They have new features in 6.5, like proactive HA, for instance. That can work on HP, that can work on Dell, that can work on Cisco UCS. That can't work on Apple, because nobody has any idea what's internally working on it. No clue what's going on there.

Preston:

Essentially, we rely on alerting and VMware saying, "This host is not responding," or the management network not responding, and then proactively basically removing that host from the cluster and putting a new one in its place. On the side, we can worry about, "What caused that one specifically to fail?" Everything just continues on. HA does it job, VMs and builds keep going. Transparent to what their customers may be or internally they're doing, they can keep running while that happens in the background and moving it in and out.

Greg:            

In some cases, we have a non-Mac or HP blade cluster for management of the Mac environment, both utilizing the same for basically stored images and so forth.

Speaker 1:          

Also, I heard you guys talk a little bit about the builds or the OSs that you make available to your customers or the end users. How do you generate those? Are there any specific tools used to build the OS and get Xcode on it, or does someone do it by hand? Is it automation?

Ray:            

For our desktop team, it's actually mostly done by hand. For the iOS team, we use Ansible for most of the provisioning. We'll take the base ISO file and then there's some things that we'll do manually. I don't really recall. I don't work on the iOS templates very much. For example, if it's going to take a 60-line Apple script, we might just do that one time manually and then save that as an image and then never mess with that again and then build off Ansible on top of that. That's what we do.

Speaker 1:           

The last question, do you guys use APFS at all? I know that it's not necessary with the Xcode. Have you dealt with it at all?

Preston:           

Yes. Got APFS working in 6.7. With 6.5, you had to create essentially a little hack and to ignore APFS to get [macOS] 10.13 running.

Speaker 1:

Basically, it told it to install in HFS Plus instead then?

Preston:          

Yeah. Now with 6.7, APFS is natively supported. Essentially, the only thing you're doing is via terminal before you boot up that virtual machine with the ISO attached is to configure and format the storage. It runs perfectly normal.

Speaker 1:

Do you have a lot of requests for it, or is it very uncommon?

Preston:

What'd you say?

Alex:

Do you have a lot of requests for APFS or is it pretty uncommon?

Preston:

Now, yes, with the fact that 10.14 is coming out very soon. Get lots of requests on, "Hey, when is 6.7 supported?"

Speaker 1

Do have any concerns with when you have a mass upgrade once whatever, what is it, Mojave comes out that there's going to be a lot of need to do that?

Preston:

When it started with 6.5 that they added update manager directly, that was a big help. No longer puts the burden on us to say, "You have to shut down everything. We've got to rebuild everything for a customer,” or “we've got to have running separate update managers." It's part of vCenter, you can do it. What we're doing now is empowering the customer to be able to have access to their vCenter. That's the way we're going towards that. You run your update and ta-da, there it is. You just basically do the updates for it.

Preston:

It's been a little tough sometimes with the multiple changes in ESXi coming out. VMware, if you're running on Dell or any other standard hardware, everything seems to work perfectly fine. Then you try it on Apple hardware and things are clearly broken. We ran into that with 6.5, the GA release. A lot of customers wanted to move to it and the internal M2 drives. Basically, they removed one of the drivers that was present in 5.5 and 6, and it was not present in 6.5. We were wondering, "What's happened?"

Speaker 1:

I think originally they didn't list the Mac Pro on the HCL and then they added it after…?

Preston

Then they added it, yes.

Greg:  

As a VMware partner, too, we very, very frequently provide that real-life comment back to VMware for their development folks. We send feeback back to Apple, too, 'cause at some point it'd be nice to get them in the same room, but that'll never happen.

Speaker 1:

Thanks for answering the questions.

Greg:

Thank you.

Preston:

Thank you.

Greg:

Yes, sir?

Speaker 2

I have a couple questions, too, along the same line as the infrastructure questions. How do you guys handle firmware updates to the Mac hardware that only get released in macOS upgrade cycles and also the networking side of it? How do you guys get 10G to the backend SAN with the limitations of the Mac hardware?

Greg:

Preston?

Preston:

Essentially, we use hardware boxes from Sonic that connect to the Mac Pros. That enables us to utilize 10 gig and fibre channel. That throughput has been a saving grace, because otherwise you have your standard one gig copper. Everybody here pretty much knows that Apple does not … they're not enterprise. None of this stuff is enterprise. You have to basically use a little elbow grease to get these things to work in that sense. As far as firmware goes, we have done some updates to it, depending on when that hardware was actually released. We've noticed from the VMware side that there wasn't this need to constantly keep up with the change, because it didn't really make too much difference. It was more on the bare metal side versus ESXi side. I heard from VMware that they do a lot of guesswork on getting ESXi to run on Mac, simply because they really just don't know. A memory pointer, for instance. "What does this do?" "I don't know. Let's just see what it does.” “Oh, look. It works." "All right. We're good. We're set."

Greg

These are some of the things that we've spent our time and energy working on. We've got six patents around how we rack and stack Mac infrastructure. I mean, Mac mini has got a single power supply. In a data center, you've got A and B. Well, we create C. We deliver A and B power to every single Mac device, even a Mac mini. This is our patented sled where it has a PCI express Sonic box with some additional connectors to it, dual 10 gig cards, as well as fiber to the SAN. We've taken that, the Mac Pro in particular, and added the right connectivity and the right structure to it to modularize it to make it work in a data center. Then it becomes a very scalable Mac resource, just like an HP blade or a Windows or Dell and who else is in the room here. Yes, sir?

Speaker 3:

Hi. I work for a college in California. One of the things is that our whole infrastructure at the college is VDI. We don't have true computers on anything. One of the questions I have is, anybody thought about just … we have a programming class that's interested in Xcode. For us to spend $20,000 on Mac computers is just not reasonable. Anybody thought about just virtualizing the Xcode program and of getting it onto Horizon 7? Is that even possible? I'm willing to work with Mac.

Greg: ‍

I'll take the answer from the infrastructure side. We have a university in London that uses our – we have a VDI software component; in fact, we just acquired the company that does that – with 75 Mac minis in a resource pool that they can expose to the VDI as a basically virtual Mac instance to the students. Again, it's that one-to-one relationship, but instead of buying 70 minis for that session, that mini is now allocated to that virtual instance of the mini using the VDI software. There are ways to really do that as opposed to hacking into all the Xcode and that kind of stuff. That's more of the infrastructure side.

Speaker 3:  

Thank you.

Greg

Absolutely. Yes, sir? We've got time for probably one more question and we've saved it for you.

Speaker 4:

More of a selfish question. You're obviously running a SAN system. Is there anything to stop you connecting a smaller number of Macs up and running VSAN across them instead? Have you tried that? Can you run VSAN across a cluster of Mac minis instead?

Preston:

We've done it in testing and done it on Xserve as well, being that Xserve's deprecated for quite a while and trying to figure out, "What do we do with all this hardware?" We've run VCN on there, but the performance has really been somewhat sketch, mainly because hardware for the HCL. The ways to get the Mac Pro running on there as well … Addo, I believe is the company, has external hard drives. You keep the M2 drive internally for cache and then storage on the external. It works, but we've seen better performance gains from still just keeping two standard fibre channel or IP-based storage versus the VSAN. Not necessarily sure if that's based on the Apple hardware itself or something else in the ether that's happening, why it's not as quick.

Speaker 4:

You could use the external PCI boxes and the PCI hard drive in there, have your 10 gig connection as well?

Preston

Yeah. Essentially, you still meet the requirements. The fact that the Apple Mac Pro is on the HCL, it technically is supported.

Speaker 4:  

That'll do it for me.

Greg:  

Thank you very much. Again, we're here for any questions afterwards. I encourage and ask you to please fill out your survey. This is useful data that we use afterwards. Obviously, you can enter into a drawing for a VMware company store card. Thank you very much for your time. Again, thanks again for coming.


Source link