Article Two: My text-to-img Journey - Hardware

My text-to-img Journey - Hardware

This is the second post in a series of articles I have been writing about text-to-image software. In this series, I will talk about how the technology works, ways you can use it, how to set up your own system to batch out images, technical advice on hardware and software, advice on how you can get the images you want, and touch on some of the legal, ethical, and cultural challenges that we are seeing around this technology. My goal is to keep it practical so that anyone with a little basic computer knowledge can understand and use these articles to add another tool to their art toolbox, as cheaply and practically as possible. In this second post, I want to go into your options for running in the cloud or on physical hardware. I will outline some options, then tell you why I went with physical hardware.

Many people ask me about how I make the AI images I do. Most people are asking because they are amazed by the images and not so interested in the process. A few are asking because they want to do the same. Many people in this group are often aware of the free text-to-image sites that are available on the web that you can use to generate one image at a time. When I tell these same few people I can batch out 1 high res image a minute, they always start getting excited. For those of us who want to have control over the tools we use to create, we know understanding the tool is the key to making it do what we want.


To get the images they want there will be some important steps to take. I will break these steps down into 3 areas.

  • Hardware environment 

  • Software

  • Models


In this article I will focus on the hardware.


Hardware is important because neural networks and bots require speed. Specifically they require GPU (Graphic Processing Unit) cycles, and those are expensive. You can use cloud resources, (Google Public Cloud or GPC, Amazon Web Services or AWS, Microsoft Azure…) but it requires strong base cloud knowledge and can quickly become expensive. Search for “Cloud GPU Pricing” in your favorite search engine and you will see that starting cost. That pricing may seem low, until you factor in the number of GPUs, GPU memory, and process time, and it quickly adds up if you want to process a lot of images. One option is looking at some free/low cost options, such as Google CoLab, but again that takes a strong technical base knowledge. I may write some articles on that later, but let's move on to physical hardware.


For my personal use I went with a physical computer for multiple reasons:

  1. Control - My computer is next to my desk, on a UPS battery power supply to protect it from power spikes/outages. I will keep making images even if the power goes out.

  2. Cost - The initial cost is higher to buy a physical system, but the only additional cost after that is electricity. I am using linux and open source software so I have no cost for licenses. Most importantly I don’t have to track my monthly cloud computing expenses to ensure I didn’t exceed my monthly budget.

  3. Accessibility - I have a web front end that I can access anywhere my wifi reaches. I will cover how to do this in a later article.

  4. Security - My server sits on my internal network and only connects to the internet for updates and new model checkpoints.

  5. Comfortable knowledge base - I have built more computers than I can count at this point. I know physical computer hardware. 


These are my reasons for going to a physical server. Depending on your knowledge of computers, cloud resources, and financial resources, you may choose differently. In most of the later articles we will be focused on the software and process so it won’t matter too much what direction you go with the environment. The key here is finding out what environment you can afford and have the most comfort with when you have to troubleshoot issues.


Before I go into specs I suggest for a server, let's talk about the Apple Mac option. When I first was experimenting with open source software options, I used my MacBook Pro for some image creation. It worked surprisingly well, but a few points to keep in mind:

  • You NEED an M1 or better CPU. M1 works, but M2 is better.

  • You NEED at LEAST 16 GB RAM. Apple uses RAM for graphics processing so 16GB on a MacBook Pro is similar to buying an 8GB video card for windows or linux. This will limit the size of the images you can generate and options for training your own models.

  • You will run out of storage space. Once you download all those model check points and generate all those images, that 512GB storage will be full in no time.

  • You will be tethered to a wall. A MacBook Pro will chew through your battery power while processing images. No kidding. I went through a fully charged battery in about an hour with continued processing of images. I can go all day on one charge for just email and web surfing.


Since the Apple Desktops and Apple Mini use the same M1 and M2 processors, those may be a great option, if portability isn’t a concern. My opinion on Mac for text-to-image generation is that it is a great option if you can only afford one high end system that you still need to use for everyday things. Make sure your Apple system has at least 16GB RAM, an M1 chip, and 512GB storage. Better would be 32GB RAM with M2 Pro chip and 2TB storage.


If you are fortunate enough to be able to build or buy a separate system, I highly recommend it. The minimum physical hardware specifications I would recommend are:


I strongly recommend you spend as much of your budget as you can on your video card. Nvidia RTX is recommended because there is more support in software for it. My minimum recommended graphics card memory is 8GB, I highly recommend 16Gb, if you can get more memory, do it. If you want to train your own models, you will need at least a 16GB Graphics card, preferably higher.


For the operating system, I highly recommend Linux, but use Windows if that is what you are comfortable with. I will go through some of the Linux tuning in future articles, but there are many resources on the web for using windows.


If you don’t decide to use a physical hardware system, there are other resources on the web. Below I have listed some of those, including links from a couple of the opensource projects and the primary software that I will be using in later articles, Automatic1111 stable-diffusion-webui:


Automatic1111 stable-diffusion-webui - from follow link of Online Services:

  • Google Colab

DiffusionBee - Apple MAC Stable Diffusion interface. Limited, but simple to install and use.


Keep an eye out for the upcoming article, which will focus on installing Automatic1111 on a Linux server and tweaking it so you can connect anywhere on your network and it will automatically reset when issues occur, allowing you to focus on creating instead of the technology. 

Comments

Popular Posts