Article Three: My text-to-img Journey - Software

This is the Third post in a series of articles I have been writing about text-to-image software. In this series, I will talk about how the technology works, ways you can use it, how to set up your own system to batch out images, technical advice on hardware and software, advice on how you can get the images you want, and touch on some of the legal, ethical, and cultural challenges that we are seeing around this technology. My goal is to keep it practical so that anyone with a little basic computer knowledge can understand and use these articles to add another tool to their art toolbox, as cheaply and practically as possible. In this third post, I will discuss software options and detail the install of Automatic1111 stable-diffusion-webui on a Linux Ubuntu OS system.

To summarize the first two posts, I briefly touched on how generated images are created using a combination of neural networks and other machine learning techniques that require specific hardware or cloud equivalent environments to produce. Software breaks down the text, commonly referred to as prompts, into smaller parts and generates images for each part before combining them into a final image.

Overall, text-to-image software is a powerful tool that can be used for a variety of applications, from creating illustrations for books and articles to generating realistic images for video games and movies. As the technology continues to improve, it's likely that we'll see even more sophisticated and realistic images generated from text descriptions in the years to come.

If you have read this far I am going to assume you are on the journey to create your own images. There are many options to choose from, including web based tools (such as the new Bing Edge plugin), cloud resources, and software that you can download and run on your computer. To save time and stick to what I have tested I am just focusing on a couple different tools in the article. You can search for commercial and open source tools by entering “text-to-image software” in your favorite search engine to look at all your options.

Before we start talking about the tools, lets download a text-to-image model. I frequent two sites for models, huggingface.co and Civitai.com, and these links will take you to our first model. Click here to start the download. It is a big file, at almost 8GB, so if size or bandwidth is an issue look around and see what you like. Most models are between 2.5 and 8GB.

The two tools I personally tested are Diffusion Bee, for MAC, and Automatic1111 stable-diffusion-webui that I installed on MAC for testing and later on Linux for long term use and production. First, I will quickly discuss DiffusionBee, then move onto the install of Automatic1111 on Linux.

DiffusionBee - Is an Apple MAC Stable Diffusion interface free software you can download for your MAC. The software is slightly limited, but simple and easy to install for any Mac user. Requires M1/M2 chip set and I personally recommended 16GB system RAM. You can use custom models in checkpoint (.ckpt) files, but DiffusionBee did not support safetensor (.safetensor) files at the time of testing. (More on the difference between those file types in a later article.) DiffusionBee is a great option if you have a Mac and just want to experiment a little. Detailed documentation and links here will enable you to quickly get started: https://github.com/divamgupta/diffusionbee-stable-diffusion-ui/blob/master/DOCUMENTATION.md


Automatic1111 stable-diffusion-webui can be installed on Microsoft Windows, Mac or Linux. I have used it on MAC and Linux with great results. It is no surprise that hardware makes a huge difference on speed of image creation, especially when increasing pixels and sampling steps. Currently I have it on a dedicated linux server with a Nvidia RTX A4500 20GB video card, with an i5 cpu and 32GB RAM and it runs great. I can create 760x760 images with 50 sampling steps in about 30 seconds. Most important to me is I can merge models, and train models directly from images. The web browser user interface is open on my home wifi, so I can access the page from anywhere in range from any device that runs a Chrome browser. The rest of this article will allow you to set up a similar system for your own image creation.
I am going to assume that you have a physical server up and running, with Ubuntu Linux installed, the basic default Graphic User Interface (GUI) working, and an internet connection. There are many great install guides for Ubuntu, so if you need any help a quick search should solve most of your issues. If you are one of our Patreon patrons, feel free to message if you need any help.

First let’s get updates for the software installed and the Ubuntu OS to reduce issues with install, then we will walk through the install of Automatic1111. If you run into issues with updating software and Ubuntu, see here for details for help, or leave comments on this article or hit us up on Patreon

Automatic1111 Installation on Ubuntu (or Debian related Linux OS). Follow the step below. See the Github site for more information, if needed:
  1. Open a terminal window. Fetch updates by typing the following command at the prompt: 
    sudo apt-get update
    OR 
    sudo apt update 
  2. To install the newest version of all packages, type the following command at the prompt:
    sudo apt-get upgrade
    OR 
    sudo apt upgrade
  3. Sometimes a new Linux kernel is installed to address security issues and a reboot is needed. It’s not a bad idea to reboot after updates to ensure everything restarts as expected. Reboot by typing the following command at the prompt: sudo reboot
  4. After you have rebooted and are back to the terminal, install the dependencies for AUTOMATIC1111/stable-diffusion-webui by typing in the terminal window:
    sudo apt-get install wget git python3 python3-venv 
    OR 
    sudo apt install wget git python3 python3-venv
  5. Now the last piece. To install AUTOMATIC1111/stable-diffusion-webui in your home directory under a new directory called “stable-diffusion-webui” run the following command:
    bash <(wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh)
  6. To run the program you need to make sure you are in the stable-diffusion-webui directory and then run webui.sh. You can do this by entering the following command:
    cd /home/$(whoami)/stable-diffusion-webui/
    THEN type in the terminal window:
    ./webui.sh
  7. This will run for a minute. If the install was successful, the last lines it will output will look like the text and image below:
    Running on local URL:  http://127.0.0.1:7860
    To create a public link, set `share=True` in `launch()`.
  8. Now you can open a browser on the linux box, Chrome is recommended, and put in the http address in. The address is your computers local home address and followed by the port it is running on. http://127.0.0.1:7860
  9. This should bring up the webUi and look as the picture below:

  10. You will need a Model for the next step, as mentioned above. If you have already downloaded the file, move on to step 11 to move the model to the correct location. If not, any Checkpoint or safetensor will just download one from a couple different location: Huggyface - https://huggingface.co/EightBuff/8Buff_Gen/tree/main CiviAI - https://civitai.com/models/41928/8buffgen
  11. To install the Checkpoint or Safetensor, you need to move the downloaded file to the "/home/$(whoami)/stable-diffusion-webui/models/stable-diffusion" directory on your linux server. After you have moved or copied that Checkpoint or Safesensor, scroll to the bottom of the webUI and click on the bottom of the “Reload UI”. This may take up to a minute to reload. If it fails, just refresh the web page until it reloads.
  12. Once you have the page back up, in the top left corner, you will see a drop down the says “Stable Diffusion Checkpoint”. If the Checkpoint or Safesensor that you want, click on the down arrow to see all the Checkpoint or Safesensor files you have available. Note when you switch between them, it will take a few seconds to load.
  13. Now let's test it. In the “txt2img” tab, click in the top left open text box and type “plant growing,” and make sure it is followed by a comma. Then click the big orange “Generate” button. See the photo below for reference.

If you got an image, Congratulations! You have a working text-to-image server!

If you want to pause now and generate a few pictures, go ahead. I will wait. It works, but we are not done yet. Once you are done, come back and read the next article where I will show you how you can open the webUI page to your home network for access from any computer and set up the software to auto start on reboot and auto restart when errors occur.

NOTES:
  • Safari Browser has some issues with the webUI, use Chrome or a similar browser
  • Some notes we will go over in more detail in later articles:
    • WebUI has some security, but I would not consider it secure. Some security features/issues to note:
      • You can not install extensions or updates unless you are running local
      • Traffic is not encrypted
  • If you start the webui with –listen you are share the interface on the network
  • If you start the webui with –share you are share with the world through huggyface.io, not recommended unless you want to share you GPU cycles.
  • Like all software, webui will crash at times. Setting up a systemctl script for webui to auto start/restart on reboot and when errors occur will save time and headaches.
If you have questions, feedback, or issues please leave a comment on this post or contact us through our Patreon page.

Comments

Popular Posts