How-to Guides: What is AI

Artificial Intelligence (AI) is a very young technology, dating back to the 1950s, but it wasn’t until 2006 when scientists came up with the concept of deep learning that AI really came into the spotlight. The AI we’re talking about today is specifically the ChatGPT-induced AI boom, and more specifically, we’re actually talking about “generative AI”.

Generative AI is an artificial intelligence technology that can create or generate new content based on existing data. It’s like a robot that can draw and paint, and after it learns thousands of paintings, it’s able to draw a new one on its own. This kind of AI works in areas such as music, text, images, and video by understanding patterns and regularities in data and then applying that knowledge to create something new. In short, generative AI is an intelligent system that can ‘imagine’ and create new things.

Generative Artificial Intelligence (GenAI)

Generative AI can be understood as AI that generates content prompted by text. Depending on the content carrier, it can be categorized into these directions: text-generated text, text-generated graphics, text-generated video and text-generated audio.

Text to text: ChatGPT, Claude, etc., are large language models, which can be understood as text to text, as long as you provide a cue word, he will generate text according to your requirements. The scenarios of the generated text are expanded to include writing, summarization, dialogue, sentiment analysis and other scenarios.

Text-to-picture and picture-to-picture: Stable Diffusion and Midjourney are based on diffusion technology, and AI will generate corresponding pictures by inputting a prompt word.

Text-to-Speech Video: Sora, which exploded in popularity some time ago, is OpenAI’s text-to-speech video technology, which has sparked the world just by the video preview released by OpenAI without any public beta testing yet. Stable Video Diffusion (SVD), based on diffusion technology, is also in development.

Vincent Audio: TTS technology, which can be used for tone reproduction, speech synthesis, music synthesis, etc. I have not studied these technologies systematically, but only know that they have these functions.

According to the maturity of the technology, “text-to-life text > text-to-life audio > text-to-life graphic > text-to-life video”. The AI technologies now being discussed are all centered around these generative AI directions. If you don’t need these in your life or work, then maybe you don’t need AI and your life can be good; and if you are interested in AI and want to step into this torrent, then let’s start learning AI.

The minimum necessary knowledge to be mastered in the field of AI, in my opinion, is the principles and use of generative AI. As mentioned above, you can beat 80% of people by understanding the most popular generative large language models and stable diffusion techniques and applying them in your own work and life.

Use a Big Language Model for the First Time

This step is like print(“Hello World”) for people learning to code for the first time, welcoming you into the world of AI.

Nowadays, there are a lot of choices of big language models, such as ChatGPT, Claude and Gemini. Open the websites or apps of these big modeling apps and issue your first command. Thus enter this new paradigm of generative AI one-question-answers.

Prompt is a word that will be used a lot in the future, in text-to-graph, text-to-video, etc. You give the model a prompt, and then you give it to the model.

You provide a prompt to the model, and it generates content according to your prompt. In the field of large language modeling, a good prompt can help you significantly improve the model generation effect. If you’re going to have text generation scenarios, learning structured prompts is a must.

Midjourney Generate Pictures

Mastering the ability to use Mjidouney allows you to draw great looking pictures even if you are handicapped. Enter some keywords in Midjourney to generate great looking pictures. In fact, Midjourney has its own technique for prompt words, you can read the official documentation of Midjourney to learn it.

Generate Images with Stable Diffusion

In theory, both Midjourney and Stable Diffusion use diffusion techniques. So why would you want to experience Stable Diffusion after you’ve experienced Midjourney? It’s because although they’re based on the same technology, they take completely different routes. Midjourney’s vision is to reduce the difficulty of producing good-looking images, and it takes a closed-source approach, making it easy for users to produce good-looking images by training them with images from a large number of artists. On the other hand, Stable Diffusion’s vision is to make diffusion technology more stable, and has taken the open-source route. Because of its open-source ecosystem, many designers and developers have built a very powerful modeling ecosystem based on Stable Diffusion.

In other words, Midjourney can only let you experience image generation modeling, and to really learn and study, you still have to learn Stable Diffusion. Stable Diffution requires a large amount of arithmetic support for local deployment, and you can experience it in a number of online ways. For example, stability.ai’s official experience site Dream Studio, input prompt, select different models to generate pictures.

Generating a Music with SunoAI

Suno has recently updated its v3 model, and the quality of the music it generates is so much better that it’s getting a lot of attention. You can ask ChatGPT to generate some lyrics for you, and then hand them over to Suno to generate a piece of music.

Generate a Video with Pika or Runway

Pika and Runway are similar in that they can make a static image move. Some uploaders are already using these tools to create videos. However, they are still in their infancy and can only be used to make simple movements. You can use the image generated by Midjourney, upload it to Pika or Runway and enter some descriptors to create a video.

AI Characteristics

The biggest characteristic of the generative AI field is ‘fast’. The development time is short, the development speed is fast, and it is changing rapidly. For example, the Stable Diffusion technology we mentioned above has only been around since 2020. Such characteristics mean that the knowledge we learn is cutting edge and fragmented. There is a lot of knowledge that you may not be able to learn directly from books.

A great deal of knowledge comes from the internet, update announcements from big model vendors, video tutorials, etc. This relies heavily on the individual’s information processing skills: ability to retrieve information, ability to filter information, ability to process information, ability to learn by fragmentation, ability to organize information .

Find Your Interest

There are so many sub-directions of AI development, and each of them is currently growing at a high rate as well. There are also many different technology routes and different application scenarios in each of the four major categories (text-to-speech, text-to-graph, text-to-audio, and text-to-video).

Different technology routes, such as Chatbot, RAG (Search Enhanced Generation), Agent (Agent, etc.), Agent also divided into a single agent, multi-agent, auto-agent, and different technology routes in different scenarios, the application is very different. So when you are faced with such a novel situation with many directions, frankly speaking, your ability and energy can’t support you to do everything. In this case, the best way is “T-learning”, that is to say, in-depth research and study in one area you are most interested in, while maintaining basic learning activities in other areas. This is where interest is the best teacher. Find the direction you are interested in and study it in depth, learn the most cutting-edge knowledge, and find practical landing scenarios, and the time you have spent will definitely be rewarded. At this very early stage, the ROI of investing time is very high.

Finding GenAI Best Practices in Your Field

Learning and thinking go hand in hand, and the only way to really learn deeply is to get feedback and think about it in action. Learning by doing is the fastest and most effective. Learn and apply AI to your work and life events. In the process of doing, what you learn is also more solid and deeper. You learn more by doing and reviewing. Regarding the practice of AI, you can find a best practice in an area that you are familiar with after you have learned about the learning areas of AI.

One of the characteristics of general AI is ‘generalization’, take ChatGPT as a distance, I believe every industry, every position using ChatGPT may be different. Teachers may use ChatGPT to do something like student evaluation, but as a product manager, I may not be able to understand at all what problems civil servants or teachers would use generative AI to solve in their productivity scenarios. Only through interviews can I understand. Because the application scenarios of generative AI are very decentralized and vertical. So by combining your industry/professional methodology, your knowledge system and GenAI in the fields you know well, you can definitely make your own “best practices”.

For Developers

Use LLM to Develop an AI Dialog Bot

For the most entry-level calls, you can interface with any LLM (Large Language Model) vendor’s interface, and combine it with some local visualization WebUI projects, such as Gradio or ollama webui, to realize a conversational bot that can talk locally.

Calling Big Models with Rich Hugging Face

Hugging Face is a company specializing in the field of Natural Language Processing (NLP), which provides an open source platform of the same name designed to promote the development and application of deep learning and natural language processing technologies. Hugging Face offers a Model Center on its website, where users can find and share a variety of pre-trained models and datasets, as well as an online demo platform that allows you to test most of the models directly on the webpage. You can select one of the models to call according to your needs. In addition, Hugging Face is more powerful in developing the Transformer library, which accomplishes pre-training, fine-tuning, and inference of large models.

Develop a RAG Conversation Bot

The next challenge in the image route is RAG (Retrieval Augmented Generation), which translates to “Search Augmented Generation” – storing content vectorized and combining it with a vectorized search each time it is generated, and passing the result that is closest to the search question to the big language model as context to assist the big language model in generating better content. model to assist the large language model to generate better content. You can do this based on Langchain, an open-source programming framework designed for developing applications powered by the Large Language Model (LLM). By using langchain’s modular design features, developers can more simply implement chain calls, memory mechanisms, churn processing, and so on, to complete the development of AI.

Deploy Stable Diffusion by Yourself

If you choose the route of image generation, the most important thing is to deploy Stable Diffusion locally. You can use web-UI or comify UI to build it.

Using Open Source Diffusion Models

The website Civitai.com has a lot of models shared by users, so you can download some of them and try them out locally. This site is kind of like Hugging Face for LLM, but since I’m not in the graphics business, I don’t know much about it, so I won’t go into it. Of course, there are still more anti-directions that can be developed, but since my interest is in the direction of LLM, I can’t share more because my research on other fields is still relatively small.

AI Creation

The next area of learning AI is creation.

Of course, this step requires stronger expertise, basic knowledge of deep learning, natural language understanding, image understanding, and so on. Since I don’t have that at the moment, I’m still learning, so I can only give you some guidance. Fine-tuning the large language model to solve the problem you encountered. If you are doing some fixed text generation tasks, and you find that the existing large model is not effective, you can fine-tune the large language model through fine-tuning techniques. Fine-tuning, the most used method nowadays is Lora, which simply means fine-tuning the generation style, correcting the generation effect of the large language model by inputting QA pairs. The fine-tuned model will be more in line with what you want to generate.

At present, all large language model vendors have improved the fine-tuning interface, you can use these interfaces to fine-tune the microdata; if the model is locally deployed, you can also use the Transformer library provided by Hugging Face to fine-tune the model as mentioned earlier.

Training your Stable Diffusion Model

Stable Diffusion also supports fine-tuning and is much simpler to train without text. All I know so far is that after inputting a large number of images, I label them and submit them for training. Again, since I’m not going down the image generation route, I can’t go into more detail here.

Others

There’s not much else I can write about up to the maker part. This is mainly due to my own limited capabilities. Because knowledge grows fractally, new knowledge always appears at the boundaries, any knowledge has boundaries, and when you get close enough to a piece of knowledge, you know the boundaries of its use, and often exploring the boundaries of knowledge reveals new knowledge that can create new connections. When it comes to the field of creation, there are definitely more directions, many of which are up to date with the latest scholarship. It’s just that this is all I can write about with my current shallow knowledge. I’ll come back and share more when I learn more. Human sources of information can be divided into three categories: reading, talking, and doing. Earlier we talked about reading and doing, the remaining one is communication.

Communication is very useful, find friends with you like-minded, in addition to being able to find peers, but also increase the opportunity for collision of ideas, the collision of ideas can often produce sparks.

What to do, call what field of friends. Learning AI, of course, to find friends with common goals to discuss and learn together. The other thing is to communicate with people who are better than yourself, the knowledge that can be obtained outside the books. The content of books is generalized, the content of books is general, and the content of conversations with people is specific. Much of the important knowledge, the profound experiences, are difficult to describe in words and cannot be recorded in a book. “It is not what you know, but who you know that counts”. Knowledge is something that can only be acquired through your own efforts, whereas contacts can access knowledge that others have. If you can use human leverage, you can outsource the time you spend learning and let others learn for you. One way to connect with people is to be open about your learning process.

As mentioned above, you can publicize your learning process on platforms like Twitter, sharing your unique ways of using AI, which will attract friends who share your interests. At present, there are still very few people who are willing to dedicate themselves to learning AI, so this circle should not be too big.

Published by YooCare Editor on May 25, 2024 1:13 pm, last updated on May 26, 2024 2:16 am

Comments are closed.