Maybe the best way to give a proper answer is to define these terms separately.
Image annotation can be simply defined as a process of assigning labels to images so they can be easily understandable for Machine Learning training which should lead to highly valuable data outcomes. A simple example will be an image with one object on it (animal or whatever you imagine). The human annotator labels it with the correct animal name and forwards it to the machine learning dataset base which is being prepared for incorporation into Machine Learning training and processing. It may sound simple, but try to imagine processing thousands of images with multiple objects on them on a daily basis. This process is one of the essentials in Artificial Intelligence and related processes of learning and development (Machine Learning, Deep Learning, Computer Vision).
Automatic image annotation is the process in which a computer system or Machine Learning model automatically assigns metadata to a digital image, in a form of a caption or keyword. It could mean that the machine recognizes the object from some previous training dataset and annotate it according that previous example processing.
Image segmentation is the process or separation of one image into multiple segments. Every pixel in each segment represents the semantic concept of the label. This process is highly sensitive to human-caused errors such as tiredness after reviewing numerous images and their details. Image segmentation can be split into 3 process types:
- Semantic segmentation: Process where each pixel on the image is associated with a semantic label, regardless of instances
- Instance segmentation: Process where each object on the image is properly annotated at the pixel level. We can say that is process equal to pixel-accurate bounding boxe
- Panoptic segmentation: This process is a combination of the 2 listed above, where each pixel is associated with a semantic label regarding each instance of an object within the image.