In case you have a string data which has some non ASCII characters and want to strip off all those non-ASCII characters the following regular expression will help you.
[^u0000-u007F]+
Explanation
- [^u0000-u007F]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible
-
u0000-u007F a single character in the range between the following two characters
-
u0000 the literal character u0000 (case sensitive)
-
u007F the literal character u007F (case sensitive)
-
^ is the not operator. It tells the regex to find everything that doesn’t match, instead of everything that does match.
The u####-u#### says which characters match.u0000-u007F is the equivilent of the first 255 characters in utf-8 or unicode, which are always the ASCII characters. So you match every non ASCII character (because of the not)
I had a string like the one below where there are many non standard chars
name 1= Chanel 51������������������������������������������������������������
Applying the replace all method in java as below
String s= "name 1= Chanel 51������������������������������������������������������������" s = s.replaceAll("[^u0000-u007F]+",""); System.out.println(s);
would output the following to console
name 1= Chanel 51
Test it herehttps://regex101.com
ASCII Table for reference.
Extended ASCII characters